Edinburgh Research Explorer Genome-wide association and functional follow-up reveals new loci for kidney function

Chronic kidney disease (CKD) is an important public health problem with a genetic component. We performed genome- wide association studies in up to 130,600 European ancestry participants overall, and stratified for key CKD risk factors. We uncovered 6 new loci in association with estimated glomerular filtration rate (eGFR), the primary clinical measure of CKD, in or near MPPED2 , DDX1 , SLC47A1 , CDK12 , CASP9 , and INO80 . Morpholino knockdown of mpped2 and casp9 in zebrafish embryos revealed podocyte and tubular abnormalities with altered dextran clearance, suggesting a role for these genes in renal function. By providing new insights into genes that regulate renal function, these results could further our understanding of the pathogenesis of CKD. The Prospective Study of Pravastatin in the Elderly at Risk (PROSPER) trial investigator Bristol-Myers The study was conducted, analyzed, reported independently company. The SAPALDIA study The SAPHIR-study the Stiftung Paulweber of Lipid-associated Disorders – GOLD’’ the ‘‘Austrian Genome Research Programme GEN-AU’’ F Kronenberg. eQTL analysis: HJ Gierman AFAR/EMF and provide insights into potential novel mechanisms of disease.

and Treatment Center) AdiposityDiseases (K7-37 to M Stumvoll and A Tö njes). We also thank Dr. Knut Krohn (Microarray Core Facility of the Interdisciplinary Centre for Clinical Research, University of Leipzig, Germany) for providing the genotyping platform.  (RIDE), the Ministry of Education, Culture, and Science, the Ministry for Health, Welfare, and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. The Erasmus Computing Grid, Rotterdam (The Netherlands) and the national German MediGRID and Services@MediGRID part of the German D-Grid were both funded by the German Bundesministerium fuer Forschung und Technology under grants #01 AK 803 A-H and # 01 IG 07015 G, for access to their grid resources. A Dehghan is supported by NWO grant (vici, 918-76-619). The Study of Health in Pomerania (SHIP) is part of the Community Medicine Research net of the University of Greifswald, Germany, funded by the Federal Ministry of Education and Research (grants no. 01ZZ9603, 01ZZ0103, and 01ZZ0403), the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania. Genome-wide data have been supported by the Federal Ministry of Education and Research (grant no. 03ZIK012) and a joint grant from Siemens Healthcare, Erlangen, Germany, and the Federal State of Mecklenburg-West Pomerania. The University of Greifswald is a member of the 'Center of Knowledge Interchange' program of the Siemens AG. The Vis study was supported through the grants from the Medical Research Council UK to H Campbell, AF Wright, and I Rudan; and Ministry of Science, Education, and Sport of the Republic of Croatia to I Rudan (number 108-1080315-0302) and the European Union framework program 6 EUROSPAN project (contract no. LSHG-CT-2006-018947). The WGHS is supported by HL 043851 and HL69757 from the National Heart, Lung, and Blood Institute and CA 047988 from the National Cancer Institute, the Donald W. Reynolds Foundation and the Fondation Leducq, with collaborative scientific support and funding for genotyping provided by Amgen. The 3 City Study was supported by the National Foundation for Alzheimer's disease and related disorders, the Institut Pasteur de Lille and the Centre National de Génotypage. The 3 City Study was performed as part of a collaboration between the Institut National de la Santé et de la Recherche Médicale (Inserm), the Victor Segalen Bordeaux II University and Sanofi-Synthélabo. The Fondation pour la Recherche Médicale funded the preparation and initiation of the study. The 3C Study was also funded by the Caisse Nationale Maladie des Travailleurs Salariés, Direction Générale de la Santé, MGEN, Institut de la Longévité, Agence Française de Sécurité Sanitaire des Produits de Santé, the Aquitaine and Bourgogne Regional Councils, Fondation de  . These authors contributed equally to this work.
" These authors were joint senior authors on this work.
Using genome-wide association studies (GWAS) in predominantly population-based cohorts, we and others have previously identified more than 20 genetic loci associated with eGFR and CKD [8][9][10][11]. Although most of these genetic effects seem largely robust across strata of diabetes or hypertension status [9], evidence suggests that some of the loci such as the UMOD locus may have heterogeneous effects across these strata [11]. We thus hypothesized that GWAS in study populations stratified by four key CKD risk factors -age, sex, diabetes or hypertension status -may permit the identification of novel eGFR and CKD loci. We carried this out by extending our previous work [9] to a larger discovery sample of 74,354 individuals with independent replication in additional 56,246 individuals, resulting in a total of 130,600 individuals of European ancestry. To assess for potential heterogeneity, we performed separate genome-wide association analyses across strata of CKD risk factors, as well as in a more extreme CKD phenotype.

Results
Meta-analyses of GWAS on the 22 autosomes were performed for: 1) eGFR based on serum creatinine (eGFRcrea) and CKD (6,271 cases) in the overall sample, 2) eGFRcrea and CKD stratified by the four risk factors, and 3) CKD45, a more severe CKD phenotype defined as eGFRcrea ,45 ml/min/1.73 m 2 in the overall sample (2,181 cases). For the stratified analyses, in addition to identifying loci that were significant within each stratum, we performed a genome-wide comparison of the effect estimates between strata of the four risk factors. A complete overview of the analysis workflow is given in Figure S1. All studies participating in the stage 1 discovery and stage 2 replication phases are listed in Tables S1 and S2. The characteristics of all stage 1 discovery samples by study are reported in Table S3, and information on study design and genotyping are reported in Table S4. Results of the eGFRcrea analyses are summarized in the Manhattan and quantile-quantile plots reported in Figures S2 and S3. A total of 21 SNPs from the discovery stage were carried forward for replication in an independent set of 56,246 individuals (Tables S5 and S6). These SNPs were selected for replication for the following ( Figure S1): 5 reached genome-wide significance in either eGFRcrea overall or stratified analyses, 1 based on a test of direction-consistency of SNP-eGFR associations across the discovery cohorts for eGFRcrea overall, 4 demonstrated a P value#10 26 and high between-study homogeneity (I 2 ,25%) in the CKD45 analysis (Table S7), and 11 demonstrated between-strata P value#5610 25 along with a P value#5610 25 for association with eGFRcrea in at least one of the two strata (Table S8).
While none of the loci identified for CKD45 or the test for between-strata difference analyses replicated, all 6 loci identified from the eGFRcrea overall analysis, stratified analyses, and the direction test did (Table 1). These 6 loci were identified and replicated in the overall analysis (rs3925584, located upstream of the MPPED2 gene; rs6431731 near the DDX1 gene), in the diabetes-free sub-group (rs2453580 in an intron of the SLC47A1 gene), in the younger age stratum (rs11078903 in an intron of the CDK12 gene; rs12124078 located near the CASP9 gene), and the direction test (rs2928148, located in the INO80 gene, see Methods for details). In the combined meta-analysis of all 45 studies used in the discovery and replication stages, all six SNPs met the genome-wide significance threshold of 5610 28 , with individual P values ranging from 4.3610 28 to 8.4610 218 ( Table 1). The imputation quality of these SNPs is reported in Table S9, and Figure S4 shows the regional association plots for each of the 6 loci. We also confirmed all previously identified renal function loci in the current data (Table   S10). Brief descriptions of the genes included within the 6 new loci uncovered can be found in Table S11. Forest plots for the associations between the index SNP at each of the 6 novel loci and eGFR across all discovery studies and all strata are presented in Figures S5 and S6. Most of the 6 new loci had similar associations across strata of CKD risk factors except for the CDK12 locus, which revealed stronger association in the younger (#65 years of age) as compared to the older age group (.65 years of age).
We further examined our findings in 8,110 African ancestry participants from the CARe consortium [12] (Table 2). Not surprisingly, given linkage disequilibrium (LD) differences between Europeans and African Americans, none of the 6 lead SNPs uncovered in CKDGen achieved significance in the African American samples. Next, we interrogated the 250 kb flanking regions from the lead SNP at each locus, and showed that 4 of the 6 regions (MPPED2, DDX1, SLC47A1, and CDK12) harbored SNPs that achieved statistical significance after correcting for multiple comparisons based on the genetic structure of each region (see Methods for details). Figure 1 presents the regional association plots for MPPED2, and Figure S7 presents the plots of the remaining loci in the African American sample. Imputation scores for the lead SNPs can be found in Table S12. We observed that rs12278026, upstream of MPPED2, was associated with eGFRcrea in African Americans (P value = 5610 25 , threshold for statistical significance: P value = 0.001). While rs12278026 is monomorphic in the CEU population in HapMap, rs3925584 and rs12278026 have a D9 of 1 (r 2 = 0.005) in the YRI population, suggesting that these SNPs may have arisen from the same ancestral haplotype.
We also performed eQTL analyses of our 6 newly identified loci using known databases and a newly created renal eSNP database (see Methods) and found that rs12124078 was associated with cis expression of the nearby CASP9 gene in myocytes, which encodes caspase-9, the third apoptotic activation factor involved in the activation of cell apoptosis, necrosis and inflammation (P value for the monocyte eSNP of interest = 3.7610 213 ). In the kidney, caspase-9 may play an important role in the medulla response to hyperosmotic stress [13] and in cadmium-induced toxicity [14]. The other 5 SNPs were not associated with any investigated eQTL. Additional eQTL analyses of 81 kidney biopsies (Table S13) did not reveal further evidence of association with eQTLs (Table S14).

Author Summary
Chronic kidney disease (CKD) is an important public health problem with a hereditary component. We performed a new genome-wide association study in up to 130,600 European ancestry individuals to identify genes that may influence kidney function, specifically genes that may influence kidney function differently depending on sex, age, hypertension, and diabetes status of individuals. We uncovered 6 new loci associated with estimated glomerular filtration rate (eGFR), the primary measure of renal function, in or near MPPED2, DDX1, SLC47A1, CDK12, CASP9, and INO80. CDK12 effect was stronger in younger and absent in older individuals. MPPED2, DDX1, SLC47A1, and CDK12 loci were associated with eGFR in African ancestry samples as well, highlighting the cross-ethnicity validity of our findings. Using the zebrafish model, we performed morpholino knockdown of mpped2 and casp9 in zebrafish embryos and revealed podocyte and tubular abnormalities with altered dextran clearance, suggesting a role for these genes in renal function. These results further our understanding of the pathogenesis of CKD and provide insights into potential novel mechanisms of disease. 1Effects on log(eGFRcrea); post GWAS meta-analysis genomic control correction applied to P values and SEs.
*While being uncovered in the younger samples, this locus showed consistent results in the non-diabetic group (combined-analysis P value 5.7610 216 ) and in the overall population (P value 9.5610 222 ) -see Tables S16 and S10 for additional details.
**The direction test was performed in the overall dataset; the genomic control corrected P value from the direction test for the SNP rs2928148 was 4.0610 27 . In the combined analysis, the largest effect size (0.0054 on log eGFR in ml/min/1.73 m 2 ) and the smallest P value (3.7610 28 ) were observed in the non-diabetic group.
{All results were confirmed by random-effect meta-analysis. doi:10.1371/journal.pgen.1002584.t001 Casp9 morphants displayed diminished clearance of 70,000 MW fluorescent dextran 48 hours after injection into the sinus venosus compared to controls, revealing significant functional consequences of casp9 knockdown ( Figure 2Q-2V). No clearance abnormalities were observed in mpped2 morphants. The occurrence of abdominal edema is a non-specific finding that is frequently observed in zebrafish embryos with kidney defects. We examined the occurrence of edema in mpped2 and casp9 knockdown embryos at 4 and 6 days post fertilization (dpf), both in the absence and presence of dextran, and observed a significant increase in edema prevalence in casp9 with (P value,0.0001) and without (P value = 0.0234) dextran challenge but not in mpped2 morphants ( Figure 2W).
In order to further demonstrate differences in kidney function in response to knockdown of mpped2 and casp9, we injected the nephrotoxin gentamicin which predictably causes edema in a subset of embryos. Casp9 morphants were more susceptible to developing edema compared to both controls and mpped2 morphants ( Figure 2X). In addition, edema developed earlier and was more severe, encompassing a greater area of the entire embryo ( Figure  S9). Together, these findings suggest that casp9 and mpped2 knockdowns result in altered kidney gene expression and function. Specifically, abnormal expression of pax2a and nephrin in casp9 morphants in addition to dextran retention and edema formation suggest loss of casp9 impacts glomerular development and function.
The lead SNP at the MPPED2 locus is located approximately 100 kb upstream of the gene metallophosphoesterase domain containing 2 (MPPED2), which is highly evolutionary conserved and encodes a protein with metallophosphoesterase activity [18]. It has been recognized for a role in brain development and tumorigenesis [19] but thus far not for kidney function.
To determine whether the association at our newly identified eGFRcrea loci was primarily due to creatinine metabolism or renal function, we compared the relative associations between eGFRcrea and eGFR estimated using cystatin C (eGFRcys) ( Figure S10, File S1). The new loci showed similar effect sizes and consistent effect directions for eGFRcrea and eGFRcys, suggesting a relation to renal function rather than to creatinine metabolism. Placing the results of these 6 loci in context with our previously identified loci [8,9] (23 known and 6 novel), 18 were associated with CKD at a 0.05 significance level (odds ratio, OR, from 1.05 to 1.26; P values from 3.7610 216 to 0.01) and 11 with CKD45 (OR from 1.08 to 1.34; P values from 1.1610 25 to 0.047; Figure S11 and Table S15).
When we examined these 29 renal function loci by age group, sex, diabetes and hypertension status (Tables S16, S17, S18, and S19), we observed consistent associations with eGFRcrea for most loci across all strata, with only two exceptions: UMOD had a stronger association in older individuals (P value for difference 8.4610 213 ) and in those with hypertension (P value for difference 0.002), and CDK12 was stronger in younger subjects (P value for difference 0.0008). We tested the interaction between age and rs11078903 in one of our largest studies, the ARIC study. The interaction was significant (P value = 0.0047) and direction consistent with the observed between-strata difference.
Finally, we tested for associations between our 6 new loci and CKD related traits. The new loci were not associated with urinary albumin-to-creatinine ratio (UACR) or microalbuminuria [20] (Tables S20 and S21), with blood pressure from the ICBP Consortium [21] (Table S22) or with myocardial infarction from the CARDIoGRAM Consortium [22] (Table S23).

Discussion
We have extended prior knowledge of common genetic variants for kidney function [8][9][10][11]23] by performing genome-wide Table 2. Interrogation of the six novel loci uncovered in the European ancestry (EA) individuals (CKDGen consortium) in individuals of African ancestry (AA) from the CARe consortium for the trait eGFRcrea.  Table 1.
1The gene closest to the SNP is listed first and is in boldface if the SNP is located within the gene. association tests within strata of key CKD risk factors, including age, sex, diabetes, and hypertension, thus uncovering 6 loci not previously known to be associated with renal function in population-based studies (MPPED2, DDX1, CASP9, SLC47A1, CDK12, INO80). In contrast to our prior genome-wide analysis [8,9], the majority of the new loci uncovered in the present analysis have little known prior associations with renal function. This highlights a continued benefit of the GWAS approach by using large sample sizes to infer new biology. Despite our hypothesis that genetic effects are modified by CKD risk factors, most of the identified variants did not exhibit strong cross-strata differences. This highlights that many genetic associations with kidney function may be shared across risk factor strata. The association of several of these loci with kidney function in African Americans underscores the generalizability of identified renal loci across ethnicities. Zebrafish knockdown of mpped2 resulted in abnormal podocyte anatomy as assessed by expression of glomerular markers, and loss of casp9 led to altered podocyte and distal tubular marker expression, decreased dextran clearance, edema, and enhanced susceptibility to gentamicin-induced kidney damage. These findings demonstrate the potential importance of these genes with respect to renal function and illustrate that zebrafish are a useful in vivo model to explore the functional consequences of GWAS-identified genes.
Despite these strengths, there are some limitations of our study that warrant discussion. Although we used cystatin C to separate creatinine metabolism from true filtration loci, SNPs within the cystatin C gene cluster have been shown to be associated with cystatin C levels [8], which might result in some degree of misclassification in absolute levels. While we used standard definitions of diabetes and hypertension in the setting of population-based studies, these may differ from those definitions used in clinical practice. In addition, we were unable to differentiate the use of anti-hypertension medications from other clinical indications of these agents or type 1 from type 2 diabetes. The absence of association between our six newly discovered SNPs and the urinary albumin to creatinine ratio, blood pressure, and cardiovascular disease may have resulted from disparate genetic underpinnings of these traits, the overall small effect sizes, or the cross-sectional nature of our explorations; and we were unable to differentiate between these potential issues. Finally, power was modest to detect between-strata heterogeneity.
With increased sample size and stratified analyses, we have identified additional loci for kidney function that continue to have novel biological implications. Our primary findings suggest that there is substantial generalizability of SNPs associations across strata of important CKD risk factors, specifically with hypertension and diabetes.

Phenotype definition
Serum creatinine and cystatin C were measured as detailed in Tables S1 and S2. To account for between-laboratory variation, serum creatinine was calibrated to the US nationally representative National Health and Nutrition Examination Study (NHANES) standards in all discovery and replication studies as described previously [8,24,25]. GFR based on serum creatinine (eGFRcrea) was estimated using the four-variable MDRD Study equation [26]. GFR based on cystatin C (eGFRcys) was estimated as eGFRcys = 76.76(serum cystatin C) 21.19 [27]. eGFRcrea and eGFRcys values,15 ml/min/1.73 m 2 were set to 15, and those .200 were set to 200 ml/min/1.73 m 2 . CKD was defined as eGFRcrea ,60 ml/min/1.73 m 2 according to the National Kidney Foundation guidelines [28]. A more severe CKD phenotype, CKD45, was defined as eGFRcrea ,45 ml/min/ 1.73 m 2 . Control individuals for both CKD and CKD45 analyses were defined as those with eGFRcrea .60 ml/min/1.73 m 2 .

Covariate definitions
In discovery and replication cohorts, diabetes was defined as fasting glucose $126 mg/dl, pharmacologic treatment for diabetes, or by self-report. Hypertension was defined as systolic blood pressure $140 mmHg or diastolic blood pressure $90 mmHg or pharmacologic treatment for hypertension.

Discovery analyses
Genotyping was conducted as specified in Table S4. After applying quality-control filters to exclude low-quality SNPs or samples, each study imputed up to ,2.5 million HapMap-II SNPs, based on the CEU reference samples. Imputed genotypes were coded as the estimated number of copies of a specified allele (allelic dosage). Additional, study-specific details can be found in Table S1.

Primary association analysis
A schematic view of our complete analysis workflow is presented in Figure S1. Using data from 26 population-based studies of individuals of European ancestry, we performed GWA analyses of the following phenotypes: 1) log e (eGFRcrea), log e (eGFRcys), CKD, and CKD45 overall and 2) log e (eGFRcrea) and CKD stratified by diabetes status, hypertension status, age group (#/ .65 years), and sex. GWAS of log e (eGFRcrea) and log e (eGFRcys) were based on linear regression. GWAS of CKD and CKD45 were performed in studies with at least 25 cases (i.e. all 26 studies for CKD and 11 studies for CKD45) and were based on logistic regression. Additive genetic effects were assumed and models were adjusted for age and, where applicable, for sex, study site and principal components. Imputation uncertainty was accounted for by including allelic dosages in the model. Where necessary, relatedness was modeled with appropriate methods (see Table S1 for study-specific details). Before including in the meta-analysis, all GWA data files underwent to a careful quality control, performed using the GWAtoolbox package in R (www.eurac.edu/GWA toolbox.html) [29].
Meta-analyses of study-specific SNP-association results, assuming fixed effects and using inverse-variance weighting, i.e.: the pooled effectb b pooled is estimated as P K i~1 w ib b i . P K i~1 w i , whereb b i is the effect of the SNP on the outcome in the i th study, K is the number of studies, and w i~1 .
is the weight given to the i th study. The meta-analyses were performed using METAL [30], with genomic control correction applied across all imputed SNPs [31] if the inflation factor l.1 at both the individual study level and after the meta-analysis. SNPs with minor allele frequency (MAF),1% were excluded. All SNPs with a meta-analysis P value#5610 28 for any trait or any stratum were deemed genomewide significant [32].
In the eGFRcrea analyses, after excluding loci that were previously reported [8,9], we selected for replication all SNPs with P value,5610 28 in any trait or stratum that were independent (defined by pairwise r 2 ,0.2), in the primary association analysis. This yielded five SNPs in five independent loci. The same criterion was applied to the CKD analysis, where no SNPs passed the selection threshold. Given the smaller number of cases with severe CKD resulting in less statistical power, a different selection strategy was adopted for the CKD45 analysis: selected for replication were SNPs with discovery P value#5610 26 , MAF$5%, and homogeneous effect size across studies (I 2 #25%). Four additional SNPs were thereby selected for replication from the CKD45 analysis.

Direction test to identify SNPs for replication
In addition to identifying SNPs for replication based on the genome-wide significance threshold from a fixed effect model meta-analysis, we performed a ''direction test'' to identify additional SNPs for which between-study heterogeneity in effect size might have obscured the overall association that was nevertheless highly consistent in the direction of allelic effects. Under the null hypothesis of no association, the a priori probability that a given effect allele of a SNP has either a positive or negative association with eGFRcrea is 0.5. Because the meta-analysis includes independent studies, the number of concordant effect directions follows a binomial distribution. Therefore, we tested whether the number of discovery cohorts with the same sign of association (i.e. direction of effect) was greater than expected by chance given the binomial distribution and a null expectation of equal numbers of associations with positive and negative sign. The test was only applied for eGFRcrea in the overall analysis. Multiple testing was controlled by applying the same P value  (Fisher's exact test). (Q-V) Embryos were injected with control, mpped2, or casp9 MO at the one-cell stage and subsequently injected with 70,000 MW fluorescent rhodamine dextran at 80 hpf. Dextran fluorescence was monitored over the next 48 hours. All dextran-injected embryos show equal loading into the cardiac sinus venosus at 2 hours post-injection (2 hpi/82 hpf; Q, S, U). Compared to control MO-injected embryos (R) and mpped2 knockdown embryos (T), knockdown of casp9 resulted in reduced dextran clearance at 48 hpi as shown by increased trunk fluorescence (V). (W) Casp9 knockdown results in increased susceptibility to edema formation both spontaneously (2dex) (P value = 0.0234, Fisher's exact test) and after dextran challenge (+dex) (P value,0.0001). Embryos injected with both MO and dextran did not survive to 6 dpf (N/A). (X) Edema develops earlier and with higher frequency in casp9 morphants following injection of the nephrotoxin gentamicin. doi:10.1371/journal.pgen.1002584.g002 threshold of 5610 28 as in the overall GWAS. Given that no SNP met this criterion, we selected for replication one novel SNP with the lowest P value of 4.0610 27 .

Genome-wide between-strata difference test to identify SNPs for replication
Based on the results of the stratified GWAS of eGFRcrea and CKD, for each SNP we tested the hypothesis whether the effect of a SNP on eGFRcrea or CKD was the same between strata (null hypothesis), i.e. diabetes versus non-diabetes subjects, hypertensive versus normotensive, younger versus older, females versus males. We used a two-sample test defined as Z = (b 1 2b 2 )/(SE(b 1 ) 2 +-SE(b 2 ) 2 ) 0.5 , with b 1 and b 2 indicating the effect estimates in the two strata and SE(b 1 ) and SE(b 2 ) their standard errors [33]. For large samples, the test statistic follows a standard normal distribution. SNPs were selected for replication if they had a between-stratum difference P value#5610 25 , an association P value#5610 25 in one of the two strata, and MAF$10%. Independent loci were defined using the same criteria as described above. Eleven further SNPs, one per locus, were selected for replication from the between-strata difference test.

Replication analysis
Replication was performed for a total of 21 SNPs including 5 from the overall and stratified eGFRcrea analyses, 1 from the direction test on eGFRcrea, 4 from the overall CKD45 analysis, and 11 from the between-strata difference test. Replication studies used the same phenotype definition, and had available genotypes from imputed in silico genome-wide SNP data or de novo genotyping. The same association analyses including the identical stratifications were performed as in discovery studies. Details can be found in the Tables S2, S5 and S6. Study-specific replication results for the selected SNPs were combined using the same metaanalysis approach and software as in the discovery stage. Onesided P values were derived with regard to the effect direction found in the discovery stage. Based on the P value distribution of all SNPs submitted for replication (the 10 from eGFRcrea and CKD45 and the 11 from the between strata difference test), we estimated the False Discovery Rate as a q-value using the QVALUE [34] package in R. SNPs with q-value,0.05 were called significantly replicating, thus specifying a list of associations expected to include not more than 5% false positives.
Finally, study-specific results from both the discovery and replication stage were combined in a joint inverse-variance weighted fixed-effect meta-analysis and the two-sided P values were compared to the genome-wide significance threshold of 5610 28 to test whether a SNP was genome-wide significant. Between-study heterogeneity of replicated SNPs was quantified by the I 2 statistic [35].

Replication genotyping
For de novo genotyping in 10,446 samples from KORA F3, KORA F4, SAPHIR and SAPALDIA, the MassARRAY system at the Helmholtz Zentrum (München, Germany) was used, using Assay Design v3.1.2 and the iPLEX chemistry (Sequenom, San Diego, USA). Assay design failed for rs1322199 and genotyping was not performed. Ten percent of the spectra were checked by two independent, trained persons, and 100% concordance between investigators was obtained. SNPs with a P value,0.001 when testing for Hardy-Weinberg equilibrium (rs10490130, rs10068737, rs11078903), SNPs with call rate ,90% (rs500456 in KORA F4 only) or monomorphic SNPs (rs2928148) were excluded from analyses without attempting further genotyping.
The call rates of rs4149333 and rs752805 were near 0% on the MassARRAY system. These SNPs were thus genotyped on a 7900HT Fast Real-Time PCR System (Applied Biosystems, Foster City, USA). Mean call rate across all studies and SNPs ranged from 96.8% (KORA F4) to 99% (SAPHIR). Duplicate genotyping was performed in at least 14% of the subjects in each study with a concordance of 95-100% (median 100%). In the Ogliastra Genetic Park Replication Study (n = 3000) de novo genotyping was conducted on a 7900HT Fast Real-Time PCR System (Applied Biosystems, Foster City, USA), with a mean call rate of 99.4% and 100% concordance of SNPs genotyped in duplicate.

Between-strata analyses for candidate SNPs in replication samples
Twenty-nine SNPs, including the 6 novel loci reported in the current manuscript along with 23 previously confirmed to be associated with renal function [9], were tested for differential effects between the strata. The same Z statistics as described for discovery (above) was used and the Bonferroni-adjusted significance level was set to 0.10/29 = 0.003.
SNP-by-age interaction, for the one SNP showing significantly different effects between strata of age, was tested in the ARIC study by fitting a linear model on log(eGFRcrea) adjusted for sex, recruitment site, the first and the seventh genetic principal components (only these two were associated with the outcome at P value,0.05). Both the interaction term and the terms for the main effects of age and the SNP were included in the model.

Power to assess between-strata effect difference
To assess genome-wide between-strata differences, with alpha = 5610 28 and power = 80%, the maximum detectable difference was 0.025 when comparing nonDM versus DM and 0.015 when comparing nonHTN versus HTN. Similarly, when testing for between-strata differences the 29 known and new loci (Bonferroni-corrected alpha = 0.003) in the combined sample (n = ,125,000 in nonDM and n = ,13,000 in DM) we had 80% power to detect differences as large as 0.035.

Look-up in African Americans (CARe)
For each of the 6 lead SNPs identified in our European ancestry samples, we extracted eGFR association statistics from a genome-wide study in the CARe African ancestry consortium [12]. We further investigated potential allelic heterogeneity across ethnicities by examining the 250 kb flanking region surrounding each lead SNP to determine whether other SNPs with stronger associations exist in each region. A SNP with the smallest association P value with MAF.0.03 was considered the top SNP in the African ancestry sample. We defined statistical significance of the identified lead SNP in African ancestry individuals based on a region-specific Bonferroni correction. The number of independent SNPs was determined based on the variance inflation factor (VIF) with a recursive calculation within a sliding window of 50 SNPs and pairwise r 2 of 0.2. These analyses were performed using PLINK.

Analyses of related phenotypes
For each replicating SNP, we obtained association results for urinary albumin-to-creatinine ratio and microalbuminuria from our previous genome-wide association analysis [20], and for blood pressure and myocardial infarction from genome-wide association analysis from the ICBP [21] and CARDIoGRAM [22] consortia, respectively.
A second expression analysis of 81 biopsies from normal kidney cortex samples was performed as described previously [51,52]. Genotyping was performed using Affymetrix 6.0 Genome-wide chip and called with GTC Software (Affymetrix). For eQTL analyses, expression probes (Affymetrix U133set) were linked to SNP probes with .90% call-rate using RefSeq annotation (Affymetrix build a30). P values for eQTLs were calculated using linear multivariable regression in both cohorts and then combined using Fisher's combined probability test (see also [52]). Pairwise LD was calculated using SNAP [53] on the CEU HapMap release 22.

Zebrafish functional experiments
Zebrafish were maintained according to established IACUC protocols. Briefly, we injected zebrafish embryos with newly designed (mpped2, ddx1) or previously validated (casp9 [54]) morpholino antisense oligonucleotides (MO, GeneTools, Philomath OR) at the one-cell stage at various doses. We fixed embryos in 4% PFA at the appropriate stages for in situ hybridization (http://zfin.org/ZFIN/Methods/ThisseProtocol.html). Different anatomic regions of the kidney were visualized using a panel of 4 established markers: pax2a (global kidney marker) [15], nephrin (podocyte marker) [16], slc20a1a (proximal tubule) [17], and slc12a3 (distal tubule marker) [17]. Abnormalities in gene expression were independently scored by two investigators. We compared the number of abnormal morphant embryos to control embryos, injected with a standard control MO designed by GeneTools, with the Fisher's exact test, at the Bonferronicorrected significance level of 0.0125, i.e.: 0.05/4 markers. We documented the development of gross edema at 4 and 6 days postfertilization in live embryos.
We performed dextran clearance experiments following previously described protocols [55]. Briefly, 80 hours after MO injection, we anesthetized embryos in 4 mg/ml Tricaine in embryo water (1:20 dilution), then positioned embryos on their back in a 1% agarose injection mold. We injected an equal volume of tetramethylrhodamine dextran (70,000 MW; Invitrogen) into the cardiac sinus venosus of each embryo. We then returned the embryos to fresh embryo water. Using fluorescence microscopy, we imaged the embryos at 2 hours post-injection (82 hpf) to demonstrate equal loading, then at 48 hours post-injection (128 hpf) to evaluate dextran clearance.
Embryos were injected with control, mpped2, or casp9 MOs at the one-cell stage. At 48 hpf, embryos were manually dechorionated, anesthetized in a 1:20 dilution of 4 mg/ml Tricaine in embryo water, and oriented on a 1% agarose injection mold. As previously described [56], embryos were injected with equal volumes of 10 mg/ml gentamicin (Sigma) in the cardiac sinus venosus, returned to fresh embryo water, and subsequently scored for edema (prevalence, time of onset) over the next 3 days. Figure S1 Flowchart of the project. (TIF) Figure S2 Genome-wide 2log 10 P values plot from stage 1 discovery meta-analysis. Plots show the discovery analysis of eGFRcrea in the overall group, with known loci [8,9] highlighted in orange and novel loci highlighted in blue (A), and in strata of the main CKD risk factors (B, C, D, and E), with complementary groups being contrasted each other. The dotted line indicates the genome-wide significance threshold at P value = 5610 28 . The unmarked locus is RNASEH2C on chromosome 11, colored in gray despite genome-wide significance. The P value for the current stage 1 discovery for rs4014195 was 2.7610 29 . This locus previously did not replicate [9]; when we additionally considered our prior non-overlapping in silico and de novo replication data, the current stage 2 P value was 0.8832, yielding a combined stage 1+stage 2 P value of 2.6610 27 . Therefore, we did not submit this SNP for further replication. (PDF) Figure S3 Quantile-quantile plots of observed versus expected 2log 10 P values from the discovery analysis of eGFRcrea overall (A) and by strata of the main CKD risk factors (B). The orange line and its 95% confidence interval (shaded area) represent the null hypothesis of no association. In panel (A), results are compared when considering all SNPs (black dots) and when removing SNPs from loci that were already reported in previous GWAS [8,9] (orange dots). The meta-analysis inflation factor l is reported along with the discovery sample size. Individual-study minimum, maximum and median ls are also reported for comparison. Genomic-control correction was applied twice: on individual study results, before the meta-analysis, and on the meta-analysis results. (PDF)  Figure S6 Results from discovery meta-analysis of eGFRcrea for the six new loci: overall sample and all strata are considered. Reported is the effect size on log(eGFRcrea) and its 95% confidence interval. The stratum where the SNP was discovered is marked with a triangle for discovery based on meta-analysis P value or with a circle for discovery based on direction test. (TIF)  Figure S9 Casp9 and mpped2 knockdown embryos are more susceptible to gentamicin-induced kidney injury. Compared to control embryos (A), casp9 and mpped2 knockdown embryos develop edema at 103 hpf (C, E), suggestive of a renal defect. When injected with gentamicin, a nephrotoxin that reproducibly induces edema in control embryos (B), mpped2 and casp9 knockdown embryos develop edema earlier, more frequently, and in a more severe fashion (D, F). Whereas control embryos primarily develop cardiac edema, mpped2 and casp9 knockdown embryos display cardiac (arrowhead), ocular (black arrow), and visceral (white arrow) edema, demonstrating that mpped2 and casp9 knockdown predisposes embryos to kidney injury. (G) Quantification of edema prevalence in control, mpped2, and casp9 knockdown embryos 2, 22, and 55 hours post-injection (hpi) of gentamicin. These numbers are presented graphically in Figure 2X. (TIF) Figure S10 Comparison of the effect size on eGFRcrea and on eGFRcys for the lead SNPs of known and new loci. Results are based on the largest sample size available for each locus, i.e. the combined discovery and replication sample for the novel loci (N = 130,600), the discovery sample only for the known loci (N = 74,354). Sign of effect estimates has been changed to reflect the effects of the eGFRcrea lowering alleles. Original beta coefficients and their standard errors for the two traits can be downloaded from the File S1. (TIF) Figure S11 Odds ratios (ORs) and 95% confidence intervals of CKD and CKD45 for the lead SNPs of all known and new loci, sorted by decreasing OR of CKD.

(TIF)
File S1 Effect size on eGFRcrea and on eGFRcys for the lead SNPs of known and new loci. (XLSX)       Table S16 Association between novel and known loci and log(eGFRcrea) in individuals without and with diabetes and test for difference between strata. (DOC) Table S17 Association between novel and known loci and log(eGFRcrea) in individuals without and with hypertension and test for difference between strata. (DOC) Table S18 Association between novel and known loci and log(eGFRcrea) in individuals younger and older than 65 years and test for difference between strata. (DOC) Table S19 Association between novel and known loci and log(eGFRcrea) in females and in males and test for difference between strata. (DOC)

Table S20
Effects of novel loci on the logarithm of urinary albumin-to-creatinine ratio (log(UACR)) in the overall sample and by diabetes and hypertension status. (DOC)