A Genome-Wide Search for Linkage of Estimated Glomerular Filtration Rate (eGFR) in the Family Investigation of Nephropathy and Diabetes (FIND)

Objective Estimated glomerular filtration rate (eGFR), a measure of kidney function, is heritable, suggesting that genes influence renal function. Genes that influence eGFR have been identified through genome-wide association studies. However, family-based linkage approaches may identify loci that explain a larger proportion of the heritability. This study used genome-wide linkage and association scans to identify quantitative trait loci (QTL) that influence eGFR. Methods Genome-wide linkage and sparse association scans of eGFR were performed in families ascertained by probands with advanced diabetic nephropathy (DN) from the multi-ethnic Family Investigation of Nephropathy and Diabetes (FIND) study. This study included 954 African Americans (AA), 781 American Indians (AI), 614 European Americans (EA) and 1,611 Mexican Americans (MA). A total of 3,960 FIND participants were genotyped for 6,000 single nucleotide polymorphisms (SNPs) using the Illumina Linkage IVb panel. GFR was estimated by the Modification of Diet in Renal Disease (MDRD) formula. Results The non-parametric linkage analysis, accounting for the effects of diabetes duration and BMI, identified the strongest evidence for linkage of eGFR on chromosome 20q11 (log of the odds [LOD] = 3.34; P = 4.4×10−5) in MA and chromosome 15q12 (LOD = 2.84; P = 1.5×10−4) in EA. In all subjects, the strongest linkage signal for eGFR was detected on chromosome 10p12 (P = 5.5×10−4) at 44 cM near marker rs1339048. A subsequent association scan in both ancestry-specific groups and the entire population identified several SNPs significantly associated with eGFR across the genome. Conclusion The present study describes the localization of QTL influencing eGFR on 20q11 in MA, 15q21 in EA and 10p12 in the combined ethnic groups participating in the FIND study. Identification of causal genes/variants influencing eGFR, within these linkage and association loci, will open new avenues for functional analyses and development of novel diagnostic markers for DN.


Introduction
Diabetes mellitus is responsible for approximately 50% of cases of incident end-stage renal disease (ESRD) in the United States and other Western societies, with projections of up to 70% of ESRD in 2015 [1]. Diabetic nephropathy (DN) is a serious complication of diabetes caused by hyperglycemia-induced renal injury, involving a complex interplay of metabolic and hemodynamic disturbances in genetically predisposed individuals. DN is typically characterized by persistent proteinuria and elevated blood pressure; however, progressive declines in estimated glomerular filtration rate (eGFR, an estimate of kidney function) are uniformly present and may occur in the absence of persistent proteinuria [2]. Individuals with DN have significantly increased cardiovascular morbidity and premature mortality. Among Pima Indians with type 2 diabetes, only those with overt DN had mortality rates higher than among nondiabetic persons [3]. Rates of decline in eGFR were associated with albuminuria in type 2 diabetes [4] and assessments of eGFR facilitate the diagnosis, evaluation and management of patients with chronic kidney disease. Therefore, identifying the inherited and environmental causes of reduced eGFR would help target novel treatment strategies to prevent progression of DN to ESRD and reduce associated cardiovascular complications.
In an attempt to identify and characterize susceptibility genes influencing kidney disease in diabetes, we chose the family-based genome-wide linkage scan approach that can identify genetic regions where there are multiple susceptibility variants or other complex mechanisms that may in aggregate explain a larger proportion of the heritability than the single polymorphisms typically identified in GWAS. A genome-wide linkage screen was performed for eGFR based on 6,000 single nucleotide polymorphisms (SNPs) from Hispanic American (HA), African American (AA), European American (EA), and American Indian (AI) participants in the Family Investigation of Nephropathy and Diabetes (FIND). The FIND study was established to provide genome-wide coverage for localization of genes with pathogenically significant effects on risk of progressive DN and related traits, such as eGFR.

Study Participants
The FIND study protocol and patient recruitment procedures have been reported [25]. Briefly, families of self-reported AA, EA, AI and MA ethnicity were recruited from eight participating investigation centers. Families were ascertained based on a proband with advanced diabetic nephropathy (DN) or DNattributed end-stage renal disease (ESRD), who had at least one additional diabetic sibling with or without DN. A variety of metabolic, hemodynamic, anthropometric, and demographic variables were collected. Diabetes was clinically diagnosed based on treatment regimen (insulin or oral hypoglycemic agents); the remainder of study participants were screened using hemoglobin A1C levels or fasting plasma glucose concentrations. Details of the proband and sibling selection criteria have been described [14].  receiving dialysis treatments or kidney transplants, eGFR was imputed at 5.0 ml/min/1.73 m 2 because (1) eGFR is meaningless with respect to the participant's true kidney function under these circumstances; and (2) imputing at zero, an extreme value, would give the data from ESRD cases undue influence relative to those of the non-ESRD cases. A total of 3960 subjects, comprising 3547 sib pairs, were included in the analysis (Table 1).

Estimation of GFR
Genotyping DNA was isolated from lymphoblastoid cell lines or leukocyte buffy coats [25]. The Illumina SNP-based Linkage Panel IVb was employed for both linkage and association analysis as described previously [26]. This panel consists of 6,008 diallelic SNP markers distributed evenly across the genome. The average and median intervals between markers are 482 kb (0.64 cM) and 298 kb (0.35 cM), respectively. The largest interval between successfully

Statistical methods
Genetic analyses were performed using the S.A.G.E. (Statistical Analysis for Genetic Epidemiology) software package, version 5.3 (http://darwin.cwru.edu/sage/). Allele frequencies were estimated separately in the four ethnic groups using the maximum likelihood method implemented in the program FREQ. Mendelian inconsistencies were identified with the MARKERINFO program and inconsistent genotypes were coded as missing. Errors in relationship specification were identified with the program RELTEST. When necessary, a second relationship testing program, RELPAIR version 2.0.1, was enlisted to resolve potential errors involving complex relationships. Multipoint identity by descent (IBD) allele sharing probabilities were estimated by the method of maximum likelihood, using all available information in the pedigree as implemented in the program GENIBD. Multipoint IBD-sharing estimates are robust to misspecification of population allele frequencies, as may occur with admixed samples, because most of the parental information is inferred when the available information is high [27]. The Shannon information, as calculated by Merlin [28], available from the Illumina IV SNP panel was never less than 0.7, and seldom less than 0.8, except at the telomeric regions (data not shown). Using the multipoint IBD sharing estimates, a genome-wide linkage scan for quantitative trait loci potentially influencing eGFR was performed by the Haseman-Elston regression approach implemented within the program SIBPAL, using the W4 weighting option to maximize power. We converted the p values reported by SIBPAL to LOD scores using the one-sided chi-squared distribution with one degree of freedom (i.e., a 50:50 mixture of distributions with 0 and 1 degrees of freedom), appropriate for a one-sided test. In principle, the sib pairs who are identical by descent (IBD) at a marker locus will be phenotypically similar for traits influenced by a nearby linked gene. Evidence for linkage of eGFR was assessed with and without incorporating covariate effects of diabetes duration and body mass index (BMI), entered in the regression model as the sibpair sum. Non-parametric multipoint linkage analysis was carried out separately in each ethnic group, and P values were combined across ethnicities according to Fisher's method [29].
Empirical P values were obtained for the major linkage peaks using the ''simulation'' option in SIBPAL, which performs a permutation test. Association analysis was conducted as described previously [26] using the linear mixed model approach implemented in the S.A.G.E. program ASSOC. Results were combined across ethnic groups using Fisher's method [26,29]. The SNPs used in this analysis have been previously reported [26]. To assess the sensitivity of the association analysis to genetic admixture, the linear mixed model was fitted with and without adjustment for the first two principal components from a principal components analysis using 5,547 SNPs from the Illumina IV panel with minor allele frequencies of at least 0.05 in the combined sample. Principal components were obtained via the smartpca program in EIGENSOFT [30].

Results
Several quality control measures were implemented to determine the final set of markers for the linkage analysis. Briefly, SNPs were required to have median GenCall scores (a measure of how close a genotype is to the center of the cluster of other samples assigned to the same genotypes) $0.5, MAF (specific to ethnic group) $0.05, and p value for deviation from Hardy-Weinberg proportions .0.001. Since, LD between neighboring SNPs may create bias in estimates of IBD sharing among relatives, markers were screened such that pairwise | D9 | was less than 0.3. After quality control, a final marker set of SNPs qualifying for further genetic analysis was identified as described previously [24]. Table 1 lists the ethnicities of the 3,960 subjects comprising 3,547 sib pairs and 442 half-sib pairs from four ethnic groups in whom eGFR and genotypic data were available. Of these, 40.7%, 24.1%, 19.7%, and 15.5% were MA, AA, AI, and EA, respectively. Table 2 displays the clinical characteristics of genotyped individuals from each ethnic group.

Genome-wide linkage scans for eGFR
Adjusting for the covariate effects of diabetes duration and BMI, the genome-wide linkage scan in population-combined data identified the strongest evidence for linkage of eGFR on chromosome 10p12.31 (P = 5.5610 24 ) at 44 cM near rs1339048 (Figures 1, 2a and Table 3). Evidence for linkage was primarily contributed by the AA and EA groups, with a smaller contribution from MA. A second suggestive linkage signal across populations was observed on chromosome 20q11 at 56 cM (P = 1.9610 23 ), flanked by SNPs rs221972 and rs735264 ( Figure 1; Table 3).
In population-specific analysis, the strongest evidence for linkage of eGFR localized to a genetic region on 20q11.22 at 56.8 cM near rs736264 (LOD = 3.34; P = 4.4610 25 ) in MA participants (Figures 1, 2b). Other genetic regions with suggestive evidence for eGFR linkage in MA included 1q43 and 7q11.22 ( Figure 1; Table 3). Accounting for the covariate effects of diabetes duration and BMI, the Haseman-Elston linkage scan in EA identified the strongest linkage signal for eGFR on chromosomes 15q12 near rs2928714 (LOD = 2.84; P = 1.5610 24 ) and suggestive evidence for linkage was found near rs913375 on 10p14 ( Figure 1; Table 3).

Genome-wide association scans for eGFR
Following the linkage scans, coarse association analyses between eGFR and SNPs that passed quality control was performed, using the approach implemented in ASSOC. Table 4 shows the SNPs associated with eGFR with p,0.001 in at least one ethnic group after adjusting for the covariate effects of diabetes duration and BMI in population-specific or population-combined analyses. In the population-combined data, the most significant association with eGFR was found for rs486567 on chromosome 1q21.1 (P = 2.9610 24 ), primarily contributed by EA (Table 4). We also observed a significant association between rs580839 residing on 15q14 and eGFR (P = 4.2610 24 ) in the combined data, that was primarily driven by AI (Table 4). Our association analysis in the population combined data also exhibited a significant association between rs856830 residing on 6q12 and eGFR (P = 4.3610 24 ), which was driven by AA, AI and MA. Another SNP (rs1345561) associated with eGFR (P = 6.4610 24 ) in the combined data was located approximately 16 Mb from the eGFR linkage marker rs1339048 on 10p12 (Table 4; Figure 2). This association in the combined data was primarily driven by AA (Table 4). Several SNPs were significantly associated with eGFR in populationspecific analyses, the most significant were rs1703711 residing on chromosome 10q26.3 (P = 2.96610 24 ), rs580839 on 15q14 (P = 4.81610 25 ), rs666478 on 9p21 (P = 1.46610 24 ) and rs2928972 on 18q21.2 (P = 1.30610 24 ) in AA, AI, EA, and MA, respectively (Table 4). These results were robust to ethnic admixture: value changed by more than 2-fold, and the vast majority of P values changed by less than 1.1-fold, with adjustment for two principal components from a genomewide principal components analysis (data not shown).

Discussion
Estimated GFR provides an accurate index of the degree of renal dysfunction and plays a prominent role in the staging of chronic kidney disease [31]. Though variation in eGFR among individuals is partly explained by environmental influences, heritability estimates of eGFR in families suggest that genes play a major role in determining kidney function [32]. Despite high heritability estimates, the identification of genes influencing eGFR and its variability remains challenging. In attempts to identify quantitative trait loci influencing eGFR, the genome-wide linkage approach has been utilized in several genetic epidemiological studies [32]. Genome wide linkage studies have identified several QTL influencing eGFR, but the subsequent susceptibility gene mapping efforts have been unsuccessful and remain in progress. In an effort to identify and characterize the genes influencing kidney function, we performed a SNP-based genome-wide linkage scan followed by association analysis in the multi-ethnic FIND samples.
The most significant linkage to eGFR in ethnicity-combined data was found near rs1339048 on 10p12.31 (P = 5.5610 24 ). It is interesting to note that the linkage of eGFR on 10p12 was contributed by three (AA, EA, and MA) of the four ethnic groups participating in this study, indicating that the 10p12 region may potentially harbor genes influencing GFR in the FIND participants (Figures 1 and 2a; Table 3). The localization of an eGFR linkage signal in the 10p12 region appears to be novel as this region was not identified in the previous genome-wide microsatellite scan studies for eGFR, including in the FIND [14]. However, several genome-wide microsatellite linkage scans have linked this 10p12 region with obesity and related traits. Furthermore, genes located near the eGFR linkage markers on 10p12, including calcium channel, voltage-dependent, beta 2 subunit (CACNB2), ARL5B ADP-ribosylation factor-like 5B (ARL5), and nebulette (NEBL), were previously associated with eGFR-related traits such as blood pressure and hypertension [33], sudden cardiac arrest and diabetic retinopathy [34]. By binding to actin and thin filaments and Z-line associated proteins in striated cardiac muscle, nubulette regulates cardiac myofibril assembly. CACNB2 is a subunit of a voltage-dependent calcium channel protein and mutations in CACNB2 were also associated with sudden cardiac arrest.
In population-specific linkage analyses, suggestive evidence for linkage of eGFR was seen at rs736264 on 20q11.22 in MA (Fig. 2b) and at rs2928714 on 15q12 in EA. Although our results failed to replicate genetic regions previously linked to eGFR and related traits in the FIND and non-FIND studies and appeared to detect novel loci influencing eGFR, the 15p12 region has been previously associated with urine albumin:creatinine ratio (ACR) in MA in the San Antonio Family Diabetes Study [35]. We next performed a sparse association scan to identify whether the SNPs used in the linkage scan are associated with eGFR and potentially responsible for the observed linkage signals. While several SNPs across the genome were suggestively associated with eGFR, none of them were located within the eGFR linkage intervals identified in population-specific or the combined data set. In the population-combined association analysis, the most significant association was observed between eGFR and rs486567, rs580839, and rs1345561 with primary contributors EA, AI, and AA, respectively ( Table 4). The rs1345561 SNP is located ,16 Mb from the linkage of eGFR marker rs1339048 on 10p12 that was primarily driven by AA (Figure 2a; Table 4). The most significant population-specific associations with eGFR were found for rs1703711, rs580839, rs666478, and rs2928972 in AA, AI, EA, and MA, respectively (Table 4).
Of the SNPs most strongly associated with eGFR in populationspecific analyses, rs666478 is located within an intronic region of the tyrosine kinase receptor (TEK) gene on 9p21. TEK is a cellsurface receptor for angiopoietin (ANGPT) 1, 2, and 4. Through TEK-dependent signaling, ANGPT regulates endothelial cell survival, proliferation, migration, adhesion and cell spreading, and controls vascular permeability and quiescence. Mutations in TEK were previously associated with autosomal dominant forms of venous malformations [36]. Although the functional relevance of rs666478 associating with eGFR needs to be explored, genetic variants located about 5 Mb upstream of TEK on cyclindependent kinase inhibitor (CDKN) 2A, 2B genes have been previously associated with type 2 diabetes mellitus [37] and coronary heart disease [38].
Population-specific association analysis identified several SNPs (rs1686430 and rs1734449) that are associated with GFR only in the AA group. They were located 100 kb apart within an intronic region of the protein disulfide isomerase family A, member 6 (PDIA6) gene on 2p25. PDIA6 belongs to a thioredoxin superfamily oxidoreductase from the endoplasmic reticulum that acts as a redox signaling adaptor protein, adjusting reactive oxygen species intermediates to specific signals and redox signals to cell homeostasis [39]. It also catalyzes the formation and isomerization Table 4. Most significantly associated SNPs with eGFR in population-specific and in population-combined analysis. of disulfide bonds thereby facilitating protein folding. Although the functional mechanism by which these two variants residing within the PDIA6 and regulating renal function needs to be examined, genetic variants located about 7 Mb upstream of PDIA6 on the SRY (sex determining region Y)-box 11 (SOX11) gene were previously associated with T2DM and CKD in Europeans [17]. Utilizing a relatively dense set of 6,000 SNPs as a linkage panel as opposed to the conventional use of a set of about 400 microsatellite markers, the present study reveals quantitative trait loci influencing eGFR to 20q11 in MA, 15q21 in EA and 10p12 in the combined ethnic groups from the FIND study. Several suggestive linkage peaks were also identified in population-specific and population-combined linkage scans in this multi-ethnic cohort. In contrast to GWAS that requires a very stringent p-values (e.g., P,5610 28 ) for statistical significance on account of the large number of statistical tests involved, linkage studies with less stringent P values are powerful because the number of effectively independent comparisons is much smaller. Conventionally p,0.0001 (LOD.3) has been considered significant linkage, while p,0.001 (LOD.2) has been considered suggestive [40]. Furthermore, the linkage approach can identify potential genetic regions harboring multiple susceptibility variants or other complex mechanisms that may in aggregate explain a larger proportion of the heritability than the single polymorphisms typically identified in GWAS.
As expected for a complex trait, multiple linkage peaks for eGFR were observed. Although the functional relevance of the linkage findings remains to be established and replicated, genetic regions suggestively linked with eGFR in population-specific and population-combined studies suggest that multiple loci are involved in regulating eGFR in diabetes. Disappointingly, there was no significant overlap with loci linked with renal functionrelated traits in other studies [5][6][7][8][9][10][11][12][13], as well as in our previous FIND microsatellite marker linkage study that was carried out in a subset of the same study populations [14]. Absence of concordance in localizing QTL influencing eGFR between the present study and our previous study [14] using the same FIND population data set could be due to the differences in the sample size, set of linkage markers and covariates used. In contrast to our previous linkage scan for eGFR [14] that used the genotypic data of about 400 microsatellite markers and eGFR data available on 941 individuals and 882 sib pairs, the present study used genotypic data of about 6000 SNPs and eGFR data available on 3960 individuals and 3547 sib pairs. In addition, the previous study accounted for the diabetes duration and angiotensin converting enzyme inhibitor/ angiotensin receptor blocker use as covariates in the linkage analysis [14]. The present study used the effects of BMI, and diabetes duration in the eGFR linkage scan. A limitation of the present analysis of eGFR as a continuous variable is that many of the determinants of high eGFR, such as uncontrolled hyperglycemia before diabetes treatment is optimized, may not be under genetic control or may be influenced by different genetic factors than those contributing to declining eGFR. This might, in part, account for differences in the present linkage results with those from analysis of diabetic nephropathy as a discrete trait [30]. The discrepancies between the present study and the non-FIND study results [5][6][7][8][9][10][11][12][13] may be related, in part, to heterogeneous study populations (some with and some lacking diabetes), pedigree structures, ascertainment criteria, treatment effects, definitions of kidney function, and diabetes duration. In contrast to existing publications, the FIND is a multi-ethnic collection of families ascertained based on a proband with advanced DN or ESRD with at least one other diabetic sibling with or without nephropathy. Furthermore, differences in allele frequencies and LD structure of the sets of SNPs contributing to linkage and association might have contributed to the lack of consistency across ethnic groups.
While this large study in a severely affected study sample had several advantages, potential limitations are that eGFR was estimated using a single random blood sample for serum creatinine concentration and employed the modified MDRD equation. This equation performs best for eGFR ,60 ml/min per 1.73 m 2 ; whereas the CKD-EPI equation appears more accurate for those with eGFR values between 60 and 90 ml/min per 1.73 m 2 . Although all analyses adjusted for diabetes duration and BMI, other potentially relevant confounding variables such as degree of blood pressure control and cardiovascular disease risk factors were unavailable.
In conclusion, several loci influencing eGFR were identified in the multi-ethnic FIND cohort. Linkage and association results emanating from this multi-ethnic study represent a first step towards improving our knowledge of the mechanisms underlying genetic susceptibility to renal function in diabetes. Furthermore, the results of linkage and association analyses reported in this study will help interpret future genome-wide association/whole-genome sequencing data that should accelerate the identification of causal genes for variation in kidney function in patients with diabetes. Defining the genetic architecture responsible for eGFR loss in individuals of different ethnicities may help develop ethnicityspecific intervention programs and services specifically targeted toward this devastating complication of diabetes. With existing high-throughput genome technologies and novel statistical methodologies, we envision promising new therapies to prevent loss of eGFR, a strong and independent risk factor for cardiovascular morbidity and mortality in patients with diabetes.