Replication of TCF4 through Association and Linkage Studies in Late-Onset Fuchs Endothelial Corneal Dystrophy

Fuchs endothelial corneal dystrophy (FECD) is a common, late-onset disorder of the corneal endothelium. Although progress has been made in understanding the genetic basis of FECD by studying large families in which the phenotype is transmitted in an autosomal dominant fashion, a recently reported genome-wide association study identified common alleles at a locus on chromosome 18 near TCF4 which confer susceptibility to FECD. Here, we report the findings of our independent validation study for TCF4 using the largest FECD dataset to date (450 FECD cases and 340 normal controls). Logistic regression with sex as a covariate was performed for three genetic models: dominant (DOM), additive (ADD), and recessive (REC). We found significant association with rs613872, the target marker reported by Baratz et al.(2010), for all three genetic models (DOM: P = 9.33×10−35; ADD: P = 7.48×10−30; REC: P = 5.27×10−6). To strengthen the association study, we also conducted a genome-wide linkage scan on 64 multiplex families, composed primarily of affected sibling pairs (ASPs), using both parametric and non-parametric two-point and multipoint analyses. The most significant linkage region localizes to chromosome 18 from 69.94cM to 85.29cM, with a peak multipoint HLOD = 2.5 at rs1145315 (75.58cM) under the DOM model, mapping 1.5 Mb proximal to rs613872. In summary, our study presents evidence to support the role of the intronic TCF4 single nucleotide polymorphism rs613872 in late-onset FECD through both association and linkage studies.


Introduction
Fuchs endothelial corneal dystrophy (FECD), first described by Ernst Fuchs [1], is a common progressive disorder of the corneal endothelium that typically becomes symptomatic during the fifth or sixth decade of life [2,3], although corneal endothelial abnormalities can be clinically detected several years before patients become symptomatic [4]. The disorder affects as much as 4% of the United States population over the age of 40 years [5][6][7] and occurs predominantly in women, who comprise approximately 75% of cases [8]. This debilitating disorder leads to corneal edema with a loss of corneal clarity, painful episodes of recurrent corneal erosions, severe impairment of visual acuity, and sometimes even blindness.
The first genome-wide association study (GWAS) for FECD was reported recently, using an initial dataset of 130 unrelated cases and 260 unaffected controls genotyped with the Illumina 370K BeadChip panel [29]. After genotyping their most significant findings in a replication dataset containing 150 cases and 150 controls, Baratz and colleagues concluded that a single nucleotide polymorphism (SNP) on chromosome 18q21, rs613872, in an intron of a gene encoding transcription factor 4 (TCF4, MIM: 602272) showed genome-wide significant association with FECD (P = 1.10x10 212 for the initial GWAS dataset; P = 1.79x10 213 for the replication dataset; P = 2.34x10 226 for the combined dataset).
Here we present additional evidence for the presence of a FECD locus on chromosome 18. We report the results from an expansion of our previous linkage study, as well as an association study analyzing the association of rs613872 with FECD in a dataset containing 450 unrelated FECD cases and 340 unaffected controls, the largest sample of FECD patients interrogated to date.

Ethics Statement
Our study was performed in accordance with the Declaration of Helsinki and the institutional review boards at Duke University Medical Center and Johns Hopkins University (JHU) specifically approved this study. Both FECD study sites, the Cornea Clinics at Duke University Eye Center (DUEC) and the Wilmer Eye Institute at JHU, obtained the appropriate institutional review board approval for research on human subjects prior to initiating subject recruitment, and all individuals gave written, informed consent. The control subjects from the Duke glaucoma genetics study were also recruited under the approval of the Duke Institutional Review Board, and consented to allow their biological samples to be used by other research studies.
Two Caucasian datasets were used for association and linkage studies. The association dataset consisted of 450 unrelated FECD cases, each with a Krachmer grade $2, and 340 unaffected controls with age at enrollment of $45 years old. The 450 unrelated cases used here were either probands from the family dataset (see below) or singleton cases, all of whom were ascertained through the cornea clinic at DUEC. The unrelated controls were from 26 unaffected married-in spouses from the DUEC family dataset and 314 control subjects from the glaucoma genetic study at DUEC [30]. Glaucoma control subjects underwent detailed eye examination and had no signs of corneal abnormalities at the time of subject enrollment.
The linkage dataset consisted of 64 multiplex families (at least two affected individuals per family) containing a total of 215 subjects (69.8% females). Families were ascertained independently from the Cornea Clinics at DUEC or at the Wilmer Eye Institute at JHU. Demographic data, including age at enrollment and gender, for the individuals analyzed in the association and linkage datasets are summarized in Table 1.

Marker Selection and Genotyping
Association analysis focused on the SNP rs613872 in TFC4, the most significantly associated SNP in the GWAS performed by Baratz and colleagues [29]. We also genotyped rs10490775 in PTPRG on chromosome 3, which was significant but not at a genome-wide level in the Baratz GWAS, to examine possible replication in our dataset. The two SNPs were genotyped with predesigned TaqManH allelic discrimination assays (Life Technologies, formerly Applied Biosystems, Inc., Foster City, CA), which use unlabeled polymerase chain reaction (PCR) primers and two allele-specific probes containing the TaqManH minor groove binding group (MGB) probe and either a FAM TM and VICH dye label in a 384-well plate format. PCR reactions were performed with TaqmanH Universal PCR Master Mix on the GeneAmpH PCR System 9700 (Applied Biosystems, Inc.), and the ABI 7900HT Fast PCR System (Applied Biosystems, Inc.) was used for reading allelic discrimination calls. Quality control (QC) samples, including two CEPH (Centre d'Etude du Polymorphisme Humain) pedigree individuals, one no-template sample, and two duplicate samples (one male, one female), were contained within each quadrant of each 384-well plate. These QC samples were used to provide duplicate samples within one quadrant, across quadrants within one plate, and across plates. Results of the CEPH and QC sample genotyping were compared to identify possible sample plating errors and genotype-calling inconsisten- cies, and none were observed. A threshold of 95% genotyping efficiency was required for submission to the analysis database. Two Illumina (San Diego, CA) SNP linkage panels were used for genome-wide linkage marker genotyping. Seventy-two of the 215 linkage samples were included in our previous linkage study [28], and were genotyped using the Illumina GoldenGate linkage panel IVB, which contains 5,858 SNPs. The remaining 143 samples were genotyped using the Illumina Infinium HumanLinkage-12 platform containing 6,090 SNPs, of which 4,811 overlap with the GoldenGate IVB platform. All genomic DNA samples were prepared at a concentration of 75 ng/mL, in a total volume of 10 mL, and were genotyped in four multiplexed assays according to the manufacturer's protocols. All DNA samples were extracted from blood using the PureGene system (Gentra Systems, Minneapolis, MN).

Data cleaning for the association dataset
Reproducibility of genotypes was examined for the replicate QC samples located in each 384-well Taqman plate to assess genotyping quality. Reproducibility rates of 100% were observed. The two SNP markers, rs613872 and rs10490775, were examined for possible deviations from Hardy-Weinberg equilibrium (HWE).

Data cleaning for the linkage dataset
We instituted several QC measures to determine the final set of markers for analysis. We first used the Illumina BeadStudio program to check genotype reproducibility rates using the replicated samples, family relationships, gender status of each DNA sample using Y chromosome markers, sample genotype call rates .99%, and the GenCall score which is a quality measure for each genotype used in the Illumina genotyping system. In particular, the GenCall score measures how close a genotype is to the center of the cluster of other samples assigned the same genotype, as compared with the centers of the clusters of the other genotypes, and ranges from 0 to 1. The higher the GenCall score the more reliable the genotype. A set of clean markers was chosen based on GenCall scores $0.15, as well as meeting the other QC measures described previously, for further data cleaning.
For additional data quality assurance, 1000 markers with approximately equal inter-marker distances across the genome were selected to examine family relationships using the RELPAIR [31] and PREST [32] programs. After correcting for family relationship errors, Mendelian inheritance inconsistencies were checked using the PEDCHECK program [33]. Missing genotypes were assigned to the family for those markers with Mendelian inheritance inconsistencies.
All SNPs were tested for deviations from HWE. We randomly chose one affected individual per family to include in an unrelated affected dataset, and one unaffected individual per family was selected to include in an unrelated, unaffected dataset. An exact test implemented in the Genetic Data Analysis (GDA) program was used to test HWE, by performing 3,200 permutations to estimate the empirical p-value for each marker [34]. We applied a Bonferroni correction (significance threshold = 0.05/total number of markers) to determine significant deviation from HWE. Markers that were out of HWE in the unaffected dataset were excluded in the linkage analysis.

Association Analysis
Genotypes of rs613872 (T and G nucleotide polymorphism) were tested using three genetic models: dominant (DOM), additive (ADD), and recessive (REC). The genotypes of (GG, TG, and TT) were coded as (1, 1, 0) for DOM; (2, 1, 0) for ADD; and (1, 0, 0) for REC where allele G is the rare allele. A logistic regression association analysis in PLINK [35] was applied with gender as a covariate. We also stratified our association dataset by gender and tested for association of this marker within the males and females. A nominal significance level of 0.05 was applied to declare significant association. The relationship between disease severity (grade level) and genotypes of rs613872 was examined using Fisher Exact test, because some genotype-phenotype has small number of samples, particularly for grade 4.

Linkage analysis
Two-point parametric linkage analyses using the FASTLINK [36] and HOMOG [37] programs (http://linkage.rockefeller. edu/, provided in the public domain by Rockefeller University, New York, NY) were performed to generate heterogeneity logarithm of the odds (HLOD) scores. Since the mode of inheritance is unknown for FECD, both DOM and REC genetic models were assumed in the parametric analysis with disease allele frequencies of 0.001 and 0.01 respectively. The penetrance (chance of an individual being affected if carrying the disease susceptibility genotype) was based on the assumed genetic models, and was consistent with the goal of conducting an affecteds-only parametric linkage analysis.
MERLIN [38] was used to perform both multipoint parametric and non-parametric linkage analyses (NPL). The same DOM and REC models described above were assumed in the MERLIN parametric analysis. It is known that linkage disequilibrium (LD) may inflate the type I error of a multipoint linkage analysis, particularly when parental genotype data are not available-as is the case for most of our families [39]. We set a threshold of the squared Pearson correlation coefficient (r 2 ) between markers at 0.16 in MERLIN to ensure independence between markers for multipoint linkage analysis.

Association results
We tested rs613872 in our case-control association dataset of 450 cases and 340 controls; Table 1 summarizes the demographic data of our subjects. We did not observe deviation from HWE in the control group for rs613872 (P = 0.60), indicating high quality genotyping results. However, we found strong deviation from HWE for this marker in the FECD case group (P,0.001). This may be the source of the highly significant association signal we observed in our dataset for all three genetic models (DOM:  [29]. Table 2 shows the comparison of genotype frequencies between our FECD cases and controls to the genotype frequencies observed in the FECD GWAS cases and controls studied by Baratz and colleagues. Clearly, the FECD case group has excessive heterozygous genotypes for rs613872 compared to the controls in both datasets, which is consistent to the observation of deviation from HWE in FECD cases. As for rs10490775 in PTPRG on chromosome 3, we did not detect significant association with FECD (DOM P = 0.98; ADD P = 0.96; REC P = 0.92).
When we examined gender-specific groups, we observed the same elevation of the heterozygous (GT) genotype in cases relative to controls in both genders ( Table 2). The SNP rs613872 is still significant in both genders (male P = 3610 226 ; female P = 1.2610 216 ). However, this result may not reflect the true level of association due to the small sample size, particularly for males. Furthermore, the results between genders should not be compared due to the unbalanced sample sizes. Among our 450 cases, we have 235 cases with grading information. We did not find the correlation between disease severity and genotypes of rs613872 (P = 0.13). However, more samples may be needed to make a solid conclusion.

Linkage scan samples and markers
After implementing QC measures for the linkage dataset we removed five individuals from three families due to relationship errors that could not be resolved and five individuals due to low sample call rates (,95%). In total, we analyzed 215 individuals in 64 families ( Table 1), including 41 ASPs from 64 families, making this the largest dataset used in a genomewide linkage scan for FECD to date. We noted a high proportion of females in both our linkage and association datasets, likely because of the known gender imbalance among FECD patients [8].
A total of 7,291 markers were genotyped between the two SNP genotyping platforms, with 4,551 markers overlapping between the two chips. Of those, 4,533 markers met our QC criteria for linkage analyses, which included having a GenCall score $0.15, a genotype call frequency $95%, and no significant deviation from HWE. Eleven markers produced Mendelian inheritance inconsistencies within three families and were assigned missing genotypes within those families. In the multipoint analysis using MERLIN, 3,927 markers met our LD criteria and were analyzed.

Linkage regions
Eighteen SNPs on ten chromosomes gave two-point HLOD scores $2 in either the DOM or REC models ( Table 3). In particular, three markers produced HLOD scores above 3: rs1889974 (chromosome 10, 119.34 cM, HLOD = 3.37) and Table 2. Comparison of genotype frequencies between Duke dataset and dataset used in Baratz et al. (2010), and between male and female in Duke dataset for rs613872.   Table 3, only rs998876 on chromosome 5 (194.22 cM) overlaps with a previously identified FECD linkage locus, FCD3 [27]. Table 4 summarizes the eight linkage regions that produced peak multipoint LOD scores $1.5 in any of the three multipoint linkage analyses, the nonparametric (NPL) and parametric (DOM and REC) analyses. The boundaries of a linkage region outlined in Table 4 are based on the chromosomal interval that covers one LOD score below the peak marker's LOD score. Two separate regions of linkage were found on chromosomes 16 ( The most promising multipoint linkage region was on chromosome 18 from 69.94 cM to 85.29 cM with a peak multipoint HLOD = 2.5 at rs1145315 (75.58 cM) under the DOM model. The nonparametric analysis showed consistent results for this region with a peak multipoint HLOD = 1.48 at the same peak marker ( Table 4). In addition, rs4941043 within this same interval produced a HLOD score = 1.99 under the two-point DOM model ( Table 3). This multipoint linkage peak on chromosome 18 overlaps with the FCD2 peak reported by Sundin et al. [26] and with the most significant GWAS hit, rs613872 in TCF4, reported by Baratz et al. [29] (Figure 1). The SNP rs613872 in TCF4 is only 1.4 Mb away from our peak DOM multipoint marker, rs1145315.

Discussion
To date, progress towards identifying the genetic underpinnings of FECD has been limited to a handful of genes, including COL8A2, SLC4A11, and TCF8 [3,13,20,21,23] that have arisen from candidate gene or genome-wide linkage studies. Linkage scans have additionally identified four genetic loci (FCD1 to FCD4) that appear to influence familial FECD [23,[25][26][27]. In spite of these insights, knowledge of the genetic basis of non-familial FECD has remained limited. Recently, Baratz and colleagues identified significant statistical association between a SNP in TCF4 and FECD in the first genome-wide association study carried out for FECD [29]. Their study identified an intronic SNP in TCF4, rs613872, with highly significant allelic and genotypic p-values of 2.34610 226 and 1.29610 218 , respectively, in their combined analysis of their small discovery and replication datasets.
Here, we present a large case-control association dataset that replicates these recent genome-wide association data [29], revealing a highly significant association between rs613872 and FECD with a pvalue of 9.33610 235 (DOM). Additionally, we show that the best  The consistent findings in both our linkage and association studies for an association of rs613872 with FECD together with the findings of Baratz et al. suggest that the association is probably not spurious but, rather, is due to certain recombination events in this region that may increase susceptibility to FECD. Additionally, the excess of heterozygous genotypes at rs613872 in both our cases and those of Baratz et al. warrants further investigation. Although we replicated the association of rs613872 with FECD in our dataset, we failed to replicate the association with rs10490775 in PTPRG. Since we also did not detect evidence of linkage to chromosome 3, we hypothesize that either this locus does not influence FECD risk or its effect on FECD risk is small. It is of great interest to know if rs613872 genotypes can predict disease severity. Our 235 cases with grading information did not show significant correlation to the genotypes of rs613872, which is consistent to the finding from Riazuddin et al. [40], where 180 cases were used. Although both studies obtained the same observation, the limited sample sizes are the drawback for making a solid conclusion. Many of our early samples were given disease status diagnosis rather than detailed grading information. As our recruitment of FECD cases moving forward, we will be able to increase the number of cases with grading. The disease severity vs. genotypes of any target marker will be able to be evaluated properly.
According to data from the Human Genome Diversity Project, the minor (risk) allele, G, of rs613872 is rare to nonexistent in populations from Africa, Eastern Asia, and Central and South America, yet is more frequent in European, Middle Eastern, and Southern Asian populations (Figure 2, [41]). Given that the FECD cases analyzed by our group and by Baratz et al. are of European descent, it will be important to examine whether the association between rs613872 and FECD risk replicates in other ethnic and racial populations. It is possible that alternative TCF4 risk alleles may be associated in other ethnic and racial populations, which would argue for this locus being functionally important in FECD pathogenesis. Alternatively, if the true FECD disease variant is in LD with rs613872, a lack of association in other ethnic and racial populations may help explain why the disorder has a lower prevalence in other populations like African, East Asian, and South American populations.
GWAS studies commonly raise the question of how to interpret the biological significance of statistically associated SNPs, which may be located in intronic or intergenic regions with no obvious connection to the disease phenotype. Baratz and colleagues did not find any coding variations within the TCF4 gene that might help explain the functional mechanism behind the association of rs613872 with FECD, so clearly further studies are needed to detect such variations if they exist. Although TCF4 is an attractive candidate gene within the FCD2 locus, much remains unknown. If TCF4 plays a role in familial FECD, further research is needed todiscover the mutation(s) within TCF4 that segregate with disease in families that show evidence of linkage to FCD2. l. Until such a causative mutation(s) is identified, restraint needs to be exercised in not drawing premature conclusions that a causal link between the TCF4 protein and FECD has been identified.  Table 3). The location of the FCD2 peak  and the most significantly associated SNP, rs613872, from the FECD GWAS performed by Baratz et al. (2010) are indicated by arrows. The location of the TCF4 gene is also indicated for reference. 2PT, two-point results; MPT, multipoint results; cM, centiMorgans; LOD, logarithm of the odds; HLOD, heterogeneity logarithm of the odds. doi:10.1371/journal.pone.0018044.g001 Expression data have indicated that TCF4 is expressed in eye tissues, particularly in the corneal endothelium [29,[42][43][44][45][46], However, Tcf4 mutant mice do not appear to contain vision/eye abnormalities as a phenotypic feature [47,48]. Additionally, ENCODE data on the UCSC genome browser [49] (March 2006 assembly) shows that rs613872 is contained within the chromatin immunoprecipitation sequence (ChIP-seq)-purported binding site for two transcription factors, Ini1 (SMARCB1) and Brg1 (SMARCA4). These are both components of the SWI/SNF chromatin remodeling complex that is required for transcriptional regulation of genes normally repressed by chromatin [50,51]. Given that TCF4 is a transcription factor, it is possible that variation at this site might have an effect on the spatiotemporal expression of TCF4 and, in turn, affect the expression of some of its targets. Further studies are required to investigate the veracity this hypothesis.
Our study provides supporting evidence of a linkage signal to the strong association results of rs613872. This not only further confirms the importance of a locus on chromosome 18 in influencing FECD risk, but also indicates that a causal variant for FECD may be identified within this locus. It is highly possible that rs613872 is tagging a rare variant within this locus that plays a role in the etiology of FECD. Additional sequencing studies using family datasets that are highly linked to this locus is a potential strategy to identify the biological mechanisms underlying how this locus influences FECD pathogenesis.