Gene expression in the corneal endothelium of Fuchs endothelial corneal dystrophy patients with and without expansion of a trinucleotide repeat in TCF4

Fuchs Endothelial Corneal Dystrophy (FECD) is a late onset, autosomal dominant eye disease that can lead to loss of vision. Expansion of a CTG trinucleotide repeat in the third intron of the transcription factor 4 (TCF4) gene is highly associated with FECD. However, only about 75% of FECD patients in the northern European population possess an expansion of this repeat. The remaining FECD cases appear to be associated with variants in other genes. To better understand the pathophysiology of this disease, we compared gene expression profiles of corneal endothelium from FECD patients with an expanded trinucleotide repeat (RE+) to those that do not have a repeat expansion (RE-). Comparative analysis of these two cohorts showed widespread RNA mis-splicing in RE+, but not in RE- samples. Quantitatively, we identified 39 genes in which expression was significantly different between RE+ and RE- samples. Examination of the mutation profiles in the RE- samples did not find any mutations in genes previously associated with FECD, but did reveal one sample with a rare variant of laminin subunit gamma 1 (LAMC1) and three samples with rare variants in the gene coding for the mitochondrial protein peripheral-type benzodiazepine receptor-associated protein 1 (TSPOAP1).


Introduction
Fuchs endothelial corneal dystrophy (FECD) is a common, heritable degeneration of the corneal endothelium with considerable locus heterogeneity. Mutations in 7 genes (AGBL1, COL8A2, LOXHD1, SLC4A11, TCF4, ZEB1, DMPK) have been shown to be either causal or highly associated with the disease [1][2][3][4][5][6][7][8][9]. A recent large genome-wide association study of FECD added an additional 3 genes to the candidate list (KANK4, LAMC1 and LINC00970/ ATP1B1) and re-confirmed the strongest genetic signal for FECD in a predominantly PLOS  caucasian U.S. population in the transcription factor 4 (TCF4) gene on chromosome 18 [10]. We have previously shown that an expansion of !45 CTG trinucleotide repeats in the third intron of TCF4 is highly associated with late onset FECD [9]. This observation has subsequently been replicated in other studies [11,12] and disease severity has been correlated with the length of repeat expansion [12]. Despite the progress in identifying the genetic underpinnings of FECD, our understanding of the pathogenic pathways in this disease remains limited. In TCF4-associated FECD, the CTG trinucleotide repeat expansion leads to nuclear accumulations of transcribed repeat (CUG) n RNA which sequesters critical RNA splicing factors and leads to widespread splicing changes [13,14]. Other than TCF4, the myotonic dystrophy-causing CTG repeat in the DM1 protein kinase (DMPK) gene is the only other repeat expansion implicated in FECD [5]. The remaining genes that have been implicated in FECD do not harbor repeat expansions nor have obvious interconnections or shared common pathways. In addition, the previously described loci still fail to explain the entire genetic contribution to FECD, suggesting that additional loci remain to be discovered.
In this study we examined corneal endothelial tissue from eyes with FECD to analyze for differences in RNA splicing patterns and gene expression between subjects with TCF4 trinucleotide repeat expansion (RE+) and those without this expansion (RE-). We also identified several rare genetic variants that may be novel contributors to the genetic basis of FECD.

Isolation of corneal tissue
Patients with FECD (modified Krachmer grade 5 or 6) requiring corneal transplantation and control participants without guttae (grade 0) were enrolled in a Mayo Clinic Institutional Review Board-approved Hereditary Eye Disease Study. Patients that participated in the study provided written informed consent and agreed to a blood draw and use of their excised central corneal endothelium/Descemet membrane specimen obtained at endothelial keratoplasty for FECD. DNA was isolated from peripheral blood leukocytes and RNA was isolated from corneal endothelium/Descemet membrane specimens following storage in RNAlater ICE (Ther-moFisher Scientific, Waltham, MA). This research was conducted in accordance with the Declaration of Helsinki.

RNA isolation and sequencing
A total of 24 corneal endothelial samples were collected in succession and processed for RNA-Seq at three different times over a 5-year period (2013-2017). Samples in each batch (Table 1) were processed and sequenced in the same manner and sequenced on the same machine approximately 14 months apart, although cDNA synthesis and sequencing methodology varied slightly between the 3 batches (described below). Total RNA was isolated from tissue samples by homogenization in QIAzol Lysis Reagent, chloroform extraction and the RNeasy Mini QIAcube Kit (Qiagen, Valencia, CA) [13,14]. All samples that had RNA integrity number (RIN) values of !6.0 were used in this study. RNA libraries were prepared for each corneal endothelial tissue sample using TruSeq RNA sample Prep kit version 1 or 2 (Illumina, San Diego, CA, USA). Ribosomal transcripts were depleted from total RNA and deoxythymidine triphosphate (dTTP) was replaced with deoxyuridine triphosphate (dUTP) during reverse transcription. DT-priming (Batch #1) or random priming (Batch #2 and #3) were used to generate the second strand synthesis using TruSeq stranded total library preparation kit (Illumina). Due to the differences in priming, Batch #1 was extended in a non-strand specific manner while Batches #2 and #3 were extended in a strand specific manner. The resulting libraries were minimally amplified to enrich for fragments using adapters on both ends and then quantified for sequencing at three samples/lane using a HiSeq2000 (Batch #1) or HiSeq4000 (Batches #2 and #3) Illumina sequencers. Due to differences in cDNA priming, cDNA extension, and sequencing on different machines between batches, each batch was analyzed as its own experimental group to reduce cross variability between batches. Datasets have been uploaded to the Gene Expression Omnibus (GEO) under accession number GSE112201.

Validation of differential splicing events by RT-PCR
Preparation of cDNA by reverse transcription, amplification by polymerase chain reaction (RT-PCR), and analysis by agarose gel electrophoresis was described previously [13,14]. Briefly, Platinum PCR Super Mix High Fidelity (Invitrogen, Carlsbad, CA) was used to amplify approximately 40 ng of genomic DNA isolated from FECD patients corneal endothelium

Analysis of differentially spliced genes
Whole transcriptomic sequencing data were analyzed using a comprehensive computational pipeline (MAP-RSeq) [15] to align, assess and deliver multiple genomic features. MAP-RSeq uses a variety of freely available bioinformatics tools along with in-house developed methods to obtain in-depth quality control data, transcriptome read alignment, abundance of gene expression, exon expression, and other transcriptomic features. The Binary Alignment Map (BAM) files from MAP-RSeq were then analyzed using MISO (Mixture of Isoforms) [16] packages that quantify the expression level of alternatively spliced genes between groups. For pairwise comparisons, MISO calculates Bayesian probabilities and calculates a percent spliced in (PSI) for every skipped-exon event (range from 0 to 1, with 0 being completely excluded and 1 being uniformly included in the splicing products) which identifies genuine differences in the splicing of a given exon between two samples. To identify differentially spliced genes, we utilized stringent filtering criteria (reads to support inclusive isoform is >2; reads to support exclusive isoform is >2; sum of inclusive and exclusive reads is at least 25; PSI change >0.2, Bayes factor >50) within MISO to perform genome-wide pairwise comparisons between RE + and RE-FECD samples within each batch of samples (28 total comparisons). Due to differences in sample preparation and sequencing instruments, like prepared samples were batched together and compared to each other. This was to reduce potential variability within the analysis. The comparisons for batches 1 and 3 were performed by R package edgeR (https:// academic.oup.com/bioinformatics/article/26/1/139/182458). For batch 2, we analyzed the data using a z-test using the assumption that the biological variation of gene expression in the FECD no expansion sample was similar to that of the FECD with expansion group. Once all comparisons within a batch were performed, results were compared between the 3 batches. Genes that were identified in 2 of the 3 batch comparisons and in 12 or more comparisons were reported. https://doi.org/10.1371/journal.pone.0200005.t002

DNA isolation and trinucleotide repeat characterization
TCF4 trinucleotide repeat length was determined as described previously [13,14]. Briefly, leukocyte-derived DNA was extracted using AutoGen FlexiGene (Qiagen) and suspended in 1x TE for a final concentration of 250 ng/μl. Trinucleotide repeat regions from each sample were PCR amplified in an iCycler (Bio-Rad, Hercules, CA.) by placing 100 ng of genomic DNA with 10 pmoles of oligonucleotide primers specific for TCF4 (5-TCF-Fuchs and 3-TCF-Fuchs1,  Table 2) was used in place of 5-TCF-Fuchs and PCR was performed as described above. After PCR amplification, 2 μl of DNA was mixed with 12 μl of diluted Map Marker 1000 Bio Ventures Inc. (Murfreesboro, TN.). Gene Scan was carried out using ABI 3730XL DNA Analyzer (Foster City, CA.).

Pathway analysis using PANTHER
Gene sets identified by filtering of MISO results were analyzed for overrepresentation of genes in specific PANTHER families and pathways using the default settings of the PANTHER web portal, including Bonferroni correction for multiple testing (http://www.pantherdb.org/).

Sample characteristics
The twenty-four corneal endothelium samples used for this study are described in Table 1. All samples were obtained during corneal endothelial replacement surgery for FECD. Eighteen samples were from RE+ patients (mean age = 71 yrs.; range = 56-85 yrs.; CTG repeat length !45) and six were from RE-patients (mean age = 68 yrs.; range = 60-81 yrs.; CTG repeat length <45). Five of 6 patients (83%) in the RE-group were female, whereas 11/18 (61%) patients in the RE+ group were female. Samples were collected over a 5-year period and grouped into 3 batches based on like RNA preparation and RNA sequencing techniques (Table 1). RNA sequencing was performed for each sample within each batch. Pairwise comparisons for gene splicing and expression between RE+ and RE-samples were performed within each batch, which allowed for 8, 11 and 9 comparisons in batches 1, 2 and 3 respectively, for a total of 28 pairwise comparisons.

Alternative splicing
To evaluate mRNA splicing differences between RE+ and RE-samples, we used MISO software as a screening tool. Splicing events that were common among all 3 batches that met our stringent cutoff criteria resulted in 20 differential splicing events (Table 3). Notably, splicing events in MBNL1, NUMA1 and PPFIBP1 were found in all 28 pairwise comparisons, whereas mis-splicing in INF2, SCARB1, SYNE1, ADD3 and MBNL2 was identified in >90% of the comparisons.
To assess several of the mis-splicing events identified by MISO analysis, we performed RT-PCR on FECD samples obtained from RE+ and RE-patients (Fig 1). In the ADD3, INF2, and CADM1 gene, the RE-sample (shown as the minus sign in Fig 1) showed amplification of a fragment that corresponded to the same size as the non-FECD sample (labeled as C in Fig 1). In contrast, the two RE+ samples (shown with a plus sign in Fig 1) showed 2 bands of similar intensity, one representing the same size as the control/RE-samples, and a larger band in the ADD3 and CADM1 genes representing the inclusion of an additional exon sequence in each gene. In INF2, a smaller band was identified in the RE+ samples suggesting the exclusion of exon sequence within this gene. Densitometry provided percentage of inclusion/exclusion for each band (Fig 1B, values in boxes), which correlated with percent spliced in values obtained by RNASeq (Fig 1B, value in parentheses). Additional PCR validation of MBNL1, MBNL2,  NUMA1, SYNE1, PPFIBP1, ITGA6 and CLASP1 has previously been reported in the context of RE+ vs. non-FECD control tissue, which are similar events identified in this study [13,14].

Differential gene expression
Quantitative differences in gene expression between RE+ and RE-samples were identified for each of the 3 batches of samples. Genes with a minimum log2 fold change of 1 were further compared between batches. Comparison of genes with a minimum log2 fold change of 1 between the 3 gene sets identified 28 genes in which expression was increased in RE+ compared to RE-samples and 11 genes in which expression was decreased (Table 4). Overrepresentation analysis of the 39 genes using Panther did not reveal any significant gene ontology term enrichments.

Characterization of novel genetic variants
To further analyze the RE-samples, we assessed whether these samples contained mutations in any of the other FECD associated genes. Using RNASeq data, ZEB1, SLC4A11, KANK4, ATP1B1 and COL8A2 were expressed in RE-cells, but no known FECD variants or other rare mutations were identified in these genes. AGBL1 and LOXHD1 were not expressed in any of our corneal endothelium (RE-or RE+) samples, so we were unable to evaluate possible variants in these genes from RNASeq data. Exome sequencing of leukocyte DNA from each of the RE-patients confirmed these observations. We did identify a rare mutation in LAMC1 in one of the RE-samples. Sample RNA79 had a heterozygous C->T variant at chr1: 183085942, leading to an arginine to tryptophan substitution at amino acid 490 (R490W) (Fig 2). This variant was detected in both the RNASeq data and in leukocyte DNA exome sequencing of this patient. Analysis of this C->T variant in the Exome Aggregation Consortium (ExAC) browser, which utilizes data obtained from numerous independent investigators performing large scale sequencing projects, showed that it was only observed 61 times in 121,388 analyzed chromosomes suggesting a rare nucleotide change (allele frequency = 5.0 x 10 −4 ). We also assessed genetic variation in RE-samples of the mis-spliced genes in RE+ samples (Table 3). We identified a rare hg19 chr17:g.56383714 C>T variant, resulting in an arginine to histidine substitution at amino acid 1738 (R1738H), in the TSPOAP1 (formerly designated BZRAP1) gene in RE-sample RNA142 (Fig 3). This variant has been found only once in the ExAC browser, leading to a calculated allele frequency of only 4.7 x 10 −5 . Sanger sequencing of leukocyte DNA from this patient confirmed the presence of this variant (Fig 3B). In RE+ samples, mis-splicing of TSPOAP1 led to the deletion of 280 amino acids, which removes part of the motif that has been shown to interact with TSPO, a mitochondrial outer membrane protein involved in the transport of cholesterol into mitochondria (Fig 4).
The finding of a rare TSPOAP1 variant in one of our RE-RNASeq samples led us to further investigate this genes sequence in affected members of a RE-FECD family we had previously identified. In this family, we identified and validated a different rare TSPOAP1 variant in two affected members within this family (Fig 5). As shown in the pedigree in Fig 5A, an hg19 chr17:g.56388483 C->T variant results in an arginine to histidine substitution at amino acid 1058 (R1058H). This particular variant was found only 5 times in 58,162 chromosomes analyzed in the ExAC database (allele frequency 8.6 x 10 −5 ). This particular DNA sequence variation was not found in the two unaffected siblings of the affected daughter in this family. These variants were identified in exome sequencing studies and have been confirmed by Sanger sequencing (Fig 5B).
Within this family, the TSPOAP1 variant was one of eleven rare variants that co-segregated with FECD (Fig 5C). Of these eleven candidates, only two (PLEKHF1 and TSPOAP1) have known interactions with well-established FECD genes. The Y64 Ã variant identified in PLEKHF1 [allele frequency = 5.0 x 10 −5 in ExAC database (6/121152)] also segregated with disease in this family and would clearly influence the function of the affected allele. Additionally, the PLEKHF1 gene is known to bind with ZEB1, a well-characterized FECD gene. However, PLEKHF1 is expressed at very low levels in our RNASeq data from the corneal endothelium.

Discussion
FECD is a debilitating eye disease with genetic association to multiple genes. Of these genes, an expansion of a CTG repeat sequence in the TCF4 gene is by far the most common genetic anomaly found in FECD patients in the United States [1,11,12]. In the current study, we found that mRNA splicing patterns in RE-corneal endothelium samples differ from those in RE+ corneal endothelium. This reinforces the premise that the mis-splicing events seen in the RE+ samples are a direct consequence of the expansion of the CTG repeats at the TCF4 locus rather than a secondary response to the disease process. In addition to these qualitative observations among 18 RE+ samples and 6 RE-samples, we also identified quantitative differences in gene expression between RE+ and RE-samples. Splicing patterns in RE-samples generally mirror those of control endothelium. Of the 20 differential splicing events found in comparisons of RE+ and RE-samples (Table 3), all but one (IFI44) were identified in a previous study comparing FECD RE+ samples to controls [14]. Interestingly, 14 of these 20 events were found in the group of top 24 differentially spliced genes between controls and RE+ samples ( Table 3, genes in bold font). In the same study, we compared alternative splicing in RE+ FECD to the well-characterized mis-splicing due to a similar CTG repeat expansion in the untranslated region of the DMPK gene in myotonic dystrophy, type 1. The list of top missplicing events is strikingly similar between FECD and DM1, and the finding of nuclear foci containing expanded (CUG) n and sequestered muscleblind protein further suggests a shared pathogenic mechanism.
The 10 alternative splicing events identified in previous comparisons [14] between RE+ samples and controls that did not meet criteria in the RE+ to RE-comparisons (Table 3, bottom) are also of potential interest. In principle, these are alternative splicing events that differ between controls and FECD, regardless of the TCF4 expansion status. A closer examination of these 10 splicing events revealed that all but 2 actually met our stringent criteria in two of the three sequencing batches and were detected in approximately half of the pairwise comparisons ( Table 3). The 2 splicing events that distinguished both RE+ and RE-FECD from controls were in transcripts from ABI1 and KIF13A, but even these 2 genes were noted in over 40% of the pairwise RE+/RE-comparisons. Therefore, we cannot conclude that any splicing event distinguishes FECD, regardless of expansion status, from control tissue or that any mis-splicing event is common to all variants of this genetically heterogeneous disease.
Reproducible quantitative changes in gene expression between RE+ and RE-corneal endothelium samples were identified for a set of 39 genes. These represent a set of genes that are differentially regulated by the repeat expansion in TCF4, but changes in the expression of these genes do not appear to be critical for the development of FECD. These data demonstrate that expansion of the CTG repeat in TCF4 leads to widespread changes in gene expression. While this dataset did not reveal any common pathway involvement, several genes can be found in The identification of rare variants in TSPOAP1, a gene exhibiting mis-splicing in both FECD and DM1, in three RE-patients from two separate families is also noteworthy. TSPOAP1 is a TSPO-binding protein, and it has also been shown to bind to TCF4. TSPO is thought to be important in the regulation of mitophagy, which is thought to be important in the pathogenesis of FECD [17][18][19]. Although the functional consequences of TSPOAP1 binding of TSPO are unknown, it can be speculated that structural variants of TSPOAP1 might have a differential effect on the activity of TSPO, which might in turn contribute to FECD pathogenesis.
Although the family data also revealed alternative variants that could be related to the pathogenesis of FECD, the segregation of the TSPOAP1 variants with disease in our family data is intriguing and suggests possibilities for common mechanisms between RE+ samples and at least a subset of RE-samples. Specifically, RE+ patients also produce significant amounts of a variant TSPOAP1 mRNA that codes for 280 fewer amino acids compared to the predominant form found in the corneal endothelium from RE-and control patients. Thus, the majority of all FECD patients in our population are likely to be producing a variant TSPOAP1 isoform as a result of the splicing changes that result from expansion of the CTG repeat in TCF4.
Clearly the RE-group of FECD patients is heterogeneous, because there is considerable locus heterogeneity in the 20-25% of FECD patients in our population that are not RE+. Within the RE-group characterized by RNASeq studies here, we did not identify any samples with rare variants in well-established FECD genes (SLC4A11, COL8A2, ZEB1, TCF4) but we did find that one patient had a rare variant in LAMC1, which has been implicated as an FECD gene by a recent large GWAS effort [10].
Of note, we did not identify expression of LOXHD1 or AGBL1 in any of our samples, whether RE+ or RE-. These genes have been implicated in FECD in prior studies, but this lack of expression in corneal endothelial tissue in adults brings into question whether genetic variants in these genes could have a functional role in the development of the disease.
The limitations of this study include limited statistical power due to the analysis of small groups of samples and the heterogeneity of the RE-group. Additionally, due to consecutive sample collection and processing, batch #2 only had 1 RE-sample to perform analysis with.
While this analysis appears limited, it is strengthened by the fact that we took data obtained from each batch and cross compared them to each other and only reported genes that were identified in 2 out of 3 comparisons and in 12 or more independent comparisons. This analysis plan would have eliminated most if not all of the false positives obtained within batch 2 due to a single RE-comparison. While the findings of differences in splice patterns and gene expression are likely to be significant, we cannot ensure that pathologically important differences were not missed. Resolving this limitation will depend not only on larger sample sizes but also in the further elucidation of the true causative genetic variants in the RE-group. Given the genetic mutations and implicated genes described in prior literature, the absence of those variants in this current RE-study group, and our finding of two novel variants, RE-FECD may very well represent a phenotype resulting from a very diverse source of genetic variants and pathways.
The work presented here supports our previous conclusion that corneal endothelium from RE+ patients exhibit characteristic mRNA splicing events, and we now confirm that many of these abnormalities are not present in the corneal endothelium from RE-patients. These qualitative differences in gene expression are supplemented by quantitative differences in the expression of at least 39 genes. These findings suggest that there are real biological differences between RE+ FECD and RE-FECD corneas and lend support to a hypothesis that mis-splicing is a key pathogenic feature of RE+ FECD but not RE-FECD. Therefore, future efforts to identify genetic influences on the development of FECD should consider stratification of study populations according to repeat status. This should also apply to studies of the basic science of the disease, such as cell culture systems in which RE+ or RE-models may express differential biology and differential responses to interventions or therapeutics. In addition, we identified a novel candidate locus, TSPOAP1, and confirm that a rare variant in a recently described locus, LAMC1, was found in a subset of RE-FECD.