Germ-line and somatic EPHA2 coding variants in lens aging and cataract

Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive.


Introduction
Cataract(s) is a clinically heterogeneous disorder that causes clouding or opacification of the crystalline lens and, thereby, impairs refraction and focusing of light onto the photosensitive retina of the eye. Typically, cataract is acquired with aging (> 50 years) and, despite surgical treatment, age-related cataract remains a leading cause of adult visual impairment (17%-33%) PLOS  in EPHA2, of either germ-line or somatic origin, were associated with lens aging and/or agerelated cataract.

Ethics statement
Ethical approval for this study was obtained from the University of Parma, the National Eye Institute, and Washington University (IRB ID #: 201111056 and 00-0320), and written informed consent was provided in accordance with the tenets of the Declaration of Helsinki.
Cataract case-control DNA panel Genomic DNA was extracted using standard methods from blood samples donated by a casecontrol cohort of unrelated individuals age !50 years form Northern Italy that were ascertained from the Clinical Trial of Nutritional Supplements (CTNS) and Age-Related Cataract Study [53,54]. Cataract status (nuclear, cortical, posterior sub-capsular, clear lens) was evaluated by grading slit-lamp and retro-illumination lens photographs according to a modification of the Age-Related Eye Disease Study (AREDS) cataract grading system as described [55].

Lens DNA panel
Post-mortem human donor lenses (!48 years of age, with or without cataract) were obtained (on dry-ice) from the National Disease Research Interchange (http://ndriresource.org/). Lens genomic DNA was extracted using the DNeasy Kit (Qiagen, Valencia, CA) essentially according to the manufacturer's protocol with the following modifications to mitigate the high protein-to-DNA content of the lens. Each lens was homogenized (2 min-setting 8, Bullet Blender 24, Next Advance, Averill park, NY) in buffer ATL (360 ul) then digested (16 hr, 56˚C) with proteinase K (40 ul 15 mg/ml). Samples were then diluted with buffer ATL (360ul) and re-digested (2 hr, 56˚C) with proteinase K (40 ul) followed by centrifugation (5 min, 10,000 x g) to remove excess protein before processing through spin-columns according to the manufacturer's instructions. DNA was eluted from the spin-columns in buffer AE (200 ul) and quantified (OD 260 ) using a spectrophotometer (ND-2000, NanoDrop, Wilmington, DE). If necessary, samples were concentrated by air-drying in a laminar-flow hood and re-suspended in ultrapure water to give a minimum concentration of 50 ng/μl required for amplicon sequencing.

Targeted-amplicon deep-sequencing and variant calling
Targeted-amplicon deep-sequencing was performed using the Access Array Integrated Fluidic Circuit (IFC) System with custom designed and validated gene-specific adaptor-primers (Fluidigm, San Francisco, CA). Each IFC enables nanoliter-volume high-throughput PCR to generate amplicons ( 200 bp) across 48 samples in a single run for subsequent next-generation (deep)-sequencing (NGS). Briefly, DNA samples (50 ng) and primers were mixed 'onchip' (48.48 Access Array IFC/pre-PCR IFC Controller AX), and PCR amplified (FC1 Cycler). Amplicons for each sample were pooled on-chip (post-PCR IFC Controller AX) then indexed with sample barcodes and NGS adaptors (Access Array Barcode kit) to produce 48

Statistical analysis
Genetic association analysis and logistic regression analysis of selected SNVs found in the cataract case-control panel was performed using the Golden Helix SNP and Variation Suite 7 (Golden Helix, Bozeman, MT). Statistical comparison of somatic SNVs found in the post-mortem lens panel was performed using Fisher's Exact Test by means of the online spreadsheet at http://www.langsrud.com/fisher.htm. A probability (p) value of < 0.05 after correction for multiple testing was considered significant.

DNA panels
The cataract case-control panel comprised 225 leukocyte DNA samples from 161 patients with age-related cataract (age 50+) and 64 age-matched clear lens controls from the N. Italian population [53,54]. The cataract cases included 67 nuclear only, 43 cortical only, and two posterior sub-capsular cataract (PSC) only. In addition to 'pure' forms of cataract, there were multiple cases of mixed cataract including 21 nuclear + cortical, 14 nuclear + PSC, 10 cortical + PSC, and four nuclear + cortical + PSC. The mean age of cataract cases = 74.2 ± SD 6.54 years (range 50-85 years) and the mean age of clear lens controls = 75.19 ± SD 4.2 years (range 57-86), with no significant difference between cases and controls (p = 0.21). The sex distribution was 50% female and 50% male in the cases and 44% female and 56% male in the controls. There was no association between any cataract and sex in the case-control panel using chisquare test (p = 0.51). Post-mortem donor lenses were briefly examined at the time of procurement for the presence or absence of obvious age-related cataract prior to cryopreservation. However, the donor information report did not identify age-related cataract sub-types (e.g. nuclear, cortical). Further, we cannot exclude the possibility that cataract in some of these donor lenses may have been associated with causes other than aging (e.g. uveitis). The post-mortem lens panel comprised 118 genomic DNA samples extracted from 74 clear lenses (37 pairs) and 44 cataract lenses (22 pairs) all obtained from Caucasian donors (age 48+ years). Two of the clear lens pairs failed amplicon sequencing and/or QC criteria leaving 114 lens samples (35 clear pairs, 22 cataract pairs) for variant analysis. The mean age of cataract lenses = 65.5 ± SD 6.67 (range 48-74 years) and the mean age of clear lenses = 64.06 ± SD 7.37 (range 48-78 years) with no significant difference between the two groups (p = 0.45). The sex distribution was 23% female and 77% male in the cataract lenses and 49% female and 51% male in the clear lenses. Despite the numerical sex difference in the cataract lenses there was no significant association between any cataract and sex in the post-mortem lens panel using chi-square test (p = 0.095).
Optimal custom design of PCR primer pairs (Fluidigm) to amplify exons for deepsequencing resulted in 35 amplicons for EPHA2 and 15 amplicons for TP53. Across the cataract case-control panel the mean total number of reads was 418,214 with > 99% on target of which > 82% attained 1000x coverage (S1 Table). Similarly, across the lens panel the mean total number of reads was 456,286 with > 99% on target of which > 70% attained 1000x coverage (S1 Table). All amplicons were fully sequenced in both directions with the exception of amplicon 35 in EPHA2 (part of exon-1) likely due to its high G/C content.
Following sequencing, germ-line SNVs in the cataract case-control panel (blood leukocyte DNA) were called using the FreeBayes program. Variant allele frequencies (VAFs) were calculated as a percentage by dividing the number of individual variant reads by the total number of amplicon reads and those SNVs with VAFs ! 20% were designated germ-line. Somatic variants in the lens DNA panel were called using the VarScan 2 program that was originally designed to call low-frequency (> 1%) somatic variants from deep-sequencing data derived from matched tumor (case) versus control tissue samples [56,57]. For our purposes, we compared left and right lenses from the same individual using the paired analysis or somatic mode. Rare variants present in both lenses were designated as germ-line, whereas, those present in only the left or the right lens (i.e. discordant SNVs) were designated as somatic. In order to reduce the risk of false positives we excluded somatic SNVs with VAFs below 3% and/or coverage depths below 600 reads as potential sequencing errors. For convenience, germ-line and somatic SNVs were divided into novel and reference categories to denote their absence or presence, respectively, in public genome databases including the Single Nucleotide Polymorphism database (dbSNP build 138), Exome Variant Server (EVS), Exome Aggregation Consortium (ExAC), 1000 Genomes project (1000G), and Catalogue of Somatic Mutations in Cancer (COSMIC). Both categories predominantly contained synonymous and non-synonymous (i.e. missense) SNVs with in silico predictions of damaging or deleterious effects at the protein level determined using appropriate algorithms (e.g. SIFT and PolyPhen-2). Binary versions (.bam files) of the Sequence Alignment/Map (.sam) files have been deposited with the NIH Short Read Archive (SRA Accession no. PRJNA384802).

Germ-line EPHA2 variants in the cataract case-control panel
Exon deep-sequencing of EPHA2 in the cataract case-control panel detected 10 novel SNVs (all transitions) and 20 reference SNVs (18 transitions) in the exon regions of EPHA2 at VAFs >20%-consistent with germ-line transmission (Table 1). Of the novel SNVs, two were synonymous and eight were non-synonymous-predicted to result in missense aminoacid substitutions. Two of the novel missense SNVs (p.I142T, p.W348R) occurred in controls and both were predicted in silico to be damaging. Of the remaining six missense SNVs found  in cases, two were predicted in silico to be benign (p.A650T, p.A932T) and four damaging (p. G171E, p.G776S, p.N831D, p.L895P). Since nine of the novel SNVs occurred only once in the panel, and the other only twice, we were unable to perform further statistical analysis. Of the reference SNVs, 12 were synonymous and eight were predicted to result in missense amino-acid substitutions (Table 1). Of the eight missense reference SNVs six were predicted to result in damaging amino-acid substitutions-with two occurring in cases only (p.L41V, p. R175C) and three occurring in both cases and controls (p.R721Q, p.R876H, p.R890H). The minor allele frequencies (MAFs) for all reference SNVs found in the cataract case-control panel were similar to those reported in Caucasians by public genome variant databases ( Table 1). Four of the synonymous reference SNVs that were relatively common in the Caucasian population (MAF 28%-44%) were also the most common in the cataract case-control panel (S2a Table). However, only one of these SNVs (rs6678616) showed weak association (p = 0.032) with nuclear cataract and nuclear cataract + PSC using Fisher's Exact Test (S2b Table). Correcting for sex using logistic regression in the association analysis of rs6678616 did not provide significant association with any type of cataract (p > 0.24). The remainder of synonymous reference SNVs occurred in cases and/or controls but were comparatively rare in the panel (MAF < 1%) hampering further statistical analysis.

Germ-line TP53 variants in the cataract-case control panel
Exon deep-sequencing of TP53 in the cataract case-control panel detected no novel SNVs and only nine reference SNVs (5 transitions) of which five were also present in the COSMIC database (Table 2 and S3a Table). Two of these SNVs (rs1042522, rs730882008) were non-synonymous and predicted in silico (SIFT) to be damaging, with one (rs1042522, p.P72R) present at relatively high frequency in Caucasians (MAF 0.25) and in multiple cases and controls. However, rs1042522 was not associated with any type of cataract (p > 0.33) using Fisher's Exact Test (S3b Table). Correcting for sex with logistic regression in the association analysis of rs1042522 did not provide significant association with any type of cataract (p = 0.85). The other SNV (rs730882008, p.R282L) occurred at unknown frequency in the population and in only one case of cortical cataract preventing further statistical analysis.
Somatic EPHA2 variants in the post-mortem lens panel  (Table 3). Of these SNVs, only 14 were listed in reference databases (e.g. snp138, cosmic70, exac01) suggesting that 52 were novel somatic SNVs. Of the 32 missense SNVs, only nine were listed in reference databases (e.g. snp138, cosmic70) and 29 were predicted in silico (by the SIFT algorithm) to be damaging (Table 3). Surprisingly, 31 of the 32 missense SNVs involved C/T or G/A transitions and 17 of these occurred at di-pyrimidine sites that are susceptible to UV-induced mutation [60]. Similarly, 18 of the 28 synonymous SNVs along with two UTR SNVs and two nonsense SNVs occurred at UV-susceptible dipyrimidine sites (Table 3).   In the cataract lenses, 35 discordant EPHA2 SNVs occurred with a VAF ! 3% in 10 of the 22 cataract lens pairs with only two excluded due to low read-depth (S5 Table). The remaining 33 singly occurring SNVs included 12 synonymous SNVs, 19 non-synonymous or missense SNVs, and two stop-gain or nonsense SNVs (Table 4). Of these SNVs, six were present in reference databases suggesting that 27 were novel somatic SNVs and only one (at position 16460407 bp) was present in both cataract and clear lenses (Tables 3 and 4). Of the 19 missense SNVs only four were present in reference databases and 15 were predicted in silico (SIFT) to be damaging ( Table 4). All 19 missense SNVs involved C/T or A/G transitions and 12 of these occurred at UV-susceptible di-pyrimidine sites. Ten of the 12 synonymous SNVs and both nonsense SNVs also occurred at UV-susceptible di-pyrimidine sites. Overall for EPHA2, there was no significant difference between the paired clear lens panel and the paired cataract lens panel with respect to total SNVs (p = 0.48), damaging SNVs (p = 0.85), or novel SNVs (p = 0.64) using Fisher's Exact Test (S6 Table). Correcting for sex in the lens panels using logistic regression analysis did not provide any significant association for total EPHA2 SNVs (p = 0.62), damaging EPHA2 SNVs (p = 0.63), or novel EPHA2 SNVs (p = 0.70).
Somatic TP53 variants in the post-mortem lens panel   Table). Of the 16 missense SNVs, 13 were present in reference databases, seven were predicted in silico (SIFT) to be damaging and five occurred at UV-susceptible dipyrimidine sites. Apart from the splicing SNV, none of the synonymous SNVs or UTR SNVs occurred at di-pyrimidine sites (S7d Table).
In the cataract lenses, 18 discordant TP53 SNVs (all transitions) occurred with VAFs > 3% in five of the 22 pairs of lenses including five synonymous SNVs, 12 non-synonymous or missense SNVs, and one UTR-3' SNV (S8d Table). Of these single occurrence SNVs, 12 were present in reference databases leaving six potentially novel somatic SNVs and only one (at position 7572892 bp) was present in both cataract and clear lenses (S7d and S8d Tables). Of the 12 missense SNVs, eight were present in reference databases, six were predicted to be damaging, and 11 occurred, along with the UTR SNV, at UV-susceptible di-pyrimidine sites (S8d Table). Overall for TP53, there was no significant difference between the paired clear lens panel and the paired cataract lens panel with respect to total SNVs (p = 0.73), damaging SNVs (p = 0.77), or novel SNVs (p = 0.78) using Fisher's Exact Test (S9 Table). Correcting for sex in the lens panels using logistic regression analysis did not provide any significant association for total TP53 SNVs (p = 0.39), damaging TP53 SNVs (p = 0.71), or novel TP53 SNVs (p = 0.57).

Discussion
In this study we utilized targeted-amplicon (exon) deep-sequencing to identify germ-line and somatic variants of EPHA2-particularly novel missense variants predicted in silico to result in deleterious amino-acid substitutions-that may be associated with lens aging and/or agerelated cataract. First, we profiled germ-line SNVs (VAF > 20%) in EPHA2 for association with age-related cataract in a Caucasian case-control panel that had previously revealed association with common reference SNVs flanking EPHA2 [30]. Exon deep-sequencing detected six novel missense SNVs and eight reference missense SNVs in the cataract case-control panel that were predicted to be damaging (Table 1). However, the relatively small number of individuals in the cataract case-control panel that harbored these damaging EPHA2 SNVs (n < 20) limited the power of this study to detect disease association. For example, of two novel SNVs located in the extracellular LBD of EPHA2 one (p.I142T) was present in a control, while the other (p.G171E) occurred in a case with cortical cataract. Similarly, one of the reference missense SNVs, rs116506614 (c.2162G>A, p.R721Q), located in the TK domain of EPHA2, that has previously been associated with age-related cortical cataract [26], was present in a case with cortical cataract and in a control from our cataract case-control panel. Overall, while it is possible that such control individuals may be pre-symptomatic for age-related cataract, we note that other putatively deleterious SNVs were found only in controls, whereas, putatively benign SNVs were present in cases (Table 1) rendering simple genotype-phenotype correlations inconclusive. Second, we profiled putative somatic SNVs in EPHA2 (VAF ! 3%) that arose in post-mortem lenses procured from Caucasian donors over 48 years of age (Tables 3 and 4). Paired analysis of right and left lenses from the same individual for discordant SNVs, analogous to that of matched tumor versus control tissues, detected 19 novel missense SNVs in a clear lens panel (35 pairs) and 13 novel missense SNVs in a cataract lens panel (22 pairs) that were predicted to be damaging (Tables 3 and 4). By comparison, the same paired-lens analysis of TP53 for discordant SNVs yielded predominantly reference somatic SNVs found in the COSMIC database and no novel SNVs that were predicted to be damaging (S7 and S8 Tables). This difference in SNV profile between the two genes likely reflects the high frequency of somatic mutations identified in TP53 versus EPHA2. Currently, the COSMIC database lists over 29,480 somatic mutations in TP53 including 17,166 missense substitutions that have been detected in multiple tumor samples (e.g. cutaneous melanoma) at relatively high frequencies (~27%). By contrast, EPHA2 harbors some 275 somatic mutations including 164 missense substitutions that have been detected in multiple tumor samples (e.g. stomach, intestine, skin), at relatively low frequencies (typically < 5%) (http://cancer.sanger.ac.uk/cosmic). These observations suggest that novel somatic variants in EPHA2 that are predicted to be functionally deleterious are detectable in aging human lenses. Overall, our data are in agreement with a recent study that employed targeted-hybridization deep-sequencing of human lens epithelial samples to identify somatic variants in a panel of 151 cancer-related genes [61]. To the best of our knowledge, this is the first report of putative somatic mutations in a lens-expressed gene causally implicated in age-related cataract. However, since rudimentary statistical analysis confirmed that somatic SNVs in EPHA2 were present at comparable frequencies in both clear lenses and those with age-related cataract we are unable to determine if such variants are causative for disease.
A striking feature of both the germ-line and the somatic missense SNVs in EPHA2 detected here was the high frequency of transitions (C/T, G/A) versus transversions (G/C, G/T, A/C, A/T). Theoretically, transversions should occur twice as often as transitions; however, a review of the germ-line variation annotated in the EPHA2 reference sequence reveals that the vast majority of missense variants involve C/T or G/A transitions (http://www.ncbi.nlm.nih.gov/ variation/view/). The occurrence of somatic C>T transitions is of particular interest since they may result from exposure to solar UV radiation [60]. Absorption of solar UV radiation (95% UV-A, 5% UV-B) by DNA promotes the formation of photodimeric lesions, mostly cyclobutane pyrimidine dimers (CPDs), at adjacent pyrimidine bases (C and T) that may escape nucleotide excision repair leading to base substitution and generation of UV-signature mutations (C>T or CC>TT) during DNA replication [62]. Among the somatic missense SNVs detected in our lens panel (clear and cataract) many of the C>T changes (G>A on the complementary strand) were present at di-pyrimidine (diPy) sites (CT, TC, CC) in both EPHA2 and TP53 raising the possibility that they represent UV-signature mutations (Tables 3 and 4 and S7 and S8 Tables). While there was no significant association between these somatic SNVs and cataract in our lens panel, epidemiological studies have established that lifetime exposure to solar UV radiation (particularly UV-B) is a significant risk factor for cortical cataract particularly within the lens nasal quadrant [63,64]. In addition, UV-A radiation has been implicated in the increased prevalence of left-sided cortical cataract and facial skin cancer, likely in part, due to increased exposure while operating left-hand drive vehicles [65]. Further, it has been suggested that oxidative stress secondary to solar UV exposure might contribute to age-related cataract [66]. However, since the cornea effectively absorbs most solar UV-B radiation (290-320 nm) and the levels of CPDs in lens epithelia obtained from cataract patients has been reported to be relatively low compared to those of oxidized purines, the cause-effect relationship between solar UV exposure and age-related cataract remains unclear [67,68]. Future studies of somatic variants, including UV-signature mutations, in EPHA2 and over 30 other known cataract genes, including those for crystallins (e.g. CRYAA), connexins (e.g. GJA8) and ocular transcription factors (e.g. HSF4) [14,15] may provide new insights regarding the molecular genetic mechanisms underlying age-related cataract.
Supporting information S1 Table.