Molecular Diagnosis of Putative Stargardt Disease by Capture Next Generation Sequencing

Stargardt Disease (STGD) is the commonest genetic form of juvenile or early adult onset macular degeneration, which is a genetically heterogeneous disease. Molecular diagnosis of STGD remains a challenge in a significant proportion of cases. To address this, seven patients from five putative STGD families were recruited. We performed capture next generation sequencing (CNGS) of the probands and searched for potentially disease-causing genetic variants in previously identified retinal or macular dystrophy genes. Seven disease-causing mutations in ABCA4 and two in PROM1 were identified by CNGS, which provides a confident genetic diagnosis in these five families. We also provided a genetic basis to explain the differences among putative STGD due to various mutations in different genes. Meanwhile, we show for the first time that compound heterozygous mutations in PROM1 gene could cause cone-rod dystrophy. Our findings support the enormous potential of CNGS in putative STGD molecular diagnosis.


Introduction
Stargardt disease (STGD) is the most frequent cause of macular degeneration in childhood, with a prevalence of approximately 1:10000 [1]. It is usually diagnosed within the first two decades of life and leads to progressive irreversible loss of central vision, delayed dark adaptation and a poor final visual outcome. STGD is predominantly inherited as an autosomal recessive trait with mutations in ABCA4, also known as ABCR, although an autosomal dominant form has been also reported [2]. Rare cases of STGD or ''Stargardt-like'' disease phenotypes have been reported with mutations in PROM1, PRPH2, VMD2 (also known as BEST1) and ELOVL4, which are involved in various physiological pathways that are important for macular function [3]. This complex arena of genes and clinical features complicates the nomenclature in this field [3]; it is unclear how to classify individuals with classic Stargardt phenotype. Classic STGD should be restricted to only those cases caused by ABCA4 mutations and ''Stargardt-like'' or juvenile macular dystrophy should be used for other genetic etiologies. For the purposes of this study, we classify our participants with early-onset macular degeneration as ''putative STGD'' cases.
Stem cell-based therapy shows great promise for the treatment of STGD [4] Accurate molecular diagnosis is therefore essential for the selection of patients for clinical trials, and is also crucial for prenatal STGD diagnosis. However, the genetic diagnosis of individuals with putative STGD is an ongoing challenge because of the relatively large sizes of some of the genes involved. ABCA4 and PROM1 are particularly large containing 50 and 26 exons, respectively. VMD2, ELOVL4 and PRPH2 have 8, 6, and 2 exons, respectively.
Furthermore, although biallelic mutations in ABCA4 are found in most patients with autosomal recessive STGD, there are studies which have shown that mutations in the ABCA4 gene are responsible for a wide variety of other retinal dystrophy phenotypes, such as cone-rod dystrophy (CRD), and retinitis pigmentosa (RP) [5,6]. It has also been proposed that individuals carrying mutations in ABCA4 may have a higher risk of developing age-related macular degeneration (AMD) [1,7]. We therefore sought to investigate whether other retinal disease genes besides these reported five genes could lead to putative STGD.
In this study we initially selected known retinal disease genes as a gene capture panel and applied a capture next generation sequencing (CNGS) approach to identify genetic defects in seven putative STGD patients from five independent families. This approach was used to test whether additional retinal disease genes could lead to putative STGD.

Patient Recruitment
This study conformed to the tenets of the Declaration of Helsinki and was approved by the Ethics Committee of Eye Hospital, Wenzhou Medical University. Written informed consent was obtained from all participating individuals or their guardians. Patients from families (A, B, C, D and E, Figure 1) were recruited. Ophthalmic examination was performed for each patient. Electroretinography (ERG) and optical coherence tomography (OCT) were performed as routine retinal ophthalmic examination. A five ml venous blood sample was drawn into an ethylenediamine tetraacetic acid (EDTA) sample tube from every subject. Genomic DNA was extracted from peripheral blood leukocytes using the standard phenol/chloroform extraction protocols.

Targeted Exome Illumina Library Preparation
Genomic DNA was purified and quantified with Nanodrop 2000 (Thermo Fisher Scientific, DE). The generation of a targeted exome Illumina Library was performed according to the manufacturer's protocol (MyGenostics, Beijing, China). A final library size of 350-450 bp including adapter sequences was selected.

Additional Sequencing
Targeted amplification of ABCA4 and PROM1 sequences was performed using PCR (primer sequences and amplification conditions in Table S1 in File S1). PCR products were sequenced on an ABI PRISM 3730 DNA Sequencer. For mutations, nucleotide numbering reflects cDNA numbering with +1 corresponding to the A of the ATG translation initiation codon in the reference sequence, according to journal guidelines (www.hgvs. org/mutnomen). The initiation codon is codon 1.

Clinical Data
A total of seven patients (two females and five males, Figure 1) from five independent families were recruited for this study. Clinical summaries, including visual acuity, age of recruitment, gender, and relevant ophthalmological findings are described in Table 1, Figure 2 and Figure S1 in File S1. No night-blindness was observed in any of the enrolled patients. We noticed the onset of disease in patients from family E was later than that of other  patients, which led us to suspect that it has a distinct genetic etiology.

Mutation Analysis
We first calculated the CNGS results for quality. On average, a mean coverage of 2006 over the targeted region was achieved ( Figure 3). Manual checking of sequencing depth of known STGD genes (ABCA4, PROM I, PRPH2, VMD2 and ELOVL4) ( Table S2 in File S1) showed that a mean coverage of 2166 was obtained. We observed that missing coverage of one exon each in PROM1 (Exon24) and VMD2 (Exon4) genes. We used Sanger sequencing for these two missing exons (primer sequences shown in Table S1 in File S1).
Given that autosomal recessive STGD is largely caused by homozygous or compound heterozygous mutations, we initially scanned the five reported STGD genes for mutations in our dataset. Sanger sequencing was used for validation of any pathogenic mutations identified, and the segregation of mutations was tested in the familial cases. For any missense variants identified, computational prediction by three algorithms was used to confirm the number of candidate mutations.
The mutations identified in these cases are summarized in Table 2. Briefly, seven mutations (one homozygous) in ABCA4 and two mutations in PROM1 were successfully identified in the STGD families via CNGS, Sanger sequencing, and co-segregation analysis ( Figure 4 and Figure 1, Figure S2 in File S1 and Table  S1 in File S1). Furthermore, multiple sequence alignments were performed and we found that missense mutations in ABCA4 were located within a phylogenetically conserved region ( Figure 5).
With the exception of family C all families had compound heterozygous mutations. Additionally, because we can define whether the mutation originated from paternal or maternal allele, we could trace the origin of each mutation ( Figure 1, Figure 4). No de novo mutations were identified in these families.
We searched for the mutations identified in multiple databases, including the 1000 Genome (1000G, http://www.1000genomes. org/), ESP6500 (http://evs.gs.washington.edu/EVS/) and 702 sample in-house exome database as normal controls. The candidate mutations were not present or present at extremely low frequency in these databases ( Table 2). The molecular genetics of these families clearly shows that patients with PROM1 mutations have later onset disease, compared with patients with ABCA4 (17years-old vs 6 to13-years-old).
We found that in our cohort of putative STGD patients that no mutations in additional retinal disease genes besides these reported five genes were identified. We did observe additional heterozygous DNA variants (Table S3 in File S1).

Post-Identified-Mutations for the Clinical Diagnosis
After the mutations were identified in families, we realized that re-evaluation of patients was necessary especially for family E because there are no reports that compound heterozygous PROM1 mutations have been identified for putative STGD. We retrieved the clinical data from these families and confirmed that the diagnosis of patients from family A-D is STGD. In Family E, we found retinal vessels of the patients are moderately attenuated (Figure 2), and the macular atrophy was very obvious. The electroretinogram (ERG) showed that both the cone and rod responses were affected; meanwhile, the cone responses were more severely affected than rod responses (Figure 6 A). OCT (Optical Coherence Tomography) testing showed the photoreceptor segments and the retinal pigment epithelium atrophy seriously ( Figure 6B). Visual field testing showed central scotomas, while the periphery was spared ( Figure S3 in File S1). The Arden ratios of the electro-oculogram (EOG) were 1.2 and 1.5 ( Figure S4 in File S1), respectively, compared with that of normal (1.8). Taken together, based on the clinical manifestations, the final diagnosis of patients from family E was cone-rod dystrophy (CRD). This highlights that accurate clinical diagnoses should based on all the available clinical data because there are substantially overlapping phenotypes between STGD and CRD (Table S4 in File S1). Also, it demonstrates that the CNGS method is indeed the best method to determine the genetic cause of a heterogeneous disease since it is unbiased. It is also the case that CNGS method may be helpful for the accurate clinical diagnoses of heterogeneous diseases even if the researcher does not have access to clinically well characterized patients with different forms of retinal disorders.

Discussion
STGD is the most common childhood recessively inherited macular dystrophy. The first identified disease gene linked with STGD was ABCA4 in 1997 by Allikmets et al. [9], and since then, many additional mutations have been identified [1][2][3]7,10]. Here we recruited five families and found that four of them have ABCA4 mutations, which indicates that it is very informative to screen for ABCA4 mutations in STGD families.
So far, the use of a variety of mutation detection techniques for STGD such as SSCP (single-strand conformation polymorphism)/ heteroduplex analysis, high resolution melting, microarray, direct Sanger sequencing and PCR-Next-Generation Sequencing (PNGS), and whole exome sequencing (WES) approaches have been reported [2,3,6,[10][11][12][13][14][15]. With the exception of PNGS and WES these methods are labor intensive or low throughput approaches. Although the PNGS method has the advantage of high throughput, it may be a challenge to amplify of all the reported gene fragments in one tube. The bioinformatics analysis of the results from WGS is still challenging for most laboratories and the cost may be prohibitive. In contrast, CNGS allows for the comprehensive molecular diagnosis of these heterogeneous genetic diseases and has the advantages of speed (Exons of 144 disease genes sequenced at one time) and is cost-effective (less than 1/40 cost of Sanger sequencing); here we demonstrated the usefulness of this approach.
We observed missing coverage of some exons from CNGS based molecular diagnosis of STGD, which indicates the method still has flaws for applications in clinical genetic diagnosis. It has been reported that with deep sequencing, coverage of some regions will be missing [16,17]. The main reason for this may be the PCR step, which has a bias for amplification of GC-rich or repeat fragments under normal PCR conditions. We analysed the missing coverage of exons in two genes (PROM1 and VMD2), then found there is a repeat A sequence in PROM1 and more than 60% GC content in the corresponding exon of VMD2. To fill in the missing data, we designed specific primers to amplify these fragments. The results from the present study also suggest that before searching for the disease-associated mutation, it is necessary to check the coverage of the targeted sequence, even though no mutations were found in the missing coverage region.
Definition of a ''disease-associated'' mutation is a difficult task, particularly if no simple functional assays to determine the phenotypic effects of specific variations are readily available [1]. In general we use the following criteria: if the mutation allele frequency is over 5% in the general population as identified by bioinformatics analysis of multiple databases, we would treat it as a non-pathogenic mutation since STGD prevalence of approximately 1:10000. We also checked it whether the mutation was reported in the literature or is novel. In this study, we identified these nine mutations, including five novels and four previously reported, which expands the mutation spectrum of ABCA4 and PROM1.
Analysis of disease allele frequency in specific populations is important for clinical genetic diagnosis. There are reports that ''population-specific'' ABCA4 alleles, such as p.G863A/delG863, are founder mutations in Northern European patients. We searched all these nine mutation in the literature and found five novels and four previously reported. Among these four reported mutations, p.A1773V in ABCA4 was reported as one of the founder mutations (up to17%) in Latin American population [18]; p.R2038W mutation in USA, Estonia and South African population; p.R602W mutation in USA, South African population [2,3,19]; G607R in the German population [20]. Taken together, this study confirmed that these four mutations are pathogenic mutations and among these four reported mutations, p.A1773V, p.R2038W and p.R602W may have higher allele frequencies since they were frequently reported in different populations. We observed two mutations (p.R2038W and p.G607R) [1,2], which have extremely low allele frequency (0.000080 and 0.000077, respectively), in databases (Table 2), while no allele frequency data (just ,1/1404) is available in the Chinese population due to the relatively small in-house sample size. As to real allele frequency of the mutations identified in this study, further studies are needed. This is consistent with the prevalence of STGD (approximately 1:10000) since the allele frequency of all the mutations identified in this study is not detected in our 702 sample in-house exome database. We speculate that the p.G607R mutation may have a higher allelic frequency, because the patient from family C has homozygous mutation of p.G607R and the parents come from different regions. One previous study of STGD in the Chinese population, screened part of ABCA4 coding sequence (15 exons) and identified two relatively common mutations: T1428M and R2040X [21]. To further clarify ABCA4 mutation spectrum in the Chinese population, further studies of large sample size are still needed.
So far, more than 200 disease-associated ABCA4 variants have been identified (http://www.uniprot.org/uniprot/P78363). We manfully mapped these mutations to the ABCA4 protein and found the majority of the mutations are located at extracelluar and intracellular loops ( Figure S5 in File S1). There are four mutation-rich regions at the protein level, which suggests that they are in a key functional region of ABCA4 ( Figure S6 in File S1). For genetic diagnosis, it is meaningful to scan these mutation-rich exons. Therefore, we manfully mapped all the mutations and found five exons (3,13,22,29 and 47) have more mutations per length than other exons ( Figure S7 in File S1). This indicates that these exons may be prioritized for the detection of mutations in ABCA4.
From our five families study, it is clear that patients with PROM1 mutations have a later age of onset, compared the patients with ABCA4 mutations. This may be a clinically relevant observation. We reviewed the literature and found mutations in PROM1 can lead to several diseases (Table 3), including Stargardt disease (STGD4) [22,23], retinitis pigmentosa (RP41) [24][25][26][27][28], autosomal dominant cone-rod dystrophy (CORD12, CRD) [16], macular dystrophy (MCDR2) [29] and autosomal recessive conerod dystrophy (CRD) [21]. The major difference between these four diseases (STGD, CRD, MCDR2 and RP41) is the results of full field ERG and fundus appearance, while night-blindness, the ages of onset may be helpful for the clinical diagnosis. CRD is a panretinal photoreceptor degenerative disorder with predominant loss of cone function that affects the macula early in its course, while STGD is a progressive bilateral atrophy of the retinal   pigment epithelium in the macula, with accumulation of a lipofuscin-like substance in the retinal pigment epithelium, and a reduced foveal cone ERG. Full field ERG is the key test, particularly when patients are asymptomatic and show a normal fundus at early stages because full field ERG examination can distinguish the effect of degree of cone function and rod function. In other words, cone function in CRD would be affected earlier or more severely than in STGD as measured by the ERG test.
Combined the comprehensive clinical examination and genetic diagnosis, this is the first report, to our knowledge, to show that compound heterozygous mutations in PROM1 could lead to CRD. This study also demonstrates that genetic testing can help to improve the diagnostic accuracy of heterogeneous disease.
In the present study, we demonstrated that no additional retinal disease genes could cause STGD. This may be due to the limited sample size of our study; it also suggests that if additional causative genes for STGD exist, that they may be present in a relative small fraction of cases. Since there is no more available data from additional similar reports, the real fraction still needs to be investigated. However, we cannot exclude novel genes, beyond the scope of our existing knowledge of retinal diseases. It also indicates that, it is very meaningful to scan the mutation in ABCA4 gene before screening the all retinal disease genes, since to date mainly ABCA4 has been the gene underlying this disorder.
In summary, we have demonstrated the utility of CNGS approach to molecular diagnosis of putative STGD in five independent families, with successful identification of diseasecausing mutations, including five novel mutations. The study also provides a genetic basis of the differences among putative STGD patients due to different mutations in different genes, which is a very significant advance in clinical genetic diagnosis of putative STGD. We showed that compound heterozygous mutations in PROM1 could cause cone-rod dystrophy for the first time. Our findings support the enormous potential of CNGS in putative STGD molecular diagnosis. With the progress of next generation sequencing technology, higher sequencing quality will be provided and its cost will dramatically decrease, which are the key bottomnecks for the application of CNGS to clinical genetic testing. Here we only showed CNGS results from five families, and more studies should be performed before its application as a routine clinical genetic testing method.

Supporting Information
File S1 Contains the following files: Figure S1. Color fundus photographs of the patients. Color fundus photograph of probands from family A (A), family C (B) and family D (C). All of images showed bull's-eye maculopathy, which is one of the standards for Stargardt disease diagnosis. Figure S2. DNA sequence chromatograms of the controls. DNA sequence chromatograms of the controls. The peaks pointed out by red arrow were the mutation sites identified in this study (here is the wild-type). Figure S3. Visual field testing of the patient from family E. Visual field testing showed central scotomas, while the periphery was spared. Figure  S4. Electro-oculogram of the patient from family E. The decreased Arden ratios of the electro-oculogram (EOG) were 1.2 and 1.5, compared with that of normal (1.8). Figure S5. ABCA4 mutations at protein level. The mutations were mapping to each domains of the ABCA4 protein. Each line in red represents one mutation. Blue lines represent mutation identified in this study. Figure S6. ABCA4 mutations and their relative frequencies at protein level. The mutations were mapped to each domains of the ABCA4 protein. There are four mutation-rich loops (intercellular loop 4, intercellular loop 7 and extracelluar loop 1 and extracelluar loop 5). Here it suggests that they are in a key functional region of ABCA4. Figure S7. The mutations in each exon and their relative mutation rate in ABCA4-related diseases. We mapped all the ABCA4 mutations to its 50 exons and found five exons (3,13,22,29 and 47) have more mutations per length than that of other exons. Table S1. PCR information for the amplication of ABCA4,PROM1 and VMD2 genes. Table S2. Capture Next Generation Sequencing of ABCA4, PROM1, PRPH2, VMD2 and ELOVL4 genes. Table S3. Additional DNA variants identified in CNGS.