Mutation Detection in Patients with Retinal Dystrophies Using Targeted Next Generation Sequencing

Retinal dystrophies (RD) constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencing (NGS) technologies are among the most promising approaches to identify mutations in RD. We screened a large cohort of patients comprising 89 independent cases and families with various subforms of RD applying different NGS platforms. While mutation screening in 50 cases was performed using a RD gene capture panel, 47 cases were analyzed using whole exome sequencing. One family was analyzed using whole genome sequencing. A detection rate of 61% was achieved including mutations in 34 known and two novel RD genes. A total of 69 distinct mutations were identified, including 39 novel mutations. Notably, genetic findings in several families were not consistent with the initial clinical diagnosis. Clinical reassessment resulted in refinement of the clinical diagnosis in some of these families and confirmed the broad clinical spectrum associated with mutations in RD genes.


Introduction
Retinal dystrophies (RD) are among the disorders with the highest level of heterogeneity. This includes genetic heterogeneity, allelic heterogeneity as well as clinical heterogeneity. Molecular genetic studies in the last two decades revealed~225 genes that are mutated in one or more of the various clinical subtypes of RD (https://sph.uth.edu/retnet/). Some of the clinical subtypes of RD can be caused by mutations in up to 60 different genes, e.g. in retinitis pigmentosa (RP). Adding to the genetic complexity there is considerable variation in clinical expression and overlap of symptoms of single disease entities, all of which may hamper making an exact clinical diagnosis. These obstacles have also practical implications for molecular diagnostics. Because it is difficult to predict the gene likely to be mutated, a gene-by-gene screening approach in RD patients is neither time-nor cost-efficient. On the other hand, establishing a molecular diagnosis is important for several reasons. It is vital for determining the recurrence risk for future children and therefore provides the basis for accurate genetic counseling. In many instances, it will also help to predict the clinical course, which is of central importance for the patients to plan and organize their professional and social lives. There is no effective cure for RD, however, ongoing clinical trials applying gene-replacement therapy approaches for several forms of RD have raised new hopes. Since these approaches require the identification of the causative mutation, the genetic diagnosis is an essential prerequisite.
The identification of the genetic defect in RD patients has been accelerated by the introduction of next-generation-sequencing technologies (NGS). Within the field of NGS platforms, the targeted capture of known disease genes ("disease panels") has been proven superior in terms of coverage compared with whole exome sequencing (WES), especially for previous generations of exome capture reagents [1]. Exome capture kits of newer generations, however, show an improved performance and also offer the possibility to discover novel genes. On the other hand, neither conventional "disease panels" nor WES cover non-coding regions. Whole genome sequencing (WGS), besides its ability to sequence non-coding regions of the genome, has also been shown to outperform WES in the coding regions [1], but involves higher storage and analysis costs and is still challenging in terms of bioinformatic analysis.
In this study, we used a retinal capture panel, WES, and in one case WGS, to analyze-in a research context-89 unrelated cases with different forms of RD. Our results have important implications for the design and analysis strategy of routine genetic diagnostics in RD.

Clinical diagnosis and sample collection
The clinical diagnosis of RD was established by ophthalmological and/or electrophysiological examination in different clinical centers. The majority of patients were examined in the outpatient clinic for Inherited Retinal Dystrophies at the Centre for Ophthalmology, (Tuebingen, Germany). Others were clinically diagnosed at the outpatient clinic for Retinal Dystrophies at the University Eye Hospitals in Munich, Freiburg, and Berlin. Several cases were from Sweden, Denmark, France, Hungary and the USA. Genomic DNA of patients was extracted from peripheral blood using standard protocols. Samples from all patients and family members were recruited in accordance with the principles of the Declaration of Helsinki and were obtained with written informed consent accompanying the patients´samples. The study was approved by the institutional review board of the Ethics Committee of the University Hospital of Tuebingen.
Except for a few cases most families had two or more affecteds. For the sake of brevity, in the following, we refer to multiple patients from one family as one case.

Panel sequencing
We used a capture panel of 105 retinal disease genes (RD panel) to analyze 50 cases. Details of panel design, library preparation, capture sequencing and variant calling have already been published [2].

Exome sequencing
We performed duo-based WES (two affected family members) in a cohort of 84 RD patients from 42 families, while in four cases only one exome was performed. For one family, WES was performed for three affected family members (pedigree LCA70 is depicted in Fig 1). Eight cases had been previously analyzed by our RD panel. Exomes were enriched using the SureSelect XT Human All Exon 50 Mb kit, versions 4 or 5 (Agilent Technologies, Santa Clara, CA, USA). Sequencing was performed on HiSeq 2500 systems (Illumina, San Diego, CA, USA). Reads were aligned against the human assembly hg19 (GRCh37) using Burrows-Wheeler Aligner version 0.7.5 [3]. We performed variant calling using SAMtools version 0.1.18 [4], PINDEL version 0.2.4t [5] and ExomeDepth version 1.0.0 [6]. Subsequently, variants were filtered using the SAMtools varFilter script and custom scripts. Shortly, only SNVs and indels in coding regions (nonsense, missense and canonical splice site variants as well as frameshift indels) having a potential effect on protein function in silico (assessed using predictions from PolyPhen-2 [http://genetics.bwh.harvard.edu/pph2/], SIFT [http://sift.bii.a-star.edu.sg/] and CADD [http://cadd.gs.washington.edu/]) were considered. From those, only private variants or those with a minor allele frequency <1% in a cohort of more than 66000 control individuals (ExAC Browser [http://exac.broadinstitute.org/]; and 6742 in-house exomes) were kept for subsequent analyses.

Genome sequencing
One family was analyzed by whole genome sequencing. Details have already been published [7].

Molecular validation of the candidate variants
All putative mutations identified by exome sequencing were validated using conventional Sanger sequencing according to the manufacturer´s protocols (3130XL Sequencer, Applied Biosystems, Weiterstadt, Germany) and tested for co-segregation within kinships.
Validation of the large deletion in the PRPF31 gene was performed in the seven affected and eleven unaffected members of family ADRP32 using a long-distance PCR assay. To refine the breakpoint, we used a forward primer located in exon 3 (aagcaagccaaagcttcaga) and a reverse primer located in exon 14 (cctgtgggttcacaatctcc). For amplification, we applied a long distance PCR protocol using 80 ng of genomic DNA in a total volume of 25 μl containing 0.2 μM of each primer, 400 μM of each dNTP, LA Buffer (1X, without MgCl 2 ), 0.5 mM MgCl 2 , and 2.5 U TaKaRa LA Taq DNA polymerase (Takara Bio Europe, Saint-Germain-en-Laye, France). Thermal cycling was performed with the following conditions: 1 min at 94°C followed by 14 cycles of 10 s at 98°C, 15 s at 55°C and 4 min at 68°C, a further 16 thermal cycles with an increment of 4 s/cycle for the elongation step, and a final add-on elongation for 10 min at 68°C.
Validation of the large deletion encompassing exons 15-22 of the EYS gene was performed using two duplex PCRs in the two affected and three unaffected members of family ARRP28. Due to the large intron sizes, breakpoints were not precisely defined. Briefly, primers were designed to co-amplify exons 14 and 15 in one PCR reaction, and exons 22 and 23 in another PCR reaction, respectively.
Screening for deep intronic variants (denoted as V1-V7) in the ABCA4 gene [8] was performed in family CACD25 in which exome sequencing had revealed a single heterozygous missense mutation in ABCA4. Screening for V1-V7 was performed as described previously [9].
Characterization of the deep intronic mutation in PROM1 in family RCD49 has been described before [7].
The mutational hot spot exon of RPGR, ORF15, was not accessible by our sequencing approaches in all cases due to its highly repetitive sequence. For the mutation screening of ORF15 in unsolved RP families with absence of male-to-male transmission, we used the protocol described in Neidhardt et al. [10].

Mutation detection rate
The overall mutation detection rate of our study was 61% (54/89). More specifically, causative mutations were detected in 25 of 50 cases which were analyzed with our custom RD panel (50%). Average coverage was 750 reads per base pair with approximately 55% reads on target. In all cases but one, samples from additional family members were used to verify segregation of the sequence variants identified in the index patient.
Of the 25 cases that remained unsolved using our RD panel, eight were selected for subsequent analysis applying WES, in addition to a further 39 cases that were selected for direct WES analyses. Overall, 91 affected members from 47 families were subjected to WES. A total of 1017 Gigabases of data on target genomic regions were generated for the 91 samples with a mean coverage of the targeted region of 142 fold (minimum mean coverage was 89 fold). Following WES and subsequent analyses of intronic regions in two cases, we were able to identify pathogenic mutations in 29 cases (Table 1), thereby achieving a detection rate of 62% (29/47). In cases attributed to autosomal recessive inheritance that showed two heterozygous mutations, compound heterozygosity was confirmed by segregation analysis in all cases except three in which no DNA samples of additional family members were available. In cases attributed to autosomal dominant inheritance, co-segregation in two subsequent generations was confirmed in all cases except two owing to the lack of additional DNA samples. Of note, we did not observe any de novo mutations in our cohort.
A total of 69 distinct mutations were identified, including 39 novel mutations .
Although we counted them as being solved for the calculation of the total detection rate, three families of our cohort were only partially solved. All of them are now supposed to segregate two disease entities: two members of family LCA70 had a diagnosis of LCA while one sibling was diagnosed with Bardet Biedl syndrome (Fig 1). Applying WES to all three siblings we were able to identify a homozygous splice site mutation in BBS9 that was unique for the patient with Bardet Biedl syndrome. The underlying mutation of the LCA phenotype in the remaining two siblings could not be identified so far.
One member of family RCD163 was diagnosed with LCA, while two siblings were diagnosed with cone-rod dystrophy (Fig 1). Using the RD panel, we could show that the patient with LCA was compound heterozygous for two frameshift mutations in RPGRIP1 while the two other siblings were only heterozygous for one of the frameshift mutations. We excluded other variants in the coding regions and canonical splice site mutations of RPGRIP1 in these patients using conventional Sanger sequencing. Whether they harbor a second disease-causing mutation in the non-coding regions of RPGRIP1, or whether their phenotype is caused by a second gene, remains unknown.
In family CHRO89, three brothers with a clinical diagnosis of achromatopsia/color vision deficiency were analyzed with the RD panel (Fig 1). Targeted sequencing revealed that only two of them harbour a homozygous frameshift mutation in the CNGB3 gene while the third brother shows two wildtype alleles. The clinical difference between the brothers has already been noted in a prior clinical report [41]. Follow-up clinical examination showed that the nonsegregating brother has reduced visual acuity and perifoveal depression of cone responses in the multifocal ERG, however, his color vision is not achromatic, but deuteranopic and he shows no nystagmus. The underlying mutation of his phenotype remains unclear.
Of note, eight cases that had been mutation-negative upon the analysis with our custom RD panel were selected for subsequent WES; five cases could be solved. Applying WES to family ARRP182 we were able to identify a known homozygous missense mutation in the CLN3 gene. Yet at the time when the RD panel was applied in this case, it was not known that mutations in CLN3 can cause nonsyndromic RP and therefore the gene was not included in the panel design. This clearly shows one of the major disadvantages of a panel-based sequencing approach: if a gene has not been linked to a specific disease at the time of its design, it will escape detection. In four cases, RCD70, RCD82, RCD285, and ZD345, WES detected mutations in PROM1, CRX, and ABCA4, respectively. Although these genes had been included in the RD panel

Comparison with other NGS studies on RD
In the present study, we applied a custom RD panel interrogating 105 RD genes to analyze 50 cases and were able to solve 25 cases. This detection rate of 50% is somewhat lower in comparison with a prior study using the same custom RD panel in a genetic diagnostic context setting (55% detection rate; [2]) as well as when compared with other studies which also used panel-based sequencing approaches. Eisenberger and colleagues [21] analyzed 55 genes in a cohort of 70 patients with RP and 56 patients with LCA and achieved an overall mutation detection rate of 70%. Other panel-based studies obtained similar results: in a cohort of 82 RP families from Northern Ireland disease-causing mutations were identified in 60% [42]. Another study analyzed 179 Chinese families with RD and achieved a detection rate of 55.3% [43]. The fact that we obtained only a detection rate of 50% in this study might be due to several reasons: 1) our cohort is more diverse concerning clinical phenotypes and inheritance traits; 2) we used a very early version of the RD panel which had some technical limitations; and 3) our cohort is somewhat biased since most cases had been extensively pre-screened for mutations in frequently affected genes applying Sanger sequencing and/or APEX arrays. As for our detection rate of 62% in the cases that were analyzed by WES, a direct comparison with other studies is complicated due to differences in both cohort size and composition regarding clinical phenotypes and inheritance traits. Corton and colleagues used WES to analyze twelve Spanish families with presumed recessive RD and were able to solve ten of them [44]. Another study was able to identify disease-causing mutations in four of six Spanish families with an initial diagnosis of autosomal dominant RP [45]. A very recent study that analyzed 90 patients from 68 Israeli and Palestinian families with diagnoses of RP and LCA achieved a detection rate of 49% [46].

Genetic heterogeneity
A total of 69 distinct mutations were identified in our study; 39 of them had not previously been reported (Tables 1 and 2). Among these novel mutations, 25 were nonsense, frameshift or splice site mutations presumably leading to functional null alleles while 14 were missense mutations that were predicted to have a deleterious effect on protein function in silico. Twelve of the novel missense mutations were absent from the ExAC database and two had a minor allele frequency of less than 0.00001.
With 36 genes implicated in disease in 54 families, and only few recurrent mutations in the same gene (Table 3), our observations reaffirm the known genetic heterogeneity of RD in an outbred European population. Similar genetic heterogeneity was also noted in 126 RP and LCA patients [21].
Besides the identification of mutations in already known RD genes, WES led to the identification of two genes that had not previously been associated with RD, demonstrating a major advantage of WES in a research setting. The extreme genetic heterogeneity in RD usually makes it very unlikely to identify-in a limited study cohort-more than one family carrying mutations in a novel RD gene. However, such initial findings of potential candidates may find a match in databases listing single genetic findings (GeneMatcher; https:// genematcher.org) or within the public domain of large consortia (e.g. the European Retinal Disease Consortium; http://www.erdc.info/) or will guide targeted screening in larger patient cohorts. Applying this strategy we were able to identify, or substantiate identification of, respectively, two novel genes associated with RD in our cohort of 47 families that underwent WES. In family CHRO249 we identified a homozygous nonsense mutation in RAB28 that led to the first description of this gene being associated with cone-rod dystrophy [30] and in family CHRO436 we identified two heterozygous frameshift mutations in the ATF6 gene, lending further support to our identification of ATF6 as a novel gene for achromatopsia [31].
Replicates of initial findings may still be challenging for novel candidate genes of ultra-rare disease entities represented by several unsolved cases in our study cohort (e.g. autosomal dominant vitreoretinochoroidopathy in family BD49 and atrophy of the choroid and retina in family BD35, presenting with a fundus appearance of gyrate atrophy but without hyperornithinemia). Two cases were shown to harbor pathogenic deep intronic mutations: family CACD25 is compound heterozygous for a missense mutation and a deep intronic mutation in ABCA4 that affects splicing [8]. Two affected siblings of family RCD49 are homozygous for a deep intronic mutation in PROM1 that leads to the activation of a cryptic exon [7].  Leber congenital amaurosis 2/3 CEP290 (2)
In several families we observed an inconsistency between expected findings based on the initial clinical diagnosis and the actual genetic result: family CHRO391, initially diagnosed with achromatopsia in childhood, was clinically re-examined since the only rare and potentially disease-causing exonic variants compatible with a model of autosomal recessive inheritance and shared by both siblings was a novel homozygous missense mutation in BBS5. Clinical re-examination revealed that the only symptom that could be attributed to Bardet Biedl syndrome is obesity. Neither polydactyly, renal dysfunction, hypogonadism, nor cognitive impairment was observed. However, the phenotype of Bardet Biedl syndrome is very variable and it has been shown that mutations in other BBS genes, like BBS1 and BBS2, can cause mild forms or even nonsyndromic retinal dystrophy [47][48]. It is therefore likely, that the mutation in BBS5 we found is the underlying cause of the phenotype in family CHRO391, especially since we did not find any variants in other genes that were considered pathogenic.
WES also helped to clarify distinct disease causes in family ARRP230: only one of two sisters initially diagnosed with RP was shown to be homozygous for a known missense mutation in MAK while the clinical symptoms of her sister, who does not carry the mutation, were shown to be due to chronic uveitis (Fig 1).
Family ZD68 had an initial diagnosis of cone dystrophy, and as a differential diagnosis optic atrophy and cataract (Fig 1). Genetic analysis showed that the two siblings harbor a novel missense mutation in the OPA3 gene, which is implicated in autosomal dominant optic atrophy and cataract (ADOAC).
Two siblings of family ARRP210 were shown to harbor two heterozygous missense mutations in IFT140 (Fig 1). This gene encodes a member of intraflagellar transport (IFT) proteins involved in bidirectional protein trafficking along the cilium. Mutations in genes coding for IFT components have been associated with several ciliopathies. In some instances, mutations might result in isolated forms of retinal degeneration, as has been shown for IFT172 [49], and only recently for IFT140 [50]. Prior to the latter publication, mutations in IFT140 had only been described in patients with Mainzer-Saldino syndrome and Jeune asphyxiating thoracic dystrophy [27,51]. Both syndromes involve skeletal, renal, hepatic and retinal abnormalities. Extra-ocular symptoms are not apparent in the two siblings of family ARRP210 but could not yet be excluded by radiologic and internistic examinations. Nevertheless, our findings might confirm the recent finding that mutations in IFT140 can result in nonsyndromic RP.

Copy number variation
Copy number variations (CNV) are an important cause of human disease [52]. In RD, pathogenic CNVs have been described in a number of genes. For instance, deletions of one or more exons account for a considerable part of the mutation spectrum of USH2A, EYS and PRPF31 (source: HGMD, http://www.biobase-international.com/product/hgmd). The accurate detection of large heterozygous deletions or duplications in WES data is considered one of the pitfalls of the method but is possible when applying suitable algorithms [53]. We used Exome-Depth [6] to discover CNVs in our data sets and were able to identify a large heterozygous deletion spanning exons 4-13 of the PRPF31 gene in family ADRP32. In addition to the computational approach, we manually compared the number of exon reads for known RD genes in the unsolved cases but were not able to identify additional CNVs.

RPGR ORF15
Despite the fact that our bioinformatic pipeline successfully identified a pathogenic deletion in ORF15 of the RPGR gene in family RCD291, several issues prompted us to screen ORF15 by conventional Sanger sequencing in unsolved RP families showing no male-to-male transmission: 1) The high incidence of X-linked RP among families initially classified as dominant [54], 2) the large proportion of ORF15 mutations in X-linked RP [55], and 3) the poor coverage of ORF15 in exome data due to its high repetitive nature. However, we were not able to identify additional disease-causing mutations in ORF15 in our cohort.

Unsolved cases-possible explanations
Thirty-five cases of our cohort could not be solved so far. Of these, 17 have only been analyzed by means of our custom RD panel. As discussed above, we used an early version of the RD panel which did not interrogate several more recently discovered RD-associated genes and also had some technical limitations. Of the 18 cases that remained unsolved after WES, seven cases were analyzed using a previous version of the exome capture kit. It has been shown that libraries obtained with the most recent Agilent V5 kit result in 94.57% of the targeted region covered by at least 20x compared with only 88.75% of the targeted region when the Agilent V4 kit was used [1]. Yet we did not achieve a higher detection rate in the group of cases that have been analyzed with V5 compared with those that have been analyzed with V4.
Of note, within the cohort that was analyzed by WES, we were able to solve a significantly lower fraction of cases with dominant inheritance (9/19) compared to cases with recessive inheritance (18/24). On average, our variant detection and annotation pipeline identified 100-150 sequence variants per family with dominant inheritance that were rare, potentially affecting protein function and shared by two affected family members. Even after filtering for retinal expression, several dozens of variants remained which made prioritization of novel candidate genes for dominantly inherited RD challenging.
Regardless of the inheritance pattern, it is likely that some causal variants will be structural or reside within non-coding regions. Genomic analyses for genes involved in retinal degeneration like ABCA4 [8][9], USH2A [56][57] and CEP290 [58] have shown that a probably underestimated number of patients harbor deep intronic variants that interfere with splicing. Some families might therefore be solved by WGS, like performed for family RCD49, in which we were able to identify a pathogenic deep intronic mutation in PROM1. However, computational analyses of WGS data sets are challenging and most likely only have prospect for success in families with multiple affecteds and evidence of linkage to known disease-gene loci.

Lessons from our study
An important factor that might hamper identification of disease-causing mutations is inaccurate or insufficient pedigree information with regard to inheritance, disease entity or disease status. Although all our cases have been followed clinically for many years, we took into account the possibility of an imprecise clinical diagnosis or unexpected forms of inheritance since the wide phenotypic variability of RD with the clinical overlap of symptoms often hampers accurate clinical diagnosis. This obstacle was impressively demonstrated by the fact that our molecular findings led to the reclassification of the phenotype in several families. A good example for inaccurate pedigree information in our cohort is family ARRP230: both affected sisters had initially been diagnosed with retinitis pigmentosa but only one of them is homozygous for a known missense mutation in MAK. Retrospectively, the other sister was shown to suffer from chronic uveitis and not from retinitis pigmentosa. If we had only considered overlapping variants we would have missed the MAK variant. Moreover, three families in our cohort were shown to segregate two disease entities. This shows how important it is to analyze exome data sets with a hypothesis-free approach, especially in RD, with its pronounced clinical and genetic heterogeneity.
There is an increasing number of reports describing disease-causing mutations in non-coding sequences in RD families [8,[58][59][60][61]. However, such reports are mainly based on incidental findings and there is a lack of a systematic study on the prevalence of such "cryptic" mutations. In our cohort of 89 unrelated cases, we were able to identify coding mutations in 52 cases while non-coding mutations were found in two cases, corresponding to 5% of previously unsolved cases; this confirms the necessity of analysis of regions outside of the coding exons. We therefore recommend that in future studies mutation screening should include at least as a second level screening, the analysis of non-coding regions of known RD disease genes.
In summary, our study confirms the diagnostic value of NGS platforms in the identification of mutations in a heterogeneous disease like RD. The advantage of WES to discover novel genes together with its reliable variant calling of coding regions and competitive prices, make it the technique of choice in the mutation screening of heterogeneous diseases.