Increasing the Yield in Targeted Next-Generation Sequencing by Implicating CNV Analysis, Non-Coding Exons and the Overall Variant Load: The Example of Retinal Dystrophies

Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover “hidden mutations” such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5′ exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5′-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even truncating mutations may be misleading.


Introduction
Retinal dystrophies result from degeneration of photoreceptor and retinal pigment epithelium cells. With a prevalence of ,1 in 3,000, they represent the major cause of hereditary blindness in developed countries [1]. Apart from the individual burden, retinal dystrophies significantly contribute to healthcare costs [2]. Retinal dystrophies are characterized by extensive genetic heterogeneity, with more than 60 genes currently known to underlie retinitis pigmentosa (RP), the most prevalent subtype that affects more than 1.5 million people worldwide [3,4]. Knowing the causative mutation is desirable for several reasons: It provides the basis for personalized genetic counseling and specification of the recurrence risk, and it may predict the natural clinical course (including the determination of a genetic syndrome). In clinically atypical presentations or ambiguous family history, the genotype may specify or even reverse the previous diagnosis or the assumed mode of inheritance. Regarding the progress of gene-replacement therapy approaches for several retinal dystrophies, the genetic diagnosis will be an essential prerequisite for gene-specific therapies [3,5]. However, apart from the c.2991+1655A.G mutation in CEP290 previously reported to be present in 20% of patients with Leber congenital amaurosis (LCA) and RPGR in male RP patients [6,7], there is no major mutation or disease gene for RP and LCA, and clear-cut genotype-phenotype correlations are largely lacking, which prevents efficient targeted Sanger sequencing. Because chip-based analysis for previously reported mutations detects only a fraction of the causative alleles [8], and gene-by-gene analysis by Sanger sequencing is too laborious and expensive, genetic testing has been the exception until recently. Now, next-generation sequencing (NGS) allows for simultaneous and efficient analysis of all known disease genes for a given trait.
NGS of 55 genes involved in RP and LCA (the term ''LCA'' was applied for early-onset retinal dystrophies, including infant RP and infant cone-rod dystrophies, CRD; Additional Data File S1) in 126 patients. Causative mutations, including CNVs affecting one to multiple exons, were identified in the majority of patients and confirmed the extensive genetic heterogeneity. Our findings demonstrate the immense potential of NGS for diagnostics of retinal dystrophies and shed light on the genetic complexity of this disease group.

Performance of Two NGS Platforms in RD Gene Panel Analysis
Initially 79 samples were sequenced on the Roche 454 GS FLX platform, followed by 38 samples sequenced on the Illumina MiSeq system. With the Roche platform, 90% of the target exons were covered more than 15-fold, with an average coverage of 75fold per sample. With the Illumina MiSeq instrument, the average coverage was significantly higher (250-fold) and more complete (15-fold for more than 99% of target sequences). 37% of the samples sequenced on the 454 platform were mutation-negative (29 of 79 samples), compared to only 18% sequenced on the MiSeq (7 of 38 samples). CNV analysis was only possible with high-coverage NGS as obtained with the MiSeq system.

High Mutation Detection Rate, Extensive Genetic Heterogeneity and Predominance of Novel Mutations
The overall mutation detection rate was 70% (88/126 patients). More specifically, causative mutations were detected in 38/53 patients (72%) with autosomal recessive (ar) and in 12/14 (86%) with autosomal dominant (ad) RP ( Figure 1A,B). Three patients turned out to have X-linked RP based on the genetic findings. In LCA, causative mutations were identified in 35/56 patients (63%; Figure 1C). Although mutations in some genes (RP1 and EYS in arRP, and RPGRIP1, GUCY2D and TULP1 in LCA) were more prevalent, mutations in many rare genes account for the majority of patients, confirming that these phenotypes are genetically highly heterogeneous and only comprehensively accessible by highly parallel sequencing of all known disease genes. CEP290, previously reported as the predominant LCA gene, was not a major contributor to this phenotype in our cohort, and its hot spot mutation, c.2991+1655A.G, was not found at all. This may partially be due to the ethnic background of LCA patients in our cohort with 43% of patients originating from the Arabian peninsula. In contrast to other large studies [9], USH2A mutations contributed only to a small proportion of arRP. Causative mutations were found in 28 different genes that encode proteins from diverse pathways and cellular compartments. Mutations in ciliary genes were most prevalent ( Figure 1D), indicating the importance of the photoreceptor's connecting cilium, its associated structures and functions (such as intraflagellar transport) for visual integrity. Of 98 different mutations, 67 were novel (68%) and would thus have been missed by approaches exclusively targeting known alleles such as genotyping microarrays. Below, we describe several families with peculiar findings that further expand our understanding of RD genetics beyond the mere identification of the causative mutations.

CNV Detection from High-coverage NGS Data
Virtually any gene may be captured and subjected to NGS aimed not only at qualitative, but also quantitative readout. This utilization of NGS data enables CNV detection and can favourably complement MLPA (multiplex ligation-dependent probe amplification), where the application depends on the availability of commercial kits that currently cover only a fraction of known RD genes. We identified four alleles with pathogenic CNVs comprising one to multiple exons. Below, we describe exemplary constellations with CNVs contributing to retinal disease.
CNV and point mutation in a non-coding EYS exon contributing to arRP. Mutations in EYS account for 5-18% of arRP cases depending on the population [10,11]. It has been suggested that at least 15% of patients with monoallelic point mutations may carry midsized rearrangements as second mutant alleles [12]. In our study, EYS mutations were found in 9.4% of arRP patients (five families). One patient was compound heterozygous for a truncating mutation in the coding region and a deletion of non-coding exon 1 at least (Figure 2A,C; Figure S2A in File S1). 59 non-coding gene sequences, especially first exons, usually contain the promoter and are thus important for gene regulation and vulnerable to mutations [13]. In a recent example, a recurrent de novo mutation creating an aberrant initiation codon of the IFITM5 gene was found to cause a genetic subtype of osteogenesis imperfecta [14,15]. Promoter site prediction programs TSSG [16] and NNPP [17] predict the EYS transcription start site at the beginning of exon 1 and the TATA box upstream. The potential disease-causing effect of exon 1 mutations in EYS is supported by two siblings of a second family with a putative splice site mutation of exon 1 in trans to a truncating mutation in a coding exon ( Figure 2B,C). We therefore propose that loss and aberrant splicing of EYS exon 1 should impair transcription of the mutant gene copy and result in a null allele. Our findings illustrate the potential benefit of including 59-UTRs in NGS of disease gene or even exome panels. Evaluation of non-coding regulatory regions may identify the ''missing hit'' in heterozygous carriers of recessive mutations.
Hemizygosity of a CRX mutation in a consanguineous LCA family. In a consanguineous Turkish LCA family with two affected siblings ( Figure 3A), homozygosity mapping by genomewide linkage analysis had initially failed to identify an unambiguous chromosomal candidate region, and the combined maximum parametric LOD score of 2.4 was not obtained ( Figure 3B). NGS of a sample from the index patient identified an apparently homozygous CRX mutation in exon 4 that abrogates the natural translation termination codon (c.899A.G), predicting an elongated protein with 118 unrelated residues (p.*300Trpext*118). Subsequent quantitative analysis revealed a heterozygous deletion of exon 4 in trans to the no-stop mutation which was thereby recognized as hemizygous ( Figure 3C; Figure S2B in File S1). Both mutations cosegregated with LCA in the family. Interestingly, CRX mutations have mostly been observed in autosomal dominant LCA and CRD [18,19]. Congenital retinal degeneration in a patient with homozygosity for a missense allele, p.Arg90Trp, suggested that CRX may also be a recessive LCA gene [20]. The lack of retinal degeneration in both parents of the index patient and LCA in her brother who also carried both mutations strongly indicate that both CRX mutations identified here represent recessive loss-of-function alleles, confirming the previous assumption that recessive LCA may result from biallelic CRX mutations. This example illustrates how CNV analysis from NGS data can prevent major interpretation pitfalls, especially in consanguineous families with compound heterozygous mutations, including a large deletion simulating the expected homozygosity of a point mutation.
CNVs are common in PRPF31, an adRP gene to be considered in ''simplex'' RP. Mutations of PRPF31 account for about 5-10% of adRP cases (RP11) [4,21]. RP11 families often display incomplete penetrance, and dominant inheritance may not be obvious from the family history. In five patients, we identified heterozygous PRPF31 mutations, including deletions of multiple (patient 116) or even all coding exons (patient 113) (see Figure S1 and Figure S2C in File S1). By Sanger sequencing and subsequent MLPA in seven patients with pedigrees suggesting incomplete penetrance, we identified point mutations in two patients, and three had multiple exon to whole-gene deletions (these patients were not part of this study), compatible with a previous study suggesting that the RP11 locus is prone to genomic rearrangements [22]. Patients 22, 23 and 116 had a provisional diagnosis of sporadic and thus recessive RP which was revised after the genetic findings -resulting in significantly higher recurrence risks of up to 50% for the patients' offspring to be communicated in genetic counseling. Evaluation of PRPF31, including CNV analysis, is therefore advisable in all RP patients independent of the assumed inheritance mode.

Oligogenic Heterozygosity: Accidental Carriership, Potential Modifiers and Non-pathogenic Truncating Mutations
Given the multitude of genes implicated in RP and LCA, it is not surprising that NGS, providing a ''full picture'' of the mutational load, identifies constellations with mutations in several genes. In view of a recent study of genome sequences from 46 control individuals from various regions of the world indicating that one in 4-5 individuals from the general population may be a Figure 1. Mutational spectrum in RP and LCA patients. Percentages refer to patients with mutations in the respective gene that are considered causative. The distribution of causative mutations across many genes, each contributing a relatively small fraction to the mutational spectrum, confirms the extensive genetic heterogeneity of retinal dystrophies. Note that the three patients that were found to carry X-linked mutations are not contained in the schemes A -B. A. arRP. B. adRP. Note that the percentages refer to a relatively small adRP cohort in this study. C. LCA. D. Functional categorization of genes that were found to carry causative mutations in our study. Mutations in genes encoding components of the photoreceptor's connecting cilium and associated structures were predominant. doi:10.1371/journal.pone.0078496.g001 carrier of null mutations in a gene for inherited retinal degeneration [23], constellations with mutations in multiple loci need to be anticipated in a comprehensive NGS approach. In our study, many patients with causative biallelic mutations carried singular heterozygous missense variants in other RD genes ( Table 1). These additional alleles were frequently indicated as likely protein-damaging by the prediction programs applied herein, and their contribution to disease severity as modifiers or in an oligo2/digenic setting cannot be excluded. Digenic inheritance has been reported for non-syndromic RP due to double heterozygosity for recessive mutations in RDS and ROM1, both encoding interacting structural components of rod outer segments [24], and for deafblindness with mutations in genes encoding interacting proteins (GPR98 and PDZD7) of the Usher protein interactome [25]. However, a final proof of causative oligogenic constellations is often impossible because it usually requires segregation analysis and precise phenotyping in extended families, determination of the variants' prevalence in large cohorts or simulation in animal models as previously reported for AHI1 and PDZD7 [25,26]. Although oligogenic inheritance cannot be excluded in some families, there was no clear evidence for digenic disease or a modifying effect in any patient from our cohort.
However, we identified patients with causative biallelic mutations in recessive RP genes who additionally carried heterozygous truncating mutations in secondary loci. RP1, the most prevalent arRP gene in our cohort, was frequently found together with mutations in other RD genes. The observed constellations resulted in different deductions regarding the pathogenicity of the respective RP1 allele: Pathogenic RP1 truncations with causality in the family. RP1 mutations are mostly truncating and may cause adRP [27] or arRP [28]. Of note, no RP1 mutations were observed in our adRP patients, but RP1 was the most prevalent arRP gene, with clearly causative biallelic mutations in several cases (11,3%; Table 1). Patient 25 was compound-heterozygous for two truncating RP1 alleles, c.597C.A (p.Tyr199*) and  c.3157delT (p.Tyr1053Thrfs*4), and additionally carried a nonsense mutation in CDH23 which has previously been described in recessive deafblindness (Usher syndrome type 1D, USH1D) [29]. Segregation analysis for the mutations in RP1 and CDH23, both encoding proteins of the photoreceptor's connecting cilium, was compatible with both RP1 alleles acting recessively ( Figure 4A). Detailed ophthalmological investigation revealed no abnormalities in the mother who was double heterozygous for the c.3157delT RP1 mutation and the CDH23 mutation, excluding a digenic mechanism with an elevated recurrence risk for RP for the patients offspring solely based on her genotype. Although the CDH23 mutation may modify disease expression in patient 25, her phenotype did not appear unusually severe compared to other patients with RP1-associated arRP. Patient 25 can hence be regarded an accidental carrier of the CDH23 mutation, and her RP is sufficiently explained by her RP1 mutations. Of note, c.3157delT RP1 has been reported as a dominant mutation in an RP1 screening study [30]. Based on our data, we assume that adRP in the reported family was possibly due to a mutation in another adRP gene, and the detection of the heterozygous c.3157delT RP1 mutation likely represented supplemental carriership for a recessive allele.
Accidental carriers of pathogenic RP1 truncations and refinement of the RP1 ''critical position''. A monoallelic truncating RP1 mutation (p.Glu1750*) was found in LCA patient 124, on a homozygous TULP1 nonsense mutation background ( Figure 4B). The TULP1 mutation was clearly causative. The RP1 mutation p.Glu1750* flanks a region referred to as ''critical position'' ( Figure 4D) between residues p.1751 and p.1816 that may distinguish between non-functional and functional truncated RP1 proteins [31]: Homozygosity for p.Asn1751Ilefs*3 was shown to cause arRP [32] while homozygosity for the nonsense mutation p.Cys1816* did not evoke a retinal phenotype [33]. Hence, p.Glu1750* very likely presents a pathogenic recessive allele, an interpretation that is compatible with the homozygous mutation p.Asn1760Cysfs*46 segregating with arRP in another consanguineous arRP family from our cohort (patient 28), refining the ''critical position'' to p.1760-p.1816. However, patient 124 is obviously an accidental carrier of p.Glu1750* RP1 . The constellation resembles the findings in patient 49 who carries a heterozygous mutation of the SPATA7 initiation codon in addition to a likely causative homozygous RDH12 mutation.
Non-pathogenic RP1 truncations. In patient 55, RP was well explained by compound heterozygosity for the PROM1 mutations p.Tyr214* and p.Gln403_Ser410delinsHis ( Figure 4C). In addition, he carried a heterozygous RP1 nonsense allele, p.Gln2102*, that localized beyond the ''critical position'' where truncations of C-terminal residues may result in RP1 proteins retaining full function and can be considered non-pathogenic [31,33].
These three scenarios with RP1 mutations (a-c) demonstrate that even truncating mutations must be assessed with caution and in the context of the full variant load of known disease genes in order to avoid false interpretations: If a monoallelic truncation is found in addition to clearly causative biallelic mutations in another gene, it may either represent accidental carriership for a pathogenic allele (unrelated to disease in that patient) or a nonpathogenic variant in a non-essential gene region (as is the case at the very C-terminal part of RP1).

Monoallelic Mutations in Genes Underlying Recessive Retinal Dystrophies
Monoallelic mutations in recessive disease genes represent a challenge for interpretation regarding their causality in the patient, especially if there are no biallelic mutations in another gene for the trait that would qualify such mutations as incidental findings (i.e. carrier status unrelated to the disease in the individual). While single non-synonymous variants in recessive disease genes may often represent rare non-pathogenic variants (see Table 1, ''Additional Alleles'', and Table S2 in File S1), the nature of the alteration strongly suggests loss of function for two monoallelic mutations in recessive RP genes in our cohort: The same large inframe deletion-insertion mutation in TULP1 (p.Asp124_132delin-sAla) was identified in two independent simplex RP patients (patients 33 and 82), and an SAG nonsense mutation was found in patient 29. Patient 33 in addition carried two CRX missense variants, p.Arg41Gln and p.Tyr142Cys, that we consider likely benign (both are listed as disease-causing in HGMD, but also in dbSNP, and p.Tyr142Cys was found in several patients with disease-causing mutations in other genes). The sporadic occurrence of RP suggests autosomal recessive inheritance in all three patients, making a dominant-negative effect of the TULP1 and the SAG mutation unlikely. The three DNA samples with monoallelic mutations in TULP1 and SAG mutations were initially sequenced on the GS FLX system; subsequent analysis on the MiSeq platform did not identify additional mutant alleles, in particular no CNVs.
Patients 33, 82 and 29 are therefore either accidental carriers of the TULP1 and SAG mutations with the causative mutation in another arRP gene not known at the time of study design, or the ''missing alleles'' escaped detection by exonic sequencing because they are deep intronic (as exemplified by the LCA mutation c.2991+1655A.G CEP290 or the only known RP23 mutation in the OFD1 gene [34]), or because they localize in regulatory noncoding regions (as shown for EYS exon 1 in this study).

Patients without Mutations -possible Explanations
As discussed above, mutations in known retinal dystrophy genes may escape detection because of their localization -about 15% of disease-causing mutations localize outside coding exonic sequences [35]. Non-coding exons were not systematically included in our study; the identification of mutations in non-coding exon 1 of EYS suggests that such exons should be included in upcoming disease gene panels. Mutation-negative cases in our study will in part be due to mutations in RP and LCA genes that were identified after the design of our gene panel (e.g. NMNAT1, DHDDS, ZNF513, FAM161A, KCNJ13, IMPG2, IQCB1, CLRN1, MAK, C8ORF37, PRPF6, OFD1). For example, subsequent exome sequencing for patient 15 identified a homozygous nonsense mutation in IMPG2 a deletion of the same exon (delE4) in trans in patient 110 and her brother. B. Graphical view of the LOD score calculation from genomewide SNP mapping for this family previous to NGS testing: Genomewide homozygosity mapping prior to NGS did not identify a clear candidate locus. The combined maximum parametric LOD score of 2.4 was not obtained. C. Scheme of the CRX gene and coverage plots for CNV analysis from NGS data (Illumina MiSeq), indicating a heterozygous deletion of exon 4 (upper panel, absolute coverage based on read count; lower panel, SeqNext CNV analysis). See legend to Figure 2C. D. Schematic representation of the mapped sequencing reads for the no-stop mutation (Integrative Genomics Viewer). The mutation (arrow) was present in all 65 reads covering this region of the gene and therefore appeared homozygous. E. Electropherograms from Sanger sequencing of the no-stop mutation with hemizygosity in patient 110 (upper panel) and heterozygosity in her mother (lower panel). F. Summary of the disease-causing genetic constellation in patient 110 and her brother (superimposition on parental alleles (data not shown). Updating the panel accordingly will identify the causative mutations in additional patients. Mutations in the X-linked RP genes RP2 and RPGR have been reported to account for 8.5% of cases with RP of apparently autosomal dominant transmission and for 15% of males with simplex retinal degenerative disease [7,36]. While enrichment and NGS of RP2 is uncomplicated, the mutational hot spot exon of RPGR, ORF15, is not accessible by our NGS approach due to its highly repetitive sequence. Because about 2 / 3 of RPGR mutations reside in ORF15 RPGR [37], its inaccessibility causes a diagnostic gap. Thus, male patients (but also females) without mutations in the genes investigated herein may carry mutations in ORF15 RPGR . However, there was no excess of male mutation-negative RP patients: 41% of RP patients without mutations were male which corresponds to their percentage (40%) in the RP cohort (excluding the three proven X-RP patients).
The rate of mutation-negative samples sequenced on the Illumina MiSeq system was only half compared to the Roche GS FLX platform (18% versus 37%). In contrast to 454 sequencing (GS FLX), analysis of homopolymer stretches is not problematic in Solexa sequencing (MiSeq). Because very few mutations identified by supplementary Sanger sequencing (as conducted for mutation-negative LCA samples and arRP samples with monoallelic mutations) or with the Miseq were found in such sequence motifs, the higher detection rate on the MiSeq was mainly due to a better coverage in terms of completeness and depth (which allowed for CNV detection, too). As detailed above, CNVs in PRPF31 detected by quantitative analysis of MiSeq reads resolved the diagnosis in two patients with seemingly recessive RP, and a CNV in EYS represented the ''missing allele'' in one patient.
Finally, lack of mutations may result from unclear genetic diagnosis: If an older patient is for the first time seen by an ophthalmologist at a late stage of his disease, it may be impossible to assess if the initial disease was RP or CRD; CRD genes are not comprehensively covered by our current gene panel and the causative mutation could therefore be missed.

Comparison with Other NGS Studies on RD
This is the largest NGS study on retinal dystrophies to date. Compared to other NGS studies on this disease group, we . Different arRP scenarios implicating truncating RP1 mutations with diverse impact on disease. A. Pedigree of patient 25 whose arRP is caused by two truncating recessive RP1 alleles. In addition, the patient carries a heterozygous CDH23 nonsense mutation that has been reported in USH1 patients but is probably unrelated to disease here. B. LCA in patient 124 is due to homozygosity for the founder mutation p.Gln301* in TULP1. Heterozygosity for the RP1 nonsense mutation p.Glu1750* likely reflects accidental carriership. It likely represents a recessive lossof-function allele. Dotted horizontal line: likely consanguinity. C. Compound heterozygosity for two truncating PROM1 mutations can be considered pathogenic in arRP patient 55. The RP1 nonsense mutation p.Gln2102* locates near the C-terminus and likely represents an NMD-insensitive nonpathogenic variant. D. Scheme of the RP1 protein and overview of truncating RP1 mutations reported in this study (mutations shown in A -C in red). The four classes of RP1 truncating mutations [31] are displayed. Class I, NMD-sensitive truncations; class II, NMD-insensitive truncating mutations representing the majority of pathogenic truncation mutations in RP1 (dominant negative pathomechanism); class III, NMD-insensitive truncation mutations representing loss-of-function arRP mutations; class IV, NMD-insensitive, non-pathogenic truncations located 39 of p.1816. CP, ''critical position'': 65-residue region between p.1751 and p.1816 containing a yet undefined protein residue before which truncation causes disease. doi:10.1371/journal.pone.0078496.g004 obtained a significantly higher diagnostic yield -which is remarkable because the number of analyzed disease genes (55) in this study was much smaller than in similar studies [9,[38][39][40][41][42][43] ( Table 2). This may in part be due to different enrichment and sequencing methods, factors that both influence depth and completeness of coverage and accuracy (for example, NGS with the 454 GS FLX platform results in a higher error rate in homopolymer stretches). High and extensive coverage, as obtained in this study, allow for systematic analysis for CNVs and reduce the risk of mutations escaping detection because of their localization in regions with low coverage. Finally, direct comparison of studies is difficult because of differences in cohort size and composition regarding phenotypes, clinical characterization and traits.
In conclusion, the identification of mutations in 28 RD genes in our cohort, with most alterations previously undescribed, clearly demonstrates that this disease group is accessible only by massively parallel multi-gene sequencing. Although our NGS study was rather conservative and confined to only 55 genes, we detected the causative mutations in the majority from a large cohort of RP and LCA patients. Regular updating of such panels and inclusion of genes for related disorders (e.g. cone-rod dystrophies) is needed to maximize the mutation detection rate. CNV detection from high-coverage NGS data was a major benefit from switching to a high-capacity NGS platform. Therefore, we currently favor NGS of an RD gene panel over exome sequencing where RD gene coverage is reduced due to distribution of reads across some 20,000 genes. Both, oligogenic heterozygosity and monoallelic constellations were observed and may require segregation analysis and careful evaluation of clinical data. Importantly, NGS readout should implicate the overall variant load in order to avoid interpretation pitfalls -as exemplified by the identification of RP1 truncations unrelated to disease in certain constellations. ''Missing alleles'' in seemingly accidental carriers of recessive RD gene mutations were partly large CNVs and mutations affecting noncoding 59 exons, demonstrating that both UTR inclusion and quantitative analysis should be part of a comprehensive NGS approach. Because such mutations may also be deep intronic variants with impact on splicing, genomic sequencing, where necessary followed by RNA analysis, may complement primary exonic sequencing in the future. Careful consideration of all variants led to revision of the assumed mode of inheritance, e.g. in case of PRPF31 mutations in simplex RP patients.
As indicated by several exceptional findings in our study, scientific gain of knowledge will strongly benefit from the recent advent of NGS in routine diagnostics and the ''byproducts'' of such unprecedented large-scale analyses -not only for RD, but for many other genetically heterogeneous conditions.

Ethics Statement
All samples in this study were obtained with written informed consent accompanying the patients' samples. All clinical investigations have been conducted according to the principles expressed in the Declaration of Helsinki. The study was approved by the institutional review board of the Ethics Committee of the University Hospital of Cologne.

Patients and DNA Samples
A total of 126 patients (53 with arRP, 14 with adRP, 3 with X-RP and 56 with LCA) were included in this study. Genomic DNA was isolated from EDTA blood following standard protocols. The diagnoses of all patients were established by medical history, family history and detailed clinical evaluation of vision. Ophthalmological examination included stereoscopic funduscopy, standard ERG, perimetry, measurement of dark adaptation, and determination of best-corrected visual acuity in most patients.

NimbleGen SeqCap EZ Choice Library Design
Genomic coordinates of coding and non-coding exons in all isoforms were identified in the RefSeq database (hg19) using the University of California Santa Cruz (USCS) table browser [44]. All coding exons (31 arRP genes, 413 exons; 23 adRP genes, 248 exons; 16 LCA genes, 215 exons) of 55 known genes (as of end of year 2010; Table S1 in File S1) including 35 bp of flanking 59 and 39 intronic sequence were targeted by a custom SeqCap EZ Choice library (NimbleGen, Madison, Wisconsin, USA). In total, 752 regions were targeted comprising 213 kb of target sequence. The final design covered about 99% of the requested target regions. Because of its highly repetitive sequence which precludes efficient enrichment and sequencing, RPGR exon ORF15 was excluded from panel design. Because the USH2A gene was not included in the panel design, all coding exons of the gene were analyzed either by conventional Sanger sequencing or by a complementary USH2A-including NGS gene panel in arRP patients without mutations in the RP genes covered by our panel.

Sequence Capture and Next-generation Sequencing (NGS)
Samples from 79 patients were subjected to NGS on the Roche GS FLX platform (454 Life Sciences, Branford, CT; average output 400-500 Mb). In the second part of the study, 38 samples were sequenced on the Illumina MiSeq system (Illumina, San Diego, CA; average output 1,5-5 Gb) with only the latter allowing for CNV detection due to high and uniform coverage. Samples from nine patients were analysed on both systems (two samples with no mutations, four samples with monoallelic mutations in 454 sequencing and three samples with confirmed mutations from 454 sequencing). Between eight (GS FLX) and 20 (MiSeq) samples were pooled and sequenced in a multiplexing procedure. Multiple DNAs were enriched using the NimbleGen SeqCap EZ choice sequence capture approach and sequenced by Roche 454 GS FLX pyrosequencing or by Illumina MiSeq sequencing-by-synthesis technology according to the manufacturers protocols. In brief, 0.5-1 mg of genomic DNA per sample was sheared using the Covaris S2 AFA system (Covaris Inc., Woburn, MA, USA) and ligated to barcoded adaptors for multiplexing. Pre-capture amplified samples were pooled and hybridized to the customized in-solution capture library for 72 hours, subsequently eluted and post-capture amplified by ligation-mediated (LM-) PCR. This amplified enriched DNA was used as input for emulsion PCR (emPCR) and subsequent massively parallel sequencing on one full PTP of a Roche 454 GS FLX platform or as input for direct cluster generation and sequencing on the Illumina MiSeq system (26150 bp paired-end reads). Uncovered regions of LCA genes (n = 16) in negative samples from LCA patients designated as having ''LCA'' were sequenced by conventional Sanger sequencing for completeness, whereas in RP samples, gaps of uncovered exons of arRP genes (n = 31) samples were only eliminated by Sanger sequencing in search of a second mutation in an incompletely covered arRP gene).

Read Mapping and Variant Analysis
Demultiplexed reads from the GS FLX platform or paired end reads (26150 bp) from the Illumina MiSeq instrument were mapped against the hg19 human reference genome using SMALT  [46] was applied for a local realignment and base quality score recalibration of the mapped reads. Mapping and coverage statistics were generated from the mapping output files using the SeqCap analysis toolkit provided by Roche 454 as well as GATK. Identified variants were checked against the dbNSFP v1.3 [47] as well as dbSNP v135 and HGMDH Professional 2011.4 database (released December 9, 2011). SNVs and indels were filtered depending on their allele frequency focusing on rare variants with a minor allele frequency (MAF) of 3% or less. Nonsense, frameshift and canonical splice site variants were considered pathogenic. Pathogenicity of a rare non-synonymous single nucleotide variations (nSNVs) scores of which were not yet predicted in dbNSFP were assessed using five in silico prediction software tools: SIFT [48], Mutation Taster [49], PolyPhen-2 [50], AlignGVGD [51,52] and PMut [53]. An nSNV was considered likely pathogenic when at least three of these algorithms predicted that the variant is probably damaging and when it was predicted as conserved with the conservation prediction algorithms PhyloP [54] and GERP++ [55]. The impact of splice site variants was assessed using splice site prediction programmes NNSPLICE v0.9 [56], NetGene2 [57,58], SpliceView [59] and ESEfinder [60]. Variants not listed in HGMD [61] were considered novel. For visualization of the identified SNVs, SFF files (Roche 454) or FASTQ files (Illumina) of the patients' sample were loaded into the SeqPilot SeqNext module (v4.0, JSI medical systems, Kippenheim, Germany), and reads were mapped against the genomic sequences of the genes in the indicated subpanels arRP, adRP or LCA. SNVs were filtered by their occurrence in at least 25% of the reads. Distinct variations were checked against the in-house database. Due to inaccurate sequencing of homopolymers by Roche 454 pyrosequencing, small indels in homopolymer stretches were filtered using stringent criteria (bidirectional occurrence in at least 20% of the forward reads and 40% of the reverse reads or vice versa) and visual inspection in the SeqNext software. Identified sequence variants were annotated according to the guidelines published by the Human Genome Variation Society.

Validation and Segregation Analysis
Sequence variants of interest identified by high-throughput sequencing were verified by Sanger sequencing following PCR amplification of the respective coding exons and adjacent intronic sequences by standard protocols. Purified PCR fragments were sequenced using Big Dye Terminator Cycle sequencing and analyzed on an 3500 Genetic Analyzer sequencer (Applied Biosystems, Foster City, CA, USA). Where applicable, DNA from affected and unaffected family members was analyzed for segregation analysis of putatively causative sequence variants. The resulting sequence data were compared to the reference sequence of the RefSeq database [62].

Copy Number Variation Analysis
Very high coverage was reproducibly achievable by sequencing with the Illumina MiSeq system and enabled copy number variation (CNV) analysis for most of the analyzed genes. Potential copy number alterations (CNA) were initially identified by VarScan [63] on mapped reads. Thereby, coverage of every target region of the sample of interest was internally normalized and compared versus normalized control data of other samples of the same run (VarScan copy number mode and standard settings). Potential CNVs were reported, if the CNV was detected against at least 75% of the control patients. CNVs were annotated using refGene from UCSC (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene. txt.gz). Potential CNVs were visualized and recalculated with the CNV mode of SeqNext using standard settings and the analysis mode ''all vs. all.'' Thereby, the normalized relative coverage of every target ROI (region of interest) of a patient sample (relative product coverage, RPC P.) was calculated against the normalized average relative target coverage of several control samples (RPC C.) to obtain the ratio relative coverage (ratio RPC). Deletions were reported if the ratio RPC fell below 75%. CNVs that fulfilled these criteria were validated by multiplex ligation dependent probe amplification (MLPA) for the affected gene. For the EYS gene, the SALSA MLPA probemix P-328-A1 EYS, for the CRX gene the SALSA MLPA probemix P221-B1 LCA and for the PRPF31 gene the SALSA MLPA KIT P235-B1 Retinitis Pigmentosa was used (MRC-Holland, Amsterdam, The Netherlands). Only CNVs that could be confirmed by MLPA were considered real. MLPA results were visualized with the MLPA module of the SeqPilot software (JSI Medical Systems). The ratio RPA (relative peak area) was calculated as the RPA of the patient versus controls.

Exclusion of the CEP290 Hot Spot Mutation in LCA Patients
For exclusion of the common c.2991+1655A.G mutation in the CEP290 gene mutation in all LCA patients prior to NGS analysis, the region of interest in intron 26 was amplified by PCR. Genotyping for the presence of the mutation was performed by pyrosequencing using QIAGEN Pyro Gold chemistry according to the manufacturers instructions and subsequent analysis on a PSQ 96MA system (QIAGEN, Hilden, Germany).

Linkage Analysis
In the family of patient 110 afflicted with LCA, we performed genome-wide homozygosity mapping using the Affymetrix Gen-eChip Human Mapping 10K Array, version 2.0 (Affymetrix, Santa Clara, CA). GRR [64] and PedCheck [65] were used to verify relationships and to identify Mendelian errors. Nonparametric linkage analysis was done with MERLIN [66]. Parametric linkage and haplotype analysis was performed using the program ALLEGRO [67]. All data handling was performed using the graphical user interface ALOHOMORA [68]. Graphic output of haplotypes was generated with HaploPainter [69].

Supporting Information
File S1 File S1 contains the following files. Figure S1. CNVs (from partial to complete gene deletions) of PRPF31 detected by analysis of NGS data. A heterozygous deletion of all 14 PRPF31 exons was identified in patient 113. In patient 116, exons 1-5 were deleted on one gene copy (the noncoding exon 1 was not yet included in target enrichment and subsequent NGS, but its deletion was confirmed by MLPA in both patients). The dashed line and red arrows indicate lower coverage for heterozygously deleted regions compared to one control sample. Figure S2 Table S1. Genes analyzed in this study. A. arRP, adRP and LCA genes that were captured and subjected to NGS in this study. B. Functional categorization of genes with causative mutations. Table S2. Additional variants classified as ''likely pathogenic''. Classification as pathogenic by at least three out of five bioinformatic prediction programs and a minor allele frequency below 3% in unresolved patients. Although a contribution of these variants to the phenotype cannot be excluded, they were not considered causative. In many cases, they represented monoallelic variants in recessive genes which would not sufficiently explain the phenotype. References S1. References for Table 1 and Table S2.