Systematic evaluation of a targeted gene capture sequencing panel for molecular diagnosis of retinitis pigmentosa

Background Inherited eye diseases are major causes of vision loss in both children and adults. Inherited eye diseases are characterized by clinical variability and pronounced genetic heterogeneity. Genetic testing may provide an accurate diagnosis for ophthalmic genetic disorders and allow gene therapy for specific diseases. Methods A targeted gene capture panel was designed to capture exons of 283 inherited eye disease genes including 58 known causative retinitis pigmentosa (RP) genes. 180 samples were tested with this panel, 68 were previously tested by Sanger sequencing. Systematic evaluation of our method and comprehensive molecular diagnosis were carried on 99 RP patients. Results 96.85% targeted regions were covered by at least 20 folds, the accuracy of variants detection was 99.994%. In 4 of the 68 samples previously tested by Sanger sequencing, mutations of other diseases not consisting with the clinical diagnosis were detected by next-generation sequencing (NGS) not Sanger. Among the 99 RP patients, 64 (64.6%) were detected with pathogenic mutations, while in 3 patients, it was inconsistent between molecular diagnosis and their initial clinical diagnosis. After revisiting, one patient’s clinical diagnosis was reclassified. In addition, 3 patients were found carrying large deletions. Conclusions We have systematically evaluated our method and compared it with Sanger sequencing, and have identified a large number of novel mutations in a cohort of 99 RP patients. The results showed a sufficient accuracy of our method and suggested the importance of molecular diagnosis in clinical diagnosis.


Methods
A targeted gene capture panel was designed to capture exons of 283 inherited eye disease genes including 58 known causative retinitis pigmentosa (RP) genes. 180 samples were tested with this panel, 68 were previously tested by Sanger sequencing. Systematic evaluation of our method and comprehensive molecular diagnosis were carried on 99 RP patients.

Results
96.85% targeted regions were covered by at least 20 folds, the accuracy of variants detection was 99.994%. In 4 of the 68 samples previously tested by Sanger sequencing, mutations of other diseases not consisting with the clinical diagnosis were detected by nextgeneration sequencing (NGS) not Sanger. Among the 99 RP patients, 64 (64.6%) were detected with pathogenic mutations, while in 3 patients, it was inconsistent between a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Inherited ophthalmic disorders are a large group of clinically and genetically heterogeneous retinal diseases that constitute a major cause of blindness in children and adults [1]. This heterogeneity includes genetic, allelic as well as clinical heterogeneity [2,3]. Hereditary retinal diseases, which consist of a group of blinding diseases, such as Cone-Rod Dystrophy (CRD) and RP, are the most common ophthalmologic genetic disorders. Although most of the monogenic eye diseases remain untreatable at current stage [4]. Advances in genetic studies make it possible to reveal more than 200 disease-causing genes associated with more than 30 retinal diseases (RetNet: https://sph.uth.edu/retnet/sum-dis.htm), paving the way to accurate diagnoses, prognoses and effective genetic counseling, reducing the risk of disease recurrence in families at risk as well as improving the mechanism-specific care for these diseases [5,6].
However, there is still substantial gap in clinical gene testing of monogenic eye diseases [7]. Many monogenic eye diseases, such as RP, CRD and Leber congenital amaurosis (LCA), display a very high degree of genetic heterogeneity (each includes 20 to 60 known diseases causing genes). In additional, because of the lack of clear gene inheritance pattern in many patients and the absence of distinctive phenotypes among diseases caused by different genes, parallel sequencing of nearly all known genes were required in both research and clinical genetic screening in contexts of these diseases. Genetic eye disease can be inherited as autosomal recessive (ar) or autosomal dominant (ad), X-linked (XL), or mitochondrial traits. Nevertheless, the majority of cases are sporadic. Identifying the genetic cause of the patients' disease is crucial for genetic counseling of patients and families, and is a prerequisite for any form of genotype-based therapies. However, the enormous genetic heterogeneity in inherited ophthalmic disorders makes attempts to identify causative mutations a challenging task [8]. Recent progresses in NGS and target-enrichment technologies, which enable simultaneous rapid sequencing and analysis of the sequences of hundreds of genes at high accuracy and several orders of magnitude reduced cost, provide new opportunity for bridging the gaps in gene testing of monogenic eye diseases [9,10]. Actually, targeted gene capture NGS methods applied in research context of RP and LCA gene testing raise the possibility of being used as a routine diagnostic tool in clinical contexts [11,12]. Till now, ten more labs are providing prenatal testing service or carrier testing service through sequence analysis of the entire coding region. Emory Genetics Laboratory developed eye disorders panel include 143 genes related to 141 kinds of eye diseases (http://www.ncbi.nlm.nih.gov/).
Here, we present the development of a systematic monogenic eye diseases gene testing panel of 283 genes by coupling NGS and solution-phase-hybridization based target-enrichment method. After evaluation of key technical parameters, such as reproducibility, the sequencing depth for variants calling and accuracy of variants detection, we sequenced a cohort of 64 patients tested by Sanger sequencing previously, to compare the clinical sensitivity of these two methods, and tested the performance of the panel assay in molecular diagnosis by screening 99 unselected Chinese RP patients. Our results demonstrated that the gene testing panel is a cost-effective and high-throughput method that could be applied in both research and clinically molecular diagnosis of genetic eye diseases.

Samples
The DNA samples of 180 individuals, 163 are patient samples (with extra 12 family member samples), 68 (64patients + 4family members) of which were de-identified samples from the Casey Eye Institute, USA. The IRB approval number is IRB00008083 and the study title is "Development of new diagnostic tests". These 163 samples include 20 samples with previously identified causative variants, 44 samples with incomplete or no information of causative variants in previous Sanger sequencing-based gene testing, and 99 RP patients without any gene testing information ( Table 1). The YH DNA sample (C004 and C005) was provided by BGI-Shenzhen. The Ethics Committee of BGI has approved this study, and the IRB approval number is BGI-IRB 14002. Informed consents obtained from patients were approved by the respective institutional review boards or research ethics board.

Disease gene collections and targeted capture probes design
283 genes for 146 monogenic eye diseases, which include 58 known RP related genes (S1 File), were collected from database (OMIM: http://www.ncbi.nlm.nih.gov/omim/, and RetNet: https://sph.uth.edu/Retnet/sum-dis.htm). The RefSeq entries of these genes were also given in S2 File. Customized oligonucletide probes were designed to capture the exonic sequences and 30bp around exons by NimbleGen (Roche) oligonucletide probe design system.

Targeted sequencing library preparation and sequencing
Targeted sequencing libraries were prepared as follows: 1μg genomic DNA was sonicated to 200~300 bp sized fragments, followed by end-repair, A-tailing, and Illumina adaptors ligation, then 4 cycles pre-capture PCR and sample barcode indexing; And then, the indexed PCR product of 20-30 samples were pooled. Then, targeted capture was performed by hybridizing with capture probes, and followed by 15 cycles of PCR amplification and validation the library products for sequencing. DNA sequencing was done on Illumina HiSeq2000 sequencers to generate 90 bps of paired-end reads and 8 bps of the sample barcode.

Data filtering and analysis
Image analysis and base calling were finished by the build-in Pipeline of Illumina. Indexed primers were used for the data fidelity surveillance. Only reads that matched the adapter and primer indexed sequences with no more than 3nt mismatches were identified as valid reads. Sequencing statistics of coverage, depth and coverage depth were listed in Table 2. The reference was obtained from the NCBI, version GRCh37 (hg19). Sequence alignment was

Copy number variations detection
The screening method of copy number variations (CNV) we use was previously described by Wei XM et al. [15]. The cut-off value was built on the precondition that suggests significant depth correlation (r>0.7) at the sequencing exons among each sample. Then, z-score was calculated according to the depth of each capture region. Particularly, z-score (>2.58) was selected as the cut-off value since it filters out> 99% normal samples for bilateral tailed region. Regions with absolute z-score (>2.58) were defined as deletion (<-2.58) or duplication (>2.58) regions.

Quantitative real-time PCR (qPCR)
In order to validate the CNV results of our method, Quantitative real-time PCR (qPCR) analysis was performed. The exon 17, 20, 26 of gene CACNA2D4, exon 1, 2, 4 of gene CRX, exon 9, 11, 13 of gene TULP1 were measured by qPCR using an ABI 7900HT Real-time PCR system (Life Technologies, Carlsbad, CA, USA) and HS qPCR Master Mix, according to the manufacturer's instructions. The primers used for amplifying these exons were listed in Table 3. The PCR procedure was initiated with a thorough denaturation step of 95˚C for 10 min followed by amplification cycles. The amplification cycle condition was 95˚C for 10 s, annealing (annealing temperature was specific for each pair of primers) for 15 s and 72˚C for 30 s, for a total of 45 cycles. The DNA copy number level in affected samples were compared with the level in control samples from normal individual.

Mutation interpretation procedure
In order to identify disease causing mutations we applied the following four-step procedure. The first stage is to find out the mutations that could lead to protein coding change, which are, stop (nonsense), missense variants, exonic small insertions/deletions (InDels), especially frameshift, InDels and variants at potential canonical splice sites (±10bp of exons). Next, we will quote the allelic frequency in three databases, i.e. 1000 human genome dataset, dbSNP database. A variant having allelic frequency greater than 0.01 in any one of the databases will be filter out, as the diseases we studied here are very rare disease. In order to exclude the genetic polymorphism variants predominantly found in Chinese population, we sequenced the 283 genes of 200 Chinese normal person to build our internal control database. And thirdly, we used 5softwares (SIFT, PolyPhen2, Mutation taster, FATHMM, PhyloP score) in dbNSFP to predict novel missense variants, the variants which are predicted to be Damage or possible Damage (PhyloP score>0 will be treat as "Damage") in at least two software were reserved. Finally, if no mutations could be found from the first three stages, we will check the CNV (copy number variants) for the candidate pathogenic genes.

Systematic evaluation of the method
In order to evaluate the method, we performed targeted gene capture sequencing of the protein coding regions and 30 bp immediately adjacent sequences of 283 genes on 83 samples, including one cohort of 68 samples (64 patients + 4 family members) that have been tested by Sanger sequencing in Casey Eye Institute, USA, and 2 RP families (10 members in total), 5 unaffected

Coverage and depth analysis of 283 monogenic eye diseases genes
On an average, we generated 15.2 Mb high-quality reads for each sample, with 67.36% of which mapped onto targeted regions, corresponding to an average coverage of 400 folds on targeted regions. This sequencing depth results in at least 97.85% and 96.85% of each targeted region covered by at least 4 and 20 folds, respectively (Table 2). Among the total 4381 exons of 283 genes, only 54 exons from 35 genes were poorly covered (<50%) because of presumably high GC content or repetitive nature of the sequences and could be complemented by Sanger sequencing of PCR products. Moreover, with respect to determine the least sequencing depth required for reasonable targeted region coverage and variants detection, we randomly extracted subsets of reads with different average depths from the total mapped reads for each sample. At a sequencing depth between 200-250 folds, the coverage of targeted regions reached 97.5% with at least 1 read and 95.6% with at least 20 reads, both of which did not show any remarkable improvement with further increase in sequencing depth ( Fig 1A). For the depth need for variants calling, the number of identified SNPs also saturated at a sequencing depth of >200 folds, a similar trend was also observed for the detection of Indels ( Fig 1B). Thus, an average sequencing depth over 200 folds in samples for this study is adequate for a reasonable coverage of targeted regions and variant detection.

Reproducibility and accuracy of variants detection
In order to assess the reproducibility of the targeted gene capture sequencing panel, we calculated the correlation coefficient of coverage rate and mean sequencing depth on target regions among samples for intra and inter of the 4 targeted capture experiments. Result of each batch showed high reproducibility (0.816 to 0.996 of correlation coefficient for coverage and 0.781 to 0.999 for depth). A very high and comparable level of correlations (coverage >0.816 and depth > 0.781) were observed for both intra-and inter-experiment measurements (Fig 2A and  2B), indicating the general reliability of the targeted genes. The relative high correlation coefficient for depth and coverage rate was expected since the sequencing depth was sufficiently high, and the coverage of most target regions has reached saturation, so that the random fluctuation was in a reasonable range. Hence, the targeted gene sequencing method has a high level of reproducibility that is acceptable in relative studies. In order to assess the accuracy of variants detection, we also performed targeted sequencing for the YH (C005) sample, the genome of which has been deeply sequenced by whole genome sequencing for 50 folds. By the variant detection methods and according to the criteria described in Materials and Methods, we identified 911SNPs both in targeted sequencing and YH genome data for 1,505,712bp exon region of the 283 genes, respectively. Among those SNPs, there were 868 SNPs overlapped between two data sets, 43 SNPs specific to targeted sequencing and 43 SNPs specific to YH genome data. According to these results, both the maximum false positive rate (FP rate) and false negative rate (FN rate) are 4.7%. The sensitivity, specificity, precision and accuracy were calculated as follows:   (Table 4), it showed that pathogenic mutations revealed by NGS could cover the detection spectrum of Sanger sequencing. Because our NGS panel includes more eye disease genes which were not included in the previous Sanger RP gene list, the discrepant results of the 4 patients in Table 4 were mutations found in genes not sequenced by Sanger sequencing. For example, patient P007 was diagnosed autosomal recessive RP, we identified compound heterozygous mutations in BBS2: p.Arg275X and p.Pro134Arg, the nonsense mutation was found pathogenic and most likely has a significant effect on the function of the protein complexes [17][18][19]. The p. Pro134Arg mutation was novel and predicted probably damaging byPolyPhen-2 software (http://genetics.bwh.harvard.edu/pph2/). In patient P010, the two mutations in BBS1 gene, c.1645G>T(p.Glu549X) and c.1169T>G(p.Met390Arg), have been reported in previous studies [20,21]. It was well known that mutations in BBS (Bardet-Biedl syndrome) associate with gene induced syndromes characterized by the visual defect and other systemic symptoms like renal abnormalities. But it was also reported that 'RP-like' phenotypes without impairment in other organs was related to BBS genes in some cases [22,23]. The patients P007 and P010 were diagnosed as arRP and arLCA, yet the pathogenic mutations were found in BBS related genes instead of RP or LCA associated genes. Similar situation was found in the RP patient P062 and LCA patient P064. In patient P062, compound heterozygous mutations of CRB1: p. Cys948Tyr/p.165_167delAspGlyIle were detected, both the mutations were reported pathogenic [24,25]. Patient P064 revealed compound heterozygous mutations of CNGB3: c.1600_1601insTT/p.Gly567Glu, the insertion mutation results in frameshift mutation leads to premature termination of translation of CNGB3 transcript, and the missense mutation was a novel variation predicted pathogenic by PolyPhen-2.
Other than SNPs and small Indels, our NGS-based study also determines copy-number variation (CNV). For instance, patient P041 was diagnosed with retinal CRD. Our results showed a homozygous deletion of EX.17_26 exons within CACNA2D4 gene (Fig 3A) which was found related to retinal CRD in previous studies [26]. Meanwhile, we also found the patient's family members-father, mother and brother, carried a heterozygous deletion of EX. 17_26 within CACNA2D4 gene (Fig 3). The z-score of 17-26 exons were greater than 4.0 in patient P041, and almost all z-score were greater than 2.58 in his father, mother and brother. Consistently, the quantitative Real Time PCR (qPCR) result further validated the CNV of CACNA2D4 gene in P041 family (Fig 3E). This deletion was once found in late onset bipolar disorder patients [27]. Similar situations were found in patient P048, he and his mother were found to carry a heterozygous deletion in the whole CRX gene, mutations in CRX are associated either with recessive LCA or with dominant CRD. (Fig 3).

Molecular diagnosis of 99 RP samples
After the systematic evaluation of our panel, to test the significance of our method in molecular diagnosis, we performed the molecular diagnosis on 99 unselected Chinese RP patients, which also includes 6 Bietti Crystalline Corneoretinal Dystrophy (BCD) patients.

Sequencing of 99 RP patients using the developed panel
Using the above mentioned panel, we performed the targeted gene capture NGS experiment on 99 unrelated Chinese patients with clinical diagnosis of RP, and then the bioinformatics analysis was performed (described in Materials and methods). An average of 322 folds sequencing depth was achieved, 68.7% reads were mapped to the target region, and 98.1%, 97.2% of bases in target region were covered by 4X, 20X respectively, indicating that sufficient sequencing depth and coverage was obtained to detect variants. A total of 93,242 SNPs and 8965 InDels were identified in 99 samples, and on average, 541.8 SNPs and 490.6 small InDels were identified for each sample, respectively. Since RP is a rare mendelian disease, the variants with a frequency <0.01 in 1000 genome database, dbSNP and HapMap were kept only. In addition, to filter out the polymorphic variants in Chinese population, the variants with a frequency <0.05 in our internal database (see part "mutation interpretation" in Methods) were kept only. As a result, 52.4 rare variants (SNPs + InDels), on average, were only left in each sample, there were about 19 rare variants left in protein coding region and potential splice site. Finally, we used a "dbNSFP" program that includes 5 prediction algorithms (SIFT [28], PolyPhen-2 [29], Mutation Taster [30], FATHMM [31], PhyloP score [32]) to predict the pathogenicity of novel missense variants. As the results of prediction algorithms were often contradictory; we just took the prediction results as a reference.

Molecular diagnosis in 99 RP patients
Following our procedures, we identified 99 mutations diagnosed in all 99 RP patients, all the pathogenic mutations were validated by Sanger sequencing (RP original 54 genes). As major components, missense mutations constitute 55% and the splice, nonsense and InDel mutations together are responsible for 35% of the total identified mutations respectively (Fig 4A). We detected mutations consistent with RP phenotype in 61 (16 autosomal dominant, 40 autosomal recessive and 5 X-linked) out of 99 cases, and there are also mutations in 3 cases explained other retinal diseases such as LCA and fundus albipunctatus. Thus our identification rate was 63.5% (61/96) for RP patients and 64.6% (64/99) for all patients (Fig 4B). Altogether, we identified 94 mutations in 27 different RP genes and 5 mutations in other 3 retinal diseases genes. Among them, 72 are novel mutations and 27 are previously reported mutations. The distributing of these 27 RP disease-causing genes identified in patients was neither equally nor partially to one or two genes. The most common gene is USH2A that accounted for 9 cases, while mutations in ABCA4 and CYP4V2 genes were identified in 6 cases respectively. Eventually, recurrent mutations in patients were rare, few patients carried the same mutations, but the c.802-8_c.810del/insGC mutation in CYP4V2 was more frequent in BIETTI CRYSTALLINE CORNEORETINAL DYSTROPHY (BCD, OMIM #210370), due to the founder effect in Asian [33]. In order to understand the co-segregation of the mutations clearly, the phenotype segregation analysis was performed in 16 cases, segregation analysis turned out to be concord with the molecular diagnosis in all 16 cases.
Herein, in accordance with inheritance mode, after the pathogenic mutations were identified, the patients with potential RP-causing mutations were classified into 3 groups based on the confidence levels of different patients; patients detected with all reported mutations were defined as highest confidence group (Group. 1). Patients identified with at least one novel frameshift /nonsense mutations were categorized as middle confidence group (Group. 2). Patients carrying only novel missense/splice mutations were defined as lower confidence group (Group. 3). We identified 14 patients, 16 patients, and 31 patients in group 1, 2 and 3, respectively.
Other than SNPs and small InDels, we also found a patient, YK13S0025, carried a heterozygous deletion of exon 9-13 within TULP1 gene as well as a heterozygous variant c.349G>A (p. Glu117Lys) in TULP1 gene. Subsequently, qPCR has been applied to this large deletion for validation (Fig 5). There is a distribution bias of TULP1 pathogenic mutations which occurs in exons 10 to 15 [34,35]. The deletion of exon 9-13 results in a loss of C terminus which contains the most conserved region among the tub family members and was assumed to be critical for TULP1 function [36].

Clinical revisiting of patients carrying mutations in non-RP-causing genes
Finally, among the 3 cases explained other retinal diseases, 2 patients carrying novel frameshift/ InDel mutations were defined with high confidence and 1 patient carrying novel missense/splice mutations were defined with low confidence (Table 5). We revisited patient RP023 andRP095.
Patient RP023 is a 33 years old man. He carried a novel splice-site mutation c.-57 +7T>G and a novel missense mutation p.Arg237His in LCA9 related gene, NMNAT1 gene (Table 5) [37]. This patient showed night blindness and patchy losses of peripheral visual field since the age of 8 years. Visual acuity decrease gradually since the age of 12 followed by nystagmus, tunel vision, metamorphopsia and muscaevolitantes. His best corrected visual acuity (BCVA) was 20/200 and 20/50 in the right and left eye respectively. Fundus examination revealed waxy disc, obviously attenuated retina vascular. Significant pigmentary changes of salt and pepper or bone corpuscle type were noted. All these symptoms suggest that the clinical diagnosis is likely to be RP accompanied with cataract rather than LCA.
Patient RP095 is a 26 years old man. He carried a homozygous InDel mutation c.928delins-GAAG in RDH5 gene. He exhibited symptoms as night blindness in childhood accompanied with myodystony in the left body occasionally. Scotopic ERG (rod response) after 30min dark adaption showed the a-and b-waves's amplitudes reduced more than that of the condition of patient of 2 years earlier. Fudus examination disclosed periphery macula white starry dots, waxy disc, and obviously attenuated retina vascular without any significant pigmentary changes. Hence, the Clinical diagnosis is changed to fundus albipunctatus.

Discussion
In this study, we developed and systematically evaluated a NGS based panel for molecular diagnosis of inherited ophthalmic disorders. The evaluation result demonstrates that our method has reached a significance in molecular diagnosis and a high standard of analysis parameters, clinical sensitivity comparing with Sanger sequencing.
99.994% accuracy of variant detection is achieved in this panel, and clinical sensitivity is not only as high as Sanger sequencing, but seems to show another advantage. Asan et al. did the correlation coefficient of coverage and depth analysis in their study, and the results of their coverage rate (0.65 to 0.78) was lower than mean depth (0.90 to 0.96) [16]. Contrast to our results, the lower correlation coefficient of coverage rate in their study may due to the 30 folds low sequencing depth, which made the random fluctuation wide. For example four patients were found to carry mutations in genes related to other genetic eye diseases which were not considered in Sanger sequencing (Table 4), novel mutations were in bold type in this table. The detection of mutations in these four patients may not be achieved if the screening was only performed on specific genes associated with one or several similar diseases, due to the variety of phenotype in some non-syndromic and especially syndromic diseases. For example, the clinical manifestations of LCA/RP and related retinal diseases may be various and overlapped both at early and late stages, which makes the discrimination of various retinal dystrophies difficult sometimes (Neveling et al., 2013) [38]. For example, patients who were diagnosed with RP/LCA may actually carry mutations in non-canonical LCA/RP genes. Hence, the clinical diagnosis should be refined by molecular diagnosis. Also, screening a larger set of genes related to ophthalmologic genetic diseases is essential, for the purpose of achieving a more accurate clinical diagnosis in these patients.
In addition, our method can detect large deletions. A homozygous 17 th -26 th exons deletion in CACNA2D4 and a heterozygous deletion of the whole CRX gene in two families (P041, P048 respectively) were identified by Casey Eye Institute and also by our method (Fig 3). RP025, one of the 99 RP patients was also found to carry a large heterozygous deletion in TULP1 gene (Fig 5). In the past, people need to use two different methods to detect copy-number variants and SNVs, small InDels. Here, our pipeline can detect these 3 kinds of variants by one test. The algorithm of CNV detection is based on sequencing depth and Z-score module [15]. This pipeline can raise the molecular diagnosis rate and reduce the cost. In our opinion, it is the tendency to detect more genes and more kinds of variants by one test.
The molecular diagnosis rate of 63.5% was achieved for 96 Chinese RP patients using our method, while several recent studies using the NGS method for retinal diseases achieved a molecular diagnosis rate varying from 25-57% [38,39]. Our panel is flexible in identifying multiple pathogenic genes or heterogeneous disorders associated mutations. It reduces the dependence of specific knowledge and skills in clinical diagnosis, and even also can provide evidence to modify clinical diagnosis. In all of the 99 patients, we found molecular diagnosis of three samples inconsistent with the initial clinical diagnosis, and then we revisited two patients, the clinical diagnosis of patient RP095 was reclassified from RP to fundus albipunctatus, while patient RP023 still presented a RP phenotype rather than LCA The discrepancy in patient RP023 may be explained by the diversity of genotype-phenotype correlations, because it was reported that a lot of previously unsolved cases turned out to have mutations in genes relating to other retinal disease but not necessarily RP [40]. This explanation may also be suitable for the patient RP001. Patient RP001 carried a novel frameshift mutation c.1666delA and a novel splice site mutation c.5226+5_8delGTAA in CEP290 Systematic evaluation of a targeted gene capture for molecular diagnosis of retinitis pigmentosa gene, which is a frequent cause of LCA [41,42]. This 26-year-old patient exhibited "RP-like" phenotypes for 14 years including night blindness, vision impairment and visual field constriction without defects in other organs. The above symptoms were not the most typical symptoms in LCA. However, there are several possible reasons for undetected cases: (a) A few exons were poorly captured due to the difficulty in designing bait in repeat regions or the poor capture efficiency in GC-rich regions. Analyzing the coverage of 283 genes in all samples, 97.60% of genes were cover by !1x coverage for at least 90% of their coding bases, 97 genes doesn't reach 100%, and 49 genes doesn't reach 99%, while 4 genes were lower than 80% coverage. (b) We can identify the CNV, but deep intronic mutations and structural genomic variants were undetected. (c) Finally, some unsolved cases may be caused by new disease-causing genes while some may be caused by our limited understanding of the plethora of variants detected by NGS at present. Therefore, some variants could be overlooked by assuming they are non-pathogenic while others may be predicted pathogenic while indeed they are not.
The tremendous genetic and phenotypic heterogeneity of retinal diseases poses a major challenge for establishing a molecular diagnosis [43]. In the post-genomic era, NGS has revolutionized biological research and discovery. Thus, targeted gene capture is being used as a costeffective alternative to WGS for investigating regions of interest when a prior knowledge of potentially causal loci is available [44].
In conclusion, we performed the systematic evaluation in our targeted gene capture sequencing panel, and have compared our method with Sanger sequencing. Our method showed a high performance, and we succeeded in identifying 64.6%pathogenic mutations for 99 unselected RP patients. Altogether 75 novel mutations were found. The results showed that our method is sufficiently accurate for molecular diagnosis, it also suggested a significance of molecular diagnosis in clinical diagnosis. Comprehensive genetic screening for eye diseases would allow genetics and clinicians to improve diagnosis and perform treatment trials using updated molecular diagnosis technologies [7]. Genetic screening will be an integral part of the care for hereditary eye disease patients, and the strategy used here will become a commonly used tool for the genetically heterogeneous eye disorders in the next few years.

Conclusions
In summary, our study confirms the diagnostic value of NGS platforms in the identification of mutations in a heterogeneous disease like retinal disease. The advantage of WES to discover novel genes together with its reliable variant calling of coding regions and competitive prices, make it the technique of choice in the mutation screening of heterogeneous diseases. The aim of this study was to evaluate whether the target gene capture sequencing panel is appropriate for molecular diagnosis of genetic eye diseases. And we have systematically evaluated our method and compared it with Sanger sequencing. We have also identified a large number of novel mutations in a cohort of 99 RP patients. The experiments also showed some advantages: Firstly, our method has a little higher clinical sensitivity than that of Sanger sequencing. Secondly, the 64.6%rate of molecular diagnosis suggested that our method was appropriate for molecular diagnosis and very helpful to confirm the clinical diagnosis. Third, our method can detect SNVs, small InDels and CNVs at one test, which is helpful to lower the cost and shorten the waiting time.
These results suggested that our method was sufficiently accurate for molecular diagnosis and suggested the importance of molecular diagnosis in clinical diagnosis.
Supporting information S1 File. S1 File provides gene lists of 283 captured genes and 58 known RP disease-causing genes.
(RAR) S2 File. This file contains the following sub-files: Figures A-C, Tables A-E and the references of the detected mutations in Tables A-E. Figure A shows the overall coverage of genes in the panel. Figures B and C show the genes that doesn't reach 100% and 99%, respectively. Table A shows the variant numbers detected by NGS of 68 samples have previously tested by Sanger sequencing before. Table B shows the results of all 68 samples previously screened by Sanger sequencing. Table C shows the Z-score results for CNV detecting of family P041 and P048.