Combined Genetic and High-Throughput Strategies for Molecular Diagnosis of Inherited Retinal Dystrophies

Most diagnostic laboratories are confronted with the increasing demand for molecular diagnosis from patients and families and the ever-increasing genetic heterogeneity of visual disorders. Concerning Retinal Dystrophies (RD), almost 200 causative genes have been reported to date, and most families carry private mutations. We aimed to approach RD genetic diagnosis using all the available genetic information to prioritize candidates for mutational screening, and then restrict the number of cases to be analyzed by massive sequencing. We constructed and optimized a comprehensive cosegregation RD-chip based on SNP genotyping and haplotype analysis. The RD-chip allows to genotype 768 selected SNPs (closely linked to 100 RD causative genes) in a single cost-, time-effective step. Full diagnosis was attained in 17/36 Spanish pedigrees, yielding 12 new and 12 previously reported mutations in 9 RD genes. The most frequently mutated genes were USH2A and CRB1. Notably, RD3–up to now only associated to Leber Congenital Amaurosis– was identified as causative of Retinitis Pigmentosa. The main assets of the RD-chip are: i) the robustness of the genetic information that underscores the most probable candidates, ii) the invaluable clues in cases of shared haplotypes, which are indicative of a common founder effect, and iii) the detection of extended haplotypes over closely mapping genes, which substantiates cosegregation, although the assumptions in which the genetic analysis is based could exceptionally lead astray. The combination of the genetic approach with whole exome sequencing (WES) greatly increases the diagnosis efficiency, and revealed novel mutations in USH2A and GUCY2D. Overall, the RD-chip diagnosis efficiency ranges from 16% in dominant, to 80% in consanguineous recessive pedigrees, with an average of 47%, well within the upper range of massive sequencing approaches, highlighting the validity of this time- and cost-effective approach whilst high-throughput methodologies become amenable for routine diagnosis in medium sized labs.


Introduction
Retinal dystrophies (RD) are a group of more than 25 genetic visual disorders [1]. Although RDs rank among mendelian rare diseases, taken together, they occur at an estimated prevalence of 1-2 patients per 1000 individuals. In fact, the most frequent form of RD, retinitis pigmentosa (RP), affects 1.5 million individuals worldwide [2]. The clinical traits underlying these disorders disturb from the macular region (central vision) to the outlying retinal area (peripheral vision). In addition, at least 30 different syndromes (such as Usher and Bardet-Biedl) share some of these phenotypic alterations [3,4]. On the genetic side, more than 5000 mutations in almost 200 genes are causative of retinal dystrophies so far [1, 5,6]. Yet, around 35% of the cases remain unassigned [7]. The extreme heterogeneity of RDs at the clinical and genetic levels hinders the accurate clinical assessment, patient management, and genetic counseling. Within this context, molecular diagnosis, however challenging, is instrumental to improve the diagnosis and prognosis of RDs and guide future therapies [7][8][9].
Currently, the most demanding issue in RD molecular diagnosis is the prioritization of methodological strategies, where the main parameters to be balanced are cost, time and yield. These parameters strongly depend on phenotypic clinical assessment, pedigree information, sample availability and methodological resources. Most genetic laboratories resort to direct mutational screening when the clinical traits and/or the genetic information associated to the disease limit the number of candidates to be analyzed. Unfortunately, this is not a common case for RDs, and this type of analysis would imply screening more than 1500 exons. The search for an alternative cost-effective approach is mainly being performed using high-throughput platforms, in particular massive sequencing, which require powerful and sophisticated bioinformatics tools for analyzing and filtering the data [10,11].
To improve diagnosis, we have focused on a comprehensive strategy based on the clinical phenotype and all available genetic data prior to either analyze a reduced manageable number of candidate genes or resort to massive sequencing. We have generated and optimized a SNP-based chip for haplotype cosegregation analysis [12,13] to genotype 7-10 SNP markers of one hundred genes associated to the most prevalent RDs: Cone Dystrophies (CD), Cone-Rod Dystrophies (CRD), Congenital Stationary Night Blindness (CSNB), Leber Congenital Amaurosis (LCA), Macular Degeneration (MD) and RP. Based on this methodology, a multi-tiered approach has been devised to costeffectively diagnose [14] a panel of 36 Spanish families with nonsyndromic retinal dystrophies plus 5 patients with Usher's syndrome. As a result, we have identified the pathogenic mutation of 17 out of the 36 families, and 3 of the 5 isolated Usher patients, overall reporting 14 novel mutations. After the RD-chip analysis discarded all known RD genes, Whole Exome Sequencing (WES) was undertaken in two pedigrees. The pathogenic mutations were unexpectedly identified in two RD candidates, which had been previously discarded as non-cosegregating on the basis of homozygosity by descent in consanguineous families, and infrequent recombination of closely mapping SNPs.

Patients
Thirty-six Spanish families diagnosed with RP, LCA, CRD or CD plus 5 isolated Usher's syndrome patients were recruited for this study. Written informed consent from the patients and relatives was obtained following the tenets of the Declaration of Helsinki. Patient recruitment and sample collection procedures had been previously approved by the Bioethics Committee of the University of Barcelona (Barcelona, Spain). Peripheral blood DNA was obtained using the MoleStrips DNA Blood kit with the GeneMole instrument (Mole Genetics, Mole, Lysaker, Norway). DNA from Spanish control individuals was obtained from peripheral blood using the same methodology.

SNP Selection
The RD chip for the molecular diagnosis of Mendelian nonsyndromic retinal dystrophies was an optimized version of a previous cosegregation chip for RP-LCA disorders [12,13]. Seven to ten SNPs were selected for each candidate (100 genes in total), and genotyped on a customized Golden Gate Genotyping Assay (Illumina). The SNPs were selected following: i) high informativity according to SNPbrowser Software Version 4.0.1 and dbSNP database (www.nlm.nih.gov/projects/SNP/); ii) physical location (covering upstream, intragenic and downstream regions); iii) inclusion in different haplotypic blocks. The genes analyzed by this RD chip are listed by chromosome position in Figure 1. In addition, some common mutations in ABCA4, CERKL, COL8A2, CRB1, LRP5, NR2E3, PRPF31, RHO and USH2A, were included for direct genotyping.

High-throughput SNP Genotyping
One microgram of sample DNA (at 20 ng/ml) was laid in 96well plates. SNPs were genotyped using the Golden Gate Array (Illumina) platform following the instructions, protocol and software provided by the manufacturers. Haplotype and cosegregation analyzes were performed by hand from the raw data genotypes.

Mutational Screening
In Usher syndrome samples, where a major causative gene is involved, the Arrayed Primer Extension (APEX) approach plus direct gene sequencing was the molecular diagnosis procedure of selection. In all other cases, cosegregation analysis with the RDchip allowed to highlight the best candidates for mutational screening. All the exons and exon-intron boundaries of selected genes were directly screened for mutations in each patient. Genomic DNA was amplified, purified on High Pure 96 UF Cleaning Plates (Roche) and sequenced using the BigDye v3.1 kit (Applied Biosystems, Inc.) in the ABI PRISM 3730 DNA sequencer (Applied Biosystems, Inc.).
All missense changes identified were verified in control population using the dbSNP database (Build 137, www.ncbi.nlm. nih.gov/projects/SNP/), the 1000 Genomes Project data (http:// browser.1000genomes.org/index.html), and ESP6500 data of the National Heart, Lung, and Blood Institute GO Exome Sequencing Project (http://evs.gs.washington.edu/EVS). To validate unreported missense genetic variants, over one hundred matched controls were analyzed to discard rare non-pathogenic polymorphisms restricted to the Spanish population.

Bioinformatic Analyses
All the sequences were analyzed using the sequence assembly software Seqman (DNAStar, Madison, WI) and aligned to the reference gene sequence (Genome Reference Consortium human genome build 37, human genome 19).

Whole-Exome Sequencing
Genomic DNA from peripheral blood was prepared using the QIAamp DNA Blood Maxi Kit (Qiagen). Exome capture was performed at the CNAG using SureSelectXT All Exome v4 kit (Agilent), following the manufacturer's protocol. Libraries were sequenced on a HiSeq2000 (Illumina) to at least 80x average coverage of the target region.
Reads were aligned to the human reference genome build GRCh37 (hg19) using the Burrows-Wheeler Aligner (BWA) [21] in family E5 and GEMMapper [22] in family 9RE. Mapped reads were filtered (leaving only those mapping in unique genomic positions with enough quality), sorted and indexed with SAMtools (version 0.1.18) [23]. Identification of single nucleotide variants and Indels was performed using GATK standard hard filtering parameters [24] (family E5) or SnpEff [25] (family 9RE). In house Perl scripts were used to select the variants shared by all affected individuals, predicted to produce a high or moderate impact, including intron-exon junctions, non-annotated variants (synonymous, non-synonymous, and non-sense mutations) in coding regions, or short coding insertions or deletions. Variants mapping to the candidate genes were selected for further validation. For the final WES report the VARIANT [26] annotation tool provided the putative functional consequence, as well as other additional relevant information of the identified variants for the final candidate gene selection.

RT-PCR Expression Analysis
Blood samples from patients, relatives and unrelated controls were mixed with an RNA stabilizer solution (RNALater; Ambion) in a 1:4 ratio. Total RNA was obtained from 3 ml of blood using the RiboPure-Blood Kit (Ambion, Austin, TX), and retrotranscribed using the Transcriptor High Fidelity cDNA Synthesis Kit (Roche Applied Science, Indianapolis, IN) with a mixture of random hexamers and oligo(dT)18, according to the manufacturer's instructions. RPGRIP1 and G3PDH (used as control) transcripts were amplified using specific exon primers and the GoTaq Flexi DNA polymerase (Promega, Fitchburg, WI) in a final volume of 50 ml. The G3PDH PCR conditions were: denaturation for 5 min at 94uC followed by 35 cycles of 20 s at 94uC, 30 s at 60uC, and 1 min at 72uC, using 2 ml of cDNA. For RPGRIP1, primers were located in exons 14 and 16, and the PCR conditions were: denaturation for 5 min at 94uC followed by 38 cycles of 20 s at 94uC, 30 s at 60uC and 40 s at 72uC, using 5 ml of cDNA. Amplified bands were excised, purified from the gel using the Expin GeneAll Gel SV kit (GeneAll) following the manufacturer's protocol, and sequenced.

Plasmid Constructions and Expression Assay
The reconstructed RPGRIP1 minigene encompassed exons 12 to 18, plus at least 200 bp of each intron-exon boundary, after amplification of genomic DNA from patient 1 of the 59RE pedigree (heterozygote for the c.2367+23del mutation). The minigene was cloned in-frame at the C-terminus of the HA epitope into the pcDNA3.1 expression vector. Wild-type and mutated clones were confirmed by sequencing.
HEK293 cells were seeded on 12-well plates (4610 5 cells/well) and grown in DMEM (Invitrogen, Barcelona, Spain) supplemented with 10% of fetal bovine serum. After 12 hours, cells were transiently transfected with, either the pcDNA3.1-wild type (wt) RPGRIP1 minigene, the pcDNA3.1-mut RPGRIP1 minigene (containing the c.2367+23del mutation), or the empty vector (Clontech-BD), using Lipofectamine 2000 (Invitrogen). Forty-eight hours after transfection, cells were collected, lysed, and total mRNA was used for RT-PCR (same protocol as above). RPGRIP1 cDNA amplification was performed using primers of exons 15 and 18 as follows: denaturation for 5 min at 94uC, followed by 35 cycles of 20 s at 94uC, 30 s at 62uC, and 1 min 30 s at 72uC, using 1 ml of cDNA. G3PDH amplification was used as a control.

Classification of the Patients According to Phenotype and Family Information
Our initial cohort comprised 41 families, 36 affected with nonsydromic RDs and 5 affected by Usher syndrome type II. We attained full diagnosis for 22 of them, whose pedigrees are displayed in Figure 2. Cosegregation with the novel mutated alleles identified is shown in Figure S1.

Analysis of Usher Syndrome Cases
Usher syndrome is characterized by specific phenotypic traits that allow a clear clinical characterization in three main forms, being USH II the most frequent type (between half and two thirds of all cases), and USH2A the major causative gene (75-80% of USH II cases) [27]. Our cohort had 5 USH II cases as isolated patients or small pedigrees, which when directly tested for described mutations in USH2A (Asper Biotech), only 4 out of 10 mutated alleles were identified, all in heterozygosis. As no complete molecular diagnosis was achieved, direct mutational screening of all USH2A exons was then undertaken, prioritizing the analysis of exons where reported mutations cluster. Overall, three of the five patients (pedigrees shown in Figure 2) were completely diagnosed, with 3 missense mutations identified: c.1751G.T in heterozygosis (p.C584F), c.10636G.A in homozygosis (p.G3546R), and c.12574C.T in heterozygosis (p.R4192C) (

Genotyping of RD Families
Contrary to Usher syndrome, most RDs show high clinical and genetic heterogeneity, which greatly hampers molecular diagnosis. Our approach was to use genetic information and cosegregation analysis to decrease the number of candidate genes for mutational screening. When pedigrees were available, the use of an automated and robust SNP-based genotyping microarray greatly diminished the number of candidates. To this aim, 36 families (with at least four available samples) affected with retinal dystrophies (Retinitis Pigmentosa, Leber Congenital Amaurosis, Cone-rod Dystrophy or Cone Dystrophy) were analyzed with our optimized in house RD-chip that genotyped 768 SNP markers spanning the 100 most prevalent RD genes reported at that moment ( Figure 1). Six families showed autosomal dominant inheritance and a large number of affected individuals; twentynine were autosomal recessive pedigrees with a low number of affected members, and the remaining two were compatible with X-linked inheritance.
After RD-chip genotyping, haplotypes were constructed for each family to assess cosegregation under the presumed inheritance pattern. In pedigrees where 90 to 99% of candidates were discarded, direct mutational screening was performed in the remaining non-excluded genes. For each case, the candidates were prioritized according to previous gene assignment to: 1) the same clinical diagnosis and mendelian pattern; 2) a closely related retinal dystrophy with the same inheritance pattern; 3) the same clinical phenotype irrespective of the inheritance pattern, and finally 4) the remaining RD candidates.

Inferred Haplotypes and Subsequent Mutation Screening of the Prioritized Candidates in X-linked and Dominant Pedigrees
The genotyping results for each family were first analyzed under the most probable mendelian pattern of inheritance to exclude non-cosegregating genes and prioritize the remaining candidates. The final results are presented by family and summarized in Table 1 B.
Seven pedigrees were compatible with an autosomal dominant (ad) pattern, but one (11NCE) could also be explained by an Xlinked inheritance, as all affected women showed a less severe phenotype. In this case, the haplotypes were first analyzed under a X-linked hypothesis (see below). Of the six AD families, only 2 (6ORG and 2NCE) were amenable for mutational screening, while in the rest more than 10 candidates remained. Indeed, in dominant diseases a large number of samples is required to attain genetic informativity, which seldom occurs. In family 6ORG, the RD-chip highlighted seven candidates, three of them responsible for adRP: SEMA4A, SNRNP200 and TOPORS. We prioritized the analysis of SNRNP200, as the cosegregating haplotype extended to three neighbouring RD genes (CNNM4, CNGA3, MERTK), overall covering more than 16 Mb (Figure 1). Direct exonic sequence revealed a novel missense mutation in SNRNP200, c.2042G.T (p.R681L) (Table 1 B) in a codon also mutated in other adRP cases (c.2041C.T, p.R681C and c.2042G.A, p.R681H) [28]. Bioinformatics analysis showed that this residue was highly evolutionary conserved (data not shown) and predicted a damaging effect (Table 2). Moreover, none of the healthy siblings did carry the pathogenic variant, supporting its pathogenicity. In family 2NCE, after the RD chip analysis, ten candidates remained. Extended haplotypes with neighbouring RD genes decreased the number of candidates to five. Unfortunately, no mutation was found in any candidate (the pedigree is not included in Figure 2). X-linked inheritance was assumed for 20NCE and 11NCE, the latter being also compatible with AD inheritance. SNP genotyping revealed a common deleted region comprising the full RP2 locus in all male patients of 20NCE, clearly underscoring RP2 as the disease-causing gene. Indeed, mutational screening confirmed the deletion of the whole coding region. Concerning pedigree 11NCE, the milder affectation of women -suggestive of an X-linked trait-, added to the cosegregation of the RP2 haplotype, both pinpointed this candidate for mutational screening. Exon sequencing identified a previously reported mutation, c.409_411del causing p.I137del, in all the family patients (Table 1 B) [29].

Recessive Consanguineous Families
In five of ten known consanguineous families (10RE, 12ORG, 23NCE, T5, and 25NCE), CRB1 was the candidate of choice. Prioritization was established based on either an extended haplotype comprising the adjacent CFH locus in four pedigrees, or a shared haplotype with a previously diagnosed family, suggesting a founder effect.
The patient II.6 in the family 10RE ( Figure 2) carried an homozygous unreported deletion in the intron 9 splice donor site of CRB1, c.3749+2_3749+3del, which ablated the splice signal. This outcome was confirmed by in silico predictions (Table 3). His nephew was a double heterozygote for this mutation plus a frequent pathogenic variant, c.2843G.A p.C948Y [30].
In family 12ORG, all but four RD genes were discarded, of those, CRB1 showed an extended haplotype. Direct sequencing revealed a novel missense mutation in homozygosity, c.1702C.T p.H568Y, whose pathogenicity was supported by in silico analyses ( Table 2). On the other hand, families 23NCE and T5 showed homozygosity for the known missense mutations c.2290C.T p.R764C, and c.2843G.A p.C948Y, respectively [30]. Of note, the haplotype of family T5 affected members was also shared by the patients of another family (25NCE). Subsequent sequencing analysis of CRB1 confirmed the same causative mutation, supporting common ancestry.
Family 83RE showed an extended haplotype spanning USH2A and RD3. Given that the clinical diagnosis of the family was RP, USH2A was prioritized for direct mutational screening but was excluded after sequencing 72 exons. Screening of the RD3 candidate, previously reported only as a LCA-causative gene, revealed a new missense mutation c.259A.G p.K87E in homozygosity. In silico predictions supported its pathogenicity (Table 2). Notably, this variant had been identified in 4 out of 13.002 control chromosomes (NHLBI Exome Sequencing Project).
The consanguineous family 75RE was first analyzed assuming homozygosity by descent, and under this assumption, all candidates were discarded. However, if non-consanguinity was assumed an extended haplotype spanning candidates USH2A and RD3 emerged. Subsequent mutational screening of USH2A identified two disease-causing mutations in the patients: the reported nonsense c.2209C,T p.R737X and the novel missense c.8693A.C p.Y2898S. The pathogenicity of the latter was fully supported by in silico predictions ( Table 2).
The non-discarded genes of family 17NCE were prioritized according to: 1) cosegregation and phenotype, pointing to LRAT, or 2) shared haplotype with a previously diagnosed Spanish family in MERTK [31]. LRAT was discarded after sequencing of all exons, whereas direct screening of MERTK exon 16 identified the expected mutation c.2189+1G.T in homozygosis, again supporting a founder effect.
Two remaining families were not further considered due to the lack of genetic informativity.

Recessive Non-consanguineous Families
After cosegregation analysis, three candidate genes remained in the 22NCE family (PROM1, RP1 and TEAD1). Although consanguinity had not been reported, the mutational screening of PROM1 revealed a novel homozygous mutation, c.1984-1G.T, which ablates the consensus acceptor splice site of intron 17. Its pathogenicity was fully confirmed by in silico predictions (Table 3).
In family 59RE, seven candidate genes cosegregated, although the clinical phenotype of the patients pointed RPGRIP1 and TULP1 as the best candidates. TULP1 did not bear any mutation, whereas two previously unreported variants in RPGRIP1 were identified. The variant c.895_896del, p.E299Sfs*21, was clearly pathogenic and produced a truncated protein. The other variant was intronic, c.2367+23del, and the possible pathogenic effect was unknown. In silico predictions for splice sites, splice enhancers and silencers did not reveal any strong molecular alteration. Besides, in vivo analysis of patient's mRNAs was restrained by the RPGRIP1 low expression levels in blood. Finally, the transfection in cultured cells of minigenes bearing either the WT or the variant sequence did not conclusively support its pathogenicity. As the analysis of 434 control chromosomes identified this variant once, the c.2367+ 23del variant could be presumably classified as a rare indel, and its pathogenicity remains to be proved.  Three families compatible with a recessive RP inheritance (5NCE, 18NCE and 21NCE) showed cosegregation with 5-12 candidates, USH2A among them. Given that direct genotype of mutations in the RD-chip had already detected a frequent USH2A pathogenic allele c.2276G.T, p.C759F, in heterozygosis [32], direct sequencing of the full coding sequence was undertaken. Data revealed one novel pathogenic allele in family 5NCE, c.652-2A.C, which ablates the acceptor splice site, and two reported mutations c.9799T.C p.C3267R (18NCE) and c.1434G.C p.E478D (21NCE); although the pathogenicity of this last variant is still controversial [32][33][34].
Family 93RE whose clinical diagnosis was compatible with either CRD or recessive Stargardt disease showed cosegregation with GNAT2, ZNF513, OPA1, RP1L1 and ABCA4. Based on the type of inheritance and phenotype, we prioritized the analysis of GNAT2, ZNF513 and ABCA4. Two unreported mutations in ABCA4 were identified, the nonsense c.3988G.T p.E1330X and the missense c.6410G.A p.C2137Y variants. In silico predictions by PolyPhen2 and SIFT of the latter supported its pathogenicity ( Table 2).

WES Families
After exclusion with the chip of the one hundred RD candidates, WES was undertaken in suitable remaining families. Concerning family E5, most recessive RD genes were discarded under the assumption of claimed consanguinity (I.1 and I.2), and the rest of candidates was excluded by Sanger sequencing. WES was then undertaken for patients II.4 and III.2 ( Figure 2). Unexpectedly, one novel (c.7364G.A) and one recently reported mutation (c.4325T.C) [35] in USH2A were identified in patient II.4, whereas his affected son (patient III.2) carried the c.4325T. C mutation from his father plus the reported pathogenic c.2167+ 5G.A allele inherited from his mother. Haplotype analysis of the pedigree confirmed cosegregation of these pathogenic variants (Figure 3). In this family, the exceptional non-compliance with the homozygosity-by-descent assumption had excluded USH2A as the causative gene in the RD chip analysis.
The 9RE pedigree ( Figure 2) showed an autosomal dominant cone dystrophy. In this large cohort, the high genetic informativity allowed us to exclude most candidates after the RD cosegregation chip, and Sanger sequencing ruled out the few remaining genes. Exome sequencing was then undertaken and identified a new mutation in GUCY2D, c.2747T.C. This candidate had not been previously considered because the SNP haplotype analysis showed recombination within the 39 flanking region in one affected member (IV 5). Although the selected SNPs are closely linked to the genes to minimize it, recombination, however infrequent, may occur and mislead the analysis. Table 4 summarizes the percentage of complete molecular diagnosis attained with the RD chip, categorized by inheritance pattern and the number of samples available per family. Six autosomal dominant families with at least 4 available samples (and a minimum of three patients) were analyzed, yet only one family, 6ORG, (7 samples with three affected members) was successfully diagnosed. Indeed, when dealing with dominant diseases, the main drawback for successful diagnosis is the large number of samples required to attain discriminative genetic informativity. In contrast, in recessive families, the percentage of success doubles to 33% (6 out of 18) in non-consanguineous, and dramatically raises up to 80% (8 out of 10), in consanguineous pedigrees. We conclude that the highest informativity of recessive pedigrees, particularly under the homozygosity-by-descent assumption in consanguinity, increases the efficiency of the RD chip and highlights its reliability for RD molecular diagnosis.  Finally, two X-linked cohorts were successfully diagnosed. The RD chip included markers for cosegregation analysis of the two known X-linked causative loci (RP2 and RPGR), given that Xlinked inheritance can only be discarded in pedigrees with maleto-male transmission [36], and as a means to avoid the burdensome task of sequencing the hot-spot ORF15 of RPGR whenever cosegregation analysis does not support it.

Discussion
To meet the increasing demand for genetic diagnosis from clinicians and patients, we have devised and optimized an RD chip that allows us to extract maximum genetic informativity from pedigrees, exclude a large number of non-cosegregating candidates and focus on the most probable causative genes. Moreover, the exclusion of all analyzed candidates by the RD chip highlights the families suitable for next generation sequencing (NGS) and subsequent identification of novel RD genes.
To improve the efficiency and the scope of the RD-Chip with respect previous attempts [12,37] we have: i) increased the number of SNPs (7-10) per gene to maximize the genetic informativity, ii) extended the cosegregation study to 100 retinal dystrophy genes, and iii) included some prevalent pathogenic point mutations in the Spanish population for direct genotyping. The RD-chip allows to genotype 768 selected SNPs in a single cost-and time-effective step and is designed to use on families, not on simplex cases.
With this optimized RD-chip version, 17 out of 36 Spanish pedigrees have been fully diagnosed. Twelve new and 12 previously reported pathological variants have been identified in 9 RD genes, adding to the high genetic diversity in retinal disorders. The fact that as much as half of the mutations identified are new underscores the efficiency of our RD chip compared to direct mutational screening., particularly in non-homogenous genetic populations. Concerning the major candidates in our cohort, CRB1 and USH2A explain 10/19 families overall, in accordance with other reports [38,39] and consistent with their contribution to several clinical entities. CRB1 is responsible for RP and LCA (a more severe form of RP), and more than 150 mutations have been described so far (http://www.hgmd.org). In our panel, the LCA pedigree (10RE) carried a novel splicing mutation, whereas the RP pedigrees (12ORG, 23NCE, 25NCE, T5) were all associated to missense variants (Table 1). Our results agree with the CRB1-assigned phenotype-genotype correlations, where null alleles are mainly associated to the LCA phenotype [39,40]. Also in agreement with previous reports, p.C948Y is the most prevalent CRB1 mutation in our family panel (3 of the 5 CRB1 families) [39].
Five non-syndromic RP families presented 4 new (2 missense, one nonsense and one splicing mutation) and 7 reported (5 missense, one nonsense and one splicing mutation) pathogenic alleles in USH2A. In the Usher cohorts, analysis of USH2A rendered 2 new missense and 3 known (one frameshift and 2 missense) mutations. In two families, only one of the pathogenic alleles was identified. In fact, some reports support that as much as 35% of the second USH2A mutant alleles are duplications, deletions and deep intronic variants, which are extremely difficult to detect by DNA sequencing [27]. On the other hand, no clear phenotype-genotype correlation could be established between RP and Usher cases. The most prevalent USH2A mutation in nonsyndromic RP, p.C759F [32,41], was also present in three families of our cohort. Notably, double heterozygosis in USH2A was unexpectedly found in two consanguineous families. In fact, in pedigree E5, three pathogenic alleles were identified in two generations. In this particular case, the assumption of homozygosity by descent led us to wrongly assume non-cosegregation with all RD genes and undertake WES analysis, which eventually identified an unexpected double heterozygous genotype. A seemingly higher number of mutation carriers had also been reported for other syndromic RPs, such as Bardet-Biedl, with no solid evidence for this finding [42].
The inclusion of X-linked markers in the RD-chip has proved to be extremely useful to diagnose families compatible with both autosomal and X-linked inheritance patterns. In fact, in pedigree 11NCE, the milder affectation of the female patients was already suggestive of a pseudo-dominant effect, as it was indeed confirmed (RP2 was the causative gene). On the other hand, family 20NCE, with an unassigned mendelian pattern, cosegregated with X-linked markers, which prompted to focus on the X-linked candidates. A deletion comprising the RP2 locus was identified (Table 1).
Remarkably, the clinical heterogeneity of retinal disorders was highlighted by the identification of RD3-up to now associated only to LCA-as causative of RP (Table 1), increasing the phenotypes associated to the gene mutations. This case would have remained unassigned by conventional methods had it been not for the comprehensive analysis of our RD chip, whose main asset is the robustness of genetic information to highlight the most probable candidates, avoiding the yet burdensome task of big data analysis. Particularly, identification of shared (which indicate a common founder effect) or extended haplotypes over closely mapping genes (which strengthen cosegregation) are invaluable clues to directly pinpoint the causative mutation, unveil unexpected candidates, and/or prompt re-evaluation of clinical features.
Recently, a variety of NGS-based procedures have been developed for molecular diagnosis of RDs, from targeted long- range PCR coupled to NGS [43,44], targeted capture and sequencing of one or several RD genes [45][46][47], to full WES analysis [11,48,49]. Although powerful tools, they still yield limited complete diagnostic success: from 37%-52% in non-related cohorts to 80% in cohorts with high consanguinity levels [48] ( Table 5). Aside high costs, the restraints of NGS-targeted approaches are due to the high genetic heterogeneity of retinal disorders, whereas the main drawbacks of WES are the high sequence coverage requirement and the functional interpretation of the identified variants (WES). Within this context, the RD-chip efficiency ranges from 16% in dominant to 80% in consanguineous recessive pedigrees, with an average of 47%, well within the upper range of the NGS approaches. Besides, the design of this chip is extremely flexible, which allows to incorporate new SNPs to expand the gene repertoire after new discoveries or upon demand. Before NGS paves the future of personalized diagnosis, our costand time-effective strategy allows a quick and reliable prioritization of candidates, which is suitable and affordable for middle-size diagnostic labs with moderate to high number of family cases. Figure S1 Cosegregation analysis of the novel mutations identified. M: mutation (TIF)