Investigating the Molecular Basis of Retinal Degeneration in a Familial Cohort of Pakistani Decent by Exome Sequencing

Purpose To define the molecular basis of retinal degeneration in consanguineous Pakistani pedigrees with early onset retinal degeneration. Methods A cohort of 277 individuals representing 26 pedigrees from the Punjab province of Pakistan was analyzed. Exomes were captured with commercial kits and sequenced on an Illumina HiSeq 2500. Candidate variants were identified using standard tools and analyzed using exomeSuite to detect all potentially pathogenic changes in genes implicated in retinal degeneration. Segregation analysis was performed by dideoxy sequencing and novel variants were additionally investigated for their presence in ethnicity-matched controls. Results We identified a total of nine causal mutations, including six novel variants in RPE65, LCA5, USH2A, CNGB1, FAM161A, CERKL and GUCY2D as the underlying cause of inherited retinal degenerations in 13 of 26 pedigrees. In addition to the causal variants, a total of 200 variants each observed in five or more unrelated pedigrees investigated in this study that were absent from the dbSNP, HapMap, 1000 Genomes, NHLBI ESP6500, and ExAC databases were identified, suggesting that they are common in, and unique to the Pakistani population. Conclusions We identified causal mutations associated with retinal degeneration in nearly half of the pedigrees investigated in this study through next generation whole exome sequencing. All novel variants detected in this study through exome sequencing have been cataloged providing a reference database of variants common in, and unique to the Pakistani population.


Introduction
Inherited retinal degenerations (IRD) are among the best studied of genetic disorders. To date more than 250 genes and an additional 40 loci have been identified to be associated with retinal degenerations [1]. In some instances there is a one-to-one correlation of the gene and disease phenotype, while in other instances a broad genetic and phenotypic heterogeneity [2] has been observed. IRD phenotypes can be inherited in an autosomal dominant [3], autosomal recessive [4], X-linked [5], or mitochondrial traits [6]. Complicating this further, mutations in the same gene can cause either syndromic or non-syndromic retinal degeneration [7,8].
Targeted sequencing and/or microarray-based studies identified the genetic basis of about 50% of the total IRD cases [9]. The availability of next generation sequencing (NGS) methodologies improved the ability to carry out comprehensive analysis and utilization of these technologies have improved identifying the genetic basis to nearly 70% of cases of IRD in certain populations [10][11][12][13]. Inbred populations provide an opportunity to investigate recessive genetic disorders. Pakistani population with high consanguinity offers an opportunity to investigate the genetic basis of inherited disease including IRD. A single study involving children at the Ida Rieu School for the blind and deaf concluded that roughly a third of blindness in Pakistan has a hereditary etiology [14]. Involvement of mutations in known retinal disease genes in causing IRD in Pakistani population has been previously described [4,[15][16][17][18][19][20][21][22][23]. In this study, we examined 26 pedigrees from the Punjab province of Pakistan using exome sequencing to determine the genetic basis of early onset retinal degeneration and the contribution of known gene mutations. These analyses also enabled identification of single nucleotide variants that are common in, and unique to the Pakistani population. fundus photography, Electroretinography, measurement of visual acuity and visual fields. Funduscopy was performed at Layton Rehmatulla Benevolent Trust (LRBT) Hospital (Lahore, Pakistan). Electroretinography (ERG) measurements were recorded by using equipment manufactured by LKC (Gaithersburg, MD). Dark-adapted rod responses were determined through incident flash attenuated by -25dB, whereas rod-cone responses were measured at 0 dB. The 30 Hz flicker responses were recorded at 0 dB to a background illumination of 17 to 34 cd/m 2 . Blood samples were collected from all available members and genomic DNA was extracted as previously described [24,25].
A total of 26 consanguineous pedigrees were examined from the Punjab region of Pakistan. These pedigrees contain between 1-6 consanguineous marriages, and present 3-12 family members affected with early onset retinal degeneration. Of these, 5-16 members from each pedigree participated bringing the total to 277 individuals in 26 pedigrees (Figs 1 and 2).

Exome Capture, Sequence Alignment, Variant Calling & Analysis
Sixty three individuals representing 25 of the 26 pedigrees were sequenced with one of the following exome capture kits: 9 samples with Nimblegen V2, 18 with Nimblegen V3, 7 with Agilent V5, 25 with Agilent V5+UTRs (S1 Table), and sequenced on an Illumina HiSeq 2500 as described earlier [26]. Reads were mapped with Novoalign to hg19; any reads that were not uniquely mapped were excluded from further analysis. PCR duplicates were removed with SAMtools, and variants were called with GATK.
Variants were annotated, filtered and prioritized for further analysis using exomeSuite [27]. The variants were first filtered for a recessive inheritance pattern since it was presumed the disease causative variant would be found in the homozygous state due to the consanguineous nature of the pedigrees examined. Variants were then filtered to eliminate: (1) all variant(s) with a homozygous genotype frequency of greater than 0.5% in HapMap [28], 1000 Genome [29], and/or NHLBI ESP6500 [30] databases, (2) all variants outside of a transcribed region, (3) all variants not residing within the six bases upstream of the start codon, (4) all variants not located within a miRNA targeted region as identified from the TargetScan [31] database, (5) all SNVs producing a missense change that was predicted to be benign by either MutationTaster [32], Polyphen [33], PROVEAN and SIFT [34,35], (6) all intronic variants residing 15 or more bases from the nearest splice site, (7) all intronic variants residing within 15 bases of the nearest splice site but not predicted to alter splicing by NNSplice [36], and (8) all variants pertaining to a transcript whose biotype as defined by Ensemble is "nonsense mediated decay," "retained intron," "pseudogene" or whose transcript is 5' and/or 3' incomplete. We investigated variants within genes known to be associated with retinal degeneration and passed the above filtering criteria [1] (S2 Table).
Targeted mutation screening for RPE65 c.1087 C>A (NM_000329.2) variant was performed on Pedigree PKRD285 by Sanger dideoxy sequencing. Segregation of variant(s) with the disease phenotype in all pedigrees was verified by Sanger dideoxy sequencing using primers listed in S3 Table. Novel variants segregating with the disease phenotype in their respective pedigree were also screened for their presence in 96 ethnically matched unrelated controls.
All single nucleotide variants with a read depth >10, mapping quality score >30, and not exhibiting strand bias during sequencing were tabulated by pedigree (S4 Table). All variants present in >5 pedigrees and absent from the dbSNP, the 1000 genomes and the UCSC database were cataloged.

Results
In an ongoing effort to understand the molecular basis of inherited retinal degeneration, we investigated 25 large highly inbred familial cases to identify the causal variants through nextgeneration whole exome sequencing. Affected individuals in these familial cases experienced night blindness and decreased visual acuity in their early childhood. Medical records and clinical reports of earlier examinations were suggestive of retinal dystrophy with early, most likely congenital onset. Fundus photographs of the affected individuals show bone spicule-like pigmentation in the mid-periphery and arteriolar attenuation (Fig 3) along with severely diminished rod and cone response in electroretinography examination (Fig 4).
Whole exome sequencing of members of 25 Pakistani pedigrees generated on average 8.7 gigabases of data for each exome with an average coverage depth of 63 within the targeted region (S4 Table). Filtering sequence variants using specified criteria identified one to seven candidate variants in 13 pedigrees (S5 Table). A total of 2,904 single nucleotide variants were identified in five or more presumably unrelated pedigrees, and absent from the HapMap, the 1000 Genome and NHLBI ESP6500 databases. Of these, 200 were unreported in dbSNP and believed to be unique to the Pakistani population (S6 Table).
PKRD077: Sequencing the exome of individual IV: 5 and subsequent analysis of variants identified a homozygous non-synonymous coding variant c.2384 G>A (NM_000180.3) in GUCY2D that fulfilled the filtering criteria used to identify genes implicated in causing IRD ( Fig 1A) and segregated with the IRD in PKRD077. The respective amino acid substitution: p. R795Q (NP_000171.1) in GUCY2D is predicted to be damaging by all four variant predictors algorithms used and has been reported previously [37]. An additional synonymous variant in COL11A1 predicted not to affect splicing was also observed in IV: 5 (Table 1 and S5 Table).
PKRD138: Affected individual V:3 of this six generation pedigree was selected for exome sequencing (Figs 1D, 3 and 4). Analysis of exome variants identified a novel homozygous single base deletion in USH2A c.11473delC (NM_206933.2) predicted to result in a frame shift  p.His3825fs was found to segregate with IRD. Analysis of Pakistani control DNA identified this to be a rare sequence alteration in the Pakistani population (Table 1 and S5 Table). PKRD141: A single affected individual (V:2) from this five generation pedigree with four consanguineous marriages was selected for exome capture and sequencing (Figs 1E, 3 and 4). Two candidate variant that fulfilled the filtration criteria were identified: a nonsense variation in USH2A (NM 206933.2): c.4645 C>T, predicted to result in a premature termination: p.R1549 Ã , and a missense variant in BEST1 (NM_001139443.1:c.1633 C>A, NP_001132915.1: p.P545T) predicted to be damaging by PROVEAN and MutationTaster, but benign by SIFT and Polyphen. Dideoxy sequencing of all available members of PKRP141 confirmed segregation of the USH2A variant, but not the BEST1 variant (Table 1 and S5 Table).
PKRP142: In a six generation pedigree with eight consanguineous marriages, one individual (V:13) was selected for exome capture and sequencing (Figs 1F, 3 and 4). Analysis of the exome variants detected two homozygous changes in known IRD genes. One of them was a non-synonymous coding variant in CEP164 c.1702 A>C (NM_014956.4: p.T568P) predicted to be benign by PROVEAN, SIFT and MutationTaster, but damaging by Polyphen. And the second variant is an insertion-deletion in CNGB1 (NM_001297.4:c.2493-2_2495delinsGGC, p.S831fs). Dideoxy sequencing of nine members of PKRP142 confirmed segregation of the CNGB1 variant, but not the CEP164 variant with the disease phenotype (Table 1 and S5 Table).
PKRP176: This is a five-generation Pakistani pedigree involving three consanguineous marriages (Figs 1G, 3 and 4). Sequencing the exome of one individual (V:3) identified a novel homozygous nonsense variant in FAM161A (NM_001201543.1:c.1600 A>T,:p.K534 Ã ) and two non-synonymous homozygous coding missense variants: ALMS1 (NM 015120.4:c.5242 A>G,.4:p.T1748A) predicted to be damaging by Polyphen, PROVEAN and SIFT, but not by MutationTaster, and RAX2 (NM_032753.3:c.236 G>A,:p.R79Q) predicted to be damaging by all algorithms used in this study. Dideoxy sequencing of 12 members of this pedigree confirmed segregation of the FAM161A and ALMS1 variants, but not the RAX2 variant with IRD (Table 1 and S5 Table).
PKRP185: The exome of a single affected individual (IV:2) was selected for exome capture and sequencing (Fig 1H). Two non-synonymous variants satisfied the filtering criteria. The first one, a c.12523 T>G change in USH2A (NM 206933.2: p.W4175G) was predicted to be damaging by all algorithms. And the second, in ABHD12 (NM_001042472.2:c.1045 G>A, 1:p.A349T) was predicted to be damaging by MutationTaster and Polyphen, but not by PRO-VEAN and SIFT. Dideoxy sequencing confirmed segregation of the USH2A variant, but not the ABHD12 variant with disease phenotype (Table 1 and S5 Table).
PKRP281: A single affected member (V:5) of this pedigree was selected for exome capture and sequencing (Fig 2A). Two non-synonymous coding variants c.1087 C>A (NM 000329.2: p.P363T) in RPE65, and second, a c.2612 C>G (NM_000214.2): p.P871R) in JAG1 were identified based on the filtering criteria used. Dideoxy sequencing of the seven PKRP281 members confirmed segregation of the RPE65 variant but not the JAG1 variant with the disease phenotype (Table 1 and S5 Table).
PKRD282: An affected and unaffected sibling pair (IV: 12 and IV: 13, respectively) and affected individual (IV: 7) were selected for exome capture and sequencing from this four generation pedigree (Fig 2B). Similar to PKRP281, exome analysis identified a variant in RPE65 c.1087 C>A (NM_000329.2, p.P363T) as the causal mutation segregating with the IRD in the family PKRD282 (Table 1 and S5 Table).
PKRP283: Two affected individuals (IV:1 and IV:12) were selected for exome capture and sequencing from this four generation pedigree involving two consanguineous marriages ( Fig  2C). Similar to PKRD281 and PKRD282, exome analysis identified a variant in RPE65 c.1087 C>A (NM_000329.2, p.P363T) as the causal mutation segregating with the IRD in the family PKRD282 (Table 1 and S5 Table).
PKRP284: An affected individual and his unaffected sibling (IV: 14 and IV: 17, respectively) from this five generation pedigree with three consanguineous marriages were selected for exome capture and sequencing (Fig 2D). Analysis of variants detected two non-synonymous coding variants that satisfied the filtering criteria. One of them is a variant in CDH23 c.127 G>A (NM_001171933.1; p.V43I). The second one is the RPE65 c.1087 C>A (NM_000329.2, p.P363T) variant that is detected in PKRD281, PKRD282 and PKRD283 pedigrees. Sanger dideoxy sequencing confirmed segregation of RPE65 variant, but not the CDH23 variant with the disease phenotype (Table 1 and S5 Table). The RPE65 variant has been previously reported as a causative mutation [38].
PKRD285: Subsequent to identification of the RPE65 c.1087 C>A (NM_000329.2, p. P363T) variant in PKRD281, PKRD282, PKRD283 and PKRD284 pedigrees, targeted mutation screening for this variant was performed on PKRD285. This analysis revealed that the RPE65 c.1087 C>A variant segregated with the IRD phenotype in this five generation pedigree with six consanguineous marriages (Fig 2E).
Analysis of exome sequence variants in remaining 13 large consanguineous pedigrees using the specified filtering criteria did not reveal potential pathogenic changes in known IRD associated genes.
In summary, analysis of exome variants of 25 consanguineous Pakistani pedigrees identified nine unique homozygous causal variants in seven genes that are known to be associated with a retinal degeneration phenotype. All candidate variants identified in each pedigree are listed in S5 Table. Discussion Here, we report next-generation sequencing based genetic analyses of 26 consanguineous pedigrees of Pakistani decent that identified a total of nine causal mutations, which include six novel variants, in RPE65, LCA5, USH2A, CNGB1, FAM161A, CERKL and GUCY2D as the underlying cause of IRD in 13 of 26 pedigrees. In addition to the nine causal variants, 200 variants that were absent from the dbSNP, the 1000 genomes and the UCSC databases were identified. Each of these 200 variants was detected in five or more unrelated pedigrees investigated in this study suggesting that they are common in, and unique to the Pakistani population. A catalog of these novel variants provides a reference to variants common in, and unique to the Pakistani population.
Consistent with the phenotype observed in the cohort of patients analyzed in this study, the RPE65, LCA5, USH2A, CNGB1, FAM161A, CERKL and GUCY2D genes consisting mutations in 13 of the 26 pedigrees have been implicated in causing recessive non-syndromic retinal degeneration [1]. All novel causative variants detected in this study result in truncation of the protein or located in highly conserved domains. Mutations reported so far in FAM161A are either nonsense or frame-shift changes [39][40][41]. In pedigree PKRD176, a novel nonsense mutation, NM_001201543.1:c.1600A>T (NP_001188472.1:p.K534 Ã ), causing premature termination of the longer of two protein coding transcripts for FAM161A was identified. This transcript was observed to be the minor transcript in a human retinal cDNA library [41]. However, the Ocular Genomics Institutes Retinal Transcriptome data [42] suggested the opposite (169 reads that contain portions of the 168 base exon versus 63 that bridge it). Therefore the impact of the p.K534 Ã variant in retinal tissue is unclear. In addition, a second novel variant in ALMS1 (NM_015120.4:c.5242A>G; NP_055935.4:p.T1748A) predicted to be damaging by Polyphen, PROVEAN and SIFT, but not by MutationTaster was also found to segregate with disease in this pedigree (Fig 1G). Mutations in FAM161A are reported to be associated with early onset recessive retinitis pigmentosa while mutations in ALMS1 are associated with Alstrom syndrome and non-syndromic Leber's congenital amaurosis [7,8]. The FAM161A and ALMS1 genes are located 10 Mb apart on chromosome 2 and the novel variants observed in these genes may exist in linkage disequilibirium in pedigree PKRD176. The ALMS1 variant is listed in the ExAC database with an allele frequency of 0.0023, where as the frequency of FAM161A variant is 0.0000088. While the nonsense mutation in FAM161A may be sufficient to cause retinal pathology in this pedigree, additional studies are needed to establish if the early onset retinal degeneration phenotype observed in this pedigree is due to one or both of the potentially pathogenic variants segregating in this pedigree.
Due to the high consanguinity in Pakistani population, involvement of a smaller number of common disease causing mutations might be expected in pedigrees with IRD from a single province in Pakistan. Consistent with this, a single RPE65 mutation was observed in five out of 26 pedigrees studied. However, the remaining 8 mutations are detected in single pedigrees. Among the nine total mutations detected in 13 pedigrees, seven causative variants were not reported previously in Pakistani patients. Furthermore, identification of six novel causative variants in known IRD genes emphasizes the understudied nature of this population. These novel mutations will enhance our Additionally, discovery of novel potentially disease causing variants in well-investigated retinal disease genes will expand our understanding of their role in ocular tissue and the disease mechanism. Although a limited number of pedigrees were investigated in this study, the identification of causal mutations in 13 out of 26 pedigrees accounting for 50% of the cases is consistent with similar observations on the contribution of known IRD gene mutations in other populations [10][11][12][13]43].
An inherent advantage of next-generation based exome analysis is the identification of all potential variants in the exome in addition to the causative mutation that might act as modifiers of the disease phenotype. Consistent with this, presence of additional potentially pathogenic sequence alterations in IRD genes was observed in the set of pedigrees analyzed (S5 Table). Detailed and systematic analysis of the phenotype and genotype of patients in these pedigrees may elucidate the potential role of additional pathogenic variants as genetic modifiers.
In all but four pedigrees investigated in this study, sequencing was carried out on probands alone. The results indicate that sequencing a single affected patient can identify the causative mutations in known IRD genes when consanguinity is present. The identification of causative mutations enables appropriate molecular diagnosis that is critical for choosing therapeutic options as gene based therapies become available. The lack of identification of mutations in known IRD genes in half of the pedigrees analyzed suggests that novel genes are involved in causing IRD pathology in some of these families. Identification of additional genes involved in IRD pathology will further improve our understanding of the molecular pathology of retinal degenerations.
Supporting Information S1