Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Avidity sequencing of whole genomes from retinal degeneration pedigrees identifies causal variants

  • Pooja Biswas,

    Roles Methodology, Writing – review & editing

    Affiliation Department of Ophthalmology, Shiley Eye Institute, University of California at San Diego, San Diego, California, United States of America

  • Adda Villanueva,

    Roles Conceptualization, Investigation, Methodology

    Affiliation Department of Ophthalmology, Mejora Vision MD, Merida, Yucatan, Mexico

  • Benjamin J. Krajacich,

    Roles Investigation, Methodology

    Affiliation Element Biosciences, San Diego, California, United States of America

  • Juan Moreno,

    Roles Investigation

    Affiliation Element Biosciences, San Diego, California, United States of America

  • Junhua Zhao,

    Roles Investigation, Methodology, Writing – review & editing

    Affiliation Element Biosciences, San Diego, California, United States of America

  • Anne Marie Berry,

    Roles Investigation

    Affiliation Department of Ophthalmology, Shiley Eye Institute, University of California at San Diego, San Diego, California, United States of America

  • Danielle Lazaro,

    Roles Investigation

    Affiliation Department of Ophthalmology, Shiley Eye Institute, University of California at San Diego, San Diego, California, United States of America

  • Bryan R. Lajoie,

    Roles Formal analysis, Software, Visualization, Writing – review & editing

    Affiliation Element Biosciences, San Diego, California, United States of America

  • Semyon Kruglyak ,

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    semyon.kruglyak@elembio.com

    Affiliation Element Biosciences, San Diego, California, United States of America

  • Radha Ayyagari

    Roles Conceptualization, Investigation, Methodology, Writing – review & editing

    Affiliation Department of Ophthalmology, Shiley Eye Institute, University of California at San Diego, San Diego, California, United States of America

Abstract

Whole genome sequencing has been an effective tool in the discovery of variants that cause rare diseases. In this study, we determined the suitability of a novel avidity sequencing approach for rare disease applications. We built a sample to results workflow, combining this sequencing technology with standard library preparation kits, analysis workflows, and interpretation tools. We applied the workflow to ten pedigrees with inherited retinal degeneration (IRD) phenotype. Candidate variants of interest identified through whole genome sequencing were further evaluated using segregation analysis in the additional family members. Potentially causal variants in known IRD genes were detected in five of the ten cases. These high confidence variants were found in ABCA4, CERKL, MAK, PEX6 and RDH12 genes associated with retinal degeneration, that could be sufficient to cause pathology. Pending confirmatory clinical evaluation, we observed a 50% diagnostic yield, consistent with previously reported outcomes of IRD patient analysis. The study confirms that avidity sequencing is effective in detection of causal variants when used for whole genome sequencing in rare disease applications.

Introduction

Rare genetic diseases affect millions of people around the world [1]. Due to the difficulties of obtaining a definitive diagnosis in many rare disease cases, the use of whole genome sequencing (WGS) has become a popular approach to identify variants that may explain the underlying cause [28]. WGS enables a comprehensive analysis across regions (coding and noncoding) and variant classes from single nucleotide variants (SNVs) to structural or copy number variants (CNVs) [9]. An important category of rare diseases is inherited retinal degeneration, a group of rare eye diseases affecting approximately 1 in 2,000 individuals worldwide [10]. Retinal degeneration leads to progressive loss of vision as a result of cell death within the retina. The phenotypes, including age of onset, vary widely suggesting that the underlying genetics may also be variable. Previous publications have demonstrated that exome and genome sequencing are an effective approach for establishing a molecular diagnosis in retinal degeneration cases [2, 1113]. Whole genome sequencing has been applied in a wide diversity of studies at both the cohort and population scale but nearly all of these studies have used the sequencing by synthesis technology from Illumina [14]. Assessing the performance, efficiency and relative capabilities of multiple technologies has only recently become possible as more sequencing technologies have entered the commercial market. This study applied a novel technology, termed sequencing by avidity [15], for the genetic analysis of retinal degeneration by whole genome sequencing. Briefly, the sequencing technology works as follows: DNA of interest is fragmented, circularized, and hybridized to a flowcell. Rolling circle amplification is used to generate concatemers from each circular molecule. Following a primer hybridization, avidity sequencing proceeds by sequentially identifying nucleotides in the DNA of interest by iterating on the following steps: (1) the binding of a dye labeled polymer, called an avidite, to identify the nucleotide and (2) the incorporation of an unlabeled but 3’ blocked nucleotide to advance along the DNA template (S1 Fig). One implementation of avidity sequencing generates 2x150 base pair reads and produces approximately 1 billion read pairs per flowcell, making it a good throughput match for the WGS application. To evaluate the technology on this application, components upstream and downstream of sequencing are required. Specifically, library preparation methods that are compatible with the sequencer and analysis tools that utilize the sequencer output to call and interpret variants are needed. Here we demonstrate that avidity sequencing can be integrated with standard upstream and downstream components to create an effective workflow for the WGS rare disease application.

Results

We developed a modular architecture (Fig 1) to evaluate compatibility of components within a sample-to-results workflow. Extracted DNA was used as input to multiple library preparation kits and final libraries from two kits (one PCR-free and one PCR+) were used in this study. This was done to verify that both library types were compatible with the sequencing technology and application. Once the library was prepared and sequenced, a FASTQ file was generated and used as input to standard alignment and variant calling tools. The output of the avidity sequencing platform commercialized by Element Biosciences (Element Biosciences, CA, USA) is a standard FASTQ file compatible with all aligner and variant caller combinations tested. We selected BWA-MEM paired with either Sentieon DNAScope [16] or Google DeepTrio [17], following a benchmarking study. The VCF file produced by variant callers was used by interpretation tools to prioritize findings. Here we primarily relied on the Franklin by Genoox interpretation engine (franklin.genoox.com), though a subset of cases were also reviewed in other interpretation tools. The workflow summarized in Fig 1 was applied to samples from ten pedigrees, five of which are shown (Fig 2). One of the cases was a positive control that had previously been diagnosed via WGS on another technology [11], though the interpretation process was blinded to prior results. In the other nine cases, WGS had not been previously applied and a molecular diagnosis had not been made. For one of the pedigrees (Fig 2E), we sequenced the affected individual and the parents. For all other pedigrees, we sequenced a single affected individual. Following sequencing and analysis, candidate variants that were classified as either pathogenic or likely pathogenic were further evaluated via segregation analysis to determine whether variants segregated with the phenotype across the broader pedigree. A prioritized variant that also passed segregation analysis was deemed to be causal and referred for orthologous confirmatory sequencing in a CLIA laboratory to validate the observation and to provide results that could be shared with and considered by the clinicians managing the patients. Table 1 details the findings for cases in which causal variants were identified. For the positive control, the previously published compound heterozygote in the PEX6 gene was identified as the top candidate [11]. The PEX6 compound heterozygous variants consisted of a frameshift indel and a missense variant. No segregation analysis was performed on these variants because this case was a previously published positive control [11]. In four of the cases, the variants that were prioritized for validation, segregated as expected with the phenotype. In the remaining cases, no variants were classified as either pathogenic or likely pathogenic. Several were categorized as VUS, but failed to segregate with the phenotype. The variants detected in ABCA4 (homozygous frameshift), MAK (homozygous missense), RDH12 (homozygous missense), and CERKL (homozygous stop gain) had existing ClinVar submissions classified as pathogenic or likely pathogenic. We reviewed each variant that passed segregation analysis via the Integrative Genomics Viewer (IGV) [18]. The displays for the solved cases are shown (Fig 3). All calls were supported by multiple reads and occurred in generally well-behaved genomic regions, meaning that coverage was adequate, mapping quality was high, and there were few mismatches. In the case of the trio, the parents were observed to be the heterozygous carriers that passed the pathogenic allele to the homozygous proband.

thumbnail
Fig 1. WGS for rare disease.

The modular design supports multiple options for each step. In the current study, two library prep kits (with and without PCR) were used. Following avidity sequencing, Sentieon alignment and variant calling was used for single samples and DeepTrio was used for the trio. Franklin was used for interpretation with a subset of cases also reviewed via Opal.

https://doi.org/10.1371/journal.pone.0307266.g001

thumbnail
Fig 2. Pedigree diagrams.

(A) The positive control pedigree that was previously published [11]. (B) RF.FR.0916: Two affected individuals with three unaffected individuals were recruited for the study. The whole genome analysis of individual II:2 identified a previously reported homozygous c.247_250dupCAAA p.Ser84ThrfsTer16 variant in ABCA4 gene associated with retinal degeneration (PMID-33369172). This variant was also reported to be pathogenic by ClinVar (# 854791). The homozygous c.247_250dupCAAA p.Ser84ThrfsTer16 variant in ABCA4 segregated with the disease in the RF.FR.0916 pedigree. (C) RF.BR.0416: A homozygous previously reported causal variant c.485C>T, p.Thr162Ile in MAK was identified in individual II:2 of this pedigree (ClinVar#-867222) (PMID: 28559085). The segregation analysis confirmed that this variant is present in the homozygous state in two affected individuals (II:2 and II:3) and also heterozygous in three different unaffected individuals (I:1, I:2 and II:1) including the parents (I:1 and I:2). (D) RF.VI164.0516: A family with two affected and four unaffected individuals participated in this study. A homozygous variant c.377C>T, p.Ala126Val in RDH12 gene was identified in the proband (II:1) after the whole genome analysis. This rare variant (ClinVar # 2061) segregated with the disease. (E) RF.OS.0916: Not only the proband analysis but the trio analysis of the parents and the proband from this pedigree identified a homozygous nonsense variant, c.769C>T, p.Arg257Ter in CERKL gene (ClinVar # 2364) in the proband. We note that pedigrees are referenced via anonymized ID that cannot be used to identify patients or family members by anyone outside of the research group.

https://doi.org/10.1371/journal.pone.0307266.g002

thumbnail
Fig 3. Display of IGV read alignment of the sequence encompassing the causal variants that passed segregation analysis.

(A) ABCA4 (Pedigree RF.FR.0916). Coverage track shows no color because the causal variant is an insertion. (B) MAK (RF.BR.0416), (C) RDH12 (Pedigree RF.VI164.0516). (D) CERKL (Pedigree RF.OS.0916): read alignments in heterozygous parents I:1 and I:2 and the homozygous proband II:1. Uncolored reads in the parents support the reference allele in a heterozygous state. Deviations from 50% allele fraction may be the result of sampling variation exaggerated at reduced coverage.

https://doi.org/10.1371/journal.pone.0307266.g003

Discussion

We evaluated the new avidity sequencing technology in the context of whole genome sequencing for rare diseases. To enable this evaluation, we first implemented a sample to results workflow that included library preparation, sequencing, variant calling, and variant interpretation (Fig 1). Our results demonstrate the success of one version of the workflow, but several alternative library preparation and analysis options exist and can be used based on laboratory preference. All twelve samples from ten pedigrees (Fig 2) were sequenced using 2x150 base pair reads to a target of 30X genomic coverage. For the trio analysis (Fig 2E), the parents were sequenced at a lower coverage so that all three samples could be combined on a single flowcell, thus minimizing cost while retaining the ability to check inheritance patterns and identify de novo variants. High confidence variants predicted to be pathogenic were identified in five of the ten pedigrees that we sequenced, including the positive control (Fig 2A). The variants segregated with disease through the respective pedigrees, providing support for our findings. In three pedigrees with multiple affected individuals (Fig 2B–2D), the sequencing of a single affected sample followed by segregation analysis resolved the entire pedigree, making this an efficient and cost-effective approach compared to sequencing the entire pedigree or all affected individuals. This study demonstrates that avidity sequencing is compatible with common library preparation and analysis methods and provides an effective option for rare disease analysis using whole genome sequencing. The study has several important limitations. We started with a small cohort so estimates of diagnostic yield may not be accurate. This was the first such study leveraging a pre-commercial version of avidity sequencing. The analysis workflows utilized at the time that this study was completed identified only single nucleotide variants, small indels, and copy number variants but more complex structural variants were not considered. We are in the process of expanding the patient cohort to additional retinal degeneration pedigrees and applying the approach to other disease classes [19], while also extending the analysis methods to include all variant classes.

Methods

DNA extraction from blood

Blood samples were collected from all available family members after obtaining their written consent to participate in our study. DNA extraction was done from peripheral whole blood sample, collected from all the participating patients using DNeasy Blood & Tissue Kits (Cat. No.: 69504, Qiagen, Hilden, Germany) following the manufacture’s protocol.

Library preparation

For the singleton samples: 0.5 pmol (30 ul of 16.7nM) of pooled libraries generated with Roche KAPA HyperPlus Prep Kit (Cat# 07962428001) was processed in a single reaction using Adept Compatibility Workflow Kit (Cat# 830–00003, Element Biosciences, CA, USA). The final circularized library was quantified using qPCR standard and primer mix provided in the Adept Compatibility Workflow Kit following manufacturer’s protocol.

For the trio samples: DNA library preparation was performed using the combination of Roche KAPA HyperPlus Prep Kit (Cat# 07962428001, Roche,) and Element Elevate Index and Adapter Kit (Cat# 830–00005, Element Biosciences, CA, USA), following Roche’s protocol instruction. 100ng extracted DNA was treated with enzymatic fragmentation step for 10 min, followed by end repair and A-tailing. Element Elevate adapters were used in adapter ligation step. Ligated products were purified by SPRI beads at 0.5X/0.66X ratio for size selection. 5 cycles PCR amplification was used to introduce Element index by Elevate unique index pairs (1A-1G). PCR product was cleaned up by 1x SPRI, and quantified by Qubit.

0.5 pmol (30 ul of 16.7nM) of linear library prepared above was processed in a single reaction using Element Elevate Library Circularization Kit (Cat# 830–00001). The final circularized library was quantified using qPCR standard and primer mix provided in the Elevate Library Circularization Kit following manufacturer’s protocol.

Sequencing

The quantified libraries were pooled, denatured, and sequenced on the AVITI system using 2x150 paired end reads (Element Biosciences, San Diego CA). Two genomes were typically multiplexed per flowcell and genome wide coverage exceeded 30X. For the trio, the three libraries were pooled on a single flowcell at unequal concentration, yielding genome coverage in the proband coverage approximately twice that of the parental samples– 52X, 20X, and 22X, respectively for proband, maternal and paternal samples. The trio was sequenced to enable detection of de novo variants and to track inheritance patterns of germline variants.

Analysis

Following sequencing, FASTQ files were generated as the input to secondary analysis tools. For singleton samples, the Sentieon workflow for BWA-MEM alignment and DNAScope variant calling was used. For the trio, BWA-MEM was used for alignment and Google DeepTrio was used for variant calling. The resulting variant calls were interpreted using the Franklin by Genoox software. Genoox software was also used to generate CNV candidates, though none were found in genes associated with the phenotype of interest. The phenotype was described as retinal degeneration for all cases. Franklin provided a list of 0 to 10 variants of interest for each case. In order to be further considered, variants had to be rare (< .01% gnomAD allele frequency), associated with the phenotype, and classified as likely pathogenic or pathogenic by the automated ACMG calculator. If no such variants were identified, variants of uncertain significance were also reviewed. Four of the cases were also reviewed using the Opal by Fabric software [20] to determine if a new tool would yield additional promising candidates. If one or more variants of interest was identified, a group review was used to determine whether to proceed with segregation analysis. The review consisted of scientists with expertise in retinal degeneration genetics and those with expertise in the likely impact of a given variant. There was no disagreement regarding the variants and cases reviewed. For the trio, the inheritance patterns were reviewed for variants of interest. For the selected variant (homozygous stop gain in CERKL), we verified that each parent was a heterozygous carrier.

Segregation analysis

Segregation analysis of the identified variants of interest in additional family members were performed using standard PCR amplification and Sanger sequencing analysis.

Data access

BAM and VCF files for sequenced samples associated with the 5 pedigrees that passed segregation analysis are in the process of being submitted to dbGaP (https://www.ncbi.nlm.nih.gov/gap/).

Supporting information

S1 Fig. Detailed view of a single sequencing cycle in avidity sequencing.

The number of sequencing cycles equals the total read length. The figure depicts four DNA concatemers attached to the flowcell surface. In the illustrated cycle, each of the four are being sequenced at different nucleotides, so bind each of the four avidites (A). The imaging step (B) determines the base, followed by avidite removal (C). An unlabeled nucleotide is incorporated (D) to advance along the template strand and the block is removed (E) so that the next cycle of sequencing can start.

https://doi.org/10.1371/journal.pone.0307266.s001

(TIF)

References

  1. 1. Ferreira CR. The burden of rare diseases. Am J Med Genet A. 2019;179: 885–892. pmid:30883013
  2. 2. Carss KJ, Arno G, Erwood M, Stephens J, Sanchis-Juan A, Hull S, et al. Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease. Am J Hum Genet. 2017;100: 75–90. pmid:28041643
  3. 3. Sanford EF, Clark MM, Farnaes L, Williams MR, Perry JC, Ingulli EG, et al. Rapid Whole Genome Sequencing Has Clinical Utility in Children in the PICU*: Pediatr Crit Care Med. 2019;20: 1007–1020. pmid:31246743
  4. 4. Souche E, Beltran S, Brosens E, Belmont JW, Fossum M, Riess O, et al. Recommendations for whole genome sequencing in diagnostics for rare diseases. Eur J Hum Genet. 2022;30: 1017–1021. pmid:35577938
  5. 5. Stranneheim H, Lagerstedt-Robinson K, Magnusson M, Kvarnung M, Nilsson D, Lesko N, et al. Integration of whole genome sequencing into a healthcare setting: high diagnostic rates across multiple clinical entities in 3219 rare disease patients. Genome Med. 2021;13: 40. pmid:33726816
  6. 6. Turro E, Astle WJ, Megy K, Gräf S, Greene D, Shamardina O, et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature. 2020;583: 96–102. pmid:32581362
  7. 7. Willig LK, Petrikin JE, Smith LD, Saunders CJ, Thiffault I, Miller NA, et al. Whole-genome sequencing for identification of Mendelian disorders in critically ill infants: a retrospective analysis of diagnostic and clinical findings. Lancet Respir Med. 2015;3: 377–387. pmid:25937001
  8. 8. Wright CF, FitzPatrick DR, Firth HV. Paediatric genomics: diagnosing rare disease in children. Nat Rev Genet. 2018;19: 253–268. pmid:29398702
  9. 9. The 100,000 Genomes Project Pilot Investigators, Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, et al. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care—Preliminary Report. N Engl J Med. 2021;385: 1868–1880. doi:https://doi.org/10.1056/NEJMoa2035790
  10. 10. Sohocki MM, Daiger SP, Bowne SJ, Rodriquez JA, Northrup H, Heckenlively JR, et al. Prevalence of mutations causing retinitis pigmentosa and other inherited retinopathies. Hum Mutat. 2001;17: 42–51. pmid:11139241
  11. 11. Biswas P, Villanueva AL, Soto-Hermida A, Duncan JL, Matsui H, Borooah S, et al. Deciphering the genetic architecture and ethnographic distribution of IRD in three ethnic populations by whole genome sequence analysis. Iyengar SK, editor. PLOS Genet. 2021;17: e1009848. pmid:34662339
  12. 12. Biswas P, Duncan JL, Maranhao B, Kozak I, Branham K, Gabriel L, et al. Genetic analysis of 10 pedigrees with inherited retinal degeneration by exome sequencing and phenotype-genotype association. Physiol Genomics. 2017;49: 216–229. pmid:28130426
  13. 13. Villanueva A, Biswas P, Kishaba K, Suk J, Tadimeti K, Raghavendra PB, et al. Identification of the genetic determinants responsible for retinal degeneration in families of Mexican descent. Ophthalmic Genet. 2018;39: 73–79. pmid:28945494
  14. 14. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456: 53–59. pmid:18987734
  15. 15. Arslan S, Garcia FJ, Guo M, Kellinger MW, Kruglyak S, LeVieux JA, et al. Sequencing by avidity enables high accuracy with low reagent consumption. Genomics; 2023 May. Available: pmid:37231263
  16. 16. Freed D, Pan R, Chen H, Li Z, Hu J, Aldana R. DNAscope: High accuracy small variant calling using machine learning. Bioinformatics; 2022 May.
  17. 17. Kolesnikov A, Goel S, Nattestad M, Yun T, Baid G, Yang H, et al. DeepTrio: Variant Calling in Families Using Deep Learning. Bioinformatics; 2021 Apr.
  18. 18. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative Genomics Viewer. Nat Biotechnol. 2011;29: 24–26. pmid:21221095
  19. 19. Ramsey K, Kruglyak S, Naymik M, Lajoie BR, Wiseman KN, Sanchez-Castillo M, et al. An efficient design for whole genome trio sequencing identifies key variants in rare neurological disorder cases. 2023.
  20. 20. Coonrod EM, Margraf RL, Russell A, Voelkerding KV, Reese MG. Clinical analysis of genome next-generation sequencing data using the Omicia platform. Expert Rev Mol Diagn. 2013;13: 529–540. pmid:23895124
  21. 21. Maggi J, Koller S, Bähr L, Feil S, Kivrak Pfiffner F, Hanson JVM, et al. Long-Range PCR-Based NGS Applications to Diagnose Mendelian Retinal Diseases. Int J Mol Sci. 2021;22: 1508. pmid:33546218
  22. 22. Lew YJ, Rinella N, Qin J, Chiang J, Moore AT, Porco TC, et al. High-resolution Imaging in Male Germ Cell–Associated Kinase (MAK)-related Retinal Degeneration. Am J Ophthalmol. 2018;185: 32–42. pmid:29103961
  23. 23. Stone EM, Andorf JL, Whitmore SS, DeLuca AP, Giacalone JC, Streb LM, et al. Clinically Focused Molecular Investigation of 1000 Consecutive Families with Inherited Retinal Disease. Ophthalmology. 2017;124: 1314–1331. pmid:28559085
  24. 24. Zou X, Fu Q, Fang S, Li H, Ge Z, Yang L, et al. PHENOTYPIC VARIABILITY OF RECESSIVE RDH12-ASSOCIATED RETINAL DYSTROPHY. Retina. 2019;39: 2040–2052. pmid:30134391
  25. 25. Benayoun L, Spiegel R, Auslender N, Abbasi AH, Rizel L, Hujeirat Y, et al. Genetic heterogeneity in two consanguineous families segregating early onset retinal degeneration: The pitfalls of homozygosity mapping. Am J Med Genet A. 2009;149A: 650–656. pmid:19140180
  26. 26. Tuson M, Marfany G, Gonzàlez-Duarte R. Mutation of CERKL, a Novel Human Ceramide Kinase Gene, Causes Autosomal Recessive Retinitis Pigmentosa (RP26). Am J Hum Genet. 2004;74: 128–138. pmid:14681825
  27. 27. Aleman TS, Soumittra N, Cideciyan AV, Sumaroka AM, Ramprasad VL, Herrera W, et al. CERKL Mutations Cause an Autosomal Recessive Cone-Rod Dystrophy with Inner Retinopathy. Investig Opthalmology Vis Sci. 2009;50: 5944. pmid:19578027