Comprehensive Mutation Analysis for Congenital Muscular Dystrophy: A Clinical PCR-Based Enrichment and Next-Generation Sequencing Panel

The congenital muscular dystrophies (CMDs) comprise a heterogeneous group of heritable muscle disorders with often difficult to interpret muscle pathology, making them challenging to diagnose. Serial Sanger sequencing of suspected CMD genes, while the current molecular diagnostic method of choice, can be slow and expensive. A comprehensive panel test for simultaneous screening of mutations in all known CMD-associated genes would be a more effective diagnostic strategy. Thus, the CMDs are a model disorder group for development and validation of next-generation sequencing (NGS) strategies for diagnostic and clinical care applications. Using a highly multiplexed PCR-based target enrichment method (RainDance) in conjunction with NGS, we performed mutation detection in all CMD genes of 26 samples and compared the results with Sanger sequencing. The RainDance NGS panel showed great consistency in coverage depth, on-target efficiency, versatility of mutation detection, and genotype concordance with Sanger sequencing, demonstrating the test's appropriateness for clinical use. Compared to single tests, a higher diagnostic yield was observed by panel implementation. The panel's limitation is the amplification failure of select gene-specific exons which require Sanger sequencing for test completion. Successful validation and application of the CMD NGS panel to improve the diagnostic yield in a clinical laboratory was shown.


Introduction
The main objective of human genetics is to identify the genetic variants underlying specific phenotypes and provide molecular diagnoses to guide clinical management [1]. Genetic heterogeneity in inherited disorders including breast cancer, intellectual disability, ataxia, hearing loss, immunodeficiency, cardiomyopathies, and inherited muscle disorders such as the congenital muscular dystrophies has driven the development of novel screening and testing approaches [2]. Over the past decade, our molecular understanding of the congenital muscular dystrophies (CMDs) has expanded greatly [3]. CMDs are rare genetic muscle disorders that present early at birth or within the first 2 years of life, with variable inheritance patterns. Generally, they are characterized by congenital hypotonia, delayed motor development, progressive muscle weakness, and dystrophic features on muscle biopsy [4].
CMDs are genetically and phenotypically heterogeneous and include disorders caused by: 1) recessive or dominant COL6A1, COL6A2, and COL6A3 mutations, which manifest as Ullrich or Bethlem CMD; 2) recessive LAMA2 mutations, which result in merosin-deficient CMD (MDC1A); 3) recessive mutations in the POMT1, POMT2, POMGNT1, FKRP, FKTN, or LARGE genes, which manifest as purely muscular or syndromic conditions, such as Fukuyama CMD, muscle-eye-brain disease (MEB), Walker-Warburg syndrome (WWS), or CMD type 1C (MDC1C); 4) recessive mutations in the SEPN1 gene, which manifest as rigidspine syndrome (RSS); 5) dominant lamin A/C mutations, which result in the congenital form of Emery Dreifuss muscular dystrophy; and 6) recessive ITGA7 mutations, which manifest as CMD with integrin A7 deficiency [4,5]. Novel genes remain to be identified in patients with clinical collagen VI-like, dystroglycanopathy-like and not previously described phenotypes with congenital onset and dystrophic muscle biopsy findings. In a recent survey of a national UK referral service for CMD diagnostics, a genetic diagnosis was reached in 53 of 116 patients, with most common diagnoses being collagen VI related disorders (19%), dystroglycanopathy (12%) and merosin deficient congenital muscular dystrophy (10%) [6].
Diagnosing a specific CMD subtypes may present a challenge and requires a multidisciplinary expertise (neurology, pathology, genetics, and neuroradiology). Currently, there are only a few centers with the expertise to recognize and delineate the wide range of overlapping clinical features [4,7,8]. Even then, patients undergo a battery of immunostains and individual gene sequencing analyses to arrive at an exact diagnosis [9]. The most frequently used CMD diagnostic method is sequential gene-bygene mutation detection by Sanger sequencing. Sequencing a gene is rightly the preferred approach for mutation detection in CMD, since most mutations identified to date are either point mutations or small insertions and deletions. However, the difficulty in disease delineation and the increasing number of genes implicated mandate a more comprehensive molecular diagnostic approach to improve both diagnostic and cost efficiencies.
In this study, we describe the development and validation of an NGS panel, using RainDance (as enrichment technology) and Applied Biosystems SOLiD3 (as sequencing platform), for comprehensive mutation detection in CMD genes, along with its diagnostic yield and clinical implementation in patients with confirmed diagnosis, serving as positive controls, and with clinically suspected CMD (Table 1).

Sequencing yields and optimal target base coverage
The targeted next-generation sequencing panel was designed to amplify all exons of the 12 known CMD associated genes ( Table 2). The overall target enrichment and next-generation sequencing   yielded an average of 1,195,183 reads; 57% of these reads mapped to the genome and 35% mapped to the targeted regions. The total coverage of all targeted bases ranged from 87 to 94% at 56 and from 84 to 94% at 206. The mean gene depth of coverage across all samples ranged from 06 for POMT1 to 1086 for POMGNT1, with an average of 506. Despite the high mean gene read depth and target region coverage, several exons, including exon 1 of SEPN1 and all exons of POMT1 had no mapped reads [23]. Of all the exons targeted, 49 consistently showed less than 206 average coverage across all samples, which could be due largely to sequence complexity, problematic library synthesis, or unusual GC content of the fragments (Table 3). SEPN1, COL6A1, and POMT2 had very high GC content, namely, 87%, 73%, and 73%, respectively, specifically in their first coding exon, while POMT1 had an average of 56% GC content across all exons. In contrast, exons 34, 13, 6, 7, and 27, of COL6A1, COL6A3, FKTN, FKTN, and LAMA2, respectively, had very low GC content and consequently had no mapped reads in the NGS data.
High variant detection rates in targeted regions with excellent coverage, phred scores, and allele percentages Known variants/mutations in the 5 variant-positive control samples re-detected by the CMD NGS test had coverage, Phred score, and allele percentages well above acceptable thresholds (Table 4). This included all previous Sanger-detected variants in exons not problematic for NGS. As expected, none of these variants were detected in the wild-type control (CMD-13) sample. However, additional variants (ranging from 0-14 per sample), not confirmed by Sanger sequencing, passed these threshold values and are thus false positives (Table 5). These false-positive calls are due to several factors including low coverage, low Phred scores and skewed allele percentages of those specific genomic areas. As shown in the Table 5, the false-positive rates ranged from 0% in CMD-11 to 37% in CMD-8. Overall, a total of 85 variants were detected in all 5 variant-positive control samples, of which 68 are reported in the dbSNP database and are, thus, true variants. This in turn reflects the low false-positive rate of the targeted approach via this panel test. Similarly, in the 20 blinded samples, 271 variants were detected, of which 98 are dbSNP calls ( Table 6).

Versatility of variant detection
The ability of NGS to efficiently detect all kinds of mutations, including point mutations and small insertions/deletions, was interrogated using previously Sanger-confirmed variants or mutations in the variant-positive samples (Table 4). These variant-positive samples represented all the different types of variants NGS is expected to detect. These included silent, c.1770G.C (p.T590) (CMD-8); missense, c.2084C.T (p.D695V) (CMD-9); small deletion, c.6617delT (CMD-11); and  small duplication, IVS24-3dupC (CMD-10) variants in genes COL6A2, LAMA2, COL6A3, and COL6A2, respectively ( Table 4). Some of these insertion and deletion variants, as detected by NGS and confirmed by conventional Sanger sequencing, are represented in Figure 1. Additionally, potential causative mutations and variants of different types were identified in blinded samples and were concordant with previous Sanger sequencing results obtained in Dr. Carsten Bonnemann's laboratory (Table 7). All confirmed variants had a coverage of at least 76 and mutant allele percentage of 17-71% for heterozygous and 78-100% for homozygous variants (Tables 4 and 7).

Multiple parameters as data filters for identification of causative mutations
For clinical applications, it is crucial to reduce the number of variants that need confirmation by Sanger sequencing (lower falsepositive rate) to maintain an acceptable cost-benefit ratio. This was evaluated with the variant call data from known positive samples (CMD-8 to CMD-12) by including multiple parameters for variant filtering as illustrated in Figure 2. Following this, variants with coverage less than 206were all filtered out, unless they were listed in HGMD, dbSNP, or EGL (Emory Genetics Laboratory) databases as definitive known pathogenic variants or mutations. Similarly, variants with high frequency (observed in multiple samples) were removed, unless they were found to be in HGMD, frameshift, or nonsense changes. Additionally, synonymous   variants, dbSNP variants with allele frequency .1%, were filtered out of the list as well. When the coverage is greater than 206, variant calls with mutant allele percentages greater than 85% and less than 30% were retained as homozygous and heterozygous calls, respectively, and others were filtered out. Simultaneous consideration of the multiple parameters described above, for data filtering, significantly reduced the number of variant calls requiring Sanger confirmation (Table 5). Using these parameters, all potential disease-causing mutations previously identified by Carsten Bonnemann's laboratory, but initially blinded to EGL staff, were detected in the 20 blinded clinical samples (Table 7).

NGS panel approach has higher clinical yield compared to a sequential Sanger sequencing approach
Since the NGS CMD panel was made available at EGL, a number of tests have been ordered and yielded a higher call percentage than single-gene tests (Table 8). Call percentage refers to the identification and report of pathogenic mutations in the respective gene. For example, LAMA2 and FKTN yielded the highest percentages for the single-gene test, 64% and 35%, respectively. In contrast, panels 1 and 2, containing LAMA2 and FKTN, had call percentages of 54% and 94%. Generally, the total percentage call for the gene-by-gene approach was 17% compared to the higher total call percentage of 41% obtained by the 4 NGS panels. The low call percentage or mutation detection rate for individual gene tests is due mostly to incorrect presumption of genetic etiology and suspicion of the incorrect gene.

Discussion
Recent studies have highlighted the potential of NGS in mutation detection [2,[24][25][26][27]; however, it remains imperative to query and validate its performance efficiency prior to implementation in a clinical testing laboratory. To this end, we investigated the potential of NGS technology and its efficiency as a clinical diagnostic tool by implementing a high-throughput gene panel for CMD. Target coverage in a panel approach varies based on the target enrichment method employed prior to NGS. The relatively high sensitivity, specificity, and accuracy of target enrichment for NGS offered by the highly multiplexed PCR-based technology (RainDance) over other hybridization technologies has been previously demonstrated for CMD [23].
The evaluation of NGS performance parameters is critical to offering such diagnostic strategies in a clinical laboratory. Our study design using negative (wild-type, CMD-13) and positive controls (variant-positive samples CMD-8 to CMD-12) demonstrated the efficiency of the NGS technology in detecting all potential mutation types ( Table 4). The study also established the limitations of the technology. For the entire CMD panel, there were at least 49 exons across 9 genes that had significantly low coverage (,206) and required Sanger sequencing (Table 3). Among these 49 exons, were POMT1 exons (19 exons) that failed amplification and hence had no coverage at all, most likely due to unusual GC content and sequence complexity. Amplification of exon 1 of most genes was problematic for similar reasons. [1,18,28]. Empirically, the average failure rate for target enrichment is 15-20%. Though not comparable to the low failure rates of Sanger sequencing (3%: 2 out of 63 amplicons), the NGS Total NGS Variants, Include variants that passed threshold settings described in text; Non-dbSNP Variants, NGS variants not listed in dbSNP; Filtered Variants, Variants that were initially detected by NGS but did not meet thresholds and were filtered by the criteria described in Figure 2.  failure rate is compensated for by other advantages of the technology [29]. The flexibility of both batch processing and single sample processing offered by RainDance minimizes reagent wastage and maintains rapid turnaround times, even when processing is less frequently ordered for samples of rare disorders in diagnostic laboratories. One additional advantage of Rain-Dance for target enrichment is the ability to differentiate genes from pseudogenes by the use of gene-specific primers for PCR amplification unlike other hybridization based technologies. Comparable low coverage (,106 coverage in 20% of exons) has also been observed in non-CMD genes as reported by a recent study of ataxia gene targets involving array-based enrichment and NGS sequencing [2]. Sanger sequencing may remain the strategy of choice for confirmation of low-coverage variants. Following this successful validation, a clinical CMD NGS panel was launched at the Emory Genetics Laboratory (EGL) and has been used successfully by clinicians in CMD cases presenting with overlapping phenotypes, inconclusive biochemical studies, nondiagnostic brain or muscle MRIs. This expedited approach to molecular diagnosis avoids the diagnostic odyssey and cost associated with a serial gene testing approach. For example, on average the number of exons comprising a CMD gene is 25 (ranging from 1-65), costing around $2500 per gene for molecular analysis, clinical interpretation, and report issuance. Alternatively, the NGS-based sequencing panel offered for just $5000 includes comprehensive analysis of the current12 disease-associated genes. In addition, as more and more disease-causing genes are identified, they can be added to the panel without a significant increase in the overall cost, which is very unlikely to be the case for a gene-be-gene approach. In this study, the CMD panel approach   The flowchart demonstrates the criteria used to select variants that were Sanger sequencing confirmed. In essence, selected variants that had .206 coverage, a low allele frequency, and nonsynonymous changes were Sanger confirmed if they were listed on HGMD, frameshift, or nonsense changes. In addition, interesting variants with a coverage ,206 were also confirmed ( convincingly showed better mutation detection or diagnostic yield compared to a single-gene analysis ( Table 8). The efficiency and better yield of the panel approach is better illustrated by the analysis of the 20 blinded samples included in our study (Table 7). Several samples, which underwent a series of single-gene tests, and others which remained CMDs of unknown molecular etiology due to inconclusive biochemical or immunologic assays, all received a definitive diagnosis through this NGS approach. Others, in which no mutations could be detected, could either be negative for all known genes or have larger deletions or duplications requiring array CGH analysis. Although whole-exome sequencing is making its way into clinical genetics, the high false-positive rate, long turnaround times requiring 3 to 6 months for analysis and interpretation alone, ethical challenges involving secondary findings, and the high test price when data and interpretation are included, may present prohibitive barriers to commercial application [12,[30][31][32]. Exome sequencing currently covers about 92% of the exome. From the experience of the whole exome analysis pipeline at EGL, approximately 10-20% of the 92% had low or zero coverage. The failed exons vary between individual runs and may involve genes of interest strongly associated with the patient phenotype. Given the large number of failed exons, they cannot practically be followed up and confirmed by Sanger sequencing, making the test necessarily incomplete. In contrast, targeted panel sequencing gives the laboratory the ability to complete the test by tracking all NGS failed exons and confirming by Sanger sequencing. These current limitations further highlight the significance of panel approaches over current gene-by-gene or potential whole exomebased tests. In agreement with the newly released CAP and Gargis et al. guidelines for clinical NGS validation and implementation, our study demonstrates that an NGS panel approach can be successfully adopted in clinical laboratories to improve disease diagnosis [33].
In conclusion, clinical CMD NGS panels offer cost-effective and more rapid turn-around molecular diagnostic testing than the conventional sequential Sanger sequencing of associated genes. A detailed step-wise approach to recommending NGS panel tests and diagnosing the disease subtype is illustrated as a flowchart ( Figure 3). We anticipate that following this diagnostic algorithm will ensure a more efficient and rapid diagnosis for patients with CMD who currently lack molecular characterization.

Primer Library Design
A primer library for target amplification of all exons of the known CMD genes was designed using the manufacturer's design parameters (RainDance Technologies) and the Primer3 algorithm (http://frodo.wi.mit.edu/primer3/). All SNPs from dbSNP build 130 were filtered from the primer selection region. The in-house primer design pipeline performed an exhaustive primer design and selection across the 65-kb targeted region. The final library consisted of primer pairs for successful amplification of 383 amplicons with Tm ranging from 57 to 59uC and primer length ranging from 16 to 21 bp.

Target Enrichment by Droplet-Based Multiplex PCR (RainDance)
Intact genomic DNA samples were sheared to yield 3-4 kb long fragments with the Covaris S2 instrument following the manufacturer's instructions. For target sequence amplification, input DNA template mixture was prepared by mixing 1.5 mg of the above sheared DNA fragments, 4.7 ml of High-Fidelity Buffer (Invitrogen), 1.26 ml of MgSO4 (Invitrogen), 1.6 ml of 10 mM dNTP (Invitrogen), 3.6 ml of 4 M Betaine (Sigma), 3.6 ml of RDT Droplet Stabilizer (RainDance Technologies), 1.8 ml of dimethyl sulfoxide (Sigma), 0.7 ml 5 units/ml of Platinum High-Fidelity Taq (Invitrogen), and nuclease-free water to bring to a final reaction volume of 25 ml, per the RainDance protocol. These samples were then subjected to emulsification using the RDT1000 instrument (RainDance Technologies) to generate individual PCR droplets. Droplets for each sample were automatically dispensed as an emulsion into separate PCR tubes and transferred to a standard thermal cycler for PCR amplification. Samples were cycled in an Applied Biosystems GeneAmp 9700 thermocycler as follows: initial denaturation at 94uC for 2 min, 55 cycles at 94uC for 15 s, 54uC for 15 s, 68uC for 30 s, final extension at 68uC for 10 min, and a 4uC hold.
After PCR amplification, an equal volume of RDT 1000 Droplet Destabilizer (RainDance Technologies) was added to each emulsion PCR tube, vortexed for 15 s, and centrifuged at 13,000 g for 5 min. The oil from below the aqueous phase was carefully removed from the sample. The remaining sample was then purified using a MinElute column (Qiagen) following the manufacturer's recommended protocol.

NGS Sequencing by SOLiD3
The ends of the amplicons were blunt end-repaired by adding the reagents to the purified DNA (diluted to 68 ml): 10 ml 106 Blunting Buffer (Epicentre), 10 ml 2.5 mM dNTP Mix (Invitrogen), 10 ml 10 mM ATP, 2 ml End-it enzyme mix (Epicentre), and sterile water to a total reaction volume of 100 ml. The reaction was incubated at room temperature for 30 min, and the DNA was immediately purified using Ampure XP beads (Agencourt). The amplicons were then concatenated using the NEB Quick Ligation kit according to the manufacturer's protocol. DNA was purified using Ampure XP beads and eluted in 105 ml of low TE. An Agilent 7500 Bioanalyzer chip was run to confirm the concatenation of PCR products. The sample was then fragmented and processed as described in the manufacturer's standard SOLiD3 workflow (Applied Biosystems).

Validation of Variants and Mutations by Sanger Sequencing
Exon-specific primers were designed to amplify individual exons along with 20-50 bp of flanking intronic regions on either side for LAMA2, COL6A1, COL6A2, COL6A3, FKTN, POMGnT1, POMT1, POMT2, FKRP, LARGE, ITGA7, and SEPN1. Samples were prepared by fluorescence sequencing on the ABI 3730XL DNA analyzer with BigDye Terminator chemistry and the BigDye XTerminator purification kit (Applied Biosystems). Individual sequences were aligned against reference sequences (downloaded from NCBI) using Mutation Surveyor v3.30 software (Soft-Genetics) and analyzed for variations or mutations.