Pulse-Field capillary electrophoresis of repeat-primed PCR amplicons for analysis of large repeats in Spinocerebellar Ataxia Type 10

Large expansions of microsatellite DNA cause several neurological diseases. In Spinocerebellar ataxia type 10 (SCA10), the repeat interruptions change disease phenotype; an (ATTCC)n or a (ATCCT)n/(ATCCC)n interruption within the (ATTCT)n repeat is associated with the robust phenotype of ataxia and epilepsy while mostly pure (ATTCT)n may have reduced penetrance. Large repeat expansions of SCA10, and many other microsatellite expansions, can exceed 10,000 base pairs (bp) in size. Conventional next generation sequencing (NGS) technologies are ineffective in determining internal sequence contents or size of these expanded repeats. Using repeat primed PCR (RP-PCR) in conjunction with a high-sensitivity pulsed-field capillary electrophoresis fragment analyzer (FEMTO-Pulse, Agilent, Santa Clara, CA) (RP-FEMTO hereafter), we successfully determined sequence content of large expansion repeats in genomic DNA of SCA10 patients and transformed yeast artificial chromosomes containing SCA10 repeats. This RP-FEMTO is a simple and economical methodology which could complement emerging NGS for very long sequence reads such as Single Molecule, Real-Time (SMRT) and nanopore sequencing technologies.

Analysis of sequence structure of long repeat tracts larger than 2000 (bp) in SCA10 has been very limited due to the inability to sequence through the repeat using normal sequencing procedures. Some success has been seen using single molecule real time (SMRT) sequencing to determine sequence of SCA10 patient DNA [2] but that technique is limited due to expense, and its intrinsic methodology of single molecule sequencing, which in the case of SCA10 restricts the scope of information to a limited number of expanded repeat DNA molecules, since the normal allele is preferentially sequenced. Nanopore sensor technology also allows for single molecule long sequencing reads (10 4 −10 6 bases), with minimal amount of sample requirement. However, its effectiveness in reliably sequencing long repeats that possess complex structures has yet to be determined. We demonstrate the use of a combination of repeatprimed PCR in conjunction with the high-resolution pulse field capillary electrophoresis analysis using the FEMTO Pulse Automated Pulsed-Field CE Instrument (Agilent, Santa Clara, CA) to successfully determine sequence content of large expansions of pentanucleotide repeats in SCA10 patients. In addition, this technique was valuable in determining the stable integration of large pentanucleotide expansions into a yeast artificial chromosome in Saccharomyces cerevisiae. benefits between academic investigators and PacBio investigators. PacBio provided support in the form of salaries for authors [TC and YCT], but did not have any additional role in the study design, decision to publish, or preparation of the manuscript. However, they performed No-Amp SMRT sequencing of the DNA samples that we provided and performed computational analyses of the data to generate circular consensus sequences of the repeat. The specific roles of these authors are articulated in the 'author contributions' section. (1) the genetic diagnosis of SCA10, (2) 18 years of age or older, AND (3) capable of providing informed consent. The exclusion criteria were: (1) having ataxic disorder(s) other than SCA10, OR (2) unwillingness to participate in this study. The diagnosis of SCA10 was established by the referring physician based on clinical phenotype of ataxia and genetic testing. The mean and standard deviation of the age of study subjects were 54 ± 16years (range 34-89), and those of SCA10 repeat expansion size was 1,440 ± 506 repeats (range 822-2360). The Houston Methodist IRB specifically approved this study (Pro00013782).

DNA isolation
Genomic DNA from SCA10 patient blood was isolated using the DNA Blood and Tissue Kit (Qiagen). Genomic DNA from yeast cultures maintaining a yeast artificial chromosome with SCA10 repeat DNA were isolated using the Yeast DNA Extraction Kit (Thermo Scientific).

SMRT sequencing
SMRT sequencing was performed as previously described [2]. Briefly, PCR amplicons or unmodified genomic DNA containing SCA10 pentanucleotide repeat were restriction digested and ligated to the SMRTbell adaptors. The SMRTbell fragments of PCR amplicons underwent SMRT sequencing using the PacBio RSII or Sequel IIsequencer. SMRTbell fragments from unmodified genomic DNA were cleaved with CRISPR-Cas9 with gRNA for sequence upstream of the SCA10 repeat. A DNA hairpin adapter with polyA sequence was attached to the Cas9 cleavage site and pulled down using Magbeads carrying polyT DNA oligos [11]. The purified templates containing the SCA10 repeat sequence were subjected to SMRT sequencing.

FEMTO Pulse Analysis of Repeat-primed PCR (RP-PCR) Products (RP-FEMTO)
RP-PCR was performed as described previously [25,26]. RP-PCR is an amplification method using fluorescent labelled primers specific to the SCA10 locus upstream of the pentanucleotide repeat paired with pentanucleotide repeat specific primers amplifying at multiple sites within the repeat region. When capillary electrophoresis is performed on the RP-PCR products peaks can be detected where the primers bind indicating region of binding and therefore size and location of the specific repeated DNA.
Briefly, 100 ng of genomic DNA from blood from SCA10 patients, or 25 ng of yeast genomic DNA containing pentanucleotide repeat expansion on a yeast artificial chromosome (YAC), was added to 1X Amplitaq Gold™ 360 Master Mix (Thermofisher), final concentration of 0.6 nM of FAM-tagged forward flanking Primer, 0.6 nM of Tail primer, and 0.06 nM of the respective internal repeat primer in a 30 ul total volume to determine repeat motifs (see Table 1 and Fig 1 for primer and assay details). Internal repeat primers for each known interrupting repeat within the expanded repeats, i.e., (GGAAT) 8 , (AGGAT) 8 and (GGGAT) 8 , were used to detect (ATTCC) n , (ATCCT) n and (ATCCC) n interruptions. The PCR was run in a Veriti ABI PCR Thermal Cycler at the following conditions: Initial Denature 93˚C 3', 17 cycles of 93˚C for 15 sec, 61˚C for 30 sec and 64˚C for 5 minutes, followed by 18 cycles of 93˚C for 15 seconds, 61˚C for 30 seconds, 64˚C for 5 minutes increasing 15 seconds per cycle, followed by a 72˚C for 10 minute extension.

Analysis
Raw data obtained from running the samples in the FEMTO Pulse System was analyzed using the PROSize Data Analysis Software from Agilent.

Comparison of RP-FEMTO data with SMRT sequencing from genomic DNA of SCA10 patients
SCA10 pentanucleotide repeat expansions with (ATTCC) n , (ATCCT) n , and (ATCCC) n interruptions were observed in SCA10 patient's DNA [2,27]. Sequencing of the large repeats by conventional methodology including the NGS did not work due to the large repeat sequence structure. The repeat units within internal sequences of expanded SCA10 repeats were initially Detects length of ATTCT repeat RP PCR of (ATTCT)n with (ATTCC) n interruptions Forward Flanking primer Tail primer ATTCT primer Detects length of ATTCT repeat Forward Flanking primer Tail primer ATTCC primer Detects length of ATTCC repeat and position within the ATTCT repeat RP PCR of (ATTCT) n with (ATCCT) n and (ATCCC) n interruptions Forward Flanking primer Tail primer ATTCT primer Detects length of ATTCT repeat Forward Flanking primer Tail primer ATCCT primer Detects approximate length of ATCCT interruption and position within the ATTCT repeat Forward Flanking primer Tail primer ATCCC primer Detects approximate length of ATCCC interruption and position within the ATTCT repeat RP-PCR product was diluted 1:400 in Tris-EDTA and 2μl was used to run on the FEMTO Pulse analyzer (Agilent, Santa Clara, CA) using both the Large DNA Separation FP-5001 Gel or the small DNA separation gel FP-5201 Gel (Agilent) following the manufacturer's instructions. https://doi.org/10.1371/journal.pone.0228789.t001 identified by long PCR amplification of relatively small SCA10 repeat expansion alleles, shearing of the amplicon, shotgun cloning of sheared fragments, and Sanger sequencing of cloned sequences, which produced histograms of repeat units within the SCA10 expansion [2]. The SMRT sequencing of the same long amplicons provided long single molecule reads of these expanded allele sequences. Comparisons of these two sets of data showed general agreements of the repeat unit composition [2]. Furthermore, in long sequence reads of SMRT sequencing, the basic composition and arrangement of heterogeneous repeat units were maintained between the circular consensus sequences (CCSs) obtained from the genomic DNA of same individuals (unpublished data). Thus, we considered results of the SMRT sequencing of expanded SCA10 repeat as reliable reference sequences. We utilized the RP-FEMTO to identify the various interruptions and determine approximate repeat length of SCA10 pentanucleotide repeat expansion. RP-PCR was used to amplify pentanucleotide expansions of SCA10 patients' DNAs (see Table 1, Fig 1 and method sections). RP-PCR amplicons underwent fragment analysis using the FEMTO Pulse system (Agilent, Santa Clara, CA) which can generate data for fragments up to 165 kb. We used the Large DNA Separation 5001 Gel to detect large expansions over 6000 bp and the Small Separation FP-5201 Gel to detect smaller expansion under 6000 bp (Agilent, Santa Clara, CA). Fig 2 and Fig 3 show comparison of results obtained with the SMRT sequencing and those from the RP-FEMTO analysis.
DNA samples from 12 patients with SCA10 were analyzed by SMRT sequencing. From each patient's blood genomic DNA, 79 to 269 SMRT CCSs were obtained. We expected that SMRT sequence reads of normal and expanded SCA10 repeat will show the 1:1 ratio. However, expanded alleles were grossly underrepresented: only 2 to 33 of these sequence reads consisted of repeat expansions while the rest of sequence data were of the normal-length alleles. These expanded SCA10 repeats consisted of either (ATTCT) n or (ATTCT) n -(ATTCC) n -(ATTCT) 1-

Fig 2. Comparison of SMRT sequencing data with RP-FEMTO results of an (ATTCT) n pentanucleotide repeat expansion from an SCA10 patient. (a)
SMRT sequencing results for a pure (ATTCT) n repeat from SCA10 patient blood DNA. Single cell sequences, represented by the blue bars, resulted in eight expanded alleles ranging between 6200-6450 bp, and 154 normal length alleles ranging between 60-70 bp (for graph clarity, only 15 normal sequences are shown). (b) RP-FEMTO results of same DNA using ATTCT internal primer and running on the large separation gel, capable of detecting fragments up to 165kb. The repeat starts at 170 bp. The main capillary peak is between 6000-8000 bp but can continue to above 15,000 indicating PCR amplification of larger repeats. Repeat length mosaicism was observed with SMRT sequencing data in blood genomic DNA from SCA10 patient. (c) RP-FEMTO results of same DNA using ATTCT internal primer and running on the ultrasensitive NGS small separation gel which provides improved sensitivity showing peaks of individual repeat units but is limited to smaller fragments of 100-6000 bp. Considering that the repeat length of this genomic sample was above 7000, this may not provide accurate sizing.
https://doi.org/10.1371/journal.pone.0228789.g002 10 configurations. With the RP-FEMTO the size of expanded alleles was larger than that detected by SMRT sequencing. Although the RP-PCR may also have an amplification bias toward shorter alleles, the high sensitivity of the FEMTO-Pulse system was able to detect a small number of larger expanded repeats.
For example, SMRT sequencing showed variable lengths of eight molecules consisting of a pure expanded (ATTCT) n repeat ranging from 5400 bp to 6500 bp (Fig 2A). Whereas, SMRT sequencing gives results for limited individual repeat sequences, the RP-FEMTO gives a representation of the conglomerate of a greater number of molecules where the main capillary peak is around 6000 to 8000 bp, but there were also some PCR products more than 10,000 bp long, suggesting that much larger expansions exist in the patient's DNA but cannot be detected by SMRT sequencing. RP-FEMTO of blood DNA of normal individuals shows a single prominent peak corresponding to 60-70 bp (12-14 pentanucleotide repeats) and no extension of the peak indicating that the RP PCR peak extension observed in the capillary figures of the expanded repeats is specific to the expansion (Data not shown). Southern Data for this sample shows a band of 7000 bp. In a heterogenous population, Southern data would also have a bias towards the repeat lengths that are more prevalent in the genomic population. The RP-FEMTO data and the SMRT sequencing data for SCA10 samples containing (ATTCC) n interruptions show similar results (Fig 3A).

The use of RP-FEMTO in the determination of stability of repeat length and composition after transformation of large pentanucleotide repeats into a yeast artificial chromosome in Saccharomyces cerevisiae
To establish a yeast model of SCA10, large pentanucleotide repeat sequences derived from blood DNA from patients with SCA10 maintaining either a pure expanded (ATTCT) n repeat or an (ATTCT) n repeat with (ATTCC) n interruption were transformed into a yeast artificial chromosome (YAC) in Saccharomyces cerevisiae. To determine whether the large repeat maintained its size and content in yeast we used the RP-FEMTO for high throughput sequence evaluation of these YACs. Fig 5 shows results from an RP-FEMTO run of two individual yeast integrants. A pure ATTCT pentanucleotide maintains a length of more than 4000 bp in the

Discussion
We demonstrated that the RP-FEMTO provides a fast and economical analysis of interrupting repeat sequence structures in large expanded repeats of SCA10. This technology may be applicable to other repeat expansions that are known to contain heterogeneous repeats at disease a pure ATTCT SCA10 pentanucleotide repeat transformed into a YAC and run on the RP-FEMTO using the small separation gel. As expected, no peak is present using the ATTCC primer since no ATTCC sequence should be present in this sample. (b) RP-FEMTO results of yeast genomic DNA using ATTCT (black line) or ATTCC (blue line) internal primers of an SCA10 pentanucleotide repeat containing an (ATTCC) n interruption transformed into a YAC and run on the FEMTO Pulse system using the small separation gel. Using the RP-FEMTO methodology, we were able to determine a (ATTCT) 188 and (ATTCC) 650 . There may exist some heterogeneity in the yeast population since there appears to be some overlap between the ATTCT repeat and ATTCC insertion. loci, including ATXN8OS in SCA8 [3], BEAN in SCA31 [4], NOP56 in SCA36 [28], DAB1 in SCA37 [6], FXN in Friedreich ataxia [29], RFC1 in AR-CANVAS [17], SAMD12, TNRC6A and RAPGEF2 in BAFME [13], STARD7 in BAFME2 [14], MARCHF6 in BAFME3 [15], YEATS2 in BAFME4 [16], DMPK in DM1 [30], and TCF4 in Fuchs endothelial corneal dystrophy (FECD) [31]. These internal heterogeneous repeats within the expanded allele may contribute to reduced penetrance and variable disease phenotypes. In large repeat expansions of fragile X syndrome [32] and C9ORF72-ALS/FTD [33] SMRT sequencing has shown no interrupting sequences although the number of expanded alleles studied was limited. In large repeat expansion diseases with other repeat insertions that may be several kilobases away from the flanking PCR primer, RP-Femto would be a good solution for detection of known sequence repeat insertions since only a very small amount of RP-PCR product would be necessary for detection as observed with the SCA10 patient samples. In these disorders, determination of the sequence composition of expanded repeat is important for diagnosing the disease, understanding the pathogenic mechanism, and developing therapeutic strategies. Large repeat expansions are currently difficult to sequence using typical sequencing methods. The size of the repeat as well as the potential secondary structures that the repeats can form can hinder the usual sequencing methods to examine the repeats. SMRT sequencing is one of the new technologies used today to be able to sequence such large repeat expansions. For SCA10, up to 7500 bp of pentanucleotide repeats were sequenced using SMRT sequencing. However, the limitation of SMRT sequencing is preferential sequencing of small repeat-size alleles, including normal alleles and small mutant alleles, and difficulties in generating circular consensus sequences of expanded SCA10 repeats containing large (ATCCC) repeat interruptions. We believe that this is due to the high GC content although the SMRT sequencing has been successfully applied to expanded GGGGCC repeats of C9ORF72-ALS/FTD [33] and CGG expansions in fragile X syndrome [32]. Secondary structures unique to ATCCC repeats may play a role in this hindrance to SMRT sequencing.
To bypass these obstacles, we used RP-FEMTO to determine pentanucleotide repeat sequence content and to determine repeat stability. Due to the intrinsic instability of repeat DNA in both patients and in laboratory systems, variable repeat lengths are possibly present within the same system or patient. SCA10 patients' blood DNA showed great repeat length heterogeneity with the single molecule sequencing (SMRT sequencing). Examination of the same SCA10 patient samples with the RP-FEMTO showed that some cells may contain larger repeats than evident with SMRT sequencing and that there may be even more heterogeneity within the patient DNA sample. Due to the sensitivity of the FEMTO-Pulse system, even a small amount of PCR product will be detected, enabling amplification of very larger repeats that may be limited in the genomic DNA sample and may elude detection by other methods like Southern hybridization due to sensitivity or SMRT sequencing due to ratio of the rare repeat to the entire genomic DNA content. Southern hybridization would have the same drawback as SMRT sequencing in that with Southern hybridization it would be more likely to observe the majority repeat length (the repeat length that is highest in concentration) in a heterogenous DNA repeat population. Also in general, 5-10 ug of human genomic DNA is required for southern hybridization as opposed to the 50 pg requirement for RP-PCR.
The RP-FEMTO was essential for quickly assessing length and repeat motifs of pentanucleotide repeats after integration into a YAC present in the yeast Saccharomyces cerevisiae. The large repeat size and content of the repeat made it difficult to PCR and sequence using canonical PCR and sequencing methods. The RP-FEMTO overcomes this problem with only 25 ng of RP PCR product necessary for the analysis. The RP-FEMTO is an easy and quick methodology that allows for assessing many integrants simultaneously, albeit not individually at the single molecule level, for repeat length and repeat motifs in the yeast genome.
The RP-FEMTO can determine the configuration of known interrupting runs of heterogeneous repeat units in large repeat expansions but does not sequence expanded repeats. To use the RP-FEMTO effectively, users need to understand its limitations ( Table 2). The RP-FEMTO cannot replace SMRT or Oxford Nanopore sequencing technologies but should be used to complement and enhance the utility of these sequencing technologies. We have not tested the Oxford Nanopore systems that can also generate long reads. With improving sequencing accuracy and efficiency of new sequencing technologies that can generate long reads of expanded repeats, the RP-FEMTO may become unnecessary in analysis of very large repeat expansions. However, until then, and even after then, this method should be useful for analysis of large repeat expansions in some disorders. We conclude that the RP-FEMTO is useful for a quick and cost-effective analysis of very large microsatellite repeat expansion, especially when the expanded repeat contains known runs of heterogeneous repeats.