The Unstable CCTG Repeat Responsible for Myotonic Dystrophy Type 2 Originates from an AluSx Element Insertion into an Early Primate Genome

Myotonic dystrophy type 2 (DM2) is a subtype of the myotonic dystrophies, caused by expansion of a tetranucleotide CCTG repeat in intron 1 of the zinc finger protein 9 (ZNF9) gene. The expansions are extremely unstable and variable, ranging from 75–11,000 CCTG repeats. This unprecedented repeat size and somatic heterogeneity make molecular diagnosis of DM2 difficult, and yield variable clinical phenotypes. To better understand the mutational origin and instability of the ZNF9 CCTG repeat, we analyzed the repeat configuration and flanking regions in 26 primate species. The 3′-end of an AluSx element, flanked by target site duplications (5′-ACTRCCAR-3′or 5′-ACTRCCARTTA-3′), followed the CCTG repeat, suggesting that the repeat was originally derived from the Alu element insertion. In addition, our results revealed lineage-specific repetitive motifs: pyrimidine (CT)-rich repeat motifs in New World monkeys, dinucleotide (TG) repeat motifs in Old World monkeys and gibbons, and dinucleotide (TG) and tetranucleotide (TCTG and/or CCTG) repeat motifs in great apes and humans. Moreover, these di- and tetra-nucleotide repeat motifs arose from the poly (A) tail of the AluSx element, and evolved into unstable CCTG repeats during primate evolution. Alu elements are known to be the source of microsatellite repeats responsible for two other repeat expansion disorders: Friedreich ataxia and spinocerebellar ataxia type 10. Taken together, these findings raise questions as to the mechanism(s) by which Alu-mediated repeats developed into the large, extremely unstable expansions common to these three disorders.


Introduction
Myotonic dystrophy type 2 (DM2) is an autosomal dominant multi-system disorder.It is caused by expansion of a tetranucleotide CCTG repeat in intron 1 of the zinc finger 9 (ZNF9) gene on chromosome 3q21 [1].Patients with DM2 exhibit a wide range of phenotypes that include myotonia, muscle weakness, cardiac anomalies, cataracts, diabetes mellitus, and testicular failure [2][3][4][5].In a normal allele, the repeat shows a complex motif with an overall configuration of (TG) n (TCTG) n (CCTG) n .The number of CCTG tracts is less than 30, with repeat interruptions of GCTG and/or TCTG motifs [6], and is stably transmitted from one generation to the next [1].However, in the expanded allele, only the CCTG tract elongates, and the GCTG and TCTG interruptions disappear from the repeat tract.The sizes of expanded alleles are extremely variable, ranging from 75-11,000 repeats, with a mean of 5,000 repeats.The expanded DM2 alleles show marked somatic instability, with significant increases in length over time [1,5].Although the mechanism(s) underlying this unprecedented instability remains largely unknown, the uninterrupted CCTG repeat is prone to form a stable hairpin/dumbbell DNA structure and to expand due to an error in the recombination-repair mechanism [7][8][9].To date, DM2 mutations have been identified predominantly in European Caucasians [6,[10][11][12].Haplotype analysis indicates that the European DM2 mutations originate from a single founder, between approximately 4,000-11,000 years ago [10].Liquori et al (2003) reported that humans, chimpanzees, gorillas, mice, and rats share a conserved DM2 repeat motif and flanking sequences, suggesting a conserved biological function [6].However, the origin or evolutionary process of the DM2 repeat is still ambiguous.
A group of microsatellite repeat expansion disorders have been identified in the last two decades [13].Most of these mutations involve unstable triplet repeats located in different regions of respective genes.The roles of repeat expansion mutations in the pathogenic mechanisms of these diseases are diverse and complex.Similar to DM2 [1], Friedreich ataxia (FRDA) and spinocerebellar ataxia type 10 (SCA10) are caused by large intronic expansions and show marked somatic and germ line instability [14][15][16][17][18]. Interestingly, in both FRDA and SCA10, Alu elements are proposed to be a source of the microsatellite repeats implicated in disease [19][20][21][22][23]. Alu elements are abundant in the human genome, with .1.1 million copies, and preferentially accumulate in gene-rich regions [24].Due to this abundance, insertional mutation or unequal homologous recombination of Alu elements causes various inherited diseases [25].
To gain insight into the unstable DM2 repeat expansion mutation, we addressed the evolutionary history of the complex diand tetra-nucleotide repeat configuration of (TG) n (TCTG) n (CCTG) n and the flanking Alu element.

Results
To define the mammalian origin of the DM2 (TG) n (TCTG) n (CCTG) n repeat (hereafter referred to as ''DM2 repeat''), we compared human, chimpanzee, orangutan, rhesus macaque, marmoset, galago, tree shrew, mouse, rat, kangaroo rat, guinea pig, squirrel, rabbit, pika, alpaca, dolphin, cow, horse, cat, dog, microbat, megabat, hedgehog, shrew, elephant, tenrec, armadillo, sloth, and opossum genomes.We found repetitive elements of a DNA transposon and short interspersed repetitive elements (SINE), namely MER2, AluSx and AluY, located adjacent to the DM2 repeat, in inverse directions to the ZNF9 reading frame (Figure 1A).The genomic region corresponding to the human TCTG and CCTG tetranucleotide repeats was entirely absent from other mammalian species except chimpanzees (yellow shaded box in Figure 1A).Interestingly, DM2 repeat was immediately adjacent to an AluSx element (Figure 1B), and target site duplications (TSDs) were observed at both ends of AluSx (59-ACTRCCAR-39; black-shaded box in Figure 1B) and AluY (59-ATTTTTTT-39; light gray-shaded box in Figure 1B).Because the DM2 repeat and AluSx are situated between the TSDs, the repeat itself is likely to have evolved from the AluSx and its poly (A) tail.It was reported that the DM2 repeat and the ,200-bp 39-flanking sequence are conserved among human, mouse, and rat [6].While the 200-bp region is conserved between mouse and rat, we could not find the corresponding region in any other mammalian species (Figure S1A).Notably, the rodent dinucleotide (TG) n repeat was followed by an identifier (ID) element (Figure S1B), which is a rodent-specific SINE [26,27].
To elucidate the origin of the DM2 repeat sequences in primate evolution, we next analyzed the sequence and genomic structure of ZNF9 intron 1 in 26 primate species.PCR and sequence analysis revealed that DM2 repeats and the repeat surrounding regions varied considerably for examined primate species, except for Old World monkeys (Table 1, Figure 2, and Figure S2).Prosimian poly (T) tracts interrupted by AG and AA (59-(T) 15 AG(T) 10 AA-39), which seem to be the poly (A) tail of the Alu inserted into the opposite direction of the ZNF9 gene, were followed by the Alu element.The poly (T) tracts were conserved in both the smalleared galago and the greater galago (Figure S3).RepeatMasker classified these Alu elements as AluJo, which is one of the oldest Alu elements [24].The 165-bp region following the AluJo repeat in the small-eared galago is similar (56% identity) to the 131-bp region following human AluY (light blue dot plot in Figure 3A).Although the 59-piece of prosimian AluJo was truncated (dotted line in Figure 3B), the 39-piece of prosimian AluJo and TSD sequence were more similar to human AluY (56% identity) than those of AluSx (red dot plot in Figure 3A and Figure 3B), suggesting that prosimian AluJo (older Alu element) was inserted around the region where human AluY (younger Alu element) was inserted.On the other hand, there was no Alu element located on the region to corresponding to human AluY in New World monkeys (Figure 2 and dotted line in light gray-shaded boxes in Figure S4).Taken together, the prosimian AluJo was considered to have a different origin from AluSx and AluY and be retrotransposed independently into the same site of AluY inserted later in primate evolution.
Contrary to prosimians, simians shared a common AluSx element (Figure 2) and its flanking TSDs consisting of 59-ACTRCCAR-39 or 59-ACTRCCARTTA-39 (black shaded box in Figure S4), although the 39 TSDs was absent in Old World monkeys (dotted line in black shaded box in Figure S4).Although both AluSx and AluY were observed in Old World monkeys, apes, and human, AluY was completely absent in New World monkeys (Figure 2 and Figure S4), suggesting the AluY was inserted into the genome after the divergence of New World monkeys.Instead of the AluY insertion, additional AluS insertions (AluSc insertions in white-throated capuchin and squirrel monkey, and AluSp insertions in black-handed and long-haired spider monkeys) were observed (Figure 2 and Figure S5).Since these additional AluS insertions occurred in different sites and carried their own TSDs (blue, light green, and pink boxes in Figure S5), there were speculated to occur independently in each species of New World monkeys.
As with the DM2 repeat, a pyrimidine CT-rich sequence followed the 39-end of AluSx in New World monkeys (Figure 2 and Table 1).In Old World monkeys and gibbons, the repeat motif consisted mainly of TG dinucleotides, while a single TCTG and/ or CCTG sequence motif is present at the 39-end of the repeat in gibbon sequences (Table 1).Orangutan, gorilla, bonobo and chimpanzee sequences also contain di-and tetra-nucleotide repeat motifs, similar to the human sequence.Of note, there was no CCTG motif in the orangutan sequence, and the TCTG motif did not constitute repetitive forms in the chimpanzee (Table 1).Interestingly, in the gorilla, 38 bp of the 39-end of AluSx overlapped with the middle of the DM2 repeat (sequence underlined in Table 1), indicating that the duplication event occurred independently in the gorilla lineage.

Discussion
Alu elements are primate-specific SINEs accounting for more than 10% of the human genome [28].While most Alu elements lost their transpositional ability long ago, some active Alu elements can still increase their copy number.New insertions arise at a rate of approximately one in 20 births [29,30].Because there is no known mechanism specifically for Alu element excision, most remain in the genome as a record of ancient retrotransposition.Human Alu element is classified into subfamilies according to the insertion time from the oldest (AluJ) to intermediate (AluS) and young (AluY) [24,31].A number of Alu elements are associated with microsatellite repeats; in fact, 5.7% of Alu poly (A) tails contain a patterned A-rich sequence such as (TA 3 ) n , (CA 4 ) n , (GA 3 ) n , or (TA 2 ) n [32].Alu elements are therefore suggested to be a source of microsatellite repeats [33,34].However, there are few examples indicating that the Alu-derived microsatellite repeat is responsible for human genetic disease.
In this study, we determined that AluSx and the associated complex DM2 repeat in the ZNF9 gene are unique to primates, and are completely absent in other mammals (Figure 1A).This argues against previous findings that the complex repeat and the 39-flanking region are conserved among humans, mice, and rats [6].The corresponding region of the rodent dinucleotide (TG) n repeat and the following 39-flanking region are absent in other mammalian species (Figure S1A), and a rodent-specific ID element follows the dinucleotide repeat (Figure S1B).As a result, we conclude that the rodent dinucleotide repeat has a different origin from the primate DM2 repeat.
Among the primates, the AluSx element and the DM2 repeat are present in simians, humans, apes, Old World monkeys and New World monkeys (Figure 2).In addition, the AluSx and the complex repeat in the human ZNF9 gene are flanked by TSDs (59-ACTRCCAR-39 or 59-ACTRCCARTTA-39; black-shaded boxes in Figure 1B and Figure S4).These findings indicate that the Alu element was retrotransposed into the genome very early in primate evolution, which coincides with the time that Alu elements explosively increased in number [24].We also observed one of the oldest Alu elements, AluJo, in prosimian ZNF9 intron 1 (Figure 2 and Figure 3).The small-eared galago and the greater galago have a similar pattern of AluJo insertion (Figure S3).The AluJo element in prosimians and the AluSx element in simians appear to have different origins, because the position of the AluJo and the 39 flanking TSD are inconsistent with those of the AluSx, but rather more similar to those of the AluY (Figure 3).The time discrepancy between AluJo and AluY [24] also suggests that the AluJo element may have been independently retrotransposed into the prosimian lineage before divergence of the small-eared and greater galago.
Focusing on the 39-end of the simian AluSx element, we discovered pyrimidine (CT)-rich repetitive motifs in New World monkeys, (TG) dinucleotide repetitive motifs in Old World monkeys and gibbons, and (TG), (TCTG), and/or (CCTG) repetitive motifs in great apes and humans (Table 1).One of the most parsimonious scenarios of CCTG repeat evolution arises from these lineage-specific motifs (Figure 4).First, a poly (A) tail of AluSx was introduced into the genome in an inverse direction to the ZNF9 gene, generating a TTTT repeat motif.Second, T to G    mutant allele are thought to have acquired the DM2 instability and pathogenicity [1].
Alu dispersion throughout the genome provides opportunities for a higher level of unequal homologous recombination.Alumediated recombination is widely known as a source of local duplication and deletion [35], and is responsible for several human inherited disorders, including a-thalassaemia, Tay-Sachs disease and Duchenne muscular dystrophy [25].In DM2, the mechanism of unequal crossing over has also been proposed to generate the long uninterrupted CCTG allele [7,8], which is the basis of unstable expansion [36].The primate-specific burst of Alu retrotransposition would initiate the expansion of segmental duplication in the gene-rich region, a possibility consistent with an Alu-to-Alu mediated recombination event.In fact, significant enrichment of Alu repeats is observed near or within the boundary of duplication sites in the human genome [37].It is noteworthy that 38 bp of the 39-end of the AluSx element showed duplication in the middle of the DM2 repeat in the gorilla sequence (Figure 2 and the underlined sequence in Table 1), implying that AluSxmediated unequal crossing over occurred in the gorilla lineage.
Although Alu elements have been recognized as a source of various microsatellite repeats [32][33][34], there are, to date, only two known examples in which Alu elements underlie inherited microsatellite repeat expansion disorders: GAA triplet expansion in Friedreich ataxia (FRDA) derived from the middle A-rich site of Alu [19][20][21][22], and ATTCT pentanucleotide expansion in spinocerebellar ataxia type 10 (SCA10) from the poly (A) tail of Alu [17,23].Our results reveal that the DM2 CCTG tetranucleotide repeat is also derived from the 39-end of the Alu element, similar to the ATTCT repeat.It is interesting that the repeats responsible for these three disorders commonly originate from the AluSx element [19,20,23].Moreover, the AluSx insertion events occurred at approximately the same time for the three disorders, before the time of divergence of New World monkeys [20,23].This might be just a coincidence that all of the three are from AluSx, because AluSx (one of older Alu subfamilies [31]) are old enough to allow the time for evolutionary changes to create the types of repeats susceptible to expansion.Taken together, our data strengthen the evidence that Alu elements may be responsible for a wide variety of other hereditary microsatellite repeat expansion disorders, especially large non-coding repeat expansions [38,39].Because the characteristic common to DM2, FRDA and SCA10 is extremely unstable and large repeat expansions (up to thousands of repeats), the detailed molecular mechanism responsible for the instability of these Alu-mediated repeats warrants further investigation.

Ethics Statement
This study was carried out in accordance with the guideline for the use of non-human primate subjects, Primate Research Institute, Kyoto University.Blood samples were explicitly not taken for this study [23,40].The protocol was approved by the Ethical Committee of Nagoya University (#511).

Genomic PCR and Sequencing of Primate ZNF9 Genes
Genomic PCR reactions were performed in a 50 ml volume consisting of 16buffer for KOD-plus-DNA polymerase, 15 pmol of genomic PCR primers (Table S1), 1 mM MgSO 4 , 200 mM of dNTP mixture, 5% dimethyl sulfoxide, 0.5 unit of KOD-plus-DNA polymerase (Toyobo, Osaka, Japan), and 10-100 ng of template DNA.The PCR conditions included an initial denaturing at 94uC for 2 min, followed by 35 cycles at 94uC for 30 sec, 54uC for 30 sec, and 68uC for 1 min 30 sec, with an additional extension at 68uC for 3 min.The amplified PCR fragments were gel purified using Wizard SV gel and PCR clean-up systems (Promega) and directly sequenced using a CEQ 8000 DNA sequence system (Bechman Caulter).For samples showing ambiguous sequences and heterozygosity, the PCR products were gel purified and cloned into a pTA2 plasmid vector using the TArget Clone-plus cloning system (Toyobo) and sequenced using sequencing primers (Table S1 and Table S2).Nucleotide sequences were deposited in the DDBJ/EMBL/GenBank (accession numbers AB595981-AB596009).

Figure 4 .
Figure 4. Evolutionary diagram of ZNF9 repetitive motifs in the primate lineage.Evolutionary divergence after the AluSx retrotransposition event is indicated by dark bars.Parentheses imply multiple units.The number at each node represents divergence time according to TimeTree [44].Time scale is in millions of years.doi:10.1371/journal.pone.0038379.g004

Figure
Figure S1 Genomic structure of the mouse Cnbp (Znf9) gene surrounding the dinucleotide (TG) n repeat tract and the 200-bp 39-flanking region in intron 1 [6] with other species.(A) Genomic alignment of the mouse Cnbp (Znf9) gene and the corresponding regions of other mammalian species.A Yellow box highlights the location of mouse dinucleotide (TG)n repeat and the 200-bp 39-flanking region [6].(B) Sequence alignment of the dinucleotide repeat and the 39 flanking region in mouse and rat.A blue box and a gray thick arrow indicate the dinucleotide (TG) n repeat and rodent-specific ID element, respectively.(PDF) Figure S2 PCR analysis of intron 1 of the ZNF9 gene of primates, including Alu elements and the DM2 region.(A) Genomic structure spanning ZNF9 exons 1 and 2. Arrows indicate PCR primers.(B) 1% Agarose gel electrophoresis of PCRamplified genomic fragments from human, ape, Old World monkey, and New World monkey samples.''M'' denotes 1 kb DNA ladder (Invitrogen).(PDF) Figure S3 Sequence alignment of small-eared galago and greater galago.A gray thick arrow and a white box indicate AluJo element and poly (T) tract, respectively.(PDF) Figure S4 Multiple sequence alignment around the DM2 repeat region of human, apes, Old World monkeys, and New World monkeys.A dark gray-shaded box, a light gray-shaded box, a purple box, black shaded boxes, and white boxes indicate AluSx, AluY, the ZNF9 exon 2, target site duplications of AluSx, and target site duplications of AluY, respectively.A yellow box highlights the position of DM2 repeat sequences abbreviated as ''REPEAT''.Dotted lines indicate sequence gaps.(PDF) Figure S5 Multiple sequence alignment of seven species in New World monkeys showing species-specific AluS insertions.A yellow-shaded box, a purple-shaded box, a dark gray-shaded box, and light gray-shaded boxes indicate (CT)-rich repeat, the ZNF9 exon 2, AluSx, and other AluS insertions (AluSc insertions in white-throated capuchin and squirrel monkey, and AluSp insertions in black-handed and long-haired spider monkey), respectively.Target site duplications are shown on the both ends of Alu elements: AluSx in black-shaded boxes, AluSc of whitethroated capuchin in blue boxes, AluSc of squirrel monkey in light green boxes, and AluSp of spider monkeys in pink boxes.(PDF)

Table 1 .
Sequence configurations of the DM2 repeat in 24 primates species.