AmericaPlex26: A SNaPshot Multiplex System for Genotyping the Main Human Mitochondrial Founder Lineages of the Americas

Phylogeographic studies have described a reduced genetic diversity in Native American populations, indicative of one or more bottleneck events during the peopling and prehistory of the Americas. Classical sequencing approaches targeting the mitochondrial diversity have reported the presence of five major haplogroups, namely A, B, C, D and X, whereas the advent of complete mitochondrial genome sequencing has recently refined the number of founder lineages within the given diversity to 15 sub-haplogroups. We developed and optimized a SNaPshot assay to study the mitochondrial diversity in pre-Columbian Native American populations by simultaneous typing of 26 single nucleotide polymorphisms (SNPs) characterising Native American sub-haplogroups. Our assay proved to be highly sensitive with respect to starting concentrations of target DNA and could be applied successfully to a range of ancient human skeletal material from South America from various time periods. The AmericaPlex26 is a powerful assay with enhanced phylogenetic resolution that allows time- and cost-efficient mitochondrial DNA sub-typing from valuable ancient specimens. It can be applied in addition or alternative to standard sequencing of the D-loop region in forensics, ancestry testing, and population studies, or where full-resolution mitochondrial genome sequencing is not feasible.


Introduction
Population genetic studies on modern-day Native American populations have described the presence of five haplogroups (hgs), termed A, B, C, D and X [1][2][3]. These five hgs are shared with East Asian populations and support an entry route to the Americas via the Bering landmass. However, Native American populations can be distinguished from their East Siberian source populations by exhibiting distinct sub-haplogroups (sub-hgs), which can only be found in the Americas. These so-called 'founder lineages' have been used to describe the demographic history of Native American populations and to shed light on the timing of the entry into and spread throughout the Americas [4][5][6]. The fact that the mtDNAs of all human populations native to the Americas can be assigned to one of the founder lineages pertains to stochastic events that would have affected the initial colonizers of the Americas [7]. The low genetic variation found in modern Native American groups is believed to be due to either population bottlenecks or genetic drift [8,9].
Most mitochondrial DNA (mtDNA) studies on prehistoric American populations involve sequencing of the D-loop, which contains Hypervariable Regions 1 and 2 (HVR1 and HVR 2 respectively), to describe a sequence haplotype, from which the hg can be inferred [10,11]. Sequencing of the HVR regions of mtDNA was relatively cost-effective and less time consuming than full mtDNA sequencing, and is therefore still the method of choice for many labs which study human populations [11]. Yet not all lineages harbour enough variation in the D-loop from which to infer a sub-hgs at a deeper level than the overall hg, let alone a specific founder lineage [11][12][13][14]. As a result, many past and present studies on Native American population history have been restricted to the information gained from the distribution of the major five Pan-American hgs.
The coupling of multiplex polymerase chain reaction (PCR) with a Single Base Extension (SBE) reactions, based on the established SNaPshot (Applied Biosystems) or minisequencing principle, has been widely used to design panels of single nucleotide polymorphisms (SNPs) for forensic and anthropological studies [15]. It has also found wide use in population genetic studies focussing on mtDNA and Y-chromosome SNPs, either including SNPs with a global representation or via a targeted selection of characteristic SNPs representing specific geographic regions [13,[16][17][18][19][20][21]. The design of a SNP panel including those markers defining the 15 American founder lineages described by Perego et al. [22] and more had not been attempted, although 'Multiplex 3' in van Oven et al. 2011 [23] covered 12 out of these 15. The primary aim of this study was therefore to design a novel SNaPshot assay that enables a fast and cost-efficient highresolution typing of the majority of known Native American sub-hgs by targeting 26 characteristic SNPs. Our goal was to develop an assay that is universally applicable to accommodate the specific needs of damaged and degraded DNA in ancient DNA work and forensics. Selective sequencing of the SNP regions of interest not only allows for flexibility in the number and choice of SNP sites but also allows (with reservations) the design of ultrashort amplicon lengths (50-80 bp) suitable for degraded DNA typing [13], while using far less DNA than traditional sequencing methods or SNP-typing in individual singleplex PCRs. This is of great importance in forensic and ancient DNA studies where sample DNA is a limited resource [23,24]. The secondary aim was to develop an assay that could complement an established assay with a global set of SNPs (GenoCore22, see [24] but also [23]) and at the same time provide a fast and efficient screening tool that allows the assessment of overall sample quality (presence of very short fragments of endogenous mtDNA and absence of contaminant hgs) for further use in mitochondrial genome sequencing via DNA library preparation and targeting enrichment techniques, e.g. [25][26][27].

AmericaPlex26 SNP selection
We developed a multiplex SNaPshot reaction targeting 26 SNP sites in total including characteristic SNPs of the four major Pan-American sub-hgs A2, B2, C1 and D1, as well as SNP sites for the minor Pan-American lineages C4c, D2a, D4h3a and X2a [6]. The initial choice of SNPs was based on a study by Perego et al. [22] describing 15 American founder lineages. Additional SNP sites were chosen for sub-hgs within each major hg based on the most up-to-date mtDNA phylogeny available at the time (phylotree.org, mtDNA tree Build 13, 28 Dec 2011) in order to enhance the discriminating power of the assay. For sub-hgs defined by more than one characteristic SNP we employed selection criteria during the primer design stage based on the ability to design primers with high specificity in the short flanking region around the SNPs, and under a consensus-melting temperature for all pairs in a multiplex environment.
Presented below is a summary of each major Native American sub-hg, their distribution throughout the Americas, as well as the SNP sites chosen to represent the hg and their respective sub-hgs. The representative SNPs typed in the AmericaPlex26 are given in parentheses and a simplified tree illustrating the phylogenetic relationship is shown in Figure 1.
Haplogroup A2 (G12007A) is found throughout the Americas, but its derivatives A2a (C3330T) and A2b (T11365C) are mainly found in the Northern parts of North America in Inuit, Na-Dené and Siberian populations such as Koryaks and Chukchi [22,[28][29][30], whereas particular subgroups of A2a were also reported from Athapaskan territories in the Southwest [30]. A16265G (defining A2b in [22]) was also added to the assay as it was further resolved to represent sub-hg A2b1, which can be found in Eskimoanspeaking populations (such as the Inuit and Yupik) across the Arctic [31].
Haplogroup C is represented in the Americas by sub-hgs C1b, C1c and C1d [22,32], whereas sub-hgs C1a and C1e are Siberian/East Asian and European sister-clades, respectively. Subhg C1b (A493G) can be found throughout South America. Sub-hg C1c is most frequent in Mexico [22] and was split into sub-hgs C1c1a (A12978G) and C1c2 (C14356T), since the immediate flanking region of the two C1c SNPs defining (G1888A and G15930A) were not suitable for primer design. A recent study by Perego et al. [22] has further resolved Central American sub-hg C1d (A16051G), which now includes sub-hg C1d1 (G7697A). Minor Pan-American hg C4c (C14433T) was recently discovered in an ancient sample from British Columbia, and was found to be one of the founding lineages of the Americas based on coalescent age estimates [33,34].
Haplogroup X2a (A8913G) has only been found in a limited number of samples in North American populations as compared to those of A2, B2, C1 and D1, and is therefore described as minor founding lineage in this paper [8,35].
Lastly, SNP site T14783C was included as control to define macro-hg M, which encompasses hgs C and D. In contrast, this SNP retains the ancestral state in hgs A, B and X, which belong to macro-hg N.
Primer and probe design PCR and SBE primers were designed and quality-controlled using default settings and features in the software package Geneious v5.2 (Geneious version (5.2) created by Biomatters. Available from http://www.geneious.com/) and Batchprimer3 v1.0 [38], both based on the program primer3 [39], generally following the guidelines set out in Sanchez et al. 2006 [40]. Amplicon sizes were deliberately kept smaller than 90 bp in length to allow amplification of highly fragmented DNA as typical in forensics and ancient DNA studies. Given how short the flanking regions of each SNP were, which already constrained our selection of suitable SNPs, we could not consider potential polymorphic sites in these areas nor nuclear insertions, and relied on empirical testing of PCR primer efficiency. SBE primers were then ranked according to quality score and orientation (forward or reverse) for efficient use of fluorescent dyes and fragment length spacing. The latter was adjusted to 4 bp by adding poly-CT tails to the 59end of each SBE primer (Table 1) [41].

Ethics statement
All necessary permits were obtained for the described study, which complied with all relevant regulations. Permissions to  [52]) and typing scheme of the 26 SNPs targeted in the AmericaPlex26. Sub-haplogroups, which can be unambiguously assigned, are shown in blue (blue). Basal hgs, which cannot be unambiguously assigned, are also shown (black) in order to illustrate the phylogenetic relationship, but also the inherent limitations of our assay. The phylogenetic position of the revised Cambridge Reference Sequence (rCRS [53]) is indicated within macro-hg N. SNPs with a SBE primer in reverse direction targeting the opposite strand are given in Italics. Sample preparation, DNA extractions and PCR amplification from ancient samples were performed at the Australian Centre for Ancient DNA in Adelaide, Australia, applying established methods and authentication criteria as described previously [26,42,43]. In brief, we used an in-house silica extraction method, detailed in [26], to extract DNA from two independent samples per individual. PCR amplifications from each extract and direct sequencing of the HVR-I were performed using four overlapping primer pairs with reaction conditions described in [42,44]. Details of the four primer pairs are given in (Table 2).

Multiplex PCR amplification
PCR amplifications were carried out in a final reaction volume of 12.5 ml consisting of 0.5 mL DNA sample (3 mL for ancient DNA), 1x PCR Gold Buffer, 6.5 mM MgCl 2 , 0.1 U AmpliTaq Gold DNA polymerase (all Applied Biosystems) 1.25 mM dNTP solution (Bioline Pty Ltd), (0.8 mg RSA for ancient DNA samples), and a primer mix consisting of 26 primer pairs, with concentrations given in Table 1. PCR was carried out on a Tetrad 2 Peltier Thermal Cycler (Bio-Rad Laboratories) using the following conditions: 95uC for 6 min and 30 cycles (45 cycles for ancient DNA samples) of 95uC for 30 s, 55uC for 30 s, 65uC for 30 s, and a final extension time at 65uC for 6 min. Amplification success was monitored via gel electrophoresis on an 3.5% agarose gel (100 V for 40 min; Hyperladder V DNA size ladder (Bioline Pty Ltd)). PCR products were purified by mixing 5 ml of PCR reaction with 1 U ExoSAP-IT (Thermo Fisher Scientific Australia Pty Ltd), followed by incubation at 37uC for 50 min, 80uC for 15 min and 15uC for 10 min. Single Base Extension reactions consisted of a final volume of 5 mL containing 1 mL PCR product, 2.5 mL SNaPshot ready reaction mix (Applied Biosystems), and 0.5 mL extension primer mix (individual concentrations are given in Table 1). Thermocycling of the SBE reactions was performed in a Tetrad 2 Peltier Thermal Cycler (Bio-Rad Laboratories) with the following conditions: 96uC for 10 s; followed by 35 cycles of 55uC for 5 s and 60uC for 30 s. SBE products were purified by adding 1 U Shrimp Alkaline Phosphatase (Thermo Fisher Scientific) to the reaction solution and incubating it at 37uC for 50 min, 80uC for 15 min and 15uC for 10 min.
Capillary electrophoresis was performed on a 3130 xl Genetic Analyser (Applied Biosystems) using POP-6 polymer and a customised run module, by adding 1 mL sample DNA to 18.5 mL Hi-Di Formamide and 0.5 mL GeneScan-120 LIZ internal size standard (Applied Biosystems). Electropherograms were analysed using the software Genemapper ID version 3.2.1 software (Applied Biosystems) applying custom panel and bin settings available on request.

Sensitivity tests
We performed sensitivity studies using serial dilutions of 1, 1:10, 1:100, 1:1000 and 1:10,000 of DNA from a buccal swab sample from a lab member (AC). The mtDNA copy number of the modern sample was determined through qPCR using the SYBR-Green kit (Qiagen), targeting a short 77 bp fragment of human mitochondrial DNA with primer pair L13258 and H13295 [45]. Serial dilutions were treated as separate samples, and each sample was analysed in triplicate. The qPCR reaction was performed in a total reaction volume of 10 mL consisting of 1 mL of each sample dilution, 2x Brilliant SYBR Green Master Mix and 0.1 mM of each primer. The qPCR were carried out on a Rotor-Gene Q Real-Time PCR cycler (Qiagen) with thermocycling conditions as follows: 95uC for 5 min, followed by 45 cycles of 95uC for 10 s, 58uC for 20 s and 72uC for 20 s.

Results and Discussion
Optimization of the multiplex protocol The AmericaPlex26 assay was initially tested with default concentrations of 0.017 mM for each primer (3 mL of 25 mM stock) and 0.015 mM for each SBE primer (3 mL of 50 mM stock) to assess the generic efficiency of primers or probes when used in the multiplex assay. Twenty-two out of 26 SNP sites could be readily amplified, albeit with highly variable peak heights across the assay. Primers and probes for the four problematic SNP sites were each tested in singleplex PCR and SNaPshot reactions to ensure they performed individually as expected. If the SNP fragment were successfully amplified in the singleplex PCR, concentrations of the primer would be doubled in the following multiplex PCR reaction mix.
We chose 3000 relative fluorescence units (rfu) as a default average peak height based on the ancestral allele status observed in our European modern control sample (AC), and calculated the percentage difference between peaks and the 3000 rfu average. Multiplex primer concentrations were adjusted according to this percentage difference to allow amplification of problematic SNP sites and to balance the peak heights of those that did amplify. Based on poor performance of the primer pair chosen to amplify the C1b SNP site (A493G), we performed a second round of balancing primer concentration with a new primer pair for this site.
To further refine the balance in peak height, the concentration of some SBE extension primers was adjusted to the final recommended concentrations given in Table 1. Changes to probe concentrations resulted in a more balanced electropherogram and amplification of all 26 SNP sites using modern buccal swab and ancient DNA samples (Figure 2).

Sensitivity studies
The amount of mitochondrial DNA was measured for a modern sample (1,171,699 copies/mL) and four serial dilutions of 1:10, 1:100, 1:1000 and 1:10,000 using real-time quantitative PCR (Figure 3). A near complete SNP profile could be observed for serial dilutions up to 28,278 copies/mL DNA (1:100), which is similar to other published multiplex assays [23,24,46].

Method application on ancient samples
The AmericaPlex26 assay was tested on ancient samples from three successive pre-Columbian cultures from the Huaca Pucllana archaeological site in Lima, Peru. They included samples from the PLOS ONE | www.plosone.org Table 1.    Early Intermediate (n = 20; 200-600 AD), the Middle Horizon (n = 20; 600-1000 AD) and the Late Intermediate (n = 12; 1000-1476 AD) [47] plus an Early Medieval European samples as control for the ancestral state. Samples from each period varied in the state of preservation, due to differences in mortuary customs. From our test dataset of 52 samples in total, we were able to unambiguously type 29 samples (56%) ( Table 3, Figure 4). A typing result was considered reliable when two samples from the same individual could be unambiguously assigned to the same subhg in two independent experiments. We subsequently compared the AmericaPlex26 assay results to our previous attempts at amplifying and sequencing the mitochondrial HVR-I with four overlapping primer pairs and found that the AmericaPlex26 assay improved the typing efficiency from ancient samples (Table 3, Figure 4). For example, the Amer-icaPlex26 assay allowed reliable SNP typing for eleven Early Intermediate (55%) and ten Late Intermediate samples (83%), whereas HVR-I sequencing gave reliable sequence haplotypes for seven (35%) and seven (58%) samples, respectively. For example, HVR-I sequencing for samples 10802A and 10803A failed, while the AmericaPlex26 assay revealed specific hg B2 (Table 3). Samples from the Middle Horizon culture were in general less well preserved, resulting in eight consensus sub-hg calls (40%) using the AmericaPlex26 assay, whereas HVR-I sequencing did not produce any reliable sequence haplotype from the sample replicates (0%). This highlights the genotyping power of our assay when dealing with challenging samples.
Importantly, SNP typing with the AmericaPlex26 assay also gave a higher resolution compared to traditional HVR-I sequencing. For example, samples 10809A and 10810A of the Early Intermediate period were assigned to the major hg C1 by HVR-I sequencing, yet the AmericaPlex26 assay allowed further resolution to sub-hg C1b. In addition, while many of the HVR-I results from Late Intermediate samples remained tentative, i.e. non-reproducible, the AmericaPlex26 assay provided reliable and specific sub-hgs for both replicates (Table 3, Figure 4). Taking all results together, the AmericaPlex26 assay showed a significantly higher success rate when compared to the standard HVR-I sequencing (p,0.0001, Wilcoxon matched-pairs signed rank test; Figure 4). This is likely due to the difference in amplicon sizes between the two methods.

The overall effectiveness of the assay
Overall, the effectiveness of the multiplex SNaPshot method in analysing ancient DNA lies in the fact that it only requires minimal flanking regions either side of the SNP, which in theory allows the design of very short overall amplicon sizes [40,48], 56-90 bp in    [15,49,50]. As such, SNaPshot typing is able to generate results for samples for which traditional sequencing methods often fail with ancient and/or degraded DNA, as they require longer fragment lengths to be cost-effective [51]. Multiplexing also allows the combination of many informative SNP sites into one reaction, which are otherwise spread across longer sequence regions. On its own, the multiplex PCR and SNaPshot method is time-and costeffective, and requires substantially smaller amounts of valuable DNA extract compared to HVR-I sequencing, as fewer individual reactions are needed from preparation of the multiplex PCR to capillary electrophoresis [48,50]. We show that the AmericaPlex26 can be used to complement or expand upon standard mtDNA sequencing approaches for ancient Native American populations, and especially for ancient samples where DNA preservation does not allow amplification of longer (. 100 bp or more) DNA molecules. It efficiently and economically targets characteristic SNPs from the coding region of mtDNA [15] in order to corroborate HVR-I sequencing results and to define a particular sub-hg [14]. Alternatively, the AmericaPlex26 can be used in addition to global mtDNA SNP multiplexes, such as the GenoCore22 and others [23,24]. Moreover, it is flexible enough to add newly discovered SNPs/lineages in order to enhance subregional resolution (see e.g. [32,35]. In our experience, the new method provided an extremely useful one-reaction test to screen larger numbers of degraded samples allowing the assessment of the general state of preservation, the authenticity of the result (i.e. absence of potential contaminating lineages), while at the same time allowing a categorisation of potentially interesting sub-hgs. We are currently using this approach in order to further dissect the phylogenetic resolution via DNA library creation and targeted mtDNA enrichment and Next Generation Sequencing [25,26].

Conclusions
We present a powerful, optimized SNP assay, which allows unambiguous typing of Native American mtDNA 'founder lineages' and additional SNPs for further resolution. This shortamplicon AmericaPlex26 assay is highly efficient, time and costeffective compared to classical HVR-I sequencing, and allows highly resolved SNP typing of degraded DNA samples in forensic and ancient DNA work. It is suitable as a qualitative 'screening' method to identify samples with sufficient DNA preservation, free of contaminants that complicate full mitochondrial sequencing (and beyond) via Next Generation Sequencing techniques.