Revising mtDNA haplotypes of the ancient Hungarian conquerors with next generation sequencing

As part of the effort to create a high resolution representative sequence database of the medieval Hungarian conquerors we have resequenced the entire mtDNA genome of 24 published ancient samples with Next Generation Sequencing, whose haplotypes had been previously determined with traditional PCR based methods. We show that PCR based methods are prone to erroneous haplotype or haplogroup determination due to ambiguous sequence reads, and many of the resequenced samples had been classified inaccurately. The SNaPshot method applied with published ancient DNA authenticity criteria is the most straightforward and cheapest PCR based approach for testing a large number of coding region SNP-s, which greatly facilitates correct haplogroup determination.


Introduction
Comparing ancient DNA (aDNA) sequences extracted from well dated archaeological remains from different periods and locations provide crucial information about past human population history (reviewed in [1]). Phylogeographic inferences are drawn from phylogenetic and population genetic analyses of sequence variations, the quality of which can be biased by data quantity and quality. Nowadays Next Generation Sequencing technology (NGS) provides a growing number of high quality aDNA sequence data, but until recently the majority of aDNA studies have been restricted to short fragments from the hypervariable region-1 (HVR-I) of the mitochondrial DNA (mtDNA) genome, using PCR based methods. PCR based methods are very sensitive for contamination, as low amounts of exogenous DNA can easily dominate PCR products resulting in the recovery of irrelevant sequences [2][3][4][5]. As a result, in spite of the applied authenticity criteria [6], many of the published databases may contain unreliable sequences, which distort statistical analyses. This problem is especially relevant for many of the ancient populations, from which only PCR based HVR data are available. PLOS

NGS library construction
First 50 μl DNA extract was subjected to partial uracil-DNA-glycosylase (UDG) treatment followed by blunt end repair, as described in [13]. DNA was then purified on MinElute column (Qiagen), and double stranded library was made as described in [14], except that all purifications were done with MinElute columns, and after adapter fill-in libraries were preamplified in 2 x 50 μl reactions containing 800 nM each of IS7 and IS8 primers, 200 μM dNTP mix, 2 mM MgCl 2 , 0,02 U/μl GoTaq G2 Hot Start Polymerase (Promega) and 1X GoTaq buffer, followed by MinElute purification. PCR conditions were 96˚C 6 min, 16 cycles of 94˚C 30 sec, 58˚C 30 sec, 72˚C 30 sec, followed by a final extension of 64˚C 10 min. Libraries were eluted from the column in 50 μl 55˚C EB buffer (Qiagen), and concentration was measured with Qubit (Termo Fisher Scientific). Libraries below 5 ng/μl concentration were reamplified in the same reaction for additional 5-12 cycles, depending on concentration, in order to obtain 50 μl preamplified library with a concentration between 10-50 ng/μl. 50 ng preamplified libraries were double indexed according to [15]  Mitochondrial DNA capture and sequencing Biotinilated mtDNA baits were prepared from three overlapping long-range PCR products as described in [16], but using the following primer pairs, L14759-H06378, L10870-H14799, L06363-H10888, described in [10].
Capture was done according to [16] with the following modifications: Just four blocking oligos, given below were used in 3 μM (each) final concentration:

Data analysis
The adapters of paired-end reads were trimmed with the cutadapt software [17] in paired end mode. Read quality was assessed with FastQC [18]. Sequences shorter than 25 nucleotide were removed from this dataset. The resulting analysis-ready reads were mapped to the GRCh37.75 human genome reference sequence using the Burrows Wheeler Aligner (BWA) v0.7.9 software [19] with the BWA mem algorithm in paired mode and default parameters. Aligning to the GRCh37.75 human reference genome that also contains the mtDNA revised Cambridge Reference Sequence (rCRS, NC_012920.1) [20] helped to avoid the forced false alignment of homologous nuclear mitochondrial sequences (NumtS) to rCRS, though the proportion of NumtS, derived from low copy nuclear genome, is expexted to be orders of magnitudes lower than mtDNA in aDNA libraries. Samtools v1.1 [21] was used for sorting and indexing BAM files. PCR duplicates were removed with Picard Tools v 1.113 [22]. Ancient DNA damage patterns were assessed using MapDamage 2.0 [23], and read quality scores were modified with the rescale option to account for post-mortem damage. Freebayes v1.02 [24] was used to identify variants and generate variant call format (VCF) files with the parameters -q 10 (exclude nucleotids with <10 phred quality) and -P 0.5 (exclude very low probability variants). Each variant call was also inspected manually. From VCF files FASTA format was generated with the Genom Analysis Tool Kit (GATK v3.5) FastaAlternateReferenceMaker walker [25].

NGS sequencing
We have sequenced 24 complete mtDNA genomes of the ancient Hungarians with multiple coverage (Table 1) without gaps and determined the haplotypes of the individuals ( Table 2 and Data refer to paired-end sequences from UDG treated libraries. The Szegvá r-Oromdü lő/412/anc5 and Karos-III/11 samples were sequenced twice from tooth and femur with identical results, then these sequence reads were merged, and statistics are given for the merged reads.  Table). For two samples (Table 1) we have replicated the experiments from two independent extracts, one from bone another from tooth derived from the same individual, and in each case received identical sequence reads. UDG treated and non UDG treated libraries derived from the same extract also gave the same sequence reads. MapDamage profile of our partial UDG treated and control non treated library molecules displayed typical aDNA damage distribution (S1 Fig), as described in [13]. MapDamage computed proportions of sequence reads with aDNA specific C-to-T and G-to-A transitions at the ends of molecules which remained after partial UDG treatment are shown in Table 1. The average length of the obtained mtDNA fragments ranged from 56 to 85 bp (Table 1), an expected size range for aDNA [26]. These data indicated that the majority of sequences were derived from endogenous DNA molecules. Then we have estimated the percentage of possible contaminating molecules (Table 1) with a similar logic as in [27], by calculating the proportion of reads which did not correspond with the diagnostic positions of the consensus sequence given in S1 Table, which revealed very low contamination levels. Phylogenetic analyses (HaploGrep 2, [28]) of all consensus sequences resulted unambigous classifications without contradictory positions. Consensus sequences were submitted to NCBI GenBank under Accession No: KY083702-KY083725.
In NGS sequence reads typical aDNA sequence alterations, present in individual molecules, are disclosed and excluded by averaging multiple reads. Moreover aDNA specific sequence alterations, primarily C-T and G-A transitions accumulating at the end of molecules, serve as markers to distinguish ancient molecules from contaminating modern DNA. Therefore NGS eliminates most sequencing uncertainties inherent in PCR based methods (reviewed in [29]), resulting in very reliable sequence reads. So we could use our NGS data to reevaluate and compare previous haplotyping strategies used in [7][8][9]. For this end, from our NGS data, we collected all SNP-s within the HVR stretches and coding region positions, which had been examined in [7] and [9], then contrasted these with the original dataset (Table 2).

Contrasting NGS and PCR based sequence data
We found that in [7] haplotypes of 5 out of 9 samples were determined correctly, while in one sample haplogroup was correct with inaccurate haplotype, and in 3 samples NGS detected entirely different haplogroups. In the 15 samples of [9] the same haplogroups were assigned from NGS data in all cases, however only 8 haplotypes proved to be correct. In both studies the majority of deviations originated from undetected SNP-s in sequencing reactions of PCR fragments, but [7] also identified 3 SNP-s erroneously (lined through nucleotide positions in Table 2). These results indicate that haplotypes from both studies were rather unreliable, but haplogroup classification with the approach of [9] is more trustworthy than with approach used in [7].

Discussion
As multicopy mtDNA is best preserved in archaeological remains than low copy nuclear DNA, most ancient sequences are derived from mitochondria [30]. Within mtDNA, the most polymorphic HVR control region contains outstanding phylogenetic information, therefore HVR sequencing has been the primary method of choice for mtDNA hapolotyping. However HVR polymorphisms have a limited reliability for haplogroup determination, therefore in addition several informative coding region SNP-s (CR-SNP) were selected to unambiguously define haplogroups [31]. At the beginning individual CR-SNP-s were determined with RFLP [32] or direct sequencing of PCR clones, but soon multiplex PCR combined with the SNaPshot technique [33] offered a more straightforward solution for identifying multiple SNP-s simultaneously. Latter method was soon adapted in the ancient DNA field [34] [10].
Determining individual CR-SNP-s separately is very time consuming and expensive, so it is tempting to test just those CR-SNP-s which are in line with HVR-I data. This is exactly what we read in [7]: "In cases when haplogroup categorization was not possible on the basis of HVSI motifs alone, analysis of the diagnostic polymorphic sites in the HVSII region and mtDNA coding region was also performed." A major problem with this approach is the ambiguity of sequence reads derived from aDNA PCR clones, as amplification typically starts from a mixture of endogenous and contaminating human DNA molecules [3]. Erroneous HVR reading will lead to inappropriate CR-SNP selection, and in case of dubious CR-SNP results, false Hg classification. This is the most probable explanation of the 3 incorrectly defined haplogroups in [7] ( Table 2). A major advantage of the GonoCore22 SNaPshot assay is that all Hg specific CR-SNP-s are examined irrespectively of HVR reads. The 22 CR-SNP alleles independently define a certain Hg, which must correspond with that based on HVR sequence. As both HVR and CR-SNP reads may give ambiguous results, this approach provides a double control for correct Hg designation, but is not immune against incorrect HVR haplotype reads. This is the explanation of correct Hg-s and erroneous haplotypes in [9] ( Table 2).
The problem of ambiguous aDNA sequence reads is demonstrated on Fig 1. In [9] consequently the higher peaks were taken into account, which also matched with the GenoCoRe22 data. However in position 16399 the correct nucleotide is defined by the neglected lower peak (G instead of A, see Table 2), which resulted in incorrect haplotyping. In contrast in the neighboring double peak (16403 in Fig 1), the correct nucleotide is defined by the selected higher peak.
Coding region SNP testing with either RFLP, sequencing or SNaPshot method also suffers from the same problem as demonstrated on Fig 2. After multiplex PCR amplification of 22 mtDNA fragments two separate Single Base Extension (SBE) reactions are performed, and each reveals 11 Hg defining alleles. Both independent SBE reactions shown in Fig 2 contain several double peaks, and one of each must have derived from contamination. Some of these can be excluded from repeated SNaPshot reactions, for example the lower electropherogram excludes the ancestral preHV allele, since it has a single peak (T) in this position. If such exclusion is not possible, the higher peaks are preferably chosen, as the blue peak (G) for Hg B and the green (A) for Hg N on Fig 2. These decisions however must be handled with caution, therefore the presence of the B Hg defining 9 bp deletion also had been confirmed in [9], with singleplex PCR and agarose gelelectrophoresis. In other cases phylogenetic relations are taken into account [35], for example if the preHV allele is derived the HV allele must also be derived, this is why we have considered the lower peak (A) for HV in Fig 2 [9]. The summary of repeated SNaPshot reactions considered together with multiple HVR sequence reads warrants trustable Hg classification.
The studied conqueror samples were excavated between the 1930-90s, and had been handled by a large number of researchers, many with untraceable identity. It follows that these samples were inevitably contaminated during sample collection and storage. Tömöry et al. 2007 [7] collected samples from a large number of cemeteries, and published the ones with best DNA preservation. In spite of careful sampling their available method was error prone. Neparáczki et al. 2016 [9] aimed at characterizing an entire cemetery which limited the ability of sample selection, so in spite of the more reliable method their haplotype determination proved error prone. The lesson from this study is that PCR based haplotypes need to be handled cautiously, which has been well known in the aDNA field [2] [36][37][38]. It also follows that incorrect haplotypes particularly distort sequence based statistical analysis, like Fst statistics or shared haplotype analysis applied in [7,8]. The accumulation of authentical NGS ancient DNA sequence data in databases will greatly facilitate reliable population genetic studies.