Conceived and designed the experiments: VAI AMM MC AS. Performed the experiments: VAI AMM MC BQ. Analyzed the data: VAI AMM MC AS. Contributed reagents/materials/analysis tools: MTZ IC MVL OG LPJ AC AS. Wrote the paper: VAI AMM MC AS.
The authors have declared that no competing interests exist.
R0 embraces the most common mitochondrial DNA (mtDNA) lineage in West Eurasia, namely, haplogroup H (∼40%). R0 sub-lineages are badly defined in the control region and therefore, the analysis of diagnostic coding region polymorphisms is needed in order to gain resolution in population and medical studies.
We sequenced the first hypervariable segment (HVS-I) of 518 individuals from different North Iberian regions. The mtDNAs belonging to R0 (∼57%) were further genotyped for a set of 71 coding region SNPs characterizing major and minor branches of R0. We found that the North Iberian Peninsula shows moderate levels of population stratification; for instance, haplogroup V reaches the highest frequency in Cantabria (north-central Iberia), but lower in Galicia (northwest Iberia) and Catalonia (northeast Iberia). When compared to other European and Middle East populations, haplogroups H1, H3 and H5a show frequency peaks in the Franco-Cantabrian region, declining from West towards the East and South Europe. In addition, we have characterized, by way of complete genome sequencing, a new autochthonous clade of haplogroup H in the Basque country, named H2a5. Its coalescence age, 15.6±8 thousand years ago (kya), dates to the period immediately after the Last Glacial Maximum (LGM).
In contrast to other H lineages that experienced re-expansion outside the Franco-Cantabrian refuge after the LGM (e.g. H1 and H3), H2a5 most likely remained confined to this area till present days.
Haplogroup R0, formerly known as pre-HV
By way of complete genome sequencing, Achilli et al.
On the other hand, the mtDNA phylogeny needs continuous updating in order to ease future population and phylogenetic studies (e.g.
The goals of the present study are: i) provide new insights into the distribution and population variability of haplogroup H sub-lineages in North Iberia to a high level of phylogenetic resolution; ii) resolve the many existing conflicts in the nomenclature and phylogeny of R0 that nowadays represent a challenge for future inter-population studies; iii) refine the phylogeny of R0 by way of inspecting the existing mtDNA complete genomes (plus coding region segments) available in the literature and GenBank (>1,100); and iv) contribute to enrich the known phylogeny of haplogroup H at the level of complete genome sequencing, by characterizing a new autochthonous clade observed in the Basque Country, namely H2a5.
We have collected samples from three main North Iberia regions. A total of 282 healthy unrelated individuals were obtained from Galicia (northwest Iberia) (which is an independent sample to the one reported in
Oral informed consent was required for the samples collected in Galicia and Cantabria, and all of them were anonymized. Written informed consent was required for the samples collected in Catalonia and were also anonymized; then, DNA extracts were submitted to the laboratory in Santiago de Compostela were the genotyping was carried out. In addition, the study was approved by the Ethical committee of the University of Santiago de Compostela. The study conforms to the Spanish Law for Biomedical Research (Law 14/2007- 3 of July).”
All the samples from Galicia, Cantabria, and Catalonia were sequenced for the HVS-I region (
The protocol for PCR amplification and automatic minisequencing is fully described in
MtDNA variation is referred to the revised Cambridge Reference Sequence
We have used the mtDNA tree as a reference to avoid as much as possible artefactual profiles and documentation errors in mtDNA sequences and in SNP genotypes
DnaSP 4.10.3 software
The geographical representations of haplogroup frequencies were obtained using Surfer 8.0 (
R0 and its different sub-lineages are the main focus of the present article; however, there are only few studies focusing on the internal variability of R0 suitable for population comparisons
Estimation of the time to the most recent common ancestor of each cluster and SDs were carried out according to Saillard et al.
R0 differs from R* by lacking A73G and G11719A. R0 contains haplogroup HV which likewise embraces the most common haplogroup in Europe, H, but also haplogroup HV0a (where haplogroup V is nested) and some other minor branches such as HV1 and HV2. Within haplogroup H, there are at least 25 sub-haplogroups; many of them can be further sub-divided into minor branches.
MtDNA coding region SNP genotyping has been designed here with the aim of covering as much as possible the R0 phylogeny; given priority to those SNPs representing the most frequent sub-lineages, and also those characterizing branches that do not have any known diagnostic polymorphism in the control region. SNP selection in the present study considers the full set of SNPs reported in Brandstätter et al.
When selecting mtDNA SNPs, it called our attention the many inconsistencies existing in the nomenclature of haplogroup H and its sub-lineages. One of the aims of the present study was therefore to resolve these nomenclature conflicts in order to ease inter-population genetic studies. These problems and the rationale to determine new sub-branches of R0 are shown in
An expanded view of the haplogroup H phylogeny is shown in
See legend of
The advantages of using a minisequencing multiplex genotyping procedure
The three North Iberian samples analyzed in the present study show a typical West European mtDNA haplogroup composition (
All the HVS-I profiles obtained were searched among datasets compiled from the literature (more than 83.000 profiles) but only considering the common sequence range from position 16090 to 16365. A total of ∼5%, ∼10%, and ∼14% of the mtDNAs from Cantabria, Galicia and Catalonia, respectively, were still not observed in the literature. Catalonia shows the highest levels of sequence diversity, followed by Galicia and Cantabria (see also below and
HG | Population | π±SE | ||||||||||
Galicia |
282 | 150 (0.53) | 93 | 102 | 0.952±0.010 | 0.0138±0.001 | 3.76 | 5.06 | 0.012 | −2.328 |
−4.727 |
|
Catalonia |
101 | 79 (0.78) | 71 | 73 | 0.984±0.007 | 0.0166±0.001 | 4.59 | 6.29 | 0.014 | −2.187 |
−3.557 |
|
Cantabria |
135 | 61 (0.45) | 60 | 62 | 0.971±0.007 | 0.0135±0.001 | 3.72 | 3.85 | 0.018 | −2.099 |
−2.596 |
|
Galicia |
124 | 51 (0.41) | 49 | 50 | 0.800±0.038 | 0.006±0.001 | 1.73 | 2.08 | 0.035 | −2.528*** | −4.447 |
|
Catalonia |
44 | 30 (0.68) | 33 | 33 | 0.937±0.030 | 0.009±0.001 | 2.48 | 1.93 | 0.043 | −2.300 |
−3.836 |
|
Cantabria |
52 | 26 (0.50) | 25 | 26 | 0.875±0.042 | 0.006±0.001 | 1.78 | 1.33 | 0.067 | −2.251 |
−2.480 |
|
Volga-Ural |
50 | 18 (0.36) | 17 | 18 | 0.819±0.049 | 0.006±0.001 | 1.61 | 1.39 | 0.050 | −1.884 |
−1.966 | |
Finland |
31 | 16 (0.52) | 15 | 16 | 0.908±0.035 | 0.009±0.001 | 2.42 | 1.53 | 0.092 | −1.338 | −1.083 | |
Estonia |
50 | 31 (0.62) | 30 | 31 | 0.936±0.026 | 0.009±0.001 | 2.54 | 2.31 | 0.035 | −2.114 |
−3.113 |
|
Slovakia |
50 | 30 (0.60) | 31 | 30 | 0.939±0.027 | 0.009±0.001 | 2.49 | 2.23 | 0.045 | −2.090 |
−2.455 |
|
France |
50 | 19 (0.38) | 17 | 19 | 0.762±0.063 | 0.005±0.001 | 1.31 | 1.33 | 0.097 | −2.187 |
−2.569 |
|
Balkans |
50 | 31 (0.62) | 30 | 31 | 0.953±0.018 | 0.009±0.001 | 2.52 | 1.77 | 0.053 | −2.120 |
−2.852 |
|
Turkey |
50 | 31 (0.62) | 27 | 31 | 0.914±0.032 | 0.008±0.001 | 2.13 | 1.59 | 0.055 | −2.311 |
−3.113 |
|
Near East |
50 | 36 (0.72) | 30 | 36 | 0.943±0-023 | 0.009±0.001 | 2.56 | 2.08 | 0.040 | −2.301 |
−4.097 |
|
Asia |
48 | 29 (0.60) | 26 | 29 | 0.947±0.019 | 0.010±0.001 | 2.89 | 2.37 | 0.029 | −1.962 |
−2.261 | |
Eastern Slavs2 | 50 | 30 (0.60) | 31 | 30 | 0.944±0.023 | 0.009±0.001 | 2.35 | 1.67 | 0.057 | −2.162 |
−3.280 |
|
Arabian Peninsula |
52 | 29 (0.56) | 30 | 30 | 0.947±0.017 | 0.008±0.001 | 2.32 | 1.34 | 0.074 | −2.153 |
−3.050 |
|
Armenia |
54 | 27 (0.50) | 33 | 33 | 0.914±0.031 | 0.009±0.001 | 2.53 | 2.35 | 0.030 | −2.158 |
−1.685 | |
Daghestan |
60 | 26 (0.43) | 33 | 33 | 0.859±0.042 | 0.008±0.001 | 2.17 | 2.28 | 0.023 | −2.268 |
−2.323 | |
Georgia |
30 | 15 (0.50) | 16 | 16 | 0.874±0.050 | 0.008±0.001 | 2.11 | 2.12 | 0.031 | −1.617 | −0.682 | |
Jordan |
33 | 18 (0.55) | 25 | 25 | 0.847±0.062 | 0.008±0.001 | 2.24 | 2.30 | 0.024 | −2.227 |
−2.586 |
|
Karatchaians-Balkanians |
50 | 21 (0.42) | 23 | 23 | 0.943±0.017 | 0.012±0.001 | 3.23 | 2.00 | 0.059 | −1.202 | 0.411 | |
Lebanon |
34 | 20 (0.59) | 23 | 23 | 0.907±0.041 | 0.008±0.001 | 2.09 | 1.88 | 0.061 | −2.171 |
−3.548 |
|
Northwest Caucasus |
69 | 35 (0.51) | 38 | 38 | 0.895±0.034 | 0.009±0.001 | 2.42 | 2.70 | 0.026 | −2.256 |
−2.953 |
|
Ossetia |
45 | 22 (0.49) | 26 | 27 | 0.883±0.002 | 0.009±0.001 | 2.58 | 2.84 | 0.029 | −1.950 |
−2.445 | |
Syria |
28 | 19 (0.68) | 23 | 23 | 0.966±0.019 | 0.009±0.001 | 2.38 | 1.38 | 0.098 | −2.139 |
−2.667 |
|
Turkey |
90 | 46 (0.51) | 44 | 46 | 0.898±0.029 | 0.008±0.001 | 2.24 | 2.10 | 0.037 | −2.408 |
−2.957 |
|
Austria |
964 | 116 (0.12) | 75 | 81 | 0.683±0.017 | 0.005±0.001 | 1.15 | 1.07 | 0.041 | −2.468*** | −5.322 | |
Germany |
28 | 20 (0.71) | 20 | 20 | 0.952±0.030 | 0.010±0.001 | 2.73 | 1.88 | 0.042 | −1.657 | −1.116 | |
Hungary |
55 | 15 (0.27) | 22 | 22 | 0.677±0.070 | 0.006±0.001 | 1.64 | 2.61 | 0.073 | −2.059 |
−2.160 | |
Macedonia |
88 | 30 (0.34) | 28 | 29 | 0.892±0.025 | 0.007±0.001 | 2.01 | 1.84 | 0.058 | −2.000 |
−1.707 | |
Romania |
100 | 29 (0.29) | 29 | 29 | 0.917±0.017 | 0.009±0.001 | 2.48 | 2.04 | 0.034 | −1.690 | −2.160 |
π = nucleotide diversity and standard error.
Statistical significance: *,
Present study.
Loogväli et al.
Roostalu et al.
Brandstätter et al.
A small percentage of the total mtDNAs analyzed belonged to non-Eurasian lineages. Thus, several sub-Saharan mtDNA profiles were detected in Galicia (∼2.5%) and Catalonia (∼3%); none in Cantabria. Curiously, six out of the ten sub-Saharan haplotypes observed belong to haplogroup L1b; this clade originated in western Africa but it was also carried to America during the period of the Atlantic slave trade
Some typical Native American profiles were also observed in the Catalonian sample. For instance, the haplogroup D1 profile T16189C C16223T T16325C T16362C (excluding the ‘speedy’ transversion A16183C) is commonly found in South America
In Catalonia we have also observed one rare East Asian profile, C16104T C16111T T16140C A16162G 16169+C A16182C A16183C T16189C C16228T C16234T T16243C, belonging to B5b. Members of this haplogroup appear frequently in Japan, Taiwan, Korea, etc.
The presence at low frequency of non-western European lineages in Catalonia could be explained by recent gene flow because it is well-known that this region has received important flow of immigrants in the last decades; more than Galicia and Cantabria.
Several diversity indices were computed for the three North Iberian samples analyzed in the present study (
The patterns of variability within haplogroup H are quite different around Europe and Middle East. For instance, Galicia shows one of the lowest sequence diversity values within Eurasia (
Both the Tajima's
Using the SNP genotyping strategy described above, less than 10% of the lineages within haplogroup H could not be allocated to some of the already known H sub-branches (
The distribution of haplogroup frequencies along the North Iberian fringe shows patterns moderately stratified.
On average, ∼42% of the mtDNAs in the total sample belongs to haplogroup H; the Galician sample reaches the highest frequency (∼44%), and it is slightly lower (∼39%) in Cantabria and Catalonia.
H* represents 15% and 10% of the total haplogroup H lineages in Catalonia and Galicia, respectively, but only 4% in Cantabria. H1a (without counting H1a1 and H1a2) represents 8% of the haplogroup H mtDNAs in Cantabria, but it makes-up only 2% in Galicia and 0% in Catalonia. Again with respect to haplogroup H, H1 is more frequent also in Cantabria (46%) than in Galicia (38%) and Catalonia (36%); whereas haplogroup V has a clear peak in Cantabria, ∼16% of the total R0 haplotypes (but only ∼9% in Galicia and ∼6% in Catalonia).
The maps of
Dots in the map of H* indicate the location of the populations used. Codes for populations are:
In addition, haplogroups H1, H3, and H5a display clinal patterns as determined by their spatial correlograms (
It was first notice in a study by Pereira et al.
A scrutiny of more than 5,500 coding region segments (most of them available in GenBank and some only in the literature) and in Google searches (
This analysis revealed a new sub-clade of haplogroup H, baptized here as H2a5. All these sequences share the following diagnostic variants: A1842G C4592T G13708A C16291T T16519C (
Analysis of mtDNA variation based exclusively on few RFLP markers and/or the HVS-I region have lead in the past to simplistic perceptions of Europe as a uniform population. The results presented in previous studies
A total of 518 samples from three main locations in North Iberia were sequenced for the HVS-I segment. About 55% of them could be ascribed to R0. All these samples were further screened for a set of 71 coding region SNPs in order to sub-classify them into different R0 sub-clades. As indicated by the various diversity indices computed, Galicia and Cantabria show low diversity values, especially for the overall haplogroup H. The present study also revealed moderate levels of stratification in North Iberia, which could be relevant in other fields of research, such as in forensic casework
When compared to other European and Middle East populations, we observed geographical patterns for H1, H3 and H5a that are statistically clinal, with frequency peaks in the Franco-Cantabrian region decreasing towards East Europe. This is compatible with a process of demographic repopulation of Europe after the LGM period centered in this climatic and geographic refuge, as it was previously demonstrated by Torroni et al.
We have also described a new minor autochthonous clade in Basques, H2a5. This lineage has been dated in 15.6±8 kya; this age fits also with the period of population expansion that followed the LGM (although with a large standard error). However, this branch was exclusively found in the Basque country at a significant frequency (∼6%). The absence of this clade in other parts of Europe could be due to the limited sample size still available in the literature; however, we can speculate with the fact that all the evidences taken together resemble the findings of Torroni et al.
Genotyping protocols.
(0.27 MB DOC)
Reconciliation of the nomenclature conflicts in haplogroup R0.
(0.14 MB DOC)
Note about the advantages of using minisequencing high throughput SNP genotyping and report of the phylogenetic inconsistencies observed in the data from North Iberia.
(0.04 MB DOC)
Compendium of the problems related to the nomenclature of the R0 phylogeny and update of the nomenclature.
(0.05 MB XLS)
HVS-I and coding region SNP variation for the Iberian samples analyzed in the present study.
(0.71 MB XLS)
Comparative population frequencies of different haplogroup H (sub)lineages. In bold we collapse frequencies into higher hierarchical phylogenetic clades as a function of the SNPs genotyped in the referred studies, such that only these ‘bolded’ categories are fully comparable between the different studies considered. This is because haplogroup categories are not fully comparable among populations when the samples have undetermined (nd) SNPs; for instance, H* embraces different lineages in our study because we genotyped to a higher level of resolution than previous attempts (where different lineages were already collapsed into H*). For nomenclature we follow the scheme of
(0.08 MB XLS)
Autocorrelograms for the most frequent R0 sub-clades observed in North Iberia.
(0.12 MB PPT)
We would like to thank Francesc Calafell and Oscar Lao for their help with the Surfer software and the spatial representations. The eight complete mtDNA genomes analyzed in the present study are available in GenBank under accession numbers FJ527772–FJ527779.