Uniparental Markers of Contemporary Italian Population Reveals Details on Its Pre-Roman Heritage

Background According to archaeological records and historical documentation, Italy has been a melting point for populations of different geographical and ethnic matrices. Although Italy has been a favorite subject for numerous population genetic studies, genetic patterns have never been analyzed comprehensively, including uniparental and autosomal markers throughout the country. Methods/Principal Findings A total of 583 individuals were sampled from across the Italian Peninsula, from ten distant (if homogeneous by language) ethnic communities — and from two linguistic isolates (Ladins, Grecani Salentini). All samples were first typed for the mitochondrial DNA (mtDNA) control region and selected coding region SNPs (mtSNPs). This data was pooled for analysis with 3,778 mtDNA control-region profiles collected from the literature. Secondly, a set of Y-chromosome SNPs and STRs were also analyzed in 479 individuals together with a panel of autosomal ancestry informative markers (AIMs) from 441 samples. The resulting genetic record reveals clines of genetic frequencies laid according to the latitude slant along continental Italy – probably generated by demographical events dating back to the Neolithic. The Ladins showed distinctive, if more recent structure. The Neolithic contribution was estimated for the Y-chromosome as 14.5% and for mtDNA as 10.5%. Y-chromosome data showed larger differentiation between North, Center and South than mtDNA. AIMs detected a minor sub-Saharan component; this is however higher than for other European non-Mediterranean populations. The same signal of sub-Saharan heritage was also evident in uniparental markers. Conclusions/Significance Italy shows patterns of molecular variation mirroring other European countries, although some heterogeneity exists based on different analysis and molecular markers. From North to South, Italy shows clinal patterns that were most likely modulated during Neolithic times.


Introduction
Italy has historically been a convenient destination for human populations migrating from Africa, the Middle East and European locations, in part due to the geomorphological characteristics of the Italian Peninsula [1]. These groups settled preferentially on the islands and coastal territories [1] 500,000 years ago (ya), that is, along the Lower Paleolithic, the longest period of human prehistory, which was dominated by the notable diffusion of tools made from flaked stone [2]. Although rich in tools and animal bones, only some of these sites have provided a small quantity of human skeletal remains resembling those from the more recent sites of the Middle Paleolithic, dating to the Riss-Würm interglacial period and part of the succeeding Würm glaciation (circa 120,000 to 36,000 ya). These bones belong to a species named Homo sapiens neanderthalensis. [2] In this long Paleolithic period, navigation across the Mediterranean was probably rare and some present-day islands were accessible across land bridges later covered by the rising sea [3]. During the Upper Paleolithic, from 36,000 to 10,000 ya, the icecap expansion of the Late Glacial Maximum (LGM) pushed southward groups of hunters living in Central European areas [1], and the Neanderthals gave way to the present species of man Homo sapiens sapiens during the final phases of the Würm glaciation. The numerous traces from this period are particularly rich in burials, animal bones and tools, the latter having been worked with increased precision [2]. In the few thousand years of the following Mesolithic period (circa 10,000 to 6,000 ya) the climate continued to grow milder and sites from this period have been found throughout the entire Italian peninsula, being along the coasts in the plains and on the mountains. With experienced the flourishing of a non-Indo-European Nuragic civilization and, then, the Phoenician colonization.
Genetics alone cannot disentangle the extremely complex demography of Italy through history. Some demographic movements have however left signals on uniparental and nuclear markers. Most of the genetic studies targeted local, e.g. [7], or regional, e.g. [8][9][10][11], Italian populations.
For the Y-chromosome, some attempts have been undertaken to analyze Italian variation to a more general scale [12][13][14]. Many studies have analyzed specific haplogroups in the Y-chromosomes, e.g. [15,16], or the mtDNA, e.g. [8,9]. In general, the different studies indicate that the genetic structure of the present Italian population seems to reflect, at least in part, the ethnic stratification of pre-Roman times [14]. Studies carried out in the past appear to show a major North-South cline consistent with archaeological estimates of two distinct processes: the first colonization of the area during the Paleolithic period and the subsequent Neolithic expansion from the Middle East after the last glacial [14]. There is some correspondence between patterns of variation at the Ychromosome and geography. Thus, northern Italy shows similar frequencies as the haplogroups of Central Europe, with prevalence of the western R1-M173 haplogroup compare to the eastern I-M170. In the North, E3b1-M35 and J2-M172 show low frequencies but are more prevalent in the South, which has been interpreted to be a signal of the gene flow coming from Central European Neolithic farmers [17]. R1a1-M17 is rather rare, both in the North, where it probably originates from eastern Europe, and in the South, of possible Greek provenience [17]. Occurrence of J2-M172 Y-chromosomes in Tuscany has been related to the Etruscan heritage of the region (see [17]). The two Italian major islands, Sicily and Sardinia, show a different demographic history. The Y-chromosome variability of Sicily shares a common history with that of southern Italy, enriched by an additional Arab contribution, but also North African and Greek influences [18]. On the other hand, Sardinia has been considered to be a genetic outlier within Europe showing clear signals of founder effects; some scholars suggest that its peoples could be of ancient Iberian origin [19]; recent genetic studies point to genetic contribution coming from southern France [20].
On the other hand, mitochondrial DNA studies show that Italy does not differ too much from other European populations; however, some populations have the same peculiarities and preserve signals of the ancient past demographic event, such as the Tuscans [8,9], or the Ladins [7,21,22]. Recently, patterns of variation observed in haplogroup U5b3 demonstrated for the first time the existence of a North Italian pre-historical human refuge from the hostile Central European regions covered by the ice of the Last Glacial Maximum period [20]; this area, as was also the Franco-Cantabrian region [23][24][25][26], served as a region of European repopulation during the beginning of the Holocene.
The main aim of the present study was comprehensively to analyze the patterns of mtDNA and Y-chromosome variation in Italy. This study differs from previous ones in that: (1) it provides mtDNA data from 12 new sample populations from Italy; (2) we analyzed two linguistic isolates, Ladin and Grecani Salentini, the latter sampled for the first time in this study; (3) we analyzed a sample population from Lucera (Southern Italy) for the first time, a population that according to documentation received an important input of North African immigrants during the thirteenth century; (4) we analyzed the patterns of mtDNA variation in Italy globally, that is, by combining more than 3,700 control region profiles from the literature (41 population samples in total) coupled with the more than 580 new profiles provided here; (5) Ychromosome haplotype and haplogroup patterns are analyzed in parallel with the mtDNA data in order to determine the possible differences that occurred historically in the male versus female demographic movements; and (6) the influx of migrants from Africa (North and sub-Saharan) and other regions is also analyzed using phylogeographic inferences, and also a model of admixture based on haplotypic data and a panel of ancestry informative markers (AIMs).

Ethics statement
Written informed consent was obtained from all sample donors. Analysis of mtDNA sequences was approved by the institutional review boards of the Università Cattolica del Sacro Cuore (Roma). Moreover, the study conforms to the Spanish Law for Biomedical Research (Law 14/2007-3 of July).

Samples
A total of 583 individuals were sampled from along the Italian Peninsula, representing 12 different populations (Figure 1), two of them (Ladin and Grecani Salentini) being linguistic isolates, and the Lucera being a historical enclave of Arabs coming from North Africa. A brief description of these latter three populations is given below.
In the Italian territory, the Alpine arc represents one of the main areas of presence of alloglot populations, some of them biologically isolated for historical and geographic reasons [27]. At the end of the medieval period (,1200 AD) and especially in the valley zone, a first colonization of native peasants began, starting with the use of lands previously exploited only for pasture and the lumber. Successively, with different modalities and under the control of laic and ecclesiastical owners, the colonization process involved migrant nuclei from the Tyrol, Carinthian area and other zones [28]. Currently, the Alpine arc populations are differentiated with a remarkable cultural diversity that is well represented by linguistic elements. Thus, besides the official main languages, numerous minority languages or dialects are also the cultural patrimony of linguistic minorities [27,29]. Ladin is often attributed to be a relic of vulgar Latin dialects associated with Rhaeto-Romance languages. In the vast multi-ethnic Holy Roman Empire, and then after 1804 the Austrian empire, the Ladins were left in relative peace and were allowed to continue the use of their language and culture.
Grecani Salentini is a Hellenic-speaking linguistic island of Salento, situated in southern Puglia, and consisting of nine municipalities in which a neo-Greek dialect, also known as Grecanic or Griko, is spoken. The origins of this linguistic island in Salentine Greece are uncertain. The German linguist G. Rohlfs proposed its origin in the Magna Graecia region; while O. Parlangeli suggests a Byzantine derivation of the Griki of Salento. Greek researchers (e.g. A. Karanastasis) claim the input of Byzantine elements in the pre-existing Magna Graecia matrix. The Greek arrival in the Salentine Peninsula occurred both in the Magna Graecia, and posterior Byzantine dominations. The numerous villages of Grecani Salentini had a Greek culture and language and practiced the Greek-orthodox religion. In the beginning of the Norman conquest (eleventh century), and more intensively with the arrival of different casati (clans) (Svevian, Angioin, Aragones, etc), the catholic clergy supplanted those of the orthodox faith [30].
The Lucera population has received an important influx from North African Arab peoples (see [31]). Thus, after the collapse of the Roman Empire in Europe, the Arab domination spread into the Mediterranean Basin. Referred to either as Moors in Iberia or Saracens in Southern Italy and Sicily, Arabs arrived in Europe in 711 AD, and in 831 AD Iberia and Sicily were almost completely subjected to Arab domination [31]. In the thirteenth century, Frederick II moved the Sicilian Arabs to the city of Lucera (North Apulia) [32]. This sample was genotyped for STRs and Ychromosome SNPs in Capelli et al. [31] To the best of our knowledge, all individuals collected in the present study were not maternally and paternally closely related; they had different surnames and all the donors referred back at least two generations in the region where the samples were collected.
All the samples were analyzed for the control region and selected mtSNPs (see below). A subset of the samples comprised unrelated males (n = 292) representing seven different populations. These samples were genotyped for a panel of 17 Y-chromosome SNPs (see below), and were previously genotyped for the Yfiler [33]. In addition, autosomal ancestry informative markers (AIMs) were genotyped in 441 individuals (see below).

DNA extraction
Blood extraction was performed with a salting-out method [34], modified and re-adapted to buccal cells. Swabs were incubated in 500 ml of 0.2 sodium acetate, 35 ml of 10% SDS and 20 ml of 20 mg/ml Proteinase K for 16 hours at 56uC. They were then removed and 500 ml of 3 M NaCl solution was added. Proteins were removed by centrifugation, and the DNA precipitated by adding 1 ml of ethanol 100% at 220uC for a few hours. After centrifugation, the DNA pellet was twice washed with ethanol 70%, dried and re-suspended in water. For the blood samples, aliquots of 500 ml each were thawed and red cells selectively lysed by a 16 lysis buffer. After three washes with the lysis buffer, white cells were pelleted and the DNA extracted using the salting-out protocol. All the samples were quantified by direct comparison with standard on agarose 1% minigels (1 g of agarose in 100 ml of TBE 1X-from the 1:10 diluition of TBE 10X).

PCR and mtDNA control region sequencing
MtDNA has been sequenced for the complete control region, from position 16024 (in HVS-I) to 569 (in HVS-II). The first and second hypervariable regions (HVS-I/II) were amplified via the polymerase chain reaction (PCR) and using primers reported by Á lvarez-Iglesias et al. [35].
PCR was carried out in a 25 ml reaction mix with 16 reaction buffer (20 mM Tris-HCl, ph 8.0, 0.1 mM EDTA, 1 mM DDT, 50% (v/v) glycerol), 1.5 mM MgCl 2 , 200 mM each dNTP, 0.4 mM each primer, 2.5 U (Units). Taq polymerase and 0.1-1 ng DNA template was added to the reaction mixture (Taq DNA Polymerase, recombinant. INVITROGENH Corporation). Amplification was carried out in a GENE AMPH PCR SYSTEM 9700 (Applied Biosystems, Foster City, California,U.S.A.) using a hot start at 95uC for 1 min, followed by 36 cycles at 95uC for 30 sec, 55uC for 60 sec, and 72uC for 30 sec and a final extension at 72uC for 15 min. Before the sequencing reaction, PCR products were checked by electrophoresis in polyacrylamide non-denaturing gel (T9, C5), and subsequently the gel was stained with silver nitrate. PCR products were then purified with a MultiScreenH PCR m96 Plate (Millipore, Bedford, Ma 01730, U.S.A), 96-well device.The vacuum-based, size exclusion separation effectively and quickly removed the containing salts, unincorporated dNTPs and primers from PCR reactions. Cycle sequencing was performed on both strands in a GENE AMPH PCR SYSTEM 9700 (AB) thermal cycler using the ABI PrismH dRhodamine Terminator Cycle Sequencing Ready Reaction Kit (AB). This kit consists of a reaction mix composed of: DNA-modified and termostable polymerase, Buffer Tris-HCl (pH 9.0), MgCl 2 , dNTPs, dichlororhodamine-marked ddNTPs. An aliquote of 30 ng amplicon and 3.2 rM primers were added to a 2 ml reaction mix. Sequencing was carried out using a hot start at 96uC for 4 min, followed by 36 cycles at 96uC for 15 sec, 50uC for 10 sec, 60uC for 2 sec and a final extension at 60uC for 10 min. The removal of excess dideoxy terminators, primers and buffer was accomplished with an alcoholic purification.
The sequence products were denatured with deionized formamide and analyzed by capillary electrophoresis on an ABI PRISM 3130H Genetic Analyzer (AB).The resulting data were analyzed with PE/ABD software Sequencing Analysis 5.2 and sequences were aligned and compared with the Cambridge sequence [36] from position 16024 to16569 for HVS-I and from position 1 to 600 for HVS-II by the SeqScape v.2.0 (AB).

Analysis of mtDNA coding region SNPs
Biallelic markers were genotyped using a multiplex approach [37]. The selected SNPs were combined into two multiplex reactions. Multiplex 1 included a selection of SNPs defining common European haplogroups [38]. Multiplex 2 included exclusively polymorphisms defining sub-lineages inside haplogroup H. Primers were designed in order to adjust the annealing temperatures and amplicon lengths to allow analysis in multiplex reactions [37]. The sizes of the PCR products ranged from 80 to 224 bp.
Both multiplexes were performed using 10 ng of DNA template in a 25 ml reaction volume comprising 16 Taq Gold Buffer (AB), 200 mM of each dNTP, 2 mM MgCl 2 and 0.5 U of AmpliTaq Gold Polymerase (AB). For the primer concentrations, see [37].
Amplification was carried out using a GENE AMPH PCR SYSTEM 9700 (AB) thermocycler. After a 95uC pre-incubation step for 11 min, PCR was performed for a total of 32 cycles using the following conditions: 94uC denaturation for 30 sec, annealing at 60uC for 30 sec and extension at 72uC for 1 min, followed by a 15 min final extension at 72uC. PCR products were checked by polyacrylamide gel electrophoresis (T9, C5) visualized by silver staining.
After amplification, PCR products required purification to remove primers and unincorporated dNTPs. Post-PCR purification was performed with ExoSapIT (Amershan Pharmacia Biotech): 1 ml of PCR product was incubated with 0.5 ml of  Table 1). Pie charts on the left display the distribution of mtDNA haplogroup frequencies, and those on the right the Y-chromosome haplogroup frequencies. doi:10.1371/journal.pone.0050794.g001 ExoSapIT for 15 min at 37uC followed by 15 min at 80uC for enzyme inactivation. The minisequencing reaction was performed in a GENE AMPH PCR SYSTEM 9700 (AB) thermocycler following the recommendations of the manufacturer: 2 ml of SNaPshot ready reaction mix, 0.2 mM of extension primer for each SNP (see [37]) and 1 ml of both purified PCR products in a total volume of 7 ml. The reaction mixture was subjected to 25 single base extension cycles of denaturation at 96uC for 10 sec, annealing at 50uC for 5 sec and with an extension at 60uC during 30 sec. After minisequencing reactions, a post-extension treatment to remove the 59-phosphoryl group of ddNTPs aided the prevention of co-migration of unincorporated ddNTPs with extended primers and production of a high background signal. The final volume (7 ml) was treated with 0.7 ml of SAP (Amersham Biosciences) for 60 min at 37uC, followed by 15 min at 80uC for enzyme inactivation.
The minisequencing products (1.5 ml) were mixed with 10 ml of HiDi TM formamide and 0.2 ml of GeneScan-120 LIZ size standard (AB) and electroforesis was performed on an ABI PRISM 3130H Genetic Analyser (AB). The resulting data was analyzed with Gene Mapper ID.

Genotyping of Y-SNPs
Biallelic markers were genotyped using a multiplex approach [39]. A set of 30 SNPs was tested, allowing assignation of the analyzed Y-chromosome to haplogroups (Hg), following the nomenclature and the phylogenetic relationships defined from the Y Chromosome Consortium [40]. The selected method for allele discrimination was a single base extension reaction using the SNaPshot multiplex kit (AB). We added the M269 marker to the first of the four multiplexes, in order better to dissect the subhaplogroup R1b (R1b3). The primers of this marker were M269-F 59-TCA TGC CTA GCC TCA TTC CT-39 and M269-R 59-TCT TTT GTG TGC CTT CTG AGG-39, and the minisequencing primer 59-GGA ATG ATC AGG GTT TGG TTA AT-39.

Genotyping of AIMs
A panel of 52 AIMs were genotyped according to Sánchez et al. [41] in a subset of 441 individuals. Several other population datasets were used for inter-population comparisons. This data corresponded to the CEPH panel (http://www.cephb.fr/en/ cephdb/) as reported in HapMap (http://hapmap.ncbi.nlm.nih. gov/) and was collected using the data-mining tool SPSmart [42,43]; it includes population samples from all over the world (Africa, Europe, Asia, etc.); see legend of Figure 2 for more information.

Statistical analysis
A total of 42 Italian population samples were analyzed for mtDNA in the present study. Comparative inter-population analyses were also carried out for the HVS-I segment ranging from 16024 to 16365, since this is the analyzed segment common to all of them. Haplotype (H) and nucleotide diversity (p) and other diversity indices [44][45][46] were computed using DnaSP 4.10.3 software [47]. Problematic variation located around 16189, usually associated to length heteroplasmy e.g. 16182C or 16183C, was ignored. Analysis of molecular variance (AMOVA) was carried out using Arlequin 3.5. [48]. Nomenclature of mtDNA lineages followed previous studies e.g. [23,25,38,49,50]; see Phylotree for a compilation of the worldwide phylogeny and an update of the nomenclature based on entire mtDNA genomes [51]. Genotyping and documentation errors were monitored following the phylpogenetic principles previously applied e.g. [52][53][54][55][56][57][58][59].
Mitochondrial DNA and Y-chromosome data was collected from the literature. The mtDNA data generated in the present study was analyzed together with 3,834 mtDNA HVS-I Italian profiles collected from the literature ( Table S2; 76 sample populations). The Y-SNPs were analyzed together with 1,251 Italian profiles reported in the literature (16 population samples). A full list of references for all the data used in the present study is given in Table S2.
Haplogroup frequencies were estimated by chromosome counting. Statistical differences in haplogroup frequencies were evaluated using a Pearson's chi-square test and by setting up the nominal significant value a as 0.05.
Finally, classification of mtDNA sequences into haplogroups was performed following phylogenetic criteria (Phylotree Build 14, http://www.phylotree.org/) and using both the control region sequence profile and mtSNPs.

Molecular diversity of mtDNA and Y-chromosome Italian profiles
Diversity indices were computed for all the populations analyzed in the present study and also in those Italian populations samples reported in the literature (Tables 1 and 2). Population samples were also grouped in main regions (North, Central, South, West, and East) in order to investigate the role of geography in the distribution of mtDNA variation.
Mitochondrial DNA haplotypes for the samples analyzed in the present study are reported in Table S3. Table 1 shows the molecular diversity values based on mtDNA data for 41 Italian population samples. The values indicate that the Isle of Elba is, by far, the Italian population sample that shows the lowest diversity for all the indices computed, probably as a consequence of its relative isolation from the country. It has been reported that this was a well-known enclave of Etruscan influence, and some mtDNA particularities have been described before [8,9]. Alternatively, low molecular diversity could be due to low sample sizes, although this fact is mirrored in the standard deviation of the different estimates. Excluding the Isle of Elba, haplotype diversity in Italy ranges from 0.834 to 1, nucleotide diversity from 0.01003 to 0.02409, and the average value of nucleotide differences from 3.4 to 8.19 (a value that is correlated with the nucleotide diversity). In general, Italy shows some level of heterogeneity when examined for diversity values.
When grouping populations by main geographical regions, it can be observed that Central Italy has slightly lower values than North and South Italy for all the indices computed ( Table 1). The higher diversity values were found in South Italy. Diversity values are however very similar when examining populations located in West Italy versus those in the East. The inclusion of Sicily (as part of South Italy) in the computation does not substantially change these estimates ( Table 1).
Y-SNP data were obtained for all the samples analyzed in the present study (Table S4). Table 2 shows the diversity indices for the Y-SNPs in different Italian populations. The Y-STR diversity values for the samples analyzed in the present study and other Italian and European samples have already been reported in Brisighelli et al. [33]. As expected, diversity values of Y-SNP haplogroup patterns are lower than those obtained for the mtDNA haplotypes given that the indices are based on haplogroup and not on Y-STR haplotypes. In fact, values based on Y-STR profiles (minimum or extended Yfiler profiles) [33]
Mitochondrial DNA haplotypes of African origin are mainly represented by haplogroups M1 (0.3%), U6 (0.8%) and L (1.2%); from here onwards, L will be used to refer to all mtDNA lineages, excluding the non-African branches N and M [60,61].
A total of 282 Y-chromosomes were analyzed for a set of Y-SNPs and were classified into 22 different haplogroups (Figure 3). Two haplogroups were not found, even though markers defining these clades were tested: N3 and R1a1. Five haplogroups represented 76.71% of the total chromosomes: R1b3, J2, I(xI1b2), E3b1 and G. The frequencies averaged across populations were 26%, 21.2%, 10.2%, 9.9% and 9.2%, respectively. The remaining haplogroups sum to 23.2% in the total sample, and never above 4% in single population samples.
R1b3 frequency was found to be higher in the northern part of the country, while the Y-chromosome haplogroups G and E3b1, J2 and I(xI1b2)frequencies were higher in the south and in the central part of the country, respectively (Figure 1).
Regional differences are substantially higher in the Y-chromosome than in the mtDNA. Thus, for instance, haplogroup R in the Y-chromosome was 54% in the North, 18% in the Center, and 31% in the South. Frequency differences were statistically significant between North vs Center (Pearson's chi-square, unadjusted-P value = 0.0014), and North vs South (Pearson's chi-  square, unadjusted-P value,0.00004). Haplogroup J2 also revealed important regional differences; it added to 9% in the North, 37% in the Center, and 22% in the South, with statistically significant differences between the North vs Center (Pearson's chisquare, unadjusted-P value,0.00002), North vs South (Pearson's chi-square, unadjusted-P value,0.00148), and in the limit of significance Center vs South (Pearson's chi-square, unadjusted-P value,0.049).

Autosomal ancestry in Italy
A panel of 52 AIMs was genotyped in 435 Italian individuals in order to estimate the proportion of ancestry from a three-way differentiation: sub-Saharan Africa, Europe and Asia. Structure analyses allowed us to infer membership proportions in population samples, and these proportions can be graphically displayed, as in Figure 2. This analysis indicated that Italians have a basal proportion of sub-Saharan ancestry that is higher (9.2%, on average) than other central or northern European populations (1.5%, on average). The amount of African ancestry in Italians is however more comparable to (but slightly higher than) the average in other Mediterranean countries (7.1%). Figure 2 shows in a triangle plot the relationships of Italians compared to other European, African and Asian populations.
PCA observations confirmed the results from Structure analysis, clustering Italian profiles tightly with other European ones. Thus, PCA indicated that North, Central and South Italy do not show differences between them, nor from other European populations ( Figure 2). PCA also indicated clear-cut differences between Italians, Africans and Asians (Figure 2).

AMOVA
AMOVA analyses were carried out following different grouping schemes. The samples were pooled into a single population, but also by considering main Italian regions. Analyses were carried out over haplogroups and haplotypes of the Y-chromosome and the mtDNA ( Table 3).
AMOVA indicated that, among populations, variance was more strongly stratified for the Y-chromosome than for the  Table 3. Again, the Y-chromosome showed slightly higher values of among-population variance than did the mtDNA. For the Y-chromosome, a significant proportion of the within-population variance moved to among-population within-groups variance, probably due to the fact that all population samples had a very high proportion of singleton Yfiler haplotypes, elevating the maximum values of haplogroup diversity for all of them [33].

Linguistic isolates: Ladin and Grecani Salentini
Two linguistic isolates are represented in the samples analyzed in the present study: the Ladin and the Grecani Salentini.
Other population samples of the Ladin have already been analyzed in the literature [22,62,63]. We here sampled 41 new  Table 3. AMOVA analysis of main Italian regions (Permutations: 20000; P-value,0.0000) for the mtDNA control region data and the Y-chromosome STRs and SNPs. individuals from the locality of Val Badia. As reported in Table 4 for the mtDNA, Val Badia Ladins showed relatively high nucleotide diversity patterns compared to other Ladin populations, but intermediate haplotype diversity values. Compared to other Italian populations, diversity in Ladin populations is generally lower ( Table 1). For Y-chromosome haplogroups, the differences between Ladin and the rest of Italy were more evident, with the Ladin showing much lower values than average Italians. The differences between Ladin and other populations were more evident when examining haplogroup frequency patterns (Figure 4). The frequency of haplogroup H (58%) was above the frequency of H in North Italy (55%), and was extremely high (58%) compared to the average for Italy (38%) (Pearson's Chisquare test, P-value = 0.0005). While haplogroup U was found to have approximately the same frequency as other Italian populations, haplogroup T was 5% compared to 12% in Italy generally (7% in the North). Other differences were apparent, but sample sizes were relatively low to yield significant statistical differences.
Differences are more important when examining Y-chromosome haplogroup frequencies. R1b3 reached 52% in Ladin populations but only 31% in the general population, and also in the North (Pearson's Chi-square test, P-value = 0.0087); Figure 4. More remarkable are the differences when considering the remaining R1b lineages, that is, R1b(xR1b3), which account for 15% of the lineages in Ladins, but only for 1% in the general population (Pearson's Chi-square test, P-value = 0.0001). Other haplogroups showed substantial haplogroup differences (e.g. J2) but the sample size was again too small.
Due to the availability of data for mtDNA in several Ladin communities, we were able to carry out an AMOVA analysis in order to investigate the level of population stratification in these communities. The data indicated that among-population variance is 1.09%, a value that is therefore higher than the average for the Italian Peninsula (0.79%; Table 4).
Some interesting features were also found for Ladin populations when examined at the haplotype level. For instance, the HVS-I profile G16129A C16192T A16270G T16304C was found in four Ladins from Val Badía; this profile belongs to haplogroup U5b3f [20]. In a large in-house database of worldwide profiles (.130,000 HVS-I segments), this sequence was only found sporadically in other Italian regions and in Spain (Catalonia, Galicia, and Ibiza in the Balearic Islands). U5b3f is a minor clade of U5b3, the only haplogroup reported to date that has been found to represent the glacial refuge zone in Northern Italy and a source population for human re-colonization of the continent at the beginning of the Holocene. The study of Pala et al. [20] indicates that this lineage mainly expanded along the Mediterranean coast towards the Iberian Peninsula; one sub-clade also reached Sardinia 7000-9000 years ago. The branch observed in the Ladins is younger and could also have participated in the Mediterranean spread of U5b3f towards Iberia, given its presence in modern-day Spain. The data suggest that the U5b3f members observed in the Ladins probably predate the Ladin ethnogenesis and, given that this population has somehow become isolated from other neighboring populations, could reach a substantial frequency in some other Ladin communities, as is the case for the Val Badia. Another example is the U3 profile A16233G C16256T T16311C A16343G, which was only found in five Ladins from Alto Adige (Val Badia and Val Gardena), while T16352C C16354T was found in six individuals from Val Badia in South Tyrol.
Diversity values in the Grecani Salentini samples were similar to those observed in other Italian regions. Moreover, they also show haplogroup frequency patters in the Y-chromosome and the mtDNA that matches well with other Italian samples. The haplogroups are typically European (Figure 4); given the southern location of the Grecani Salentini in the Italian Peninsula, it is noticeable that there is no evidence of North African lineages. Note however, that at higher level of phylogenetic resolution, there are signals on the Y-chromosome of North African enrichment in South Italy [31].

The North African historical legacy in South Italy
We sampled 60 individuals from Lucera. This population sample showed diversity values that fell within the average of a typical Italian population, regarding the mtDNA ( Table 1) and the Y-chromosome ( Table 2). Additionally, at the level of haplogroup frequencies, Lucera matched well with other Italian populations ( Figure 4).
There are two mtDNA haplogroups, namely U6 and M1 that can be considered to be of North African origin and could therefore be used to signal the documented historical input of this African region into Lucera. In our full set of samples, we observed five U6 haplotypes belonging to sub-haplogroups U6a, U6a2, and U6a4. Only one of these haplotypes was observed in Lucera. However, the other three U6 haplotypes were observed in the vicinity of the population of South Apulia, and another at the tip of the Peninsula (Calabria). Regarding M1 haplotypes, we observed only two carriers in our samples sharing the same HVS-I haplotype; both were found in Trapani (West Sicily).
Therefore, while South Italy shows evidence of having female introgression from North Africa, this African influence seems not to be particularly centered in the Lucera. In the Y-chromosome, we did not observe any signal of North African introgression; at least, no more than for other regions of Italy (perhaps with the exception of Sicily [31]). This again contrasts with the results of previous studies based on the Y-chromosome (but at higher or different level of phylogenetic resolution involving the genotyping of African minor sub-lineages) where signals of North African influence were observed at this latitude of the Peninsula [31].

Discussion
A meta-analysis of Y-chromosome and mtDNA sequence data was undertaken in order to investigate patterns of genetic variation throughout Italy. Molecular indices indicated that most of the Italian samples show diversity values that are comparable to other European populations. However, some differences were shown to exist, especially in isolated Ladin populations. Regional differences were much more evident when examining haplogroup frequencies in both uniparental markers. The differences were again more remarkable for the two linguistic isolates, the Ladins and Grecani Salentini. AMOVA also indicated the existence of significant population stratification along the length of the country, which appeared more remarkable for the Y-chromosome and for haplogroups than for haplotypes. These figures have however to be considered with caution given the different mutability of the markers being analyzed [64]; see also a discussion in [65].
Over the last few years, the interest in genetically isolated populations has increased, especially in biomedical studies, where there exists a growing interest in revealing genetic variants associated to disease. Genetic isolates generally originate as a result of group ''foundation'' by a small number of individuals presenting initially low variability. We have here analyzed a new sample of the Ladins, a well-known linguistic and genetic isolate from the Italian Alps. Some investigations were focused on the Ladin Romance speaking populations, distributed between Trentino, the Veneto regions and South Tyrol area [22,62,63,66]. As also observed in the present study, Ladin communities show marked genetic differentiation with neighboring (non-Ladin) populations. Differences were also observed between the different Ladin groups; for instance, AMOVA analysis also indicated that the different Ladin communities show a level of population stratification that is higher than the average in the rest of Italy. These results are also consistent with the recent study by Coia et al. [67], derived from micro-geographical analysis of nine sample populations from Trentino (Eastern Italian Alps). Genetic differences between Ladin samples are most likely to be due to the limited historical gene flow existing between these communities [22]. In this regard, it is also noticeable that, while the South Tyrol populations show clear signatures of isolation, the Veneto groups presented a high degree of genetic variability [68].
The Grecani Salentini also showed signatures of genetic isolation when compared to other Italian populations, but the differences are not as marked as observed for the Ladins. The differences with respect to neighboring Italian populations were not evident when observing individual haplotypes (as occurs with the Ladins), but were clearer when considering haplogroup frequencies (Figure 4). Larger sample sizes are needed in order to gather more signatures about the demographic past of this population. Thus, the Ladins show a more distinctive pattern than the Grecani Salentini, which is to be expected given that not only is the Ladin population a linguistic isolate, but also that these communities are confined to isolated geographical areas of the Alps.
Apart from the regional and local genetic differences observed in Italy, it is also worth examining global genetic patterns along the length of continental Italy.
Geographical clines of Y-chromosome haplogroups in Europe have been previously reported in the literature [13]; these patterns have found support in archaeological and linguistic evidence. In the Italian peninsula, the Y-chromosome variation also shows a clinal pattern along the North-South axis; the Mesolithic haplogroup R1*(xR1a1) shows higher frequency in the North while the Neolithic haplogroup J2-M172 is superposed to this Mesolithic strata with frequency patterns running in the opposite direction [14,69]. The results of the present study agreed with these earlier findings. Thus, for instance, R1b3 reached 31% in the North, 16% in the Center, and 14% in the South. Frequency of Y-chromosome haplogroup J2 was found to be 9% in the North, 37% in the Center, and 22% in the South (average in Italy: 14.5%). Haplogroup J2 is widely believed to be associated with the spread of agriculture from Mesopotamia. The main spread of J2 into the Mediterranean area is thought to have coincided with the expansion of agricultural populations during the Neolithic period. As reported by Di Giacomo et al. [12], haplogroup J ''…constitutes not only the signature of a single wave-of-advance from the Levant but, to a greater extent, also of the expansion of the Greek world, with an accompanying novel quota of genetic variation produced during its demographic growth…''; also that ''…in the central and west Mediterranean, the entry of J chromosomes may have occurred mainly by sea, i.e., in the south-east of both Spain and Italy…''. J2-M12 is almost totally represented by its sublineage J2-M102, which shows frequency peaks in both the southern Balkans and north-central Italy (14%; [13]). J2-M67 is most frequent in the Caucasus, and J2-M92 indicates affinity between Anatolia and southern Italy (21.6%; [13]). For the J1-M170 clade, the peaks of J1-M267 are in the Levant and in northern Africa, and it is closely associated to the diffusion of the Arab people, dropping abruptly outside of this area (including Anatolia and the Iberian peninsula), even if it shows an appreciable percentage in Sicily [70]. In a recent study, Pala et al. [71] confirmed that mtDNA haplogroups J and T and their major sub-clades (J1 and J2, T1 and T2) most likely arose in the Near East at the time of the first settlement by modern humans and the LGM. These haplogroups started to spread from the Near East into Europe immediately after the peak of the last glaciation, about 19 kya ago, with a major expansions in Europe in the Late Glacial period, about 16-12 kya ago, thus indicating that many of the Neolithic expansions from southern Europe into Central Europe and the Mediterranean might have been indigenous dispersal of these lineages.
Latitudinal clinal frequency patterns are also observed for the mtDNA haplogroups mirroring those of the Y-chromosome. As reported by Richards et al. [38], haplogroups H, K, T*, T2, W, and X are the major contributors to the Late Upper Paleolithic, and the central-Mediterranean region has the greatest Middle Upper Paleolithic component outside the Caucasus. In agreement with the Y-chromosome, we observed that all these Paleolithic haplogroups together add to approximately 70.3% in the North, 60.8% in the Center, and 54% in the South of Italy. The opposite pattern was observed for the main mtDNA Neolithic component, represented by haplogroups J and T1, which accounted for 5.8% in the North, 10.3% in the Center, and 14.1% in the South (Italian average: 10.5%).
As early as 1934, [72], Vere Gordon Childe suggested that the indigenous communities of hunters and gatherers of the Mesolithic European cultures were replaced by communities of farmers migrating to the North from the Middle East, a process that lasted for several generations. The first stream of emigration followed the route along the continental Balkan Peninsula and the Danube, while another, slightly later, emigration spread along the coasts of the Mediterranean Sea from East to West. The latter path would fit well with the distribution of other Neolithic cultural features, such as the so-called Cardium Pottery (or Cardial Ware) [73], the ceramic decorative style that better defines the Neolithic culture. This culture entered from Greece towards the South-Center of Italy through the Adriatic Sea, carried by the same farmers that introduced, for instance, Y-chromosome haplogroup J2 at about the same frequency in Central and South Italy, but with lower introgression into the North; from here followed further Mediterranean expansions towards Iberia.
The sub-clade E3b1 (probably originating in eastern Africa) has a wide distribution in sub-Saharan Africa, Middle East and Europe. This haplogroup reaches a frequency of 8% in the North and Center and slightly higher in the South of Italy, 11% ( Figure 1). It has also been argued that the European distribution of E3b1 is compatible with the Neolithic demic diffusion of agriculture [15]; thus, two sub-clades, E3b1a-M78 and E3b1c-M123 present a higher occurrence in Anatolia, the Balkans and the Italian peninsula. Another sub-clade, E3b1b-M81 is associated with the Berber populations and is commonly found in regions that have had historical gene flow with Northern Africa, such as the Iberian peninsula [74,75]- [76][77][78], including the Canary Islands [75], and Sicily [70,79]; the absence of microsatellite variation suggests a very recent arrival from North Africa [80]. If we assume that all E3b1 represents the only Y-chromosome continental African contribution to Italy and L and U6 lineages the continental African mtDNA component, the African component in Italy is higher for the Y-chromosome (8-11%) than for mtDNA (1-2%). The origin of sub-Saharan African mtDNAs in Europe (including Italian samples) has been recently investigated by Cerezo et al. [81]; the results indicate that a significant proportion of these lineages could have arrived in Italy more than 10,000 years ago; therefore, their presence in Europe does not necessarily date to the time of the Roman Empire, the Atlantic slave trade or to modern migration.
In addition, the Northern African influence in the Italian Peninsula is evidenced by the presence of Northern African Y chromosome haplogroups (E1-M78) in three geographically close samples across the southern Apennine mountains: East Campania, Northwest Apulia and Lucera [31]. The Lucera sample analyzed in the present study did not however show a higher impact from North Africa than for other areas from southern Italy [31].
Finally, in agreement with uniparental markers, analysis of AIMs as carried out in the present study indicated that Italy shows a very minor sub-Saharan African component that is, however, slightly higher than non-Mediterranean Europe. This agrees with the recent findings of Cerezo et al. [82] based on the analysis of entire mtDNA genomes pointing to the arrival in ancient and historical times of sub-Saharan African people to the Mediterranean Europe, followed by admixture.
The present study represents the largest meta-analysis carried out to date for the Italian peninsula. We observed that the Ychromosome and the mtDNA retain the imprint of the major ancestral events occurring in Italy; however, the Y-chromosome shows more marker regional differences than does the mtDNA. It is difficult to infer what proportion of these differences can be attributed not only exclusively to gender demographic differences, but also to the fact that both markers were analyzed to different levels of molecular resolution. Italy shows clines of variation attributable to the demographic movements of the first Paleolithic settlements, posteriorly modeled by the Mesolithic and, to a lesser extent, Neolithic farmers. Regional differences arose with time, which are more notable in linguistic isolates, such as the Ladin populations, and to a minor extent, the Grecani Salentini. Lot of effort has been dedicated during the last two decades to the study of Italian populations. Further studies are needed in order to dig into some of the many demographic movements occurring in the Italian peninsula along history. Entire genome sequencing of particular lineages (in the line of e.g. [20]) and nuclear DNA genomic studies are needed in order to explore hypothesis beyond what has been done to date in Italy.

Supporting Information
Table S1 mtSNPs and primers used to characterize J/T and U and some of their sub-clades. (XLS)