Demographic Histories, Isolation and Social Factors as Determinants of the Genetic Structure of Alpine Linguistic Groups

Great European mountain ranges have acted as barriers to gene flow for resident populations since prehistory and have offered a place for the settlement of small, and sometimes culturally diverse, communities. Therefore, the human groups that have settled in these areas are worth exploring as an important potential source of diversity in the genetic structure of European populations. In this study, we present new high resolution data concerning Y chromosomal variation in three distinct Alpine ethno-linguistic groups, Italian, Ladin and German. Combining unpublished and literature data on Y chromosome and mitochondrial variation, we were able to detect different genetic patterns. In fact, within and among population diversity values observed vary across linguistic groups, with German and Italian speakers at the two extremes, and seem to reflect their different demographic histories. Using simulations we inferred that the joint effect of continued genetic isolation and reduced founding group size may explain the apportionment of genetic diversity observed in all groups. Extending the analysis to other continental populations, we observed that the genetic differentiation of Ladins and German speakers from Europeans is comparable or even greater to that observed for well known outliers like Sardinian and Basques. Finally, we found that in south Tyroleans, the social practice of Geschlossener Hof, a hereditary norm which might have favored male dispersal, coincides with a significant intra-group diversity for mtDNA but not for Y chromosome, a genetic pattern which is opposite to those expected among patrilocal populations. Together with previous evidence regarding the possible effects of “local ethnicity” on the genetic structure of German speakers that have settled in the eastern Italian Alps, this finding suggests that taking socio-cultural factors into account together with geographical variables and linguistic diversity may help unveil some yet to be understood aspects of the genetic structure of European populations.


Introduction
A considerable body of evidence shows that geographic distance is a good predictor of the genetic structure of European populations. A southeast-northwest cline, possibly associated with the Pleistocene settlement of the continent and the Neolithic demic diffusion from the Fertile Crescent [1,2] (but see 3), has been initially highlighted for classic genetic markers [1] and later corroborated by the analysis of Y chromosome and autosomal polymorphisms [4,5,6]. One exception to this scenario, however, is that no clear evidence of clinal variation has been observed for mitochondrial DNA, which is supposedly a consequence of the higher female compared to male migration associated with the prevalence of patrilocality [7,8,9]. Finns, Sardinians, Basques and European Jewish provide important departures from this pattern, a finding which is currently explained by bottlenecks and/or their reduced genetic exchange with other European populations [10,11,12,13,14,15]. A potential but yet to be well explored source of diversity in the European genetic landscape is represented by groups that have settled in mountainous environments. In particular, great mountain ranges, such as the Alps, Pyrenees and Carpath, may have not only acted as barriers to gene flow for resident populations, but have possibly, since prehistory, also offered a place for the settlement of small, and sometimes culturally diverse, communities.
The Alps are one of the broadest mountain ranges of Europe, with a longitudinal extension of approximately 1,200 kilometers. They cover eight different countries and over 100 peaks of over 4000 m a.s.l. There is a substantial consensus among archeologists regarding the notion that many alpine areas had already been inhabited in the Paleolithic [16,17], with a more intense peopling starting from the Neolithic [18,19]. However, occupation of the upper valleys remained scattered and small in number until a more systematic process of colonization and demographic expansion began in the late Middle Ages [20]. Another key passage concerning the demographic history of the Alps is represented by the "breakup of isolates". In fact, a dramatic decline of endogamy began in the first half of the 20th century due to an increase in individual mobility and the depopulation of the mountain areas thanks to socio-cultural changes linked to industrialization [21,22].
At present, Alpine populations can be considered as a mosaic of groups that are separated by physical and cultural boundaries, whose remarkable cultural diversity is clearly demonstrated by the presence of minorities that speak Franco-Provençals, Occitans, French, German, Ladin, Friulian and Sloven languages [23,24]. From a bio-anthropological point of view, they offer a unique opportunity to study the impact of geographical, demographic and cultural factors on genetic structure [25]. Such a target requires the simultaneous investigation of distinct linguistic groups and, ideally, the analysis of genetic systems with different modes of evolution and transmission. Unfortunately, the population genetic studies that have been carried out so far are scanty and most of them only focused on a limited number of populations or single groups [26,27,28,29,30].
In this study, we present new high resolution data on Y chromosomal variation in three distinct Alpine ethno-linguistic groups, Italian, Ladin and German. Combined with data on Y chromosome and mitochondrial variation taken from our previous research work and the literature, these results are used to answer four questions: (i) how is genetic diversity patterned in alpine ethno-linguistic groups?; (ii) what microevolutionary forces might have shaped their genetic structure?; (iii) how do the observed patterns compare with what has been noticed in other European groups, in particular with well known genetic outliers and other groups settled in great mountain ranges?; (iv) are there factors, other than geography and language, that should be taken into account when studying the genetic structure of European mountain populations?

The Populations under Study
Our study is primarily based on unpublished Y chromosome data (17 Short Tandem Repeats, STRs, and 50 Single Nucleotide Polymorphisms, SNPs) from 610 unrelated individuals belonging to 15 populations from the Eastern Italian Alps (Trentino-Alto Adige, Veneto and Friuli regions; see Table  1 and Figure 1).
Ladins are thought to be related to pre-Indo-European speaking tribes who probably represent the most ancient settlers of the Alps [31]. The Dolomitic Ladins are the remnant of a wider group that started settling in a broader territory in 1000 AD. As for the Ladins, the other Romance speaking groups of Italians are thought to be linked to the most ancient peopling of the area [31]. Finally, the ethno-linguistic Germanic islands of the Eastern Alps are in continuity with nuclei that migrated from Bavaria, Carinthia and Tyrol in the late Middle Ages, a process driven by the landed aristocracy and the monasteries with the objective of a more intensive exploitation of marginal territories [20].
The dataset was integrated with an extensive search of literature data on unilinear transmitted markers [32] relative to populations living in the Alps or in other European mountain ranges (Pyrenees) (see Table S1).

Sampling and ethic statements
Buccal swabs were collected in apparently healthy and unrelated donors selected according to the place of birth of the sampled individual and of their parents and grandparents. The procedure and informed consent were reviewed and approved by the "Comitato Etico per la Sperimentazione con l'Essere Umano" of the University of Trento (samples from Trentino), "South Tyrolean Ethics Committee" (samples from Alto Adige, POLYS project) and the institutional review board of the Istituto Italiano di Antropologia (samples from Veneto and Friuli). All participants provided written informed consent to participate in this study.

Statistical analysis
Unless otherwise stated, statistical analyses were performed using 15 STRs, having excluded the duplicated DYS385 loci. The level of intra-population genetic variation was analyzed through the calculation of haplotype diversity (HD) and the number of different haplotypes (H). Multi-Dimensional Scaling of Fst genetic distances based on Y chromosome STRs (Reynolds' distances, [37]) and a Principal Component Analysis plot based on haplogroup frequencies were obtained using SPSS software (release 16.0.1 for windows, SPSS Inc.). We partitioned genetic variance at different hierarchical levels of population subdivision according to language groups (Italian, Ladin and German) by means of a molecular analysis of variance (AMOVA). In this analysis, we also used mitochondrial DNA literature data (HVR1, 333 bp from 16033 to 16365; see Table S2) [32]. All parameters of intra and inter-population genetic diversity were calculated using the Arlequin software (version 3.5.1.2, [38]).
We used a coalescent based simulation approach in order to evaluate whether the observed values of within-group genetic diversity may be attributed solely to the size of the founding group (see Tofanelli et al. [39] for a review of simulation methods for uniparental markers). We separated Italians into two sub-groups, western and eastern, according to their different current census size and previous mtDNA evidence [40]. Adige valley and Cimbrian populations were not considered to be part of the simulations because of the difficulties and uncertainties in modeling their evolutionary history. Based on current historical records, we designed two different topologies, one for the German-speaking island group and one for the two Italian sub-groups and Ladin speaking group. In both topologies (see Figure S1) three sub-populations split from a large source population at a certain time (T1) which were identified as Central-Western Europe but which differ in splitting times (32-40 generations for German speaking islands and 90-110 generation for all the other groups). According to Bramanti et al. [41], effective population sizes for source and sink populations were set as 1/10 of census size. Growth rate for the source population was set at 0.0018 from 1800 to 300 generations ago, and increased to 0.022 from then to the present day [42]. The growth rate for the sink populations was set as half of the highest value of the source. A symmetrical gene flow between source and sink was allowed (0.005-0.01), while admixture between sink populations was allowed to vary between 0.01-0.02 and 0.02-0.03. We simulated 10K random genealogies for the Y chromosome (15 STRs) using the mutation rate estimates of Ballantyne et al. [43] assuming a generation time of 25 years. For each scenario, we randomly sampled 50 individuals from each sink population and analyzed within-group diversity for each simulation using Arlequin 3.5 [38].

Patterns of genetic diversity in the linguistic groups of the Italian Alps
The Eastern Italian Alps embrace an important portion of the ethno linguistic diversity of the alpine arch, encompassing Romance (including Ladins and Italians) and German speakers. Their genetic characterization highlights a high level of diversity not only among single populations, but also within linguistic groups, a pattern which is likely to be due to a complex interplay of demographic histories and isolation determined by environmental and cultural factors.
The extent of diversity among Alpine populations is shown by the plots based on STR and SNP data (Figure 2A and 2B). The spatial relationships among populations differ between the two plots, with the SNP-based patterns probably mirroring more ancient population relationships due to their slower evolutionary rate. However, with both data-type populations under study are well separated and no linguistic structure of genetic diversity is detectable. This latter feature may be appreciated in a quantitative way by an AMOVA performed among linguistic groups, which produced low values of intergroup variation (from 0.007 to 0.020; see Table S3).
To gain further insights into the genetic diversity occurring within each linguistic group, we went one step further by focusing on their genetic structure. The Italian speaking group The Genetic Structure of Alpine Linguistic Groups PLOS ONE | www.plosone.org was found to be the most genetically homogeneous. Within group variation (0.04, p<0.05) is lower than in other Alpine groups and geographically distant European populations, but higher than observed among Northern Italian populations (Table S4-S5). Furthermore, they show high haplotype diversity values, with the highest observed in the Adige valley (0.997 ± 0.004 ; see Table S6). R1b S28*, a haplogroup found at high frequencies in most Alpine groups, is the most frequent in all populations (from 17 to 45%), the only exception being the Primiero, where G-M201 prevails (~49%, see Table S7). This pattern may be explained in two, not mutually exclusive, ways. Italian speaking populations have, since historical times constituted the most numerous ethno-linguistic group in the Eastern Alps, and they did not suffer from any historically documented bottleneck. Their present census values are comparable or higher than those of the other two groups under study (see Table 1). Furthermore, having settled in zones which are characterized by wider valleys, lower altitudes (from 200 to 1022 m a.s.l.) and more accessible mountain passes, they have probably been less geographically isolated than other groups (e.g. Ladins). Finally, the Adige river has provided a supplementary communication route, favoring population movements and interactions [44].
The genetic differentiation among Ladin valleys noticed in the plots is supported by other analyses of STR haplotype distribution. Their intra-group variation (0.075, p<0.05) is similar to what has been found in geographically distant European populations (6 populations, distances ranging from 366 to 2520 km, 0.074 p<0.05; see Table S4-S5) but much higher than what has been found in a set of Northern Italian populations (6 populations, distances ranging from 57 to 396 km, 0.006 p>0.05; see Table S4-S5) . Another likely effect of genetic drift may be seen in the intra-population diversity values (HD), which are lower than those observed in Italian speaking communities and in most European populations (Table S6). This is particularly evident for the communities from the Gardena and Badia valleys (South-Tyrol), which, correspondingly, depart more evidently from the main central group in the genetic distance plot (Figure 2A). Signatures of intra-group diversity are also provided by a phylogeographic approach. A further signal of the high within-group diversity is given by the finding that the prevalent haplogroup in the Fassa and Badia communities (S28*-R1b*) and Gardena valley (S-145 R1b*; Table S7) do not coincide. These two lineages of the main S116-R1b* haplogroup show a quite distinct continental distribution, with the former reaching its highest frequencies in south-central Europe (with frequencies peaks in France, northern Italy and the Alps), and S145-R1b found mainly in the north-Atlantic Europe [3,45]. On the whole, our results support the definition of Ladins as "small genetically isolated populations (subject to strong genetic drift), having a predominantly European ancestry" [27]. However, it should be noted that the inclusion of a third population (Fassa Valley) and the higher resolution of Y chromosome genotyping make our inferences more robust. The genetic signatures we observed may be an echo of the processes of fragmentation and/or assimilation of Ladin communities, first by Latin groups starting from the 15th century b.C, and then by German-speaking people (Gardena and Badia valleys) from the end of the 4 th century, and the consequent reduction in their settlement area and demographic size [46]. Moreover, the considerable altitude of the Ladin valleys (from 1120 to 1345 m a.s.l.) might have further increased a reciprocal isolation among fragmented Ladin communities [31].
The German speaking populations show the most marked signatures of genetic drift. As predicted by the outlying The Genetic Structure of Alpine Linguistic Groups PLOS ONE | www.plosone.org positions of Sappada, Timau and Luserna in the plot of genetic distances, the intra-group variation is very high (0.240, p<0.05), around two times higher than that found for geographically distant European populations. Moreover, the haplotype diversity values in these populations are the lowest of the the dataset, with the exception of Lessinia (see Table S6). Different haplogroups prevail in Sappada (E1b-V13 63%) and Timau (R1a-M17 56%), and different R1b subhaplogroups in Sauris (S139 34%), Lessinia (S116 17%) and Luserna (M269 84%). The considerable differentiation among German-speaking populations may be also seen as a consequence of their demographic history. In fact, they are in continuity with small founding groups [47] which settled in the present day location in Medieval times. Furthermore, as we have recently proposed [30], a relative reciprocal isolation could have occurred even among the linguistically closely related communities of Sappada, Timau, and Sauris as a result of "local ethnicity". In this condition, the members of each community tend to identify their ancestry with their own village rather than considering themselves as part of the same ethnic group, similarly to what occurs in other alpine regions [48].
The genetic differentiation between the two Cimbri populations of Luserna and Lessinia deserves further discussion. Both these communities derive from Bavarian populations that colonized a vast territory of the Eastern Italian Alps starting from 1053 AD (Veneto; [49]) to 1216 AD (Trentino; [44]). Luserna is genetically very distant from all the other Alpine populations (average Fst=0.328; see Table S6) and shows a strikingly low intra-population diversity (0.483±0.119). Paternal lineages are represented mostly by the R1b-M269* (frequency of 84%), with six different STR haplotypes associated with only one founder surname. Lessinia shows different, if not opposite, features. The average genetic distances from other populations (Fst=0.097; see Table S6) is less than one third compared to Luserna, while HD is close to the highest values of our dataset (0.978±0.019; Table S6). The prevalent haplogroup, R1b-M269*, accounts for only one third of the total, the rest represented by different lineages (G-M201, I1-M253, M410-J2A and K-M9), which are associated with twenty-three different surnames. The demographic history of the Luserna and Lessinia communities may help explain their differentiation. Luserna was founded by few families which moved from Lavarone, the first known Cimbrian settlement in the territory of Trentino [44]. This could have led to a strong founder effect in this community, a hypothesis supported by a previous study of mtDNA polymorphisms [40]. Moreover, Luserna is located on a high plateau (1,333 m a.s.l.) and is quite isolated from the surrounding areas. By contrast, Lessinia, a more extensive area with reliefs of low altitude (Giazza, 758 m a.s.l.), and has been colonized since the XIII century AD through several migration waves of small groups of settlers for more than one century. From the XV century AD, this community opened to, and probably admixed with, Italian neighboring groups [49].
On the whole, our genetic characterization indicates three main genetic patterns. Italian speaking populations show slightly higher level of within-group diversity than observed among distant European populations. The strongest signals of departure from the European genetic background can be seen among German speaking populations, while the intra-group and intra-population diversity level of Ladins fall between the former two groups. These signals seem to reflect the different demographic history of the three groups and their genetic isolation due to the mountainous environment (for all groups) and use of different languages from their neighbors (Ladins and German speakers). Nonetheless, the fact that the Y chromosome is a single locus transmitted by father to sons means that our inference needs further support from other genetic systems with a diverse mode of inheritance. Therefore, we thought it would be useful to repeat the analysis of intra and inter-population diversity with maternally transmitted mitochondrial DNA polymorphisms (hypervariable region 1) [27,30,40]. Despite some minor differences regarding Sauris and Ladins from the Gardena valley (both show an outlying position in the mtDNA MDS plot and the latter a lower rank for haplotype diversity), mtDNA and Y chromosome patterns substantially match (see Figures S2 -S4 and Table S3).
As discussed above, the intensity of the genetic signals observed in the Alpine linguistic groups seems to comply with what is to be expected for isolated population groups characterized by a different demographic profile. Therefore, a cause effect relationship between these two conditions and the different patterns of genetic diversity is worth taking into consideration. However, it did not escape our attention that such intensity seems to be inversely correlated with the supposed size of the founding groups, reflecting present census values (see Table S8). We then decided to test the alternative hypothesis that our observations could be the result of differences in the long-term effective size among groups, without any substantial effect of genetic isolation. To this purpose, we carried out coalescent simulations for all of our linguistic groups, with comparable levels of gene flow to those expected for non isolated groups. The distributions obtained ( Figure 3) are incompatible (Ladins and Italians) or only marginally compatible (German speakers) with the observed Fst values. A scenario combining the effects of founding group size and continued genetic isolation seems, therefore, to provide the best explanation for the observed level of withingroup differentiation detected in both geographic and geographic/linguistic isolates.

The Alpine linguistic groups in the European genetic background
The genetic distinctiveness of Alpine populations can be better appreciated contextualizing our results into the body of knowledge regarding European populations. A first comparison is to be made with open populations, to see whether group under study actually depart from the continental genetic structure. As shown by Roewer et al. [50], the distribution of Y chromosome variation at the continental level complies with an isolation by distance model. By contrast, the historical stratification and complexity of the peopling processes occurred in the Eastern Alps does not predict any simple relation between genetic structure and geographic distances. Accordingly, the correlation between geographic and Y chromosomal genetic distances is statistically insignificant  Table S4). In fact, average and median value of genetic distances between Alpine and open populations (0.095; 0.078) are substantially greater than between the latter (0.061; 0.061).
Using the same approach ( Figure 5B), we observed that the genetic differentiation of Ladins and German speakers from Europeans is comparable or even greater to that observed for well known continental outliers (see Table S4). In fact, the average value of Ladins (0.092) is higher than Sardinians (0.078), whereas their median Fst is slightly lower (0.075 vs 0.088). The signal is even stronger for the German speakers, whose average (0.144) exceeds that of Basques (0.121), whereas the two median values are rather close (0.111 vs 0.121) and 14.3% of Fst is above the upper bound of the range of genetic distances between Europeans and Basques. However, all these values are lower than those obtained for Finns (average 0.209; median 0.208) who are known to have undergone severe bottlenecks and further local episodes of drift [13].
As the final step of our study, we further extended our dataset by including other populations that have settled in great mountain range systems, from the Pyrenees (5) [51] and from South Tyrol (3) [27]. The results of the AMOVA (Table 2 and  Table S9) show that Y chromosome intra-group variation within human groups that have settled in mountainous environments is relatively high and statistically significant, the South Tyroleans being the only exception (see below). Not unexpectedly, this is in sharp contrast with the low and insignificant diversity observed among open populations settled on plains at comparable geographic distances (-0.003, p=0.555; see Table S9). Focusing on mountain populations, it turns out that Alpine groups host the greatest Y chromosome among-population diversity. Interestingly, this does not hold only for German speakers and Ladins, who are the only groups subject to both geographic and linguistic isolation, but even for Italians, who show the apparently weakest signals of genetic drift.
South Tyroleans provide an exception to the high and statistically significant Y-chromosome intra-group diversity of Alpine populations. A possible explanation for this finding comes from their particular social structure. In fact, since at least the early fourteenth century South Tyroleans have mostly complied with an inheritance and succession system known as Geschlossener Hof ("closed holding"), which entails an impartible transfer of the farm [52,53]. This system typically prescribes that only one son -generally the first born -takes over the economic unit consisting of the farmstead and the attached lands and succeeds into the position of a peasant house-father, while the other sons have the option to remain in the family farm as employees or to receive an economic compensation and relocate elsewhere [54,55,56,57]. Therefore, this practice may favor male dispersal, increasing the probability for sons other than the first born to marry far from the original community. Conversely, female mobility is less socially favored than in patrilocal groups. In the long term and under regimes of prevalent male mobility within the original groups, the Geschlossener Hof may lead to a pattern which is opposite to what would be expected for patrilocal groups. This is, in fact, the case of Tyrolean populations, who show only statistically significant intra-group variation for mtDNA polymorphisms (0.004, p=0.220 and 0.030 p=0 for Y chromosome and mtDNA respectively). Therefore, the Geschlossener Hof could have shaped intra-group variation of paternal lineages in the opposite way to "local ethnicity" [30]. If this was supported by further evidence, it would provide an In conclusion, the comparison between Y chromosomal and mitochondrial patterns of variation suggests that not only geographic factors and linguistic diversity, but also socially induced sex biased gene flow should be taken into account when studying the genetic structure of Alpine populations. We believe this is an important avenue for any future research work which aims to shed light on the yet to be explored complexity of the genetic structure of European populations.  Table 1 and Table S4  Values in brackets refer to minimum and maximum values obtained by jacknife procedure (see Table S9). Statistically insignificant values are in bold.
* This group is composed by three geographically close plain populations (Brescia, Treviso, Vicenza; see Table S4).