Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based FST-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the FST-outlier methods detected together 11 FST-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with FST-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation.
Citation: Scalfi M, Mosca E, Di Pierro EA, Troggio M, Vendramin GG, Sperisen C, et al. (2014) Micro- and Macro-Geographic Scale Effect on the Molecular Imprint of Selection and Adaptation in Norway Spruce. PLoS ONE 9(12): e115499. https://doi.org/10.1371/journal.pone.0115499
Editor: Roberto Papa, Università Politecnica delle Marche, Italy
Received: August 20, 2014; Accepted: November 19, 2014; Published: December 31, 2014
Copyright: © 2014 Scalfi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its Supporting Information files. Illumina SNP sequencing data are available from the Genbank database (ID: PiceaAbiesGoldenGate; accession numbers 1457253114 - 1457253446).
Funding: MS's study was financed by the project "PECC-Genetic and molecular analysis of Picea abies: Variability and adaptive evolution of the species under conditions of global change”, funded by the Autonomous Province of Trento (Italy), with the regulation N. 23, 2007. EM and EADP were supported by the ACE-SAP project "Alpine Ecosystems in a Changing Environment: Biodiversity Sensitivity and Adaptive Potential" partially funded by the Autonomous Province of Trento (Italy), with the regulation No. 23, June 12, 2008. GGV was supported by a grant of the European Commission through the FP7-project FORGER, “Towards the Sustainable Management of Forest Genetic Resources in Europe” (KBBE-289119). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Adaptation of forest tree species to their environment is of great interest in forest management, as climate change is considered to be a major threat to forest health and sustainability . Forest tree species of temperate and boreal regions have undergone a long history of demographic changes. During glacial maxima, many of these taxa were restricted to southern refugia, from where they expanded northwards during interglacials. Range contractions and expansions have been intensively studied using palaeobotanical and genetic approaches, demonstrating that past range changes were important determinants of the genetic structure of extant populations , , . Genetic structures are likely to have been influenced also by evolutionary adaptations, enabling populations to adapt to local environments. In fact, provenance trials and genecological studies have revealedphenotypic traits with clear clines along diverse environmental gradients, both across species ranges and at the local scale , . Yet, the underlying genes controlling adaptation remain poorly understood.
The development of forest tree genome sequences, single nucleotide polymorphisms (SNPs) databases and high-throughput genotyping platforms have facilitated the use of multi-locus scan approaches to identify loci involved in adaptation . Two main groups of methods are currently used to identify loci related to adaptation. A first group is based on population differentiation and provides tools to detect loci that show significantly higher FST values than neutral expectations –. The second group of methods is based on correlations between allele frequencies and environmental variables and can be used to detect selection along gradients or in heterogeneous environments . An important limitation of these methods is that they are sensitive to other evolutionary forces that can mimic selection, such as demographic history and population structure , .
Molecular studies in conifers, incorporating both population history and landscape features, have identified numerous loci, likely to be involved in adaptation –. These studies were mainly designed to investigate genetic diversity at the macro-geographic scale, i.e. across entire species ranges. Few studies have focused on a local scale, where gene flow is more effective and population structures are weak. On the other hand, gene flow can constrain adaptive divergence through homogenizing allele frequencies across space . Nevertheless, when selection pressure is high, local adaptation may occur also at the local scale. For example, tree populations along altitudinal gradients often show pronounced clines in phenotypic traits , and thus may be well suited for detecting adaptive loci.
Norway spruce (Picea abies L. Karst.) is a broadly distributed European conifer of great ecological and economic importance. Its range is divided into two major regions, a northern, boreal region and a central and south-eastern European region . In the southern region, Norway spruce mainly grows in mountains with widespread population occurrences found in the Alps, Carpathians, and Hercynia, the latter including the Bohemian massif and its surrounding mountains , . The biogeography of Norway spruce has been intensively studied using fossil pollen  and genetic markers –. Surveys of genetic variation consistently revealed two distinct genetic lineages, separating populations of the north from those of the south , , . Fossil pollen data combined with mitochondrial DNA data have shown that Norway spruce in the north is derived from a single large refugium, while in the south it persisted during the LGM in several distinct refugia . At the phenotypic level, several potentially adaptive traits have been identified, such as bud set, bud burst , , and shoot growth , with clear geographic clines along latitudinal and altitudinal gradients. Notably, a recent study of northern populations using SNPs in functional genes has identified several components potentially involved in the control of bud set . Other genes underlying local adaptation, however, remain unknown .
In this study, we focus on Norway spruce of central and south-easter Europe with the primary research goal of identifying adaptive loci through screening SNP markers at different geographic scales, taking into account population structures. SNP markers, representing 290 genes, were used to examine the role of genetic structure and environmental variation in shaping the distribution of species genetic variation and its adaptation. To achive this purpose, the sampling was designed at micro-geographic scale, where trees were sampled along two altitudinal gradients within the Alps and at macro-geographic scale, where trees were sampled in 27 natural populations across the southern range of Norway spruce. First, population structure was estimated to assess the possible presence of different genetic pools at micro- and macro-geographic scales. Second, to assess the role of selection in structuring genetic variation, we applied FST-outlier methods taking into account the population structure, and tested for correlations between allele frequencies and climatic variables at both geographic scales.
Materials and Methods
Norway spruce is a very common and not endangered tree species in Europe. For each tree, approximately 500 mg of needle tissue was sampled. No specific permissions were required for these locations/activities and we did not sample in any protected areas. The geographic coordinates are reported in Table 1 and Table 2.
The micro-geographic scale study included two altitudinal transects on south-west (Celentino-Pejo) and north (Mezzana) aspects in the Trentino province (Italy) (Table 1). Six populations were sampled along each transect, with each of the populations separated by 200 m of altitude. On average, 25 adult trees (60–70 years old) were sampled from each site, for a total of 300 trees (Table 1).
The macro-geographic scale sampling consisted of 27 putatively natural populations, distributed across the range of Norway spruce in central- and south-eastern Europe. Each population was represented by 15–24 individuals, for a total of 546 trees. Eight of the populations were sampled in the IUFRO 1964/68 provenance test  (Table 2). To compare the micro- with the macro-geographic scale study, the Mezzana site located at 1600 m a.s.l. was included in the macro-geographic investigation and more sites were sampled in the Alps. Total DNA was extracted from needles according to Doyle and Doyle  or using the DNeasy 96 Plant Kit or the DNeasy Plant Mini Kit (QIAGEN, Hilden, Germany) according the manufacturer's instructions.
In the micro-geographic scale study, two climatic variables were used. Ten years of average monthly mean temperature and monthly mean precipitation were obtained from the local spatial database  using climatic data collected from 1990 to 1999 by 64 weather-stations distributed in the Trentino province (Table 1).
In the macro-geographic scale study, we considered 19 bioclimatic variables, publicly available from the WorldClim - Global Climate Data (Free climate data for ecological modelling and GIS http://www.worldclim.org). Based on the species distribution and its ecological preferences, and to describe the sampling site climate, five bioclimatic variables were integrated in the analyses: mean annual temperature (bio01), temperature seasonality (bio04), mean temperature of the warmest quarter (bio09), mean temperature of the coldest quarter (bio11), and annual precipitation (bio12) (Table 2). Climatic data were collected from a 30 second GIS layer using Quantum GIS (Q-GIS) .
SNP discovery and genotyping
SNP discovery was based on Sanger re-sequencing of a panel of 12 unrelated trees using primers derived from almost 1000 loblolly pine expressed sequence tags (ESTs), representing genes having various biological functions (http://dendrome.ucdavis.edu/NealeLab/crsp/overview.php). DNA was extracted from the haploid megagametophyte, obtained from one seed per sampled tree. Individual sequence alignment and SNP identification were performed using PineSAP . A final set of 384 SNPs among those having quality design scores above 0.6, were selected for the genotyping, considering a maximum of two SNPs per locus, and preferring SNPs determining a change in predicted proteins (92 non-synonymous SNPs were selected). A total of 846 trees were genotyped. The SNP genotyping was performed at the Piattaforma Tecnologica Padana (Lodi, Italy) using the Illumina SNP bead array platform (Illumina, San Diego, USA) and the GoldenGate assay.
For each SNP, the percentage of individuals genotyped (call rate), the minor allele frequency (maf), the expected (HE) and observed (HO) heterozygosity, and the Wright's inbreeding coefficient (FIS) were calculated using the Genepop 18.104.22.168 program . To remove uncertain and rare SNPs, loci with call rates <90%, maf <1%, or absolute FIS values >0.25 were discarded. An individual call rate value was calculated for each sample, and samples with call rates lower than 95% were excluded.
For each geographic scale, values of genetic diversity among individuals (FIS), among populations (FST) and for the total population (FIT) were calculated for each locus using Genepop (S1B Supporting Material). With the same software, fixation index (FIS) statistics per population were calculated over all loci with the gene diversity among individuals within population (1-Qinter). Differences among populations were tested using the pairwise FST analysis in Arlequin 3.5 .
To identify FST outliers, both Bayesian and coalescent simulations were applied. The first method considers individual locus effect and specific population, focusing on a genome scan for positive and balancing selection, as implemented in BayeScan 2.1 . The method tests two alternative models and assigns a Bayes factor to each locus. We used a prior odd equal 10 and a false discovery rate FDR = 0.001. The second method proposed by Excoffier  assumes two possible situations: an equal probability of migration between populations (finite island model) and the presence of structured populations (hierarchical island model). Both approaches were applied twice: using populations assigned according to their geographic position and according to STRUCTURE clustering.
To identify loci with extreme correlations between allele frequencies and climatic variables, regression analyses were carried out. We used linear regression models where the dependent variable was the arcsine-transformed major allele frequency (MAF) of each SNP, and the independent variables were climatic variables, ancestry coefficients, and an error term. The ancestry coefficients that describe the population structure were included as covariates, as suggested by Korves . In the micro-geographic study, each SNP was tested in three models, considering the mean temperature, mean precipitation, and mean temperature and precipitation combined as independent variables (S1 Table). In the macro-geographic study, each SNP was tested in 9 models. Four models included a temperature variable (bio01, bio04, bio09, or bio11), one model the precipitation variable (bio12), and four models a temperature variable plus the precipitation variable (S1 Table). For each SNP, the model showing the minimum Akaike's Information Criterion (AIC) was selected as the best fit of the data. Using this model, the proportion of SNP variation explained by the climatic variables was estimated. FDR-corrected P-values (Q-values) were estimated using the software Q-value  implemented in R . Correlations with Q-value <0.05 were considered as significant.
Patterns of population structure were analysed by principal component analysis (PCA) and by Bayesian cluster analysis. To further characterize population structures, hierarchical F-statistics and analysis of molecular variance (AMOVA) were applied . The PCA was performed on the normalized genotypic data matrix. To identify the top k significant PCs, each PC eigenvalue was standardized and compared to the Tracy-Widom distribution (TW statistics) . A significance cut-off of 5% was used to determine the significant PCs representing population structure. Then, hierarchical fixation indices were calculated from variance components according to Yang  as applied in the HIERFSTAT library  in R. Bayesian cluster analysis was performed on the SNP data matrix using the program STRUCTURE ver.2.2  on Bioportal (www.bioportal.uio.no). STRUCTURE runs were performed with a Markov Chain Monte Carlo (MCMC) burn in of 500,000 steps, followed by an MCMC of 600,000 steps. An admixture model was used in the simulations. Each analysis was replicated 10 times for each K, with K ranging from 1 to 12 and from 1 to 30 at the micro- and macro-geographical scale, respectively. The best K was assigned using the log likelihood value, and populations were assigned to each genetic cluster considering the assignment of the majority of individuals within each population.
To investigate partitioning of genetic variation at different hierarchical levels, and to corroborate the results obtained with HIERFSTAT and STRUCTURE, an AMOVA analysis was performed at both geographic scales, assuming the presence of four genetic groups (see Results) at the macro-geographic scale, and two groups (transects) at the micro-geographic scale. The AMOVA was performed using Arlequin software .
The 384 SNPs considered represent 290 genes (S1A Supporting Material) encoding proteins with various biological functions. Among those SNPs, 41 failed to amplify, 63 were monomorphic in all samples, and 54 SNPs (micro-geographic scale) and 43 (macro-geographic scale) did not pass the quality control. A total of 226 SNPs across 224 genes (micro-geographic scale) and 237 SNPs across 247 genes (macro-geographic scale) were successfully genotyped.
At the micro-geographic scale, the overall genetic diversity (considering all SNPs together) expressed as observed heterzygosity (HO) per population ranged from 0.230 (M18) to 0.259 (C14) with a grand mean of 0.248 (± SD = 0.168) (Table 1). At the macro-geographic scale (Table 2), HO was between 0.206 (population X141) and 0.243 (X267) with a grand mean of 0.229 (± SD = 0.159).
No significant differences were found in FST value between population pairs at the micro-geographic scale (S2A Table). At the macro-geographic scale, several population pairs had a significant FST-value (P<0.0001) according to the permutation test (S2B Table) and the FST values were between 0.012 (X350 and S1U) and 0.680 (BOE and MN).
FST values were calculated for each locus at both geographic scales following Weir and Cockerham . At the micro-geographic scale, no outliers were detected using either the Bayesian simulation or considering the neutral island model.
At the macro-geographic scale, FST-values calculated among populations varied between -0.008 and 0.36 (S1B Supporting Material; mean FST = 0.024). The BayeScan simulation was run twice: using populations assigned according to their geographic position and according to STRUCTURE clustering. The first simulation detected 8 outlier loci (Table 3 and S1A Fig.). The outlier with the highest FST-value (0.234), SNP locus 2_10483_01-340, encodes a haloacid dehalogenase-like hydrolase and was detected only in Alpine populations (S2 Fig.). The other SNP loci encode a sucrose synthase (CL813Contig1_03), a transcription factor (0_10267_01), translation-elongation factor (0_8642_01), UBX domain-containing protein (0_9922_01), acyl-CoA thioesterase (2_8491_01), acetyltransferase component (CL866Contig1_01) and an unknown protein (2_5073_01-321). BayeScan simulations taking into account the four STRUCTURE clusters identified a single outlier (1_3086_01-101; FST = 0.128; S1B Fig.) that was not detected in the first simulation (S1A Fig.). This locus encodes a protein of unknown function (Table 3).
Among the outliers detected by coalescent simulations assuming a neutral island model, five were highly significant (P<0.0001; Table 3). Three of them (2_10483_01-340, CL813Contig1_03-235, 1_3086_01-101) overlapped with those identified by BayeScan. The other two encode an ovule receptor-like kinase (0_12021_01) and a protein with unknown function (CL4578Contig1_02). Simulations considering the four STRUCTURE clusters identified the same five outliers and an additional outlier (UMN_4091_02-458), with its locus encoding an F-box family protein.
Associations between allele frequencies and climatic variables were analysed with linear regression models taking into account ancestry coefficients. At the micro-geographic scale, the 227 SNPs were analysed with three models, including either temperature, precipitation, or temperature and precipitation combined. Overall, two SNP loci showed a significant correlation with climate (Table 4; S2A Supporting Material). One of them (2_5636_01), encoding a pentatricopeptide repeat-containing protein, was correlated with annual mean temperature, and the other (2_9466_01-179) with mean temperature and precipitation combined. The latter locus was shared with the macro-geographic scale study. The amount of its frequency variation explained by climate was 83%.
For each of the 237 SNPs used in the macro-geographic scale study, 9 models were applied, which included one or two climatic variables related to temperature or precipitation (S1 Table). The analyses resulted in the identification of 38 SNPs significantly correlated (at Q<0.05) with either temperature, precipitation, or temperature and precipitation variables combined (Table 4, S2B Supporting Material). Twelve loci were significant determinants (Pvar<0.01) in the model. The amount of SNP frequency variation explained by climatic variables ranged from 37% (CL3363Contig1_04-85) to 72% (2_9466_01-179), the latter locus encoding a hypothetical protein. Six of the identified loci overlapped with FST-outliers (0_9922_01-345, 2_8491_01-519, 0_12021_01-161, 0_8642_01-166, 2_10483_01-340, CL813Contig1_03-235).
To examine patterns of population structure, PCA and Bayesian clustering were applied at both geographic scales. At the micro-geographic scale, PCA identified only one significant PC according to the TW statistics. The absence of population stratification was confirmed by HIERFSTAT and STRUCTURE analyses. The average level of genetic differentiation among sampling sites was extremely low, both between transects (Ftransect/total = 0.0002, P = 0.587) and between sampling sites within transects (Fsampling_site/transect = 0.0018, P = 1) (S3A Table), and no clusters were identified using STRUCTURE (Fig. 1A). A further confirmation of the lack of structure at the micro-geographic scale was provided by the AMOVA (S3B Table).
Log likelihood value (Ln(Pr(X|K)) of Pritchard plot is shown for micro and macro-geographic scales(A). Macro-geographic populations clustering according to the Bayesian method implemented in STRUCTURE (B). The population dot colours represent the cluster that includes the majority of individuals within populations. The species distribution range is in green (created using Q-GIS based on description from ).
At the macro-geographic scale, the PCA showed a significant population structure (S3A Fig.). Three PCs, explaining 13.1% of the total variance, were significant at the 5% threshold. The first PC was significantly correlated with longitude (r2 = 0.11; P = 1.9e−15) and distinguished between populations of the Alps from all other populations. The second PC highlighted the peculiarity of a population (MN) located in Montenegro in the Dinaric Alps; it represented the southernmost population included in the study, and explained the correlation between PC2 and latitude (R2 = 0.12, P<2.2e−16). The Bayesian cluster analysis with STRUCTURE detected four clusters (Fig. 1B). The population structure was similar to that of the PCA, but the populations of the Carpathians were separated from those of Hercynia. All but one of the populations of the Carpathians formed a first cluster. The second cluster included 11 of the 16 populations of the Alps. The third cluster was characterised by populations of Hercynia and included the five remaining populations of the Alps. As the PCA, Bayesian clustering assigned the Montenegro population to a separate cluster. To further characterize the population clustering, each population was assigned to the cluster that includes the majority of the samples, and the percent of variation among the four clusters was calculated with AMOVA (S3B Table). The analysis revealed that a very low (1.54%), but highly significant (P<0.0001) portion of the total variation is explained by differences among clusters, as confirmed by the F-statistics analysis (S3A Table).
This research confirms the findings from previous studies describing the genetic structure of Norway spruce at the European level and highlights the importance of integrating the effects of demography in outlier detection studies. The experimental design we used (micro- and macro-geographic scale) and the application of different approaches in the data analysis provided new insights into the underlying genes that may be responsible for local adaptation. Some potential adaptive loci were found to be associated to temperature at both geographic scales, confirming the importance of this factor in driving adaptation in forest species.
Signature of adaptation
Theoretical  and empirical  studies show that the demographic history can inflate the detection of FST outliers. In the macro-geographic scale study, 13% of the variance was explained by population structure, presumably due to population demographic changes. Consequently, both BayeScan and Arlequin simulations were carried out with and without taking into account the population structure. BayeScan simulations without population structure correction detected eight outliers, whereas only one locus was detected with structure correction. The simulation considering the hierarchical island model detected seven outlier loci, with two of them being found only with this method. Altogether, seven FST outliers were identified taking into account the population structure, corresponding to 2.95% of the SNPs tested. This discovery rate is comparable to that observed in other conifers using similar approaches , , , including black spruce (Picea mariana), where few SNPs were identified as outliers, and only within a specific lineage .
To account for variation along clines due to demographic processes and/or selection, population structure was included as a covariate in the linear regression models. In the 27 populations analysed in the macro-geographic scale study, 38 SNP loci were significantly correlated with temperature and/or precipitation variables, corresponding to a 16% discovery rate. A greater ratio (22%) was detected in loblolly pine (Pinus taeda) using a different model . The majority of these SNP loci showed significant correlations in models with temperature variables, consistent with results observed in a lodgepole pine (Pinus contorta Dougl. ex Loud) field transplant experiment  and in other coniferous species , suggesting that temperature is a significant force in shaping genetic diversity.
In the micro-geographic scale study, the number of loci potentially involved in adaptation was much smaller: no FST-outliers were detected and the regression analyses identified only two SNPs significantly correlated with climatic variables. This finding was unexpected, because Alpine slopes are highly variable environments, where small changes in altitude can lead to significant variation in temperature, humidity and soil composition . Adaptation of populations to such environments is likely to result in genetic clines associated with altitude. The average differentiation between populations distributed along the two altitudinal transects was very low, comparable to that previously reported for Norway spruce at the local scale , and is in accordance with estimates of gene flow described for tree-line ecotones . This low differentiation was confirmed by Bayesian and PCA clustering, which both revealed absence of population structure. Assuming high levels of gene flow, it seems likely that gene flow constrained the effects of selection, at least to some extent. On the other hand, Norway spruce populations growing along altitudinal gradients typically show clear clines in growth and timing of bud set, indicating strong diversifying selection . In our study, we tested only a limited number of loci, whose selection was largely based on quality scores derived from the original sequence data, rather than on functional annotations. Identification of loci involved in growth and bud set control would require analysis of more genes.
The higher rate of locus discovery at the macro- than at the micro-geographical scale may also be a result of spatial heterogeneity in selection regimes. In particular, the Alps, Carpathians and the Bohemian massif differ considerably in topography and their continental location, and thus are characterised by distinct climatic conditions. Notably, quantitative traits assessed in provenance trials revealed clear differences among populations of the Alps, Carpathians, and Hercynia, supporting different selection regimes for these areas . It is therefore likely, that heterogeneity in selection regimes contributed to the signatures of selection we identified at the macro-geographic scale.
Putative adaptive SNPs
The six SNP loci that were identified by both correlation-based and FST-outlier analyses were considered as ‘putative adaptive loci’. The locus with the highest FST value (2_10483_01) encodes a haloacid dehaolgenase-like hydrolase, an enzyme with a putative function in the biosynthetic pathway of the vitamin riboflavin, playing a role in a variety of redox processes affected in plant defence responses . The SNP was significantly correlated with combined temperature and precipitation, and was only found in a subset of Alpine populations, further supporting its potential role in adaptation. At the micro-geographic scale, the frequency of its particular allele was very low, possibly explaining why this locus was not detected as an FST-outlier. An additional candidate locus (CL813Contig1_03) encodes a sucrose synthase, an enzyme of the primary metabolism and responsible for energy supply. The expressed sequence tag was isolated from Aleppo pine (Pinus halepensis) and was shown to be induced by water stress , consistent with our finding, that the SNP was correlated with annual precipitation. Allelic changes in enyzmes of the primary metabolism are a general response of plants to stress , and in the case of sucrose synthase a function in water stress tolerance has been proposed . A third locus (2_8491_01) encodes an acyl-CoA thioesterase and was associated to annual mean temperature. This enzyme catalyses the hydrolysis of acyl-CoAs to free-fatty-acid and coenzyme A, and thus regulates the intracellular levels of acyl-CoAs and free-fatty-acids. In white spruce (Picea glauca), acetoacetyl-CoA thiolase was demonstrated to be involved in the up-regulation of transcripts in response to stress . The remaining three loci were a translation elongation factor, an UBX domain-containing protein, and an ovule receptor-like kinase protein.
Since genetic structure was detected only at macro-geographic scale, we assumed that both altitudinal transects sampled at micro-geographic scale belonged to the same gene pool. No structure effects at micro-geographic scale were observed.
At the macrogeographic scale, the analysis of population structure using SNPs of functional genes revealed three major clusters, which were largely congruent with those delineated in previous studies . The most detailed information about the glacial and postglacial history of Norway spruce has been provided by the combined analysis of fossil pollen and mitochondrial DNA , . The data indicate that Norway spruce of the southern part expanded out of three major refugia, giving rise to populations in the Alps, Hercynia, and the Carpathians. The cluster with Alpine populations, identified by both PCA and Bayesian clustering, corresponds to a mitochondrial lineage derived from a refugium probably located in the south-eastern Alps. Populations of Hercynia and the Carpathians were delineated only by Bayesian clustering, and probably corresponds to mitochondrial lineages derived from refugia located in the southern Bohemian massif and Carpathians. Both PCA and Bayesian clustering assigned the Montenegro population to a separate cluster. Compared to other populations, its population differentiation was quite high, which may be a result of a distinct glacial history and/or its occurrence at the southern range limit. Norway spruce in the southern Dinaric Alps typically occurs in scattered populations , which may promote genetic drift and thus population differentiation.
In this study, we confirmed the confounding effect of genetic structure in the detection of outlier loci (see previous section). Therefore, the estimation of species genetic structure is a crucial step in the identification of adaptive loci, as previously reported ; ; .
This study indicates that genetic diversity of Norway spruce was shaped by both demographic and evolutionary processes, confirming the population structure identified with other marker types, but inferred from a much lower number of loci. The structure results were taken into account in the detection of selection and adaptation signs at the molecular level. The combined analyses of FST-outliers and environmental associations led to the identification of several potential adaptive genes and corroborates previous suggestions that temperature is an important factor in shaping genetic diversity in conifers. A strong relation was found between genetic structure and environmental variables but this correlation does not allow the identification of the physiological function affected by the environmental factor. Therefore, in future studies it is crucial to complement genetic studies with transplant experiments, where the phenotypic variation or the effect of an environmental stress could be assessed. Finally, our original aspect of sampling at different spatial scales allowed us to provide insights into the effects of gene flow on local adaptation. Moreover, our results highlighted the importance of combining different approaches to investigate species adaptation , .
BayeScan results at macro-geographic scale: populations assigned according to their geographic position (A) and according to STRUCTURE clustering (B).
Plots of some loci significantly associated with bioclimatic variables; colours identify the locus minor allele frequency (m.a.f.) within each population.
Plot of the first two significant principal components (PCs) at micro-geographic scale: one cluster was identified. (A). Plot of the two first PCs at the macro-geographic scale (B). Population labels are coloured according to the populations ID. Eingvalues for all PCs are in the bar plots.
Models used for the regression analysis at the micro- and the macro-geographic scale. The letter “A” represents the population structure. The following variables were used: major allele frequency (MAF) with the arcsin transformation (asin(MAF)), annual mean temperature (T, bio01), annual precipitation (P, bio12), temperature seasonality (bio04), mean temperature of driest quarter (bio09) and mean temperature of coldest quarter (bio11).
Pairwise FST between population-pairs at micro- (A) and macro-geographic scale (B). Population ID is described in Table 1 and Table 2. Values in bold are significantly different (P-value <0.0001) according to a permutation test (N = 1000).
Analysis of variance at micro- and macro-geographic scales. F statistics were calculated at different levels using HIERFSTAT library in R: between transects or among clusters, among populations, among samples (A). AMOVA analysis calculated using Arlequin at micro- and macro-geographic scales. Fixation indexes statistically significant (*** P<0.000) (B).
SNP position and locus ID with BLAST-N (A). FIT, FST and FIS values calculated per locus at both scales (B). (XLS)
Regression model analysis at micro- (A) and macro-geographic scale (B).
The authors want to thank the following people for the kind help in collecting the material for this study: Luca Bronzini (Panstudio), Fabio Angeli (Forest Services of Trentino Province) and Emanuele Endrizzi (Fondazione Edmund Mach). We also thank Felix Gugerli (WSL) for providing DNA of the populations D2U, S1U, and S3U.
Conceived and designed the experiments: MS MT GGV NLP DBN. Performed the experiments: MS. Analyzed the data: MS EM EADP. Contributed reagents/materials/analysis tools: MS EM EADP MT CS GGV NLP DBN. Wrote the paper: MS EM EADP MT CS GGV NLP DBN. Provided material and collected the data: MT CS GGV.
- 1. Lindner M, Maroschek M, Netherer S, Kremer A, Barbati A, et al. (2010) Climate change impacts, adaptive capacity, and vulnerability of European forest ecosystems. Forest Ecol Manag 259:698–709.
- 2. Cheddadi R, Vendramin GG, Litt T, Francois L, Kageyama M, et al. (2006) Imprints of glacial refugia in the modern genetic diversity of Pinus sylvestris. Global Ecol Biogeogr 15:271–282.
- 3. Magri D, Vendramin GG, Comps B, Dupanloup I, Geburek T, et al. (2006) A new scenario for the Quaternary history of European beech populations: palaeobotanical evidence and genetic consequences. New Phytol 171:199–221.
- 4. Tollefsrud MM, Sønstebø JH, Brochmann C, Johnsen Ø, Skrøppa T, Vendramin GG (2009) Combined analysis of nuclear and mitochondrial markers provide new insight into the genetic structure of North European Picea abies. Heredity 102:549–562.
- 5. Savolainen O, Pyhajarvi T, Knurr T (2007) Gene flow and local adaptation in trees. Annu Rev Ecol Evol Syst 38:595–619.
- 6. Alberto FJ, Aitken SN, Alía R, González-Martínez SC, Hänninen H, et al. (2013) Potential for evolutionary responses to climate change – evidence from tree populations. Global Change Biol 19:1645–1661.
- 7. Neale DB, Kremer A (2011) Forest tree genomics: growing resources and applications. Nature Reviews Genetics 12:111–122.
- 8. Beaumont M, Nichols R (1996) Evaluating loci for use in the genetic analysis of population Structure. Proceedings of the Royal Society B: Biological Sciences 263:1619–1626.
- 9. Beaumont M, Baldin DJ (2004) Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol 13:969–980.
- 10. Foll M, Gaggiotti O (2006) Identifying the environmental factors that determine the genetic structure of populations. Genetics 174:875–891.
- 11. Excoffier L, Hofer T, Foll M (2009) Detecting loci under selection in a hierarchically structured population. Heredity 103:285–298.
- 12. Coop G, Witonsky D, Di Rienzo A, Pritchard JK (2010) Using environmental correlations to identify loci underlying local adaptation. Genetics 185:1411–1423.
- 13. Nielsen R, Hubisz MJ, Hellman I, Torgerson D, Andres AM, et al. (2009) Darwinian and demographic forces affecting human protein coding genes. Genome Research 19:838–849.
- 14. Namroud MC, Beaulieu J, Juge N, Laroche J, Bousquet J (2008) Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce. Mol Ecol 17:3599–3613.
- 15. Eckert AJ, Wegrzyn JL, Pande B, Jermstad KD, Lee JM, et al. (2009) Multilocus patterns of nucleotide diversity and divergence reveal positive selection at candidate genes related to cold hardiness in coastal Douglas fir (Pseudotsuga menziesii var. menziesii). Genetics 183:289–298.
- 16. Eckert AJ, Bower AD, González-Martínez SC, Wegrzyn JL, Coop G, et al. (2010a) Back to nature: ecological genomics of loblolly pine (Pinus taeda, Pinaceae). Mol Ecol 19:3789–3805.
- 17. Eckert AJ, Heerwaarden van J, Wegrzyn JL, Nelson CD, Ross-Ibarra J, et al. (2010b) Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). Genetics 185:969–982.
- 18. Grivet D, Sebastiani F, Alia R, Bataillon T, Torre S, et al. (2011) Molecular footprints of local adaptation in two Mediterranean conifers. Mol Biol Evol 28(1):101–116.
- 19. Chen J, Källman T, Ma X, Gyllenstrand N, Zaina G, et al. (2012) Disentangling the roles of history and local selection in shaping clinal variation of allele frequencies and gene expression in Norway spruce (Picea abies). Genetics 191:865–881.
- 20. Mosca E, Eckert AJ, Di Pierro EA, Rocchini D, La Porta N, et al. (2012) The geographical and environmental determinants of genetic diversity for four alpine conifers of the European Alps. Mol Ecol 21:5530–5545.
- 21. Prunier J, Gérardi S, Laroche J, Beaulieu J, Bousquet J (2012) Parallel and lineage-specific molecular adaptation to climate in boreal black spruce. Mol Ecol 21:4270–4286.
- 22. Mosca E, González-Martínez SC, Neale DB (2014) Environmental versus geographical determinants of genetic structure in two subalpine conifers. New Phytol 201:180–192.
- 23. Kremer A, Ronce O, Robledo-Arnuncio JJ (2012) Long-distance gene flow and adaptation of forest trees to rapid climate change. Ecology Letters 15:378–392.
- 24. Schmidt-Vogt H (1974) Die Fichte. Verlag Paul Parey, Hamburg, Germany.
- 25. Schmidt-Vogt H (1974) Das natürliche Verbreitungsgebiet der Fichte (Picea abies [L.] Karst.) in Eurasien. Allgemeine Forst- und Jagdzeitung 145:185–197.
- 26. Latałowa M, van der Knaap WO (2006) Late Quaternary expansion of Norway spruce Picea abies (L.) Karst. in Europe according to pollen data. Quaternary Sci Rev 25:2780–2805.
- 27. Lagercrantz U, Ryman N (1990) Genetic structure of Norway spruce (Picea abies): concordance of morphological and allozymic variation. Evolution 44:38–53.
- 28. Lockwood JD, Aleksic J, Zou J, Wang J, Liu J, Renner SS (2013) A new phylogeny for the genus Picea from plastid, mitochondrial, and nuclear sequences. Mol Phylogenet Evol 69(3):717–727.
- 29. Bucci G, Vendramin GG (2000) Delineation of genetic zones in the European Norway spruce natural range: preliminary evidences. Mol Ecol 9:923–934.
- 30. Sperisen C, Büchler U, Gugerli F, Brunner L, Brodbeck S, et al. (2001) Tandem repeats in plant mitochondrial genomes: application to the analysis of population differentiation in the conifer Norway spruce. Mol Ecol 10:257–256.
- 31. Heuertz M, De Paoli E, Källman T, Larsson H, Jurman I, et al. (2006) Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway spruce [Picea abies (L.) Karst]. Genetics 174:2095–2105.
- 32. Tollefsrud MM, Kissling R, Gugerli F, Johnsen Ø, Skrøppa T, et al. (2008) Genetic consequences of glacial survival and postglacial colonization in Norway spruce: combined analysis of mitochondrial DNA and fossil pollen. Mol Ecol 17:4134–4150.
- 33. Sogaard G, Johnsen O, Nilsen J, Junttila O (2008) Climatic control of bud burst in young seedlings of nine provenances of Norway spruce. Tree Physiol 28:311–320.
- 34. Olsson C, Bolmgren K, Lindstrom J, Jonsson AM (2013) Performance of tree phenology models along a bioclimatic gradient in Sweden. Ecol Model 266:103–117.
- 35. Skroppa T, Magnussen S (1993) Provenance variation in shoot growth components of Norway spruce. Silvae Genet 42:111–120.
- 36. Gömöry D, Longauer R, Hlasny T (2011) Adaptation to common optimum in different populations of Norway spruce (Picea abies Karst.). Eur J For Res 131:401–411.
- 37. Krutzsch P (1974) The IUFRO 1964/68 Provenance Test with Norway spruce (Picea abies (L.) Karst.) Silvae Genet. 23 (1–3):58–62.
- 38. Doyle JJ, Doyle JL (1990) Isolation of plant DNA from fresh tissue. Focus 12:13–15.
- 39. Sboarina C, Cescatti A (2004) Il clima del Trentino. Distribuzione spaziale delle principali variabili climatiche. Report 33. Centro di Ecologia Alpina, Trento.
- 40. Quantum GIS Development Team (2009) Quantum GIS Geographic Information System. Open Source Geospatial Foundation Project. Available: http://qgis.osgeo.org.
- 41. Wegrzyn JL, Lee JM, Liechty J, Neale DB (2009) PineSAP – sequence alignment and SNP identification pipeline. Bioinformatics 25:2609–2610.
- 42. Rousset F (2008) GENEPOP'007: a complete reimplementation of GENEPOP software for windows and Linux. Mol Ecol Res 8:103–106.
- 43. Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux andWindows. Mol Ecol Res 10:564–567.
- 44. Excoffier L, Hofer T, Foll M (2009) Detecting loci under selection in a hierarchically structured population. Heredity 103:285–298.
- 45. Korves TM, Schmid KJ, Caicedo AL, Mays C, Stinchcombe JR, et al. (2007) Fitness effects associated with the major flowering time gene FRIGIDA in Arabidopsis thaliana in the field. Am Nat 169:141–157.
- 46. Dabney A, Storey JD, Warnes GR (2012) qvalue: Q-value estimation for false discovery rate control. R package version 1.32.0.
- 47. R Development Core Team (2012) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
- 48. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909.
- 49. Yang RC (1998) Estimating hierarchical f-statistics. Evolution 52:950–956.
- 50. Goudet J (2005) Hierfstat, a package for R to compute and test hierarchical F-statistics. Mol Ecol Notes 5:184–186.
- 51. Pritchard J, Stephens M, Donnelly P (2000) Inference of population structure using multi-locus genotype data. Genetics 155:945–959.
- 52. Weir BS, Cockerham C (1984) Estimating F-Statistics for the Analysis of Population Structure. Evolution 38 6:1358–1370.
- 53. Nei M, Maruyama T (1975) Lewontin-Krakauer test for neutral genes. Genetics 80:395.
- 54. Wang T, O'Neill GA, Aitken SN (2010) Integrating environmental and genetic effects to predict responses of tree populations to climate. Ecol Appl 20(1):153–163.
- 55. Körner C (2003) Alpine Plant Life - Functional plant ecology of high mountain ecosystem. 2nd edition, Springer, Heidelberg.
- 56. Maghuly F, Pinsker W, Praznik W, Fluch S (2006) Genetic diversity in managed subpopulations of Norway spruce [Picea abies (L.) Karst.]. Forest Ecol Manag 222:266–271.
- 57. Piotti A, Leonardi S, Piovani P, Scalfi M, Menozzi P (2009) Spruce colonization at treeline: where do those seeds come from? Heredity 103:136–145.
- 58. Collignon AM, Van de Sype H, Favre JM (2002) Geographical variation in random amplified polymorphic DNA and quantitative traits in Norway spruce. Can J For Res 32:266–282.
- 59. Zhang S, Yang X, Sun M, Sun F, Deng S, et al. (2009) Riboflavin-induced priming for pathogen defense in Arabidopsis thaliana. J Integr Plant Biol 51:167–174.
- 60. Loopstra CA, Sathyan P (2004) Genes induced by water-deficit-stress are differentially expressed in two populations of aleppo pine (Pinus halepensis). Submitted (AUG-2004) to the EMBL/GenBank/DDBJ databases
- 61. Rolland F, Moore B, Sheen J (2002) Sugar sensing and signaling in plants. Plant Cell 14 Suppl: S185–205
- 62. Ruan YL, Jin Y, Yang YJ, Li GJ, Boyer JS (2010) Sugar input, metabolism, and signaling mediated by invertase: roles in development, yield potential, and response to drought and heat. Mol Plant 3:942–955.
- 63. Bedon F, Bomal C, Caron S, Levasseur C, Boyle B, et al. (2010) Subgroup 4 R2R3-MYBs in conifer trees: gene family expansion and contribution to the isoprenoid- and flavonoidoriented responses. J Exp Bot 61:3847–3864.
- 64. Keller SR, Levsen N, Olson MS, Tiffin P (2012) Local Adaptation in the Flowering-Time Gene Network of Balsam Poplar, Populus balsamifera L. Mol Biol Evol 29 (10):3143–3152.
- 65. Prunier J, Laroche J, Beaulieu J, Bousquet J (2011) Scanning the genome for gene SNPs related to climate adaptation and estimating selection at the molecular level in boreal black spruce. Mol Ecol 20:1702–1716.
- 66. Storey JD, Tibshirani R (2003) Statistical significance for genome-wide experiments. P Natl Acad Sci 100:9440–9445.