Assessing The Spatial Dependence of Adaptive Loci in 43 European and Western Asian Goat Breeds Using AFLP Markers

Background During the past decades, neutral DNA markers have been extensively employed to study demography, population genetics and structure in livestock, but less interest has been devoted to the evaluation of livestock adaptive potential through the identification of genomic regions likely to be under natural selection. Methodology/Principal findings Landscape genomics can greatly benefit the entire livestock system through the identification of genotypes better adapted to specific or extreme environmental conditions. Therefore we analyzed 101 AFLP markers in 43 European and Western Asian goat breeds both with Matsam software, based on a correlative approach (SAM), and with Mcheza and Bayescan, two FST based software able to detect markers carrying signatures of natural selection. Matsam identified four loci possibly under natural selection – also confirmed by FST-outlier methods – and significantly associated with environmental variables such as diurnal temperature range, frequency of precipitation, relative humidity and solar radiation. Conclusions/Significance These results show that landscape genomics can provide useful information on the environmental factors affecting the adaptive potential of livestock living in specific climatic conditions. Besides adding conservation value to livestock genetic resources, this knowledge may lead to the development of novel molecular tools useful to preserve the adaptive potential of local breeds during genetic improvement programs, and to increase the adaptability of industrial breeds to changing environments.


Introduction
Neutral DNA markers have been extensively employed, during the last decades, to infer population genetics parameters, population structure and demographic trends, both in wildlife and livestock species [1][2][3]. Much scientific interest is now focused on investigating adaptive genetic variation [4][5] and on identifying genomic regions likely to be under selection [6][7][8][9]. So far, several methods have been proposed [5][6][10][11][12][13]: some are based on candidate gene approaches which test whether or not a specific locus is a true target of selection by means of a number of different statistical methods [14][15]; others are designed to identify chromosomal regions affecting the phenotypes of complex adaptive traits (e.g. disease resistance), by measuring the associ-ation between different genotypes and the phenotype of interest [16].
The population genomics approach [4] searches for selection signatures by analyzing the variation of genetic diversity parameters along chromosomes, to discriminate between genomic regions under locus-specific (selection) and genome-wide (genetic drift, inbreeding and migration) effects [17]. The major limitation of this approach, however, is that it is blind respect to the causative selection forces. Signatures of selection for adaptive traits can be partially targeted by properly designing the experiment (e.g. contrasting groups of breeds reared in different environmental conditions), but the disentanglement of the effects linked to specific environmental variables remains impossible.
The approach based on spatial analysis and multiple parallel logistic regressions [9] offers a different perspective. Based on the geographical coordinates of the sampling locations, a set of environmental variables describing these locations are regressed on the genetic profile of each individual in search for robust association. The association between alleles and climatic variables is suggestive of signatures of adaptation and gives information on the environmental forces acting on the genome. This approach was named landscape genomics [9,[18][19] and permits to seek genomic regions influencing the ability of animals to cope with climatic variations. It is fast and integrates spatial heterogeneity, but it also makes implicit assumptions that have to be taken into account, among which the main ones are that a) the functional relationship between the spatial distribution of alleles and the environmental variables is assumed to be constant (this might not be the case if the environment has changed over time), and that b) selection has had enough time to create a functional relationship between the allele distributions and the environmental variables [20]. However, landscape genomics can potentially benefit the entire livestock system through the identification of better adapted genotypes, hence more suitable to rear in specific environmental contexts. This, in turn, can significantly reduce the costs required to counter the consequences of climate changes on animal production systems.
So far, this type of analysis has been rarely applied to livestock breeds (sheep [9], goats [21], cattle [22]), while much research has been devoted to wild animal and plant species (see [18] and references therein). Most livestock genetics and genomics research is still focused on production traits and industrial breeds, while the characterization, the evaluation of genetic resources and the promotion of their sustainable use are receiving less attention, even if local breeds are a mean to provide income to a very large number of resource-poor smallholders.
The domestic goat is one of the five most important livestock species worldwide and plays an important role in the livelihood of a large proportion of small and marginal farmers and landless shepherds.
Goat genetic resources comprise 627 described breeds, distributed throughout the continents and representing a unique reservoir of genetic diversity [23]. Among these, about 400 are native to Asia (176) and Europe (195), including the Caucasus. Africa has the highest number of regional transboundary populations (15 out of 46 worldwide) followed by Europe (12) and Asia (11). About 40 breeds have been registered as international transboundary or cosmopolite.
The sustainability of goat farming in marginal and difficult areas is strongly dependent on local breeds, well adapted to specific agro-ecological conditions (e.g. harsh climate, tropical diseases, poor nutrition and beverage) and which represent, in both developed and developing countries, an invaluable source of meat, milk and skin [24]. At present, indiscriminate crossbreeding with cosmopolitan breeds and uncontrolled intermixing are threatening a number of local caprine genetic resources. In order to conserve the diversity of goat germplasm -now recognized as an issue of international concern -the characterization of local breeds molecular diversity and adaptive potential is urgently needed to serve as a rational basis for conservation strategies and sustainable breeding.
With this paper, our intention is to contribute to the understanding of the environmental forces acting on the goat genome, by applying both landscape genomic and F ST -outlier approaches to a set of presumably neutral AFLP markers genotyped in 43 European and Southwestern Asian goat breeds.

Ethics statement
All experimental procedures were approved by the European Commission, in accordance with the EU Directive 86/609. The EU Econogene contract was issued within FP5 (Framework Programme 5) in 2001. At that time within the EU there was no dedicated animal welfare/ethics committee to evaluate and approve issues specifically related to these aspects of research projects. The permission to carry out the sampling at each farm was obtained directly from the owners. Blood samples have been collected by veterinarians, complying with relevant national and international regulations and animal welfare requirements. The sampling strategy consisted in a balanced effort of collecting information from a list of selected key breeds while ensuring spatial representativeness to prevent from over-representing any environmental condition. Thus peripheral blood samples were collected from 1239 animals belonging to 43 European and Southwestern Asian goat breeds (Table 1). Between 28 and 33 unrelated individuals were sampled in 10 farms covering the area of origin and the present distribution of breeds (Table 1; Figure S1). A maximum of 3 individuals per farm were sampled to reduce the relatedness among animals and to increase the breed representativeness. There are however exceptions in Turkey, where Hair (#43), Gurcu (#42) and Abaza goats (#40) were sampled in only two farms per breed; similarly, in Albania, Mati (#5) and Muzhake (#6) goats were collected in a single farm per breed, causing a slight overrepresentation of corresponding climatic conditions. Sex bias was minimized by sampling at least 1 male per farm. Geographical coordinates (longitude and latitude, decimal degrees, WGS84 geodetic reference system) of sampling locations, were either recorded with a GPS device with a dedicated protocol or derived from local maps.

Environmental variables
Altitudinal and climatic information was used to characterize the natural environment at each sampling site. Altitude was measured in the field with either an altimeter, or a GPS device, whenever possible, or derived from SRTM30 NASA, http:// www2.jpl.nasa.gov/srtm/. All data were validated with the 3 arc second (approximately 90 meters) digital elevation model (DEM) of the Shuttle Radar Topography Mission [25]. The use of the DEM turned out to be efficient both in detecting anomalies (wrong measures in the field) and in assigning coherent altitudes in case of missing data. As for climatic data, they were recorded according to a latitude/longitude grid with a spatial resolution of 10 minutes (equivalent to approximately 12 km at the latitude of Switzerland). In each cell of the grid we took into account 9 monthly variables, plus their yearly mean (Table 2), produced by the Climatic Research Unit in Norwich (CRU; http://www.cru.uea.ac.uk) to characterize continental regions from 1961 to 1990 [26].

DNA extraction and AFLP production
Genomic DNA has been extracted from frozen whole blood using a commercial kit (GenElute TM Mammalian Genomic DNA Miniprep Kit-Sigma). AFLP marker profiles were generated following a standardized protocol (modified from [27]. See File S1 for further details). Selective amplification was performed with three highly informative EcoRI/TaqI primer combinations (see Table 1. List of goat breeds included in the study (ID numbers on the first column was used to identify the breeds on the maps in Figures 5, 6 and Figure S1).  Table S1 in File S1) selected according to levels of polymorphism and number of bands.

Identification of loci possibly under selection
Correlative approach. The Spatial Analysis Method (SAM) described by Joost and colleagues [9] was used to compute association models between environmental variables and AFLP markers. As recently pointed out by Jones et al. [28], being dominant and loosely distributed on the genome, these markers have lower information content when compared to multiallelic (e.g. microsatellites) or more dense (e.g. SNP chip) marker panels, but at the time when the research was carried out, they represented the only tool available to perform a genome-wide scan of molecular variation in goat.
Association models were calculated with the software MATSAM [29], which is based on spatial coincidence, one of the six spatial analysis concepts distinguished by Goodchild [30].
The association models relate the genetic profile of the individuals to environmental parameters measured at their geographical coordinates of origin. The method employs a georeferenced data set describing individuals with environmental parameters and a binary matrix indicating the presence or the absence of AFLP bands. Univariate logistic regression analysis is used to determine the degree of association between the values of the environmental parameters and the frequencies of each AFLP phenotype. Then, the significance of the models generated by all possible pair-wise combinations of genetic marker versus environmental parameter is calculated. To this end, two statistical tests are used: the Wald and the Likelihood ratio statistics, integrating a correction for multiple hypothesis testing [9]. A model is considered significant only if both tests reject the corresponding null hypothesis. Markers included in statistically significant models are likely to host molecular variation involved in the processes of adaptation to the environment. In this study, we deliberately implemented univariate models only, as an exploratory step to detect the main environmental variables possibly selecting particular regions of the genome.
Finally, to validate the results produced by MATSAM, the goat AFLP data were also analysed with two other population genomic approaches able to detect selection sweeps.
F ST -outlier methods. MCHEZA [31] is a front-end program facilitating the use of DFDIST software [32][33], a modification of FDIST [32] which allows for dominant markers and implements the method of Zhivotovsky [34] to estimate allele frequencies. The FDIST method relies on the principle of genetic differentiation between populations, expressed in this case through the F ST index [35]. In a simulation model of the infinite number of possible values for populations of constant size, with a migrant exchange rate at equilibrium (the infinite island model), the F ST index introduced by Wright [36] corresponds to the probability that two alleles drawn at random from a population descend from an ancestral allele present in the same population [6]. The FDIST method is a variant of the Lewontin-Krakauer test [37], which relies on the idea that, in a given population, all genomic loci have comparable F ST values because they share identical demographic histories. Only loci subject to active selection deviate from this rule [6].
The other software used was BAYESCAN version 2.1 [12], which exploits a Bayesian inference method to directly estimate the posterior probability (PP) for each locus to be under selection. The procedure of statistical inference implemented in BAYESCAN uses observations to update or to newly infer the probability that a hypothesis may be true. To identify loci possibly under selection, the software both provides Bayes factors and Posterior Odds (PO) scores. The latter, in particular, indicate how more probable a model with selection is compared to a neutral model by calculating the ratio between their PPs [38].

Spatial statistics
Using the geographical coordinates of the farms, the spatial autocorrelation of the frequency of the markers detected as possibly under selection was measured with the help of Local Indicators of Spatial Association (LISA) [39]. LISA indicators are based on the statistical index I developed by Moran [40] and evaluate the existence of clusters in the spatial arrangement of a given variable due to underlying autocorrelation.
In this paper, we used bivariate LISA within a given spatial weighting scheme (see below), to identify and map local clusters of high or low marker frequencies (the first variable) significantly correlated with environmental variables (the second variable) possibly exerting selective pressures on them. This allowed to refine the identification of association signals detected by MATSAM over the whole study area and to check whether these associations involved particular breeds only.
The study area is very large, and the spatial distribution and density of farms are heterogeneous (e.g. a dense distribution of sampling points in the Alps versus four sparse locations in Turkey). Since it was difficult to apply a weighting specific to each situation,  the K = 90 nearest neighbours criterion was adopted. Indeed, due to our sampling scheme, on average 29 nearest neighbours have a high probability to belong to the same breed; in particular, there is a small area in Eastern Turkey with 50 animals belonging to Gurcu and Abaza breeds only, and another small area in Albania with 60 animals belonging to Liqenasi and Capore breeds only. Due to these occurrences, we set K to 90 to guarantee the inclusion of a representative number of animals from outside the local area.

AFLP markers
The three AFLP primer combinations produced a total of 101 polymorphic bands (34 per primer combination on average). The percentages of polymorphic loci per breed ranged between 50.0% in HUNAT breed and 68.9% in ROCAR breed ( Table 1).
The final dataset included 125139 data points, with a level of missing data of 0.50%, and is available from the authors upon direct request.  Association models (MATSAM) With a standard 99% confidence level, on a total of 11918 models computed (118 environmental parameters 6101 AFLP markers) MATSAM identified 1123 significant associations (9.4%, Figure 1). When applying the most conservative confidence level (8.31E-17), 67 significant models (0.56% of the total) were shared between markers M16, M65, M82, M86, M89 and M96, with marker M82 showing the largest number (n = 26) of significant associations. The environmental variables involved were wind, diurnal temperature range, precipitation frequency, relative humidity and percentage of maximum possible sunshine. Marker M16 is significantly associated (16 models) with diurnal temperature range (dtrjun, dtrsep, dtroct, dtrnov, dtryear), frequency of precipitation (rdosep, rdooct, rdonov), relative humidity (rehsep, rehoct) and solar radiation (sunapr, sunjun, sunaug, sunsep, sunoct, sunyear). Marker M65 is associated (10 models) with precipitation, frequency of precipitation and solar radiation. Marker M96 is involved in 8 models showing association with ground frost, temperature and frequency of precipitation. Finally, marker M86 is significantly associated with relative humidity (1 model) and the percentage of maximum possible sunshine (1 model).

Combined approaches
With a significance level of 8.31E-15, among the most significant models identified by MATSAM, five markers (M16, M48, M86, M89 and M96) were also detected as outliers by population genomics approaches (Figure 1, panel A). Among them, three (M16, M48 and M86) were detected by MCHEZA (with a 99% confidence level) and three by BAYESCAN (M86, M89 and M96 with a PP of 1.0 and a significance threshold set to log10(PO) = 3). Only locus M86 was identified by all three methods ( Figure S2).
When only the most conservative significance level of MATSAM (8.31E-17) was considered, the number of models was reduced to 67 and we could, therefore, single out the environmental variables associated with the 4 highest ranking markers (M16, M86, M89 and M96; Figure 3).

Signatures of natural selection and adaptation
The analysis of the spatial distribution of the frequency of the four markers most significantly associated with environmental variables (Figures 1 and 3) highlights some interesting evidence. M16, identified to be under natural selection by MATSAM and MCHEZA, is present in more than 90% of the sampled animals, and is geographically distributed in a rather homogeneous way, with frequencies decreasing to about 0.6 on average in the easternmost part of the sampling range (Jordan and Turkey; Figure S3). Any environmental interpretation is made difficult because M16 is significantly associated with 4 different environmental variables ( Figure 3): diurnal temperature range (DTR, 5 variables), the percentage of maximum possible sunshine (SUN, 6 variables), frequency of precipitation (RDO, 3 variables), and relative humidity (REH, in September and October). Figure 4 well shows that the frequency of marker M16 is negatively correlated with the yearly mean of DTR and with the yearly mean of SUN (panels A and B, respectively), while it is positively correlated with RDO in November and REH in October (panels C and D, respectively), highlighting a possible selective effect of the frequency of precipitation and humidity on this genomic region. By the way, the latter shows the best goodness-of-fit value among these models as translated by an Akaike Information Criterion (AIC) [42] of 594.5 (it is equal to 605.7, 607.9 and 613.6 for sunyear, dtryear and rehoct respectively). Joost and colleagues [9] obtained a similar result in sheep, where DYMS1 microsatellite markerknown to be involved in parasite resistance [43] -was found to be associated with the number of wet days, an environmental variable greatly influencing parasite load. Accordingly, in Figure 5, a large cluster of M16 high frequencies correlated with a high number of rain days in November is observed in the Alps and in Northern Europe. In this situation, the Moran's I measuring the global spatial autocorrelation over the whole study area is 0.26, while highlighting local regimes as evidenced by the different LISA clusters. Opposed to the North-Western ''high-high'' cluster mentioned above, the South-Eastern part of the study area shows several places (e.g. southern Greece, Jordan and Turkey; Figure 5) where correlation clusters with opposite behaviours coexist: either ''low-low'' (low marker frequencies correlated with low number of rain days) or ''high-low'' (high marker frequencies correlated with low number of rain days). It has to be noted that because of the superposition of many animals in Turkey, the map hides additional blue ''low-low'' and pale red ''high-low'' dots.  red. Most of the area where M16 frequency shows this dual behaviour is close to the goat domestication centre [44]; possible explanations can be: i) a lack of directional selective pressure in this area due to different precipitations regimes alternating frequently during the year. Therefore, the present day ''high-high'' North-Eastern European cluster may have derived from a gradual selective pressure exerted by different precipitation regimes on the genomic region surrounding M16, during the migration of goats towards North-Eastern Europe along with humans; ii) in Anatolian and European populations the levels of Linkage Disequilibrium between M16 and the selected genomic region may differ, due to changes in population structure or effective population size, so that M16 has become linked to the selected gene after the spread out of the domestication centre.
M86, which resulted under selection according to all three methods, is present in only 11% of sampled goats (Figure 3) and shows a higher frequency in breeds located in the Iberian Peninsula and in the Pyrenees ( Figure S3). This marker appears to be inversely correlated with REH in July and positively with the percentage of maximum possible sunshine in March (Moran's I = 0.23), as shown by the large cluster of M86 low frequencies correlated with a low percentage of maximum possible sunshine in the Alps (Figure 6). In this area, the few pale red exceptions are represented by French Alpine, Swiss Alpine, and Valais Black Neck breeds in which M86 is present ( Figure S3). The Mediterranean islands and the South-Eastern part of the study area are characterized by a very low frequency of M86 associated with a high percentage of possible sunshine. Finally, the Iberian Peninsula and the Pyrenees exhibit a different picture with breeds showing a high M86 frequency in a sundrenched geographical region. Interestingly, the processing of the logistic regression between M86 and SUN (in March) restricted to individuals from the Iberian Peninsula and the Pyrenees provided a result worse than when calculated between all individuals in the data set. The partial model was significant with a 95% confidence level (and not with 99%), showed a pseudo R 2 of 0.033 and an Area Under the Curve (AUC) of 0.59. In comparison, the global model shows a pseudo R 2 of 0.13 with an AUC of 0.75. The details of the bivariate LISA cluster analysis reveal that on a total of 184 animals from the Iberian Peninsula and the Pyrenees, 77 have a high M86 frequency associated with high SUN values, while 107 show low M86 frequencies associated with high SUN values. Importantly, all breeds are represented in each cluster categories with a minimum of 3 individuals. This means that in this area the marker in question is spatially dependent on SUN (there is no neutral location), but the latter environmental variable until now either selected M86 only in part of the individuals, or the high SUN local values coincide by chance with a high frequency of this marker as a result of demographic events (e.g. genetic drift or introgression from Africa).
M89 and M96 were identified by MATSAM and BAYESCAN as under selection. For these two markers, the signature of the environmental pressure is clear (Figure 3). They are both present in about the 60% of the animals in the dataset, and have a homogeneous spatial distribution ( Figure S3). M89 is likely to be selected by high values of diurnal temperature range in autumn, while M96, which is less frequent in the breeds of the Iberian Peninsula and in the Pyrenees, is possibly selected by the number of days with ground frost in winter (positive correlation), and by temperature in winter (inverse correlation).

Compared advantages and drawbacks of F ST -outlier methods and correlative approaches
Advantages and drawbacks of correlative and population genomics approaches have already been discussed [9,18,20]. Here we want to extend the discussion into the usefulness and relevance of the concurrent use of the two types of approaches. The comparison of the results produced by F ST -outlier methods (MCHEZA, BAYESCAN) and by a landscape genomics correlative approach (MATSAM) definitely strengthens the robustness of the list of loci identified as under selection and possibly involved in adaptation processes, and, in the same time, constitutes a useful cross-validation or double-check [18].
Having said this, it has to be noted that the results of F ST -outlier approaches like MCHEZA and BAYESCAN can be biased by factors like the strength, pervasiveness or time frame of the selection processes [45]. Further difficulties in the use of these methods may rise when processing very large datasets produced by highthroughput sequencing technologies. Calculation time will increase dramatically and may become an obstacle for many users. Correlative approaches that do not require the processing of simulated data are much more computation-effective.
Finally, F ST -outlier detection methods can be affected by population structure, since demographic processes (e.g. bottlenecks) or introgression can sometimes leave genomic signatures that mimic selection [46]. This complicates the task of disentangling selection from demography [47][48]. Novel approaches like the detection of patterns of linkage disequilibrium, as reviewed by Li and colleagues [47], may permit to address this issue once applied to loci identified as possibly under selection by correlative approaches. Indeed, a combined approach able to capitalize on the robust theoretical framework of theoretical population genetics as well as on the incorporation of the effect of landscape spatial heterogeneity would constitute an optimal solution [42].
Opposite to F ST -outlier methods which only identify selected loci, correlative approaches as MATSAM return both the list of selected markers and of the environmental variables constituting the probable selective pressure. The landscape genomics approach, by providing knowledge about the nature of the selective pressure, is likely to favour the emergence of working hypotheses about the role of the genomic regions surrounding the markers under study, and thus facilitate the discovery of genes possibly involved in specific functional processes; in this specific hypothesisdriven context, it is useful to mention that bivariate models may be elaborated to refine the analysis. An important difference is that, being independent of any theoretical genetic framework and having no constraint like Hardy-Weinberg or Linkage equilibrium assumptions, landscape genomics is not able to discriminate between directional and balancing selection.

The role of livestock landscape genomics
In spite of the reservations mentioned above, we want to stress here the importance of livestock landscape genomics, in particular in developing countries. In Africa, it is widely recognized that livestock genetic resources -and as a consequence their germplasm and adaptive potential -are now endangered, since local breeds are being gradually substituted by or ''improved'' through crossing with cosmopolitan breeds. Local breeders decisions are more and more influenced by the need to increase short-term productivity [49], with the consequence of becoming dependent on expensive external inputs, such as veterinary intervention to progressively face increased diseases susceptibility or fertility loss. This leads local breeders to lose their means of sustenance and join shantytowns around big cities, with a reduced level of livelihood and of food security. Landscape genomics can also be used to obtain an empirical evaluation of biodiversity patterns within a geo-environmental context when no direct information is available on livestock diversity levels, and may constitute an opportunity to reverse the trend of farmers' livelihood worsening described above. Indeed, if more attention was paid to local environmental conditions, animals and breeds could be more efficiently selected to increase the overall livestock performance within a specific environment [50]. The underlying idea, in fact, is that the genotype of animals living in a specific habitat will show signatures of selection in those traits significantly associated with habitat characteristics, and that the added effect of human selection will point to genome regions linked to productivity traits [49]. It is, therefore, conceivable that in the near future the concurrent application of landscape genomics and genomic selection will provide flocks tailored for specific environments and with improved production efficiencies [50]. Concerning our study, an important next step will be to locate the AFLP markers most significantly associated to environmental variables on the goat genome, once a complete and reliable assembly will be available, and to relate them to functionally relevant genes [17,51]. In the next future, the increasing use of goat SNP panels for genotyping purposes will greatly increase the resolution power of the landscape genomics approach and will eliminate mapping problems. Main issues regarding the adoption of SNP panels data for landscape genomic evaluations will be the absence of ascertainment bias and the incorporation of genetic variants typical of local breeds from different agro-climatic areas worldwide.
In developing countries, the conservation of local genetic resources is fundamental to guarantee a sustainable livelihood for over one billion people on the Earth. Livestock landscape genomic approach holds the potential to explore the genomic bases of the adaptation of livestock breeds to specific environmental conditions. Besides adding conservation value to livestock genetic resources and helping in the definition of conservation units [52], in the present-day context of rapid global climatic change, the incorporation of genome-landscape interaction data will permit to develop novel molecular tools useful to i) conserve the adaptive potential during genetic improvement programs of local breeds, and ii) increase adaptability in highly productive industrial breeds. Figure S1 Geographical position of the farms where the goats have been sampled. For the correspondence between numbers and breed names, see Table 1.  Table 1.

(TIF)
File S1 Lab protocol and primer sequences used for the production of AFLP markers. (DOCX)