Genetic and ecological insights into glacial refugia of walnut (Juglans regia L.)

The distribution and survival of trees during the last glacial maximum (LGM) has been of interest to paleoecologists, biogeographers, and geneticists. Ecological niche models that associate species occurrence and abundance with climatic variables are widely used to gain ecological and evolutionary insights and to predict species distributions over space and time. The present study deals with the glacial history of walnut to address questions related to past distributions through genetic analysis and ecological modeling of the present, LGM and Last Interglacial (LIG) periods. A maximum entropy method was used to project the current walnut distribution model on to the LGM (21–18 kyr BP) and LIG (130–116 kyr BP) climatic conditions. Model tuning identified the walnut data set filtered at 10 km spatial resolution as the best for modeling the current distribution and to hindcast past (LGM and LIG) distributions of walnut. The current distribution model predicted southern Caucasus, parts of West and Central Asia extending into South Asia encompassing northern Afghanistan, Pakistan, northwestern Himalayan region, and southwestern Tibet, as the favorable climatic niche matching the modern distribution of walnut. The hindcast of distributions suggested the occurrence of walnut during LGM was somewhat limited to southern latitudes from southern Caucasus, Central and South Asian regions extending into southwestern Tibet, northeastern India, Himalayan region of Sikkim and Bhutan, and southeastern China. Both CCSM and MIROC projections overlapped, except that MIROC projected a significant presence of walnut in the Balkan Peninsula during the LGM. In contrast, genetic analysis of the current walnut distribution suggested a much narrower area in northern Pakistan and the surrounding areas of Afghanistan, northwestern India, and southern Tajikistan as a plausible hotspot of diversity where walnut may have survived glaciations. Overall, the findings suggest that walnut perhaps survived the last glaciations in several refugia across a wide geographic area between 30° and 45° North latitude. However, humans probably played a significant role in the recent history and modern distribution of walnut.

Introduction Near Eastern regions as the origin and primary center of diversity of walnut. The modern day walnut represents postglacial expansion, colonization, and cultivation comprising diversity resulting from complex interactions of natural and human selection and domestication [38,39]. Dode [40] described six taxa to accommodate the variation and ecotypic differentiation within the Eurasian populations of walnut, with additional taxa recognized by Soviet and other botanists.
Chloroplast genomic diversity has been extensively used to analyze the historical phylobiogeography of plants at interspecific and intergeneric levels, but limited organelle DNA polymorphisms make it unsuitable to study infraspecific genetic diversity, population structure, and differentiation. Alternatively, genomic DNA polymorphisms offer excellent opportunities to study spacio-temporal genetic diversity, population structure, and differentiation resulting from the dynamic interaction of evolutionary forces at infraspecific levels. This study focuses on: (1) examining the genetic structure and differentiation of modern walnut to identify the plausible hotspots of diversity, and (2) ecological niche modeling (ENM) to elucidate present and project past distributions during the last glacial maximum (LGM; 21-18 kyr BP), and the Last Interglacial (LIG or Eemian; 130-116 kyr BP). We address the following questions: (1) where did walnut survive during the LGM and LIG? (2) does the modern genetic structure and differentiation patterns provide evidence for the potential location(s) of Pleistocene refugia; and (3) does ecological niche modeling identify location(s) of refugia congruent with genetic evidence?

Plant material, DNA extraction, and microsatellite analysis
The study used 643 genotypes comprising 317 diverse accessions representing the modern range of distribution of walnut maintained at the National Clonal Germplasm Repository, USDA-ARS, Davis, California (S1 Table). Five major distribution centers (Caucasus, Central Asia, East Asia, Southwest (SW) Asia, and Eastern Europe) were considered.
Fresh leaf tissue was collected from each accession and total DNA isolated following a standard CTAB protocol [41] and treatment with RNase A and diluted to approximately 50ng/μL. Nineteen microsatellite loci, WGA001, WGA004, WGA009, WGA069, WGA089, WGA106, WGA118, WGA178, WGA202, WGA223, WGA225, WGA237, WGA318, WGA321, WGA331, WGA332, WGA338, WGA349, and WGA384 [42,43] were amplified by polymerase chain reaction (PCR) with fluorescent labeled forward and unlabeled reverse primers. The microsatellite loci were amplified in a triplex format in a 15 μL reaction mixture containing 10 mM Tris-HCl, pH 8.3, and 50 mM KCl (all included in 10X PCR buffer), 2 mM MgCl 2 , 0.9 pmol of each primer, 0.2 mM of each dNTP, 0.6 U of Taq polymerase (New England BioLabs, Ipswich, MA), and approximately 25 ng of template DNA. The PCR conditions were as follows: 1 cycle of 94˚C for 5 min, 30 cycles of 94˚C for 30 sec, 55˚C for 30 sec, and 72˚C for 40 sec, and a final elongation of 72˚C for 7 min. Amplified products were resolved by capillary electrophoresis using an ABI 3130xl Genetic Analyzer with Data Collection software, version 3.0 (Applied Biosystems, Foster City, CA). The data was further analyzed using Genotyper, Version 2.5 (Applied Biosystems) and data assembled as bi-allelic genotypes (S2 Table) and in a binary matrix (1 = presence, 0 = absence) format.
possible pair-wise combinations [45]. The bootstrap interior branch test [46] was used to test reliability of interior braches on the tree. The principal components analysis (PCA) was performed on the multilocus genotype data using the R package adegenet [47]. The accessions were projected onto a two dimensional space bound by the first two principal axes to elucidate the genetic relationships within and among geographic groups.
The genotypic data were subjected to a Bayesian model-based CA using the software package STRUCTURE 2.3.1 [48] to determine the optimum number of groups reflecting the genetic structure. STRUCURE allocates individuals into clusters (K) based on multilocus genotype data, so as to minimize deviations from Hardy-Weinberg and linkage equilibrium. The program uses a Markov Chain Monte Carlo (MCMC) procedure to estimate P(X|K), the posterior probability that the data (X) fit the hypothesis of K clusters. The analysis assigns individuals into each of the K clusters based on the membership coefficient (Q-value) which sums to unity over the number of clusters (K) assumed. STRUCTURE was set to ignore population information, and to use an admixture model with correlated allele frequencies, as it is considered the best option for subtle population structure [49]. The degree of admixture (α) was allowed to be inferred from the data. α is close to zero when most individuals are from one population or another, while it is greater than one when most individuals are admixed [49]. The allele frequency parameter (λ) was set to one as suggested in the STRUCTURE manual. From a pilot study, we found that burn-in and MCMC simulation lengths of 100,000 replicate runs were optimum to achieve accurate parameter estimates. We let the number of clusters (K) vary between 2 and 18 with 20 replicate runs to quantify the variation of the likelihood for each K. The K value that provides the maximum likelihood (Ln P(D) in STRUCTURE) across runs is generally inferred as the most probable number of clusters. However, the interpretation of K should be treated with care as it merely provides an ad hoc approximation [48] and sometimes genuine and subtle population structure may be missed by STRUCTURE. Therefore we used an ad hoc statistic ΔK to choose the optimum number of clusters (K) based on the second order rate of change in the log probability of data between successive K values as proposed by Evanno et al. [50].

Genetic diversity within and among groups
The multilocus genotype data were pooled into five geographic groups matching the results of the CA and subjected to analysis of total and within-group genetic diversity measures such as mean number of alleles per locus (A), observed (H o ) and expected (H e ) levels of heterozygosity, and fixation index (F) for different loci. Allelic richness (Ar) and private allelic richness (PAr) for each population were estimated using the rarefaction method [51], which compensates for differences in sample size (i.e. rarified allelic richness) among populations as implemented in HP-RARE 1.1 [52]. The estimates of Ar and PAr were geographically projected using an inverse distance weighted (IDW) interpolation tool implemented in the ArcMap 10.1 (ESRI, Redlands, CA USA). Gene diversity analysis was performed on the allele frequency data from the five geographic groups by following the method suggested by Nei [53]. The total gene diversity (H T ) was partitioned into gene diversity due to variation within groups (H G ), and the component due to variation between groups (D GT ). Differentiation between groups was calculated as G GT = D GT /H T , where G GT can vary between 0 (when H G = H T ) and 1 (when H G = 0), i.e. groups fixed for different alleles.
The group-wise microsatellite data were also analyzed using the analysis of molecular variance (AMOVA) as implemented in the software package ARLEQUIN version 3.6 [54]. The total variance was partitioned into variation within and among groups. The variance components from AMOVA were used to estimate the population subdivisions within and among groups. Contingency χ 2 analysis was performed to determine the heterogeneity among groups before performing AMOVA. A population pair-wise F ST matrix was computed to assess genetic differentiation among different geographic groups.

Ecological niche modeling
We used 237 unique walnut occurrence locations with corresponding georeferenced data gleaned from the Genetic Resources Information Network (GRIN, USDA-ARS; http://www. ars-grin.gov/npgs/index.html), the Global Biodiversity Information Facility (GBIF; http:// www.gbif.org), field collections, and published literature (S3 Table) representing the current walnut distribution. Modeling of modern distribution of walnut was performed using the maximum entropy algorithm implemented in MaxEnt 3.3.3e [55] with the current climatic data from the WorldClim database [56]. Past climatic data from two general circulation models (GCM), the Community Climate System Model (CCSM) [57], and the Model for Interdisciplinary Research on Climate (MIROC, version 3.2; [58]) at 2.5' spatial resolution, were used to hindcast LGM distributions. Data for LIG [59] at 0.5' spatial resolution aggregated to 2.5' resolution were used to model LIG distribution. Highly correlated environmental variables (Pearson's correlation coefficient >0.7) were excluded from modeling, leaving eight bioclimatic variables: mean annual temperature, mean diurnal range, isothermality, temperature seasonality, mean temperatures of the wettest quarter, mean temperature of the driest quarter, annual precipitation, and precipitation seasonality.
Correction of sampling bias. The occurrence data often exhibit spatial autocorrelation and could potentially introduce environmental bias into modeling [60][61][62][63]. In order to minimize environmental bias, we filtered walnut data using the rarefying tool in the species distribution model (SDM) toolbox [64] implemented in ArcMap 10.1 (ESRI, Redlands, CA). We rarefied data at 10 and 25 km spatial resolutions based on climatic heterogeneity of the mountainous regions where the samples originated. The filtering resulted in 136 and 112 unique localities for the 10 and 25 km rarefying resolutions, respectively.
Presence-only data are inherently biased due to uneven sampling over the species landscape [65]. In order to infer meaningful information from such data we need to correct for the sampling bias. We account for sampling bias by providing MaxEnt with a bias grid of the sampling probability surface roughly representing the sampling efforts and giving weights to random background data used for modeling. Ideally a bias file would represent the actual sampling intensity across a large rectilinear study area, which can be roughly estimated by the aggregation of occurrences from a closely related taxon or a taxon group. However, such data or information are difficult to find for the native range of walnut or for that region as whole and a large spatial extent can also lead to the selection of a higher proportion of less informative background points [66]. Instead, we produced a bias grid by deriving a Gaussian kernel density map to be more selective in the choice of background points focusing on sampling locations of walnut. This method produces a bias grid that up-weights presence-only data points with fewer neighbors in the landscape; bias values of 1 reflect no bias while higher values indicate increased sampling bias [62,64].
Tuning model settings. The unfiltered data with 237 data points and two filtered data with 136 and 112 occurrence points, were subjected to model tuning using an R package ENMeval [67] to identify the optimum data set for modeling current distribution of walnut. The ENMevaluate function in the ENMeval package performs tuning and evaluation of models by automatically implementing MaxEnt with a range of user-defined settings. It executes a series of tasks: (1) partitions occurrence and background data points into spatially independent evaluation bins using six different methods for k-fold cross validation [60,68]; (2) builds a series of models with different user-specified feature classes (FCs) and regularization multipliers (RMs); and (3) computes five different evaluation metrics to aid in selecting optimum model settings. The evaluation metrics include: (i) the area under the curve (AUC) of the receiver operating characteristic (ROC) for the test data (AUC TEST [60]); (ii) AUC DIFF which is the difference between AUC TRAIN and AUC TEST [69]; (iii) minimum training presence omission rate (OR MTP [60]); (iv) 10% training omission rate (OR 10 [60]); and (v) the Akaike information criterion (AIC c [69]). AUC TEST measures the model's ability to discriminate conditions at test occurrence localities from those at background localities averaged over k iterations, with higher values indicating better discrimination. AUC DIFF is positively associated with the degree of overfitting. Omission rates provide information regarding the ability to discriminate between suitable and unsuitable sites as well as quantify model overfitting by comparing threshold-dependent omission rates with theoretically anticipated levels of omission. OR MTP indicates the proportion of test localities with suitability values lower than those associated with the lowest-ranking training locality with values greater than zero typically indicating model overfitting. OR 10 indicates the proportion of test localities with suitability values (relative occurrence rate corresponding to MaxEnt's raw output) lower than those excluding the 10% of training localities with the lowest predicted suitability. Under either threshold rule, pixels with values equal to or higher than the threshold are considered suitable. Omission rates greater than the theoretical expectation for a given threshold typically indicate model overfitting. The AIC corrected for small sample size (AIC c ) reflects both model goodness-of-fit and complexity, where the best model has the lowest value (i.e. ΔAIC c = 0). We applied the "block" method to partition both occurrence and background data, which splits data along the latitude and longitude lines, and allocates equally into four bins for cross validation. It is the best method for studies involving model transfer across space and time [70]. We built models with the RMs ranging from 1.0 to 5.0 at increments of 0.5 and six FC combinations: Linear (L); Linear and Quadratic (LQ); Hinge (H), Linear, Quadratic, and Hinge (LQH); Linear, Quadratic, Hinge, and Product (LQHP); and Linear, Quadratic, Hinge, Product, and Threshold (LQHPT) with 10000 background points. The RM imposes a penalty on model complexity and FC determines the shape of response curves, both act in concert with each other to reduce complexity of models. Computation of all evaluation metrics used MaxEnt raw output values, which is interpreted as relative occurrence rate (ROR) [71]. The model with ΔAIC c equal to zero is considered the best model [69]. We computed Schoener's D statistic that considers the geographic variability pixel-by-pixel to quantify pair-wise similarity among different models. Based on model tuning for different data sets, we selected the data set filtered at 10 km with 137 occurrence points as the best for hindcasting LGM and LIG distributions of walnut. We ran MaxEnt modeling with settings identified as optimum by model tuning to produce the current climatic projection and to hindcast past distributions of walnut with the Gaussian kernel density bias grid file to account for any residual sampling bias in the data set. Predicted habitat suitability maps for the current, LGM, and LIG distributions of walnut showing the relative rate of occurrence were generated in ArcMap 10.1.

Genetic polymorphism and population structure
The walnut germplasm collection examined exhibited considerable polymorphism with observed number of alleles ranging from 8 for WGA089, WGA237, and WGA384 to 20 for WGA 202 with an overall mean of 12 alleles/locus ( Table 1). The observed and expected levels of heterozygosity showed significant deficiency of heterozygotes for all loci as compared to Hardy-Weinberg expectations. The observed heterozygosity ranged from 0.326 for WGA349 to 0.651 for WGA178, with an overall mean of 0.501 and the fixation index, which indicates non-random assortment of alleles due to significant population sub-structuring, ranged from 0.136 for WGA178 to 0.610 for WGA349, with an overall mean of 0.285. Deficiency of heterozygotes is sometimes attributed to presence of null alleles, but their effects on population differentiation is not fully understood. The conventional methods for detecting null alleles are less reliable and inconsistent when applied to non-equilibrium populations, and provide only a sub-optimal solution [72].
Multivariate genetic structure revealed by the CA identified five major groups closely matching with the geographic affiliations of different walnut accessions ( Fig 1A). Eastern European accessions from the Balkans, Carpathians, Russia, western Europe mainly French showed close genetic affinity with the SW Asian and the Caucasus groups. East Asian accessions from China and the Central Asian germplasm from Kyrgyzstan formed two unique groups somewhat allied to each other. The SW Asian germplasm from Afghanistan and neighboring Tajikistan, India, Nepal, and Pakistan formed a loose conglomeration exhibiting subtle differentiation among them. The Transcaucasian germplasm from Azerbaijan and Georgia formed an exclusive group closely associated with the SW Asian group.
The PCA based on mutlilocus genotype data unraveled genetic relationships within and among different geographic groups similar to CA. The two-dimensional projection of accessions defined by the first two principal axes accounting for 13.66% and 9.82% of the total variation, respectively, revealed genetic differentiation within and among groups ( Fig 1B). The first axis discriminated the Central Asian and East Asian groups from the SW Asian, Caucasian, and Eastern European groups, whereas the second axis differentiated the East Asian from the Central Asian group and among the Caucasus, Eastern European, and the SW Asian groups. The model-based Bayesian CA produced results comparable to the distance based CA and PCA. The estimated mean likelihood values (Ln Pr X|K) attained a maximum value at K = 5 (Fig 2A). The ad hoc statistic ΔK related to the second order rate of change of log probability of data between successive Ks produced a distinct peak at K = 5 with some minor peaks at K = 9, 13 and 16 ( Fig 2B). Plotting the Q-matrix of estimated membership coefficients for each individual genotypes for K = 5, sorted by Q revealed clusters somewhat similar in size and composition to distance based CA and PCA ( Fig 2C). However, genotypes with mixed ancestry, often involved members from each of the five geographic groups of walnut.

Pattern of distribution of genetic diversity within and among geographic groups
The contingency χ 2 analysis indicated that the five geographic groups differed significantly in the number, composition, and frequency of alleles. However, there were a number of high frequency alleles common across the groups that often possessed frequencies lower than 0.1 in some groups. There were 87 unique low frequency alleles among groups with the SW Asian group possessing the largest number with 50 unique alleles followed by East Asia with 20, Central Asia with nine, the Caucasus with six and the Eastern European group with two (S4 Table). Estimates of within-group diversity parameters indicated that the total number of alleles across 19 loci ranged from 191 with a mean of 10.1 alleles/locus for the SW Asian group to 100 with a mean of 5.26 alleles/locus for the Caucasus. The allelic richness adjusted to the minimum sample size of 49 genes ranged from 7.19 for the SW Asian group to 4.52 for the Central Asian group with an average of 5.29 alleles/locus and the private allelic richness followed the same trend ( Table 2, Fig 3A and 3B). There was a deficiency of heterozygotes in all the five   groups suggesting moderate levels of population subdivisions within groups. Partitioning of variation within and among geographic groups indicated that most of the molecular variation (87%) resided within populations and only 13% of the total variation accounted for genetic differentiation among groups ( Table 3). The estimated degree of among-group differentiation (F ST ) averaged over loci among groups was 0.128 (P < 0.01).
Nei's gene diversity analysis based on allele frequencies for the five groups identified from the CA indicated that the total gene diversity, a measure of heterozygosity in the total population is reasonably high across loci ranging from 0.418 for WGA106 to 0.873 for WGA349 with an average of 0.706. Only 12.4% of the total gene diversity (G GT ) accounted for genetic differentiation among groups and there was considerable variation among loci ranging from 8.3% for WGA331 to 22.2% for WGA384, and on average 88% of the total variation was found within group variation (Table 4). Geographic differentiation among the walnut groups was estimated using Wright's fixation index (F ST ) ( Table 5). The Caucasus group exhibited the highest divergence from the East Asian group (0.165) followed by the Central Asian group (0.158), the Eastern European group (0.116), and the SW Asian group (0.09). The SW Asian group is closely related to the rest of the groups in the study with F ST ranging from 0.045 with the Eastern European group, followed by the Caucasus and the East Asian groups (0.09 each) and the Central Asian group (0.103).

Ecological niche modeling
Model tuning results are presented in S5 Table, Table 6. Examining the metrics of AIC c -selected models suggested that the data set filtered at 10 km logged in the lowest values for AUC DIFF (0.014), OR MTP (0.029) and OR 10 (0.103) among the three models followed by the unfiltered data set with 0.029, 0.042, and 0.118, and filtered at 25 km with 0.040, 0.065 and 0.185, respectively, suggesting filtering somewhat improved model efficiency. Visual examination of models generated from hindcasting LGM and LIG distribution of walnut using the three data sets (S2 Fig, Fig 4) showed minor difference among the projections suggesting filtering did affect only marginally the LGM and LIG predictions and Schoener's D statistics further confirmed these results (Table 7). Based on evaluation metrics, we selected the model from the data set filtered at 10 km (Fig 4) to hindcast LGM and LIG walnut distribution.
The current climatic model predicts a moderate to high rate of occurrence of walnut in the regions mainly between 30˚N to 45˚N latitude, and 20˚E to 80˚E longitude, comprising eastern Turkey bordering the Black Sea and western Iran, the Talysh region of Azerbaijan, southern Turkmenistan, western Uzbekistan, Kyrgyzstan, southern Kazakhstan, Tajikistan, northern Afghanistan, northwestern Pakistan extending into southeastern regions, southcentral Tibet, and northeastern India. Parts of western and central Turkey, the Balkan Peninsula extending into eastern Greece and southern Bulgaria, southeastern Carpathians (Romania), and northeastern Danube region (Hungary, Slovakia, Czech Republic, Austria), Spain, the Atlas   Table 6 for feature class and regularization multiplier settings). https://doi.org/10.1371/journal.pone.0185974.g004 Genetic and ecological insights into glacial refugia of walnut showed relative high rate of occurrence of walnut. Overall, the current climatic model roughly predicted the current natural distributional range of walnut (Fig 4).
The LGM-CCSM projection predicted the areas of relatively high rate of occurrence of walnut shifted to lower latitudes than projected in the current climatic model. Distribution was fragmented and interspersed with areas of marginal occurrence. Eastern Pakistan extending in the north to Tajikistan and parts of northeastern Afghanistan, southeastern Turkmenistan, western Iran, southern Turkey bordering the Mediterranean Sea, the Hyrcanian and Colchic regions of the southern Caucasus including the Talysh and Alburz mountain ranges of Azerbaijan and Iran, Armenia and border areas of the Black Sea, showed relatively high rates of occurrence of walnut. However the entire Turkey, southern Balkans and eastern coastal regions of Adriatic Sea and in Central Asia, southern Turkmenistan, Uzbekistan down to Tajikistan, northeastern Afghanistan, and western Himalayan state of Kashmir extending up to southwestern Tibet exhibited moderate to low rates of occurrence. The LGM-MIROC model projected a similar distribution as LGM-CCSM, but regions of high relative occurrence concentrated in north eastern Pakistan, Tajikistan, northeastern Afghanistan, southern Turkmenistan on eastern coast of Caspian Sea, southwestern Balkans (southern Bulgaria), northwestern Turkey, north western coastal regions of the Adriatic Sea. Both CCSM and MIROC models projected a moderate rate of occurrence of walnut in Xinjiang province of western China, low occurrence in central China and relatively high occurrence rate in the southeastern China (Fujian and Jiangxi provinces and neighboring areas), and somewhat fragmented distributions in northeastern India. There were regions of low occurrence in central China, but overall there was a reduction in the occurrence of walnut in East Asia during LGM, as compared to the present day distribution. The LIG projection indicated relatively high rates of occurrence in a narrow region comprising southern Iran and northwestern Pakistan extending into southern Afghanistan, tapering off eastward along the southern Himalayan foothills extending into Nepal, Sikkim, Bhutan into Arunachal and north Assam. There was also a region of high occurrence in the northeastern and central regions of Xinjiang province. There

Discussion
The distribution and survival of trees during the LGM has been of interest to paleoecologists, biogeographers, and geneticists. Paleodistribution modeling in conjunction with population genetic analyses can predict the past distributions and aid in locating Pleistocene refugia of plant species. Ecological niche models (ENMs) that associate species occurrence and abundance with climatic variables are extensively used to gain ecological and evolutionary insights, and to predict species distributions across landscapes over space and time. The present study deals with the glacial history of walnut to address questions related to past distributions during the LGM and LIG periods. The results include population genetic analysis of a germplasm collection representing the modern range of walnut, and ecological modeling of present distribution and the LGM and LIG projections, to predict past climatic niches and locate the Pleistocene refugia. It is widely accepted that during the LGM, most nemoral tree species were restricted to refugia in the Iberian, Italian, and Balkan peninsulas [79][80][81]. However, during the interglacial stages of the lower Pleistocene southeastern Europe supported extensive mixed-broad leaved forests of Fagus, Juglans, Pterocarya, and Tsuga south of 57˚-58˚N [82,83], as the climate deteriorated the proportion of broad-leaved species was reduced and eventually eliminated. During the Eemian interglacial (130 -116 Kyr BP), it is generally believed that the thermal optimum was higher than today, and the dendroflora pollen spectra in the vicinity south of the Gulf of Finland and Central Europe supported broad leaved deciduous species such as J. regia, Carpinus betulus, Tilia cordata, T. tomentosa, Quercus spp. Corylus avellana and Alunus spp. [84,85]. Fossils of walnut were also discovered in Bilzingsleben, a Paleolithic site in Germany dating back to the Eemian interglacial period, indicating walnut persisted until the Ionian stage of the middle Pleistocene [85]. There is palynological evidence of existence of walnut in the Balkan refugia during the LGM but it is interpreted either as representing long distance dispersal from southern refugia, or as in situ refugia [86]. The LGM-MIROC model (Fig 4) in our study strongly supports a high rate of occurrence of walnut in the southern Balkan regions of Bulgaria and Romania adjacent to the Black Sea coast. However, the Holocene landscape comprising Juglans, Castanea, Platanus, Olea and Fagus is thought to be anthropogenic intervention [32,[87][88][89] during the Greco-Roman period.

Historical biogeography and glacial history of walnut
During pre-glacial periods the section Juglans endemic to Eurasia probably had ample opportunity to diversify and much of the ancestral taxa must have gone extinct during glaciations. Incomplete palaeobotanical records from Eurasia and perhaps the difficulty in recognizing intrasectional diversity in pollen and other microfossil flora have obscured the ancestral taxonomic diversity of the section Juglans. However, palynological evidence suggests that walnut survived in Central Europe in small cryptic refugia during the LIG [11,[84][85][86] and gradually became extinct [90] during the LGM. In contrast, the southern Caucasus and SW Asia have sheltered a large number of Tertiary relict nemoral trees, including walnut, during the LGM [77,[91][92][93]. Our ENMs suggest that walnut probably had multiple refugia spread out from the southern Caucasus to Southwestern and Central Asian regions surrounding the Pamir Mountain ranges, where more favorable Pleistocene and early Holocene climates prevailed in most of Eurasia (Fig 4).

Glacial refugia, postglacial recolonization, and genetic differentiation
The climatic deterioration during the late Tertiary followed by the Quaternary glacial and intergalcial fluctuations played a major role in shaping the present-day genetic diversity, population structure, and differentiation patterns of plant species [2,15,94]. Whether Quaternary vegetation dynamics fostered increased or decreased genetic diversity is unknown, but demographic fluctuations during range expansion and contraction could cause undesirable stochastic effects resulting in widespread extinctions [95]. However, the genetic signatures of historical biogeographic events persist long after post glacial recolonization from refugial populations [95,96]. Glacial refugia were important for species survival in glacial and interglacial periods and sheltered many species which had been widespread [15]. Knowledge of the size, distribution, isolation within and among refugia, and the mode of postglacial expansion are important to understanding the mode and tempo of evolution of modern day species [90]. Therefore, postglacial expansion of species is an important issue in the study of historical biogeography of the Quaternary [97]. It has been shown that postglacially colonized regions are known to exhibit lower genetic diversity than refugia [2,97], and as expansion proceeds its leading edge will have progenies from the nearest neighborhoods as compared to the ones far behind [98]. As colonization continued, the natural selection, local adaptation and gene flow within and among the new neighborhoods and populations from different refugia will eventually build dynamic species-wide population genetic structure. Consequently the study of contemporary genetic structure of species populations should be able to shed light on past glacial events that shaped genetic diversity and modern distribution of species aiding in identification of areas where species may have survived glaciations. Here we test whether or not the amount and pattern of distribution of genetic diversity within and among different geographic groups of walnut shed light on the Pleistocene glacial history and the postglacial re-colonization, domestication and distribution.
Humans apparently played a role in shaping the modern genetic structure, but the signature of biogeographic events should permit speculation on the mode and tempo of the evolution of walnut [99][100][101]. Our study of genetic diversity suggests five genetic groups reflecting regional centers of genetic diversity and differentiation, and we hypothesize that these groups embody the biogeographic history of walnut. Among the groups, SW Asian walnuts from the regions of Afghanistan, Pakistan, southern Tajikistan and parts of northwestern India represented the most diversity as indicated by high levels of allelic richness, private allelic richness, and heterozygosity, followed by groups from Eastern Europe, East Asia, and the Caucasus (Table 2 and Fig 3). This suggests that walnut may have survived in SW Asia during the LGM and served as a founder for recolonization of neighboring Central Asia, the Caucasus, East Asia, and Eastern Europe during the current Holocene interglacial. The LGM-CCSM and LGM-MIROC projections (Fig 4) show high occurrence of walnut interspersed with regions of moderate to low occurrence indicating a mosaic of isolated populations thriving in South and Southwest Asia during the LGM. Hemery et al. [102] suggested that walnuts may have migrated northward towards Central Asia from South Asia sometime during the Holocene. Beer et al. [99] proposed expansion of walnut from South and SW Asia to Central Asia during Chalcolithic period based on palynological data.
A significant amount of genetic diversity was detected in the walnut germplasm collection, and the loci assayed differed considerably for the number of alleles per locus, observed and expected levels of heterozygosity, and fixation index ( Table 1). General deficiency of heterozygotes and relatively high fixation index across loci is attributed to the Wahlund effect caused by significant intra-and inter-regional genetic differentiation, which is perhaps due to sampling effect in germplasm collections of outbreeding species like walnut. Finite and isolated populations in the mountainous terrains where walnut is native probably experienced drift leading to stochastic loss and/or fixation of alleles.
Central Asian walnuts show the lowest level of allelic richness and low heterozygosity as expected in recently colonized populations. Moderate allelic richness and heterozygosity of Eastern European walnut observed here is unexpected and probably due to historic migration of germplasm, recent introductions from other walnut growing regions, and directional selection during domestication that occurred in this region compared to other walnut regions. Surprisingly, the East Asian walnut exhibited moderate allelic richness and private allelic richness compared to the Transcaucasian and Central Asian walnuts, probably due to historic introductions of diverse germplasm from Persia, Tibet, and India [35], and possible interspecific gene flow between walnut and its native butternut counterparts, which were prevalent in northeastern China. The low allelic richness within the Caucasus walnuts may be due to severe bottleneck within and among the fragmented populations growing in diverse topographic, pedological, temperature and moisture conditions eroding the allelic diversity [103]. Human habitation and expansion of agriculture in this region during the late Pleistocene and Holocene have caused profound changes in soil cover and vegetation on a vast geographic scale impacting ecosystems in the southern Caucasus. Further, over harvesting and grazing in walnut forests, and more recently the forest farming systems have hampered regeneration and fragmented walnut distribution, eroding the genetic diversity and promoting differentiation among populations in the Caucasus. A recent study showed significant genetic differentiation among moderately variable fragmented walnut populations in the greater and lesser Caucasus Mountain ranges [104].
The CA and PCA results suggest close association of the SW Asian, Caucasus, and Eastern European walnut groups, while Central Asian and East Asian walnut are somewhat separate groups. Presence of one or more moderate to high frequency alleles common across loci and among groups and low differentiation of SW Asia walnut from other groups indicate that walnut probably expanded from SW Asia into other regions following glaciations. At the same time, the presence of several moderate frequency alleles across loci restricted to one or more groups suggest either local genetic differentiation after recolonization or separate expansion events from different refugia. However, high genetic diversity and close genetic affinity of SW Asian walnut strongly support a single refugial source located in the mountainous regions of SW Asia during the LGM that further expanded and spread northward into Central Asia and westward into Europe and other regions, which was further facilitated by human migration along ancient trade routes. Furthermore, human mediated dispersal and local domestication events since Greek and Roman times perhaps significantly influenced the current distribution of genetic diversity in walnut. Historic migration of walnut along the ancient trade routes from Persia, Tibet, and the Himalayan regions of India into China during the Han dynasty founded an important secondary center of diversity for walnut [35].
The two likely scenarios for recolonization of trees: (1) rapid colonization from southern refugia mediated by long-distance dispersal [105], which is unlikely as walnut is mainly dispersed by small mammals and birds and (2) slow dispersal from wide-spread refugia with some closely located to the modern range [106]. The latter is more likely as our results indicate that post-glacial spread of walnut probably occurred gradually to neighboring Central Asia, the Caucasus, East Asia and then to Eastern Europe. Our LGM and LIG models indicate the possibility of the Balkans, Caucasus, Central Asia and neighboring regions also supporting glacial refugia which may have contributed to rapid postglacial colonization of walnut. Further, it is widely believed that the post-glacial colonization of nemoral Europe comes from one or both southern refugia; Caucasus, SW Asia. Despite our genetic analysis supporting SW Asian walnut as a single founder source for post-glacial recolonization, the ENMs suggest the possibility of many more refugia in the Balkans, southern Caucasus, west, central, and south Asian regions. Our LIG projection (Fig 4) supports widespread but fragmented and low rate of occurrences of walnut throughout southern and western Europe as far north as southern Scandinavia, southern Ukraine, the coastal Adriatic regions of Greece and Albania, extending east into Turkey, southwestern Kazakhstan, eastern Iran, northeastern Pakistan, southern Tibet, and foothills of the Himalayas extending into northeastern India (Sikkim and Arunachal), and Bhutan. There were isolated populations of low to medium occurrences in northern Afghanistan and northern Pakistan and it was missing in Central Asia and southern Caucasus. Expansion and contraction of walnut populations during the Pleistocene interglacials probably resulted in isolation of subpopulations within and among regional groups as evidenced by the significant deficiency of heterozygotes and inbreeding coefficients for all groups across loci contributing to moderate differentiation within groups. The Bayesian CA exhibited subtle differentiation among the five groups showing genetic admixture, which is probably due to shared ancestral polymorphisms or recent dispersal mediated by human migration along the silk routes and gene flow between bordering populations. Eastern European walnuts showed a greater percentage of admixture suggesting the strong influence of historic introductions and human selection.

Possibility of multiple southern refugia
The presence of the Plio-Pleistocene cryptic refugia in regions other than SW Asia is not ruled out as they present moderate levels of genetic variation and differentiation within each group. In the Caucasus, the Colchis and Talysh regions served as species-rich refugia for many members of the Arcto-Tertiary flora where perhaps walnut survived during the glaciations [77,103]. The Caucasus had much more favorable Pleistocene and early Holocene climates than most of Eurasia with its complex topography providing diverse habitats and isolation favoring the formation of refugia in which ancient species survived Pleistocene climate deterioration [77,92]. The first fossil remains of walnut in Georgia date back to the Paleocene and the Sarmatian flora, a Miocene relict flora of Abkhazia (Colchis) somewhat similar to present flora containing subtropical elements such as walnut [107], where it remained dominant until the Early Pleistocene. Walnut currently survives in small isolated populations and in planted stands throughout Transcaucasia. In a recent study we showed limited diversity and significant differentiation among the walnut populations from the Talysh Mountains in the Lesser and Greater Caucasus Mountains [104].
Central Asian Mountain ranges such as the Pamir, Kopet Dagh, and Tien Shan are important centers of biological diversity and believed to be a center of origin and diversity of walnut [108][109][110]. The Kopet Dagh riparian forests along the southern and southwestern shores of the Caspian Sea, which to some degree resemble the Hyrcanian forests, where walnut has been reduced to sparse isolated populations from over harvesting and intense grazing [111]. The northwest Pamir Mountains of southern Tajikistan, especially the Gissar, Darvaz and Peter the First Ridges, support mesophyllic forest ecosystems consisting of walnuts (J. regia and J. fallax) and willow-poplar-birch forests at altitudes of 1000 to 1400 m and are considered to be relict formations of the Iranian and Turanian floras with eastern Mediterranean species occurring within distinct areas [112]. The walnut forest of western Kyrgyzstan is considered a Tertiary relict [110], but a recent palynological study indicated that it is probably of anthropogenic origin and at most 2000 years old [99]. Our analysis indicates that the Central Asian walnuts from the Fergana and Chatkal Ranges of Kyrgyzstan intergrade into the East Asian group forming a loose alliance with SW Asia and the Caucasus walnut groups (Fig 1). These results combined with our ENMs appear to suggest that walnut possibly (1) survived in small populations during glaciations in the Tien Shan Mountains extending up to the Fergana Ridge and southern Kazakhstan, (2) spread from the South and SW Asia into Central Asia following glaciation, and (3) survived in multiple refugia in many southern locations during glaciations. The LGM projections also suggest a low rate of occurrence indicating possible refugia of walnut in northeastern Xinjiang province and southeastern China. However, the high genetic variability and close genetic affinity of SW Asian walnuts to the Caucasus, East Asian, Eastern European, and Central Asian groups strongly suggest that walnut survived in SW Asia during glaciations (Tables 2 and 5). SW Asia served as a founder to post glacial range expansion of walnut into neighboring Central Asia, the Caucasus, and East Asia probably occurred during the Holocene. Strikingly, Eastern European walnuts showed closer relationship to the SW Asian and Central Asian groups, suggesting historic and repeated introductions of walnut from Asia into Southern and Eastern Europe during the Greco-Roman period and human selection and cultivation of these early Asian introductions. Further, ancient Chinese records indicate walnut was introduced to China from Iran, Tibet, and Kashmir region of India during the Han Dynasty [35]. This is further substantiated by F ST , which suggests that the East Asian group exhibits closer affinity to the SW Asian group than the other four regional groups, perhaps suggesting historical migration of walnut from this region through early trade along the ancient silk route connecting these two regions. Nonetheless, during the last glacial maximum, at least two independent refugia were maintained across northeastern China for J. mandshurica, a species representing the section Cardiocaryon within the genus Juglans [113].

Ecological niche models
Ecological niche modeling of current climatic and species occurrence data predicted southern Caucasus, parts of West and Central Asia extending into South Asia encompassing northern Afghanistan, Pakistan, northwestern Himalayan region, and southwestern Tibet as the favorable climatic niche matching the modern distribution of walnut. Hindcasting explicitly correlates climatic factors with species distributions and complements the genetic analysis in locating Pleistocene refugia. Our LGM hindcasts using data from the CCSM and MIROC models suggested disjunct distributions of walnut populations restricted to Transcaucasia, Central and South Asian regions extending into southwestern Tibet, northeastern India, Himalayan region of Sikkim and Bhutan, and southeastern China. CCSM and MIROC projections overlapped, but MIROC projected a significant presence of walnut in the Balkan Peninsula during the LGM (Fig 4). In contrast, population genetic analysis of the modern walnut distribution suggested a much narrower area in northern Pakistan and the surrounding areas of Afghanistan, northwestern India and southern Tajikistan, as a plausible hotspot of diversity where walnut may have survived glaciations (Fig 4). Paleo-projections of walnut distributions correspond to pollen finds reported from Ljubljana in Slovenia [114] and Staro-Orjachovo near Varna on the Black Sea coast [115], suggesting walnut occurred in the Balkan region during the Eemian interglacial, but vanished completely during the LGM [25], as projected by the hindcast with the LGM-CCSM simulated climatic data (Fig 4). During the LIG and later, walnut still occurred in the Ghab Valley in Syria [116] as shown in both LGM projections (Fig 4). The Colchis region is regarded as a glacial refugium for thermophilous plants of the Neogene flora [77,117], and our ENM results agree with the previous report that species such as Pterocarya and walnut survived in this region throughout the Pleistocene [82]. Hyrcanian forests stretching from the Talysh Mountains in southeastern Azerbaijan along the southern shores of the Caspian to Golestan National Park in Iran have been an important refugia for temperate broad-leaved trees including walnut during the Quaternary glaciations [118][119][120] confirming our LGM and LIG projections. Climatic conditions around the Black Sea and the Caspian Sea were favorable for walnut during the last glacial period [121]. The LIG predictions suggest that walnut probably had an extended low rate of occurrence in southern and western Europe from the Iberian Peninsula through southern France, the Italian Peninsula, Adriatic Coastal regions, Greece, the Caucasus, southern Black Sea regions of Turkey to southern Russia, western Kazakhstan, East Asia, and scattered distribution in SW Asia. Palynological evidence confirms the occurrence of walnut in many of these areas during the Quaternary period. The LIG climatic predictions in higher latitudes suggest small pockets of marginal climate for a low rate of occurrence of walnut in the UK, Germany, and Sweden. However, Juglans pollen found in two peat bogs in Kashmir between 17,000 and 10,000 cal yr BP onward supports the results of population genetic analysis of modern walnut distribution.

Conclusion
The paleoclimatic predictions show that the distribution of walnut was affected by Quaternary climatic fluctuations with population contractions and fragmentation. The LGM-CCSM and LGM-MIROC models suggested broad areas of the Caucasus, SW Asia including northeastern Afghanistan, Northern Pakistan, northeastern India and Central Asian Republic of Tajikistan as favorable climatic regions where walnut probably survived in multiple refugia. The LIG prediction suggested that walnut perhaps had expanded distribution in southern Europe from the Iberian Peninsula through southern France, Italian Peninsula, Adriatic Coastal regions, Greece, the Caucasus, southern Black Sea regions of Turkey up on to southern Russia, western Kazakhstan, East Asia and scattered distribution in SW Asia. Palynological evidence confirms occurrence of walnut in many of these areas during the Quaternary period. However, a cautionary note on paleoreconstructions is that they only predict the potential climatic niche suitable for species and may not confirm actual existence of refugia.
Population genetic analysis of walnut representing the modern distributional range suggested a general area of northern Pakistan and surrounding areas including northeastern Afghanistan, southern Tajikistan and northwestern India as the possible hotspot of diversity where walnut probably survived the last ice age. The genetic analysis also indicated that walnut probably spread into neighboring Central Asia, the Caucasus, West Asia and eventually Eastern Europe during the Roman period as confirmed by fossil pollen evidence. Overall, the findings suggest that walnut possibly survived the last glaciations in several refugia across a wide geographic area between 30 and 45 degrees north latitude. However, humans have played a significant role in the recent history and modern distribution of walnut.
Supporting information S1  AIC c -selected model prediction of occurrences of walnut for current, last glacial maximum (LGM; 21-18 kyr BP), and last interglacial (LIG; 130-107 kyr BP) climatic conditions for unfiltered data set with 237 occurrence points and filtered at 25 km spatial resolutions with 112 occurrence points, respectively (refer to