Genetic Diversity and Population Structure in a Legacy Collection of Spring Barley Landraces Adapted to a Wide Range of Climates

Global environmental change and increasing human population emphasize the urgent need for higher yielding and better adapted crop plants. One strategy to achieve this aim is to exploit the wealth of so called landraces of crop species, representing diverse traditional domesticated populations of locally adapted genotypes. In this study, we investigated a comprehensive set of 1485 spring barley landraces (Lrc1485) adapted to a wide range of climates, which were selected from one of the largest genebanks worldwide. The landraces originated from 5° to 62.5° N and 16° to 71° E. The whole collection was genotyped using 42 SSR markers to assess the genetic diversity and population structure. With an average allelic richness of 5.74 and 372 alleles, Lrc1485 harbours considerably more genetic diversity than the most polymorphic current GWAS panel for barley. Ten major clusters defined most of the population structure based on geographical origin, row type of the ear and caryopsis type – and were assigned to specific climate zones. The legacy core reference set Lrc648 established in this study will provide a long-lasting resource and a very valuable tool for the scientific community. Lrc648 is best suited for multi-environmental field testing to identify candidate genes underlying quantitative traits but also for allele mining approaches.


Introduction
Cultivated barley, Hordeum vulgare L. ssp. vulgare, the domesticated form of wild barley H. vulgare L. ssp. spontaneum (C. Koch) Thell., is one of the oldest cereal crops [1]. Barley withstands hot and dry climates, marginal soils, to some extent salinity and a broad range of soil pH conditions [2,3]. Today, barley is grown from 70˚N in Norway to 46˚S in Chile. The morphological, physiological and functional variation in barley reflects the underlying genetic diversity, which eases the environmental adaptation of this species [4,5].
In barley, as in most crops, genetic bottlenecks occurred during domestication and crop improvement. For most loci current elite varieties harbour less genetic diversity than their wild relatives or early domesticates [6][7][8][9]. Landraces are traditional domesticated populations of locally adapted genotypes maintained by local farmers over generations. Early barley cultivars were direct selections among landraces or descended from genetic recombination of different landraces. Since then, new barley varieties are mainly developed through the reshuffling of alleles resulting in a more or less constant repertoire of alleles within the elite gene pool. Overall, the genetic basis in present elite barley breeding materials is rather limited.
Since the beginning of the 20 th -century landraces were largely replaced by modern cultivars [10,11], which are higher yielding under optimal conditions but which can completely fail under harsh environments [12]. Today, most barley landraces have disappeared from practical farming. Many of them are still maintained in ex situ repositories. Globally, landraces represent the largest part of barley germplasm conserved in genebanks (44%, 128.870 accessions) [13,14].
Genome-wide association analysis (GWAS), a population based method to identify marker-trait associations based on linkage disequilibrium (LD) is being recently used extensively in crop plants [22]. Genetic diversity, population size and stratification, extent of genome-wide LD, allele frequency distribution, marker type and coverage as well as other parameters determine the accuracy, resolution and power of GWAS [22][23][24]. As a consequence of genetic bottlenecks during domestication and crop improvement, allele frequency changes resulted in different levels of LD and genetic diversity. Thus, the extent of LD increases from wild barley to landrace and to elite cultivars, whereas the reverse trend was observed for genetic diversity [25]. The trade-off of higher LD in current cultivars is lower resolution in GWAS [26]. To fine-map QTL, the varying extent of LD observed in different genepools of barley (e.g. wild, landraces, and cultivars) could be exploited and provides a great opportunity for high-resolution association mapping [27].
Several studies were performed in barley to investigate genetic diversity in different germplasm collections using molecular markers, and few of these collections (panels) were established for GWAS. However, most studies were based on either cultivar collections [28][29][30], or mixtures of cultivars and landraces [31][32][33] or landrace panels from distinct geographical regions only [34][35][36][37]. Very few collections of barley landraces collected from a wider geographic range were established but none of them was designed for higher resolution GWAS [11,[38][39].Here we describe the establishment of a comprehensive and diverse collection of spring barley landraces adapted to a wide range of climates (Landrace Collection 1485, ''LRC1485''). We studied genetic diversity and population structure using microsatellite (Simple Sequence Repeat, SSR) markers, which provided the basis for targeted research activities. The legacy core reference set (CRS) LRC648 established here is available for the scientific community to integrate data and to improve our elite barley varieties under changing environmental conditions.

Plant material
In total, 1491 spring barley landrace accessions adapted to diverse climate conditions were originally considered for this study. This representative collection was carefully selected among 22 and 16.62˚-71.5˚E. The selection was based on the following criteria: i) Mansfeld's taxonomical classification system considering growth habit, row type of the ear, kernel coverage, spike density, and seed colour [40], ii) the collection site had to be well documented (S1 Table), and iii) passport data (Characterization and Evaluation data since 1946) -obtained from IPK's Genebank Information System, GBIS, http://gbis.ipk-gatersleben.de). Barley landraces collected during targeted expeditions before 1992 were considered whenever possible to work with trustable materials. From Syria and Jordan only a few landraces (six accessions) were included -as barley landraces from this region were comprehensively investigated by [34,41]. Apart from other considerations, the proportion of landraces selected from each country should represent the overall composition of IPK's spring barley landrace collection containing 6,800 accessions meaning the number of accessions per country differed (Fig. 1, Table 1, S1 Table).
The IPK genebank practices the splitting of original landrace accessions and maintains them as morphologically distinct accessions to counteract the possible loss of rare alleles in the original population of genotypes [42,43]. For this reason, a current landrace accession might correspond to a single representative genotype of the original landrace population collected. Whenever possible, latitude and longitude coordinates were inferred using the original collection site descriptions. A broader source location (nearby city or province or country capital) was considered to infer the geographic coordinates, when the exact collection location was not documented. Searches were performed using Google maps (http://maps. google.com/maps) and global Gazetteer version 2.2 (http://www.fallingrain.com/ world/index.html).

Genotyping using SSR markers
Four seeds per landrace accession were randomly selected and sown under greenhouse conditions at IPK in 2008. Leaf material from one representative plant per accession was harvested three weeks after sowing. DNA was isolated from freeze-dried leaves using a BioRobot 9600 Work Station and the MagAttract 96 DNA plant core kit (Qiagen, Germany). Forty-five fluorescence-labelled SSR markers were selected based on their mapping position in the barley genome, covering all seven chromosomes [44,45] (Table 2, S1 Fig.). Primers were labelled with HEX, FAM and TAMRA dyes allowing multiplexing of primers pairs into 15 multiplexes (M1 to M15) with three primer pairs per amplification. PCR reactions were performed following the protocols described by [46]. Amplification products were separated on a MegaBACE 1000 capillary sequencer (Amersham Biosciences). Fragment sizes were recorded and analyzed using MegaBACE fragment profiler software version 1.2 and inspected manually. Allele sizes and peak intensities were recorded. Low intensity bands were assigned missing values. Two markers that were monomorphic (GBM1043 & GBM1036) and one marker that amplified multiple fragments (GBM1326) were excluded from further analysis ( Table 2, S1 Fig.). Six accessions were excluded from diversity and population differentiation analysis due to pure DNA quality leaving 1485 accessions for analysis (S1 Table).

Inferring the population structure
The population structure of the 1485 landraces (LRC1485) considered was inferred using STRUCTURE 2.2 [47,48] based on 42 SSR markers. This approach uses a Bayesian clustering analysis to assign individuals to clusters (K) without prior knowledge of their population affinities. STRUCTURE simulations were performed with the number of presumed clusters from K51 to K520 and five runs per K  Table 1, Table 3, Table 4, S3 Table and S1 Table for  value. For each run, the initial burn-in period was set to 50,000 followed by 100,000 Markov Chain Monte Carlo (MCMC) iterations. The most probable number of clusters was determined by plotting the estimated likelihood values [LnP(D)] as a function of K [48]. Furthermore, delta (K) values were also calculated as proposed by [49]. A cut-off limit of 60% membership coefficient (Qmatrix) was considered to assign the individuals to a particular group as suggested by [50]. Accessions that did not meet this criterion were considered as admixed. Principal Component Analyses (PCA) were performed using PAST 2.12 [51]. Neighbor-Joining tree and Neighbor-Net planar graph based on Hamming distances (uncorrected p-distance) between 1485 landraces were constructed using SPLITSTREE 4.13.1 [52].

Genetic diversity and population differentiation
The number and frequency of alleles, gene diversity and heterozygosity (H e ), were determined for all loci across the total population using POWERMARKER 3.25 [53]. Polymorphism Information Content (PIC) values were determined according to [54]. Genetic variation within and among populations was estimated by Analysis of Molecular Variance (AMOVA) using ARLEQUIN 3.1 [55]. AMOVA was conducted between morphological, geographical and STRUCTURE inferred groups. Genetic differentiation among groups was calculated based on unbiased F st estimators [56]. Pair wise population comparisons using Fixation statistics (F st ) were determined among all groups as well as allelic richness, gene diversity (GD) and the number of alleles for each group were calculated using FSTAT 2.9.3 [57]. Allelic richness values for subpopulations were calculated based on rarefaction to account for varying sample size [58]. In all analyses, statistical significance was determined by performing 1000 permutations.

Spatial distribution of groups in relation to geography and climate
Geographic ground distances in kilometres between accessions were calculated based on latitude and longitude coordinates. The genetic diversity index [59] was calculated, and the genetic distance matrix was calculated using the shared allele distance approach of [60] based on allele frequencies at 42 loci. Based on this, Mantel tests were conducted between genetic distance and all other distance matrices. Mantel correlograms were generated analogous to the autocorrelation function [61] -and allowed to assess the overall relationship between matrices and to determine the significance level of correlation for each distance class. Mantel correlograms were constructed using PASSaGE 2.1 [62].   The spatial distribution of accessions, categorized into population-based groups, was visualized in relation to climate zones based on the Köppen-Geiger classification [63]. These climate zones are obtained by classifying the mean climate conditions on land areas around the globe using climate variables such as annual and seasonal mean temperatures and precipitation. The five main zones range from tropical climate through arid and temperate climates to polar climates. Subgroups can be determined by, for example, hot summers, dry and cold winters, year-round precipitation or monsoonal conditions.

Establishing a Core Reference Set (CRS) for high resolution LDmapping
To build a legacy CRS [64] comprising c. 600-700 genotypes, we first genetically purified the entire collection using two rounds of single seed descent (SSD) [65] under greenhouse conditions in 2008-2009 (started with the same plants from which we isolated DNA, see above). Afterwards, the materials were multiplied and surveyphenotyped under field conditions at IPK using local standard agricultural practices in two subsequent years: i) 2010 (single row per landrace genotype, isolated by one row of a spring wheat genotype, 8 plants per row, 20 cm distance between the plants), and ii) in 2011 (micro plots, 1.261 m; all available seeds from the 2010 harvest were sown and equally distributed over six rows). Plants were harvested and threshed manually to avoid mixtures of seeds. In total, 1014 genotypes produced sufficient seeds after two rounds SSD and the two subsequent multiplication cycles (S1 Table). Subsequently, the M strategy as implemented in MSTRAT [66] was used to establish the legacy CRS. All data sets available were considered for this: i) geographical origin; ii) 42 SSR markers, and iii) morphological traits as quantitative parameters, which were obtained during seed multiplication in 2011, such as row type, caryopsis type, heading date, plant height and spike length (S1 Table). The best CRS that maximizes the number of observed alleles was then established based on five replications using MSTRAT. Diversity scores calculated based on allelic richness [67] were compared and validated with a random sampling approach as well as by comparing phenotypic diversity.
High levels of polymorphism and many rare alleles in the collection Data quality was high, and the level of missing information across SSR markers was very low (1.528%). Based on the 42 SSR markers considered, 372 alleles were obtained with fragment sizes ranging from 90 to 360 bp ( Table 2, S2 Table). The number of alleles per locus ranged from 3 (GBM1363) to 22 (GBM1007, GBM1015) with an average of 8.86 alleles per locus. Major allele frequencies per locus ranged between 0.213 (GBM1256) and 0.974 (GBM1404) (mean value of  Table 2). An average gene diversity (GD) value of 0.6036 was obtained, indicating a high level of genetic variation among the accessions. Heterozygosity levels (He) were very low ranging 0-0.05, with an average value of 0.0119 per locus ( Table 2). A total of 157 rare alleles (allelic frequency ,1% in the total collection) were identified amounting to 42% of the total number of alleles discovered ( Table 2, S2 Table).
Population structure within the panel STRUCTURE runs were performed for K51 to K520 based on the distribution of 372 alleles at 42 SSR loci among 1485 accessions. LnP(D) values increased slowly starting from K510, thus probably representing the most appropriate number of  major clusters in this collection. However, the maximum delta (K) value was reached at K54 (Fig. 2). The primary division at K52 was observed mainly between Ethiopian (Group 1) and non-Ethiopian landraces (Group 2) (S1 Table). Here 92.59% of the landraces were assigned to either of two groups (G) considering the 60% membership coefficient (Q-matrix). The least number of admixtures (110 accessions; 7.4%) was observed at this K value.
Pairwise F st values among STRUCTURE inferred groups at K510 ranged from 0.196 (between G4 -from Georgia and G7) to 0.556 (between G1 -naked Ethiopian, and G4). Similarly pairwise F st comparisons for inferred groups at K54 ranged from 0.187 to 0.388 (S3 Table). Highly significant levels of genetic differentiation between and within populations were found using AMOVA. The percentage of genetic variation among populations ranged from 6.38% (all tworowed vs. all six-rowed Ethiopian) to 37.6% (STRUCTURE inferred groups (K510) ) and the percentage within the populations ranged from 62.40% (STRUCTURE inferred groups (K510) ) to 93.62% (all two-rowed vs. all six-rowed Ethiopian). The values for the comparison of all Ethiopian naked vs. all non-Ethiopian naked barleys are interesting and possibly indicating two evolutionary lineages (S4 Table).

Association between eco-geographical factors and genetic diversity
Significant relationships were found between genetic distances of accessions and eco-geographical parameters at the site of origin (S5 Table). Most significantly correlated with genetic distances was geographical distance, followed by latitudinal distance (S5 Table). The Mantel test between the genetic distance matrix (shared allele distance matrix) and the geographical distance matrix revealed a significant Mantel correlation of 0.357 (P ,0.0001). The correlation between genetic distance and longitudinal distance (r50.305, P ,0.0001) was high. The correlation between genetic distance and latitudinal differences (r50.193, P ,0.0001) was 2-fold lower than the correlation between genetic and geographic distances. In the Mantel correlogram for genetic distance vs. geographic distance, the matrix was subdivided into 20 discrete distance classes. Within the distance class of 0-300 km between the collection sites of accessions, the correlation was highest (r50.523). The r -values (Mantel correlation) and their significances declined with increasing distances. Spatial Mantel correlograms provided similar trends (S6 Fig., S5 Table).
STRUCTURE inferred clusters at K510 were assigned to distinct Köppen climates, although value should not be given to the few suspect lines (see below) ( Fig. 1, S3  Fig., S4 Fig.). S6 Table shows the major Köppen climates for each of the STRUCTURE inferred group. Climate zones range from winter dry tropical climates (Aw) to fully humid boreal climates (Dfb). Most samples are located in warm temperate climates, either summer dry (Csa) or fully humid (Cfb). In more detail one can observe, that all accessions from G1 and G5 are located in winter dry climates in Ethiopia; G4 lines were only sampled from Georgia, where fully humid boreal climates (Dfb) prevail. G2 is mainly located in hot and dry desert climates (BWh) and G6 (six-rowed) prevails in wintercold steppe climates (BSk). G7 occurs along a narrow latitudinal range (30-43˚N) in Minor Asia and the Fertile Crescent in warm temperate summer dry climates (Csa). G8 (two-rowed) and G10 (six-rowed) were sampled further north, majorly between 40-50˚N (Cfb, temperate fully humid) and 36-47˚N (Csa, temperate summer dry), respectively.

Establishment of a core reference set
To determine the theoretical optimum size of a core group representing most of the genetic diversity within the whole LRC1485 collection, different sizes of core groups from n51 to n51485, with a step size increase of 50 accessions were computed using MSTRAT and plotted against their diversity score calculated based on allelic richness. The M strategy performed better in efficiently capturing maximum allelic diversity than the random sampling approach (S7 Fig., Table 5). The theoretical optimum size of the core group was determined as in the range between c. 600 to 750 samples. More specifically, 745 individuals must be selected to harbour the maximum allelic diversity (S7 Fig.). Subsequently, based on the 1014 genotypes, which were available after two rounds of SSD and multiplication, the core reference set LRC648 was established using MSTRAT consisting of 308 tworowed (242 hulled and 66 naked types) and 340 six-rowed (285 hulled and 55 naked types) genotypes (S8 Fig., S1 Table, S7 Table). We validated the superiority of the M strategy by comparing phenotypic diversity between the core set LRC648 and a random set of landraces comprising the same number of accessions (LRC648R) (S8 Table).

Discussion
The demand for higher yielding and better-adapted crop varieties has raised the need to exploit the large ex situ genebank collections [68]. So far, in case of barley, only very few collections of landraces were investigated, mostly sampled from particular geographical regions [34,35,69,70]. The primary aim of this study was to establish and to characterize a diverse legacy collection of barley landraces adapted to a wide range of climates.

Lessons learned from genebank materials
Most probably due to careful selection and accurate description of the material (conservation management of IPK genebank certified according to -DIN EN ISO 9001:2008; line splitting practice for heterogeneous accessions), very few geographical outliers were observed. Another reason for the good fit of accessions into geographical clusters could be that 1035 (69.7%) accessions were sampled during IPK collection missions (or from collections which are hosted at IPK e.g. AUTMAYR-22-32, HIND-35/36, IRNFAOKU-52-54, S1 Table) and then maintained at IPK -thus significantly reducing confusions arising from seed exchange with other genebanks. For most accessions considered here, the collection site information was available (which is usually rarely the case for landraces from other genebanks), thus significantly improving spatial-evolutionary studies, too. Few accessions appeared to be suspect and do not seem to represent their original collection site (e.g. at K510 for G1: the seven lines sampled in Europe and Morocco). These accessions were probably not collected originally in these areas. We assume they were sampled elsewhere and then grown ex situ in these countries, or they may have been incorrectly recorded or mixed up during seed exchange and propagation. As recently shown by [50] consequent elimination of any doubtful line identified would provide the best resolution.
Higher values of genetic diversity within the LRC1485 and the legacy collection LRC648 compared to the world-wide GENOBAR GWAS panel In the collection of 1485 barley landraces, 372 alleles were detected using 42 SSR markers, with an observed average allele number (AN) of 8.95 alleles per locus, which is larger than in most other studies, including [71] using 45 SSRs in 223 cultivars of worldwide origin but also including some wild barleys (AN57.7), [69] using 39 SSRs for Eritrean landraces (AN57.6), [35] with 44 SSRs for Himalayan landraces (AN55.54) or [72] with 12 SSRs for Sardinian landrace populations (AN55.6). On the other hand, [73] reported AN516.7 for a worldwide collection of 953 barley accessions including wild barleys based on 48 genomic SSRs -which are much more polymorphic than the cDNA derived SSRs in our study. Certainly, the number of alleles per locus depends on the genotypes considered, the loci investigated and the marker type.
A total of 157 rare alleles were detected in the whole collection. Rare alleles were mostly detected at SSR loci, which displayed a high number of polymorphic alleles. The presence of 42% rare alleles highlights the potential of this collection for subsequent allele mining studies but could potentially limit GWAS.
To evaluate the potential of LRC1485 for GWAS in more detail, genetic diversity values were directly compared to the potentially most diverse GWAS panel for barley comprising of 224 spring barley cultivars and landraces of worldwide origin (''GENOBAR'' panel) [32,46,74,75] -using the same set of 42 SSR markers (Table 5). Overall, the LRC1485, the CRS LRC648 but also the subpanels of LRC648 based on the row type of the ear -are all more diverse than the GENOBAR panel as indicated by e.g. the total number of alleles, number of unique alleles or average gene diversity. Unique alleles observed in the GENOBAR panel came from genotypes collected from East Asia as well as from the Americas. Such regions were not considered in LRC1485 (Table 5).
LD estimates between SSR markers were computed (S9 Fig.). All pair-wise comparisons showed very low LD (r 2 ,30), which is not surprising due to the relatively low marker coverage (S1 Fig.) and the large population size. Inferring genome wide LD dynamics in this population from few SSR markers would be a vague interpretation. Therefore, high-density marker coverage across the genome is required (a least a few hundred of mapped and equally spaced SNPs) to estimate more precisely the LD extent and the pattern at the population level and across the genome.
Ten major clusters define most of the population structure within the spring LRC1485 panel We assessed population structure by different approaches. Based on Bayesian clustering and PCA analyses we considered K510 as the most appropriate number of major clusters in the LRC1485 collection. Although the maximum delta (K) value was reached at K54, clusters at K510 were better defined based on geographical origin, row type of the ear and caryopsis type -and also associated to distinct Köppen climate zones.
Clusters G1, G3, G4, G5 G8 and G10 were distinct and well supported. An intrinsic genetic substructure was visible for groups, which included accessions from a larger geographical range (G2, G6, G7, G9). PCA provided mostly congruent results for these clusters. However, PCA does not classify accessions into discrete clusters in all cases, especially not when admixed accessions and accessions of various geographical origins with a constant gene flow are included [76]. NEIGHBOR-JOINING and NEIGHBOR-NET analysis supported these findings. All clusters obtained by STRUCTURE analysis were distinguishable, although a high amount of reticulation was evident, owing to the fact that common alleles per locus were shared among geographic regions (S10 Fig., S11 Fig.).
To explore the genetic diversity and relationships among and within STRUCTURE inferred groups, various diversity statistics were assessed (Table 4). Gene diversity (GD) over 42 loci was highest for G7, G6, G2 and G3, which is comparable with other local collections [69,77]. Highest values for allelic richness and group specific rare alleles were found in G3, which might be due to i) materials sampled from a wide range of eco-geographical conditions around the Mediterranean Basin (local climates and niches at similar latitudes) (Fig. 1, S4 Fig.); or ii) gene flow among barley genepools [31].
As expected for barley, the row type of the ear was an important determinant of the population structure [28,78] and six groups comprised majorly either tworowed (G7, G8) or six-rowed (G2, G3, G6, G10) types, respectively (Table 3). Karyopsis type defined G1 and G9 as they included only naked types. Adaptation to local climates is defined e.g. through G4 (from Georgia) and G5 (from Ethiopia) containing both row types. However, within groups G1, G4, G5 and G9 subclusters can largely be explained by the row type of the ear.

Eco-geographical factors and spatial genetics
Detailed knowledge of environmental parameters at a certain collection site can help determining the role of relevant climatic factors influencing the genetic differentiation and adaptation of genotypes to their environments. Furthermore, this knowledge might help selecting most suitable parental lines for population development and breeding programs.
The distribution of accessions was found rather along the latitudes, meaning a wider W-E than N-S window. Just Ethiopia is the southern-most sampling area, i.e. stretching the latitudinal direction the most. In general, climatologically, climate zones are determined by i) incoming solar radiation (more at equator, less at poles); ii) b) altitude (higher elevation will yield similar climate zones as usually on higher latitudes); and iii) proximity to the sea yielding maritime vs. continental climates. Thus the general pattern of climate zones (as the word ''zonal'' means in this context) is band along the latitudes.
The distribution pattern of STRUCTURE inferred groups probably indicate a preferred distribution path rather zonal (W-E) than meridional (N-S), i.e. higher correlation with longitude and indeed this was observed for LRC1485 groups. As domesticated barley spread meridionally from the Fertile Crescent to northwestern Europe, the crop encountered considerable ecological and environmental change. Natural mutation, selection and enrichment of favorable alleles at key loci such as Ppd-H1 [79], HvCEN [80] or vernalization-related genes VRN-H1 [81,82], VRN-H2 [83] and VRN-H3 (also known as HvFT1, [84]) contributed to successful environmental adaptation and range extension in barley.
However, based on the IPK genebank information system, we selected only spring barleys, which flowered at IPK without the need for any cold period to promote flowering. In Central Europe (such as at IPK), spring types are sown between early March and end of April depending on the weather conditions every year [68]. Thus, vernalization-or frost tolerance-related loci should be less relevant for the adaptation of spring barleys to their environments. However, as shown by [85], the GENOBAR spring barley collection, harbored 8 haplotypes at VRN-H3. Genotyping the same set of 224 accessions at VRN-H2 suggested the presence of recessive alleles at VRN-H2 (Kilian et al. unpublished), observed as deletions of a cluster of up to three ZCCT family genes, which contribute to the spring growth habit [86]. Thus, ideally, genotypes should be characterized at molecular and phenotypic levels, before assigning them to a specific growth habit (i.e. winter, facultative, spring). However, in the genebank context this has not been achieved for most of the collections yet. Therefore, we expect some facultative types within LRC1485.
As already shown partly for the GENOBAR collection, which harbors six haplotypes at PpdH1 [85] and six out of seven haplotypes detected in the domesticated genepool at HvCEN studied by [80], we also expect a remarkable number of haplotypes at key genes responsible for environmental adaptation within LRC648. Thus, we suggest using the legacy CRS LRC648 or their two subpanels in particular for allele mining and gene discovery studies.
Although altitude must be considered as an important factor influencing the genetic diversity distribution [87], this factor was not considered in our studies because precise geographic coordinates were not available for all landraces.
The initial Köppen analysis presented here does not give a very clear picture due to outliers and thus a rather high within-group-variability (S6 Table). However for some groups a particular climate applies. Regarding climate change one could investigate Köppen maps based on future climate projections and check where climate zones suitable for barley cultivation will be located in the future. Modelling crop performance under changing climates will help guiding the breeding programs to the expected future needs [88][89][90].

New insights into barley evolution: two examples
With our genetic analysis in 1485 barley landraces, new insights into barley domestication history can be obtained. In total, 299 accessions from Ethiopia were genotyped (Table 1), thus providing probably the largest SSR data set generated for Ethiopian landraces so far (S1 Table, S2 Table). Overall, Ethiopian barleys were apparently found to be distinct from all other groups (S3 Table, S4 Table), which is in line with previous studies [87,[91][92][93]. Different evolutionary forces (environment) and domestication histories (e.g. agricultural practices, cultural preferences of human tribes) in the Ethiopian highlands compared to the Fertile Crescent might be reasons for this distinctness [5,94]. At K510, 277 Ethiopian landraces were assigned to either of two groups comprising naked (G1, 194 lines) and hulled types (G5, 83 lines). Both groups were further sub-structured according to their row type of the ear (S5 Fig.).
Although morphologically diverse, Ethiopian hulled barleys harboured a relatively low level of nucleotide diversity as indicated by the lowest gene diversity value, the second lowest level for allelic richness and only five group specific rare alleles. These results provide further evidence that Ethiopian barleys went through a major genetic bottleneck followed by adaptation to climatic (e.g. rainfall patterns, altitude) and edaphic conditions in the Abyssinian highlands (Table 4,  S6 Table) [87,[95][96][97][98]. It is interesting that the early flowering haplotype IV at HvCEN (which derived from major haplotype II) predominated in genotypes assigned to G5 (89%) [80]; S1 Table). Furthermore, based on our preliminary data set at PpdH1 (Sharma et al. unpublished), most lines from G5 carry insensitive alleles, thus in combination with haplotype IV at HvCEN, providing favorable alleles for the two growing seasons of barley in Ethiopia -Meher and Belg [87].
Interestingly, Ethiopian naked barleys (G1) harbored 57 group specific alleles (second largest number found) and higher value for allelic richness compared to G9. Pairwise comparison of F st values between STRUCTURE inferred groups showed that G1 is most closely related to G5 (0.35) supporting a common origin of Ethiopian barleys. However, G1 and G9 (the two naked groups) are genetically less related (0.49), while G9 is closer connected to G7 (F st 0.29). Thus, our data suggest at least two evolutionary lineages of naked barleys, both of which probably originated in the eastern Fertile Crescent from a monophyletic natural mutation (17 kb deletion harboring an ethylene response factor (ERF) family transcription factor gene) at the nud locus on chromosome 7H [99][100][101]. However, resequencing larger germplasm sets at the nud locus are required to test this hypothesis and to shed more light on the origin of naked barley. Interestingly, outside Ethiopia, two-rowed naked landraces were mainly sampled from Iraq and Iran, while six-rowed naked types were collected further east (Afghanistan) [102,103].
Interestingly, the second highest variation explained among groups by AMOVA was found between all Ethiopian naked vs. all non-Ethiopian naked barleys (32.02%) (and thus supporting two evolutionary lineages for naked barley), while all Ethiopian hulled vs. all non-Ethiopian hulled barleys explained only 18.50% (S4 Table). AMOVA of all hulled vs. all naked types explained 16.17% of variation, which is nearly two-fold higher compared to the variation between all two-rowed vs. all six-rowed types (8.87%). Also AMOVA of all hulled Ethiopian vs. all naked Ethiopian types (26.69) compared to all two-rowed Ethiopian vs. all six-rowed Ethiopian landraces (6.38) suggested that hulled and naked genepools are genetically more distant than the two-rowed and six-rowed clusters. Hulled and naked types probably evolved largely independent under cultivation and were domesticated for different end-use qualities [35].

Conclusions
The LRC1485 harbors great genetic diversity. However, the collection size of 1485 genotypes is not manageable in most phenotypic studies, and smaller panels are needed. The legacy CRS LRC648 established here is best suited for multienvironmental field testing under various climates. However, to work with even more manageable sets, we suggest dividing the LRC648 into two-rowed and sixrowed subpanels, depending on the trait of interest (Table 5). Increased marker coverage [80] and precise phenotypic data for LRC648 will help to identify candidate genes also for agronomic and adaptation-related traits using GWAS [27]. Re-sequencing candidate genes [104,105] or genomic regions underlying quantitative traits using next generation sequencing (NGS) approaches [106][107][108][109] can be applied for LRC648.
Seeds of the LRC648 can be requested in small quantities from IPK. Seed delivery just awaits the Standard Material Transfer Agreement (SMTA) procedure.  Table. The LRC1485 collection. a) Details of accession names, their taxonomical designations, row type and kernel coverage, collection sites, collection missions and all other information available from IPK genebank documentation System (GEBIS); and b) STRUCTURE assignments to groups K52 to K520, phenotypic data for LRC648 and haplotype information for HvCEN obtained from [74], if available are given. Accessions selected for the CRS LRC648 are indicated. Six accessions were excluded from further analysis due to large extent of missing values. NS -not selected for core group of 648 genotypes; *lost during single seed descent and multiplication. doi:10.1371/journal.pone.0116164.s012 (XLSX) S2  Table. Mantel correlogram tables. a) Mantel correlogram tables between: genetic distance and geographic distance; b) genetic distance and longitude difference matrix; c) genetic distance and latitude difference matrix; d) genetic distance and annual mean temperature; e) genetic distance and mean diurnal range; f) genetic distance and temperature of warmest quarter; and g) genetic distance and annual precipitation. 1 different distance classes are shown; 2 lower and upper boundary values for each class; 3 number of pairs for which the correlation was calculated within each distance class; 4 the mantle correlation for each class; 5