Genetic Diversity and Linkage Disequilibrium in Chinese Bread Wheat (Triticum aestivum L.) Revealed by SSR Markers

Two hundred and fifty bread wheat lines, mainly Chinese mini core accessions, were assayed for polymorphism and linkage disequilibrium (LD) based on 512 whole-genome microsatellite loci representing a mean marker density of 5.1 cM. A total of 6,724 alleles ranging from 1 to 49 per locus were identified in all collections. The mean PIC value was 0.650, ranging from 0 to 0.965. Population structure and principal coordinate analysis revealed that landraces and modern varieties were two relatively independent genetic sub-groups. Landraces had a higher allelic diversity than modern varieties with respect to both genomes and chromosomes in terms of total number of alleles and allelic richness. 3,833 (57.0%) and 2,788 (41.5%) rare alleles with frequencies of <5% were found in the landrace and modern variety gene pools, respectively, indicating greater numbers of rare variants, or likely new alleles, in landraces. Analysis of molecular variance (AMOVA) showed that A genome had the largest genetic differentiation and D genome the lowest. In contrast to genetic diversity, modern varieties displayed a wider average LD decay across the whole genome for locus pairs with r2>0.05 (P<0.001) than the landraces. Mean LD decay distance for the landraces at the whole genome level was <5 cM, while a higher LD decay distance of 5–10 cM in modern varieties. LD decay distances were also somewhat different for each of the 21 chromosomes, being higher for most of the chromosomes in modern varieties (<5∼25 cM) compared to landraces (<5∼15 cM), presumably indicating the influences of domestication and breeding. This study facilitates predicting the marker density required to effectively associate genotypes with traits in Chinese wheat genetic resources.


Introduction
Bread wheat (Triticum aestivum L.) is one of the most important cereal crops worldwide, including China. Wheat is grown in 30 of China's 31 provinces in 10 major agro-ecological zones based on wheat type, growing season, and varietal response to temperature and photoperiod [1,2]. China is also regarded as one of the centers of diversity of common wheat [3]. Due to a long cultivation history and artificial selection in different ecological regions, about 23,135 domesticated accessions (11,694 landraces and 11,441 modern varieties) constitute the Chinese basic collection conserved in the national genebank [http://icgr.caas.net.cn/cgris_english.html]. Recently, a candidate wheat core collection (5,029 accessions) was established based on geographical regions, ecotypes, and 21 agronomic and botanic characters of the basic collections [3]. According to the utility of core collections in crop wild relatives [4], using a strategy for unlocking genetic potential in crops proposed by Tanksley and McCouch [5], both a core collection, with 1,160 accessions (5% of the national collection) representing 91.5% of the genetic diversity, and a mini core collection consisting of 262 accessions with an estimated 70% representation of the genetic variation in the full collection, were constructed based on 4610 5 SSR data-points [6]. This mini core collection is a suitable platform for in-depth evaluation, effective utilization and genetic research in Chinese wheat genetic resources [7,8].
Linkage disequilibrium (LD), or nonrandom association of alleles between loci (linked or unlinked), is becoming increasingly important for identifying genetic regions associated with agronomic traits [9][10][11][12]. Recent, genome-wide LD studies were performed on various crop plants, such as maize (Zea mays L.) [13][14][15], rice (Oryza sativa L.) [16,17], barley (Hordeum vulgare L.) [18,19], sorghum (Sorghum bicolor L. Moench) [20], durum wheat (T. turgidum L. var. durum) [21] and soybean (Glycine max L. Merr.) [22,23]. From an analysis of 242 genomic SSRs among 43 elite US wheat cultivars, Chao et al., [24] reported genome-wide LD estimates of generally less than 1 cM for genetically linked locus pairs, and that most of the LD was between loci less than 10 cM apart. Somers et al., [25] genotyped 189 bread wheat accessions at 370 loci and 93 durum wheat accessions at 245 loci to examine linkage disequilibrium across the genome, and found that LD mapping of wheat can be performed with simple sequence repeats to a resolution of ,5 cM. Most of the diversity and LD analyses on wheat were undertaken at the whole genome level, with the exception of two recent studies at the chromosome level [26,27]. Breseghello and Sorrells [26] found consistent LD of less than 1 cM for chromosome 2D and about 5 cM in the centromeric region of 5A using 33 and 20 SSR markers, respectively. Horvath et al., [27] suggested that chromosome 3B had a lower diversity than average for the entire B-genome; LD was weak in all materials studied, and marker pairs in significant LD were generally concentrated around the centromere in both arms and at distal positions on the short arm. However, all LD studies to date were based on limited numbers of loci and small sample sizes. It would be valuable to estimate LD decay in bread wheat at the whole genome level and with a larger genetic representation of wheat genotypes.
Based on genotyping of 250 accessions mostly from the Chinese bread wheat mini core collection (70% genetic diversity of the initial collection) using 512 microsatellite loci distributed over all 21 chromosomes, the objectives of this study were: 1) to evaluate the allelic diversity within the Chinese wheat collection; 2) to analyze the population structure and compare the diversity level between landraces and modern varieties; 3) to investigate genetic differentiation of wheat genomes within the two gene pools; and 4) to examine the extent and genomic structure of LD between pairs of SSR markers on both genome-wide and chromosome scales.
The results of this study should describe the level of genetic diversity and linkage disequilibrium decay of a representative Chinese collection for breeding and genetic research, and provide a molecular basis to enrich genetic diversity of bread wheat worldwide.

Overall Diversity of Chinese Wheat Collections
The genetic characteristics of the 250 member Chinese wheat mini core collection based on 512 microsatellite loci are listed in Table 1. Among 512 SSR loci, 99.4% (509) were polymorphic with just 3 being monomorphic. A total of 6,724 alleles ranging from 1 to 49/locus were detected. PIC values ranged from 0 to 0.967 and the total number of rare alleles with a frequency of less than 5% reached 4,424 (65.8%), indicating that many new alleles occurred in the mini core collection. As expected from the way in which the collection was constructed, the combination of mean genetic richness (13.1) and genetic diversity index (0.650) indicated high levels of polymorphism.

Genetic Structure of the Wheat Collection
Population structure of whole collection was investigated using a Bayesian clustering approach, to infer the number of clusters (populations) with STRUCTURE v2.2 software [28]. The structure result at K = 2 was the best separator providing the highest delta k value ( Figure 1).
Principal coordinate analysis also indicated two major supgroups within the mini core collection (Figure 2). A large proportion of accessions formed one sub-group indicated to the left, and the other sub-group (right) included accessions predominantly in the modern cultivar sub-group. The greater scattering of the landrace sub-group indicated its higher diversity. Overlapping between the two gene pools indicated by intermediates in both sub-groups, was probably caused by breeding activities in the 1940s-1950s. During this period new varieties were produced from landrace6introduced cultivar hybrids [2]. Consistent with previous studies [29][30][31], principal coordinate analysis of the mini core collection clearly indicated that Chinese landraces and modern varieties comprised separate sub-groups of genotypes.

Genetic Characteristics within the Landrace and Modern Variety Sub-groups
The basic statistics of genetic diversity between the landrace and modern variety sub-groups at the genome level are listed in Table 2. In total, 6,122 alleles ranging from 1 to 46 were identified at 512 SSR loci in the landrace sub-group, compared with 5,004 alleles ranging from 1 to 35 in the modern variety sub-group. Similarly, private alleles of the landrace were 1,720 (28.1%), but just 602 (12.0%) for modern varieties. Correspondingly, there were 3,833 (62.6%) rare alleles with frequencies ,5% for the landrace sub-group whereas this number was 2,788 (55.7%) for modern varieties, indicating higher genetic variation in novel alleles for the landraces than in modern varieties. Allele number per locus ( Figure 3A) and PIC per locus ( Figure 3B) for all SSR loci were continuously distributed in both landrace and modern varieties. Within the sub-groups allelic numbers per locus ranged from 4 to 13, and PIC values ranged from 0.6 to 0.9, indicating high polymorphism levels in both sub-groups. Both mean genetic richness (12.0) and genetic diversity indices (0.640) of the landraces were higher than those for modern varieties at 9.8 and 0.628, respectively. Consistent with landraces having higher diversity than modern varieties at the whole genome level, these relationships were retained when the three genomes were individually compared. To eliminate the influence of sample size on evaluations of genetic diversity, allelic richness calculated following rarefaction on samples of 68 accessions per sub-group  likewise indicated that landraces had higher genetic diversity than modern varieties (Table 2). A comparative analysis of genetic characteristics between subgroups was performed at the chromosome level (Table 3). For all chromosomes, the total number of alleles for landraces ranging from 130 to 450 was higher in modern varieties (99 to 361). Again, comparing 68 accessions from each sub-group confirmed that landraces had more alleles per locus than modern varieties at the chromosome level. Except for chromosomes 1B, 5A and 5B, landraces had much higher PIC values than modern varieties for all other chromosomes. The total number of private alleles for landraces ranging from 26 to 140 was higher than that of modern varieties (6 to 49) on all individual chromosomes, as well as the distribution of rare alleles in the two gene pools. The mean F st value for all chromosomes was 0.021 ranging from 0.010 to 0.035 between landraces and modern varieties. Chromosome 3A (0.035) provided the highest genetic differentiation and 1D (0.010) the lowest (Table 3).
By comparing genetic diversities with the parameter PIC value between the landrace and modern variety sub-groups on homologous group 2 chromosomes (Figure 4), we found that genetic differentiation between them might not be on a genomewide scale, but rather on selected loci or chromosome intervals, exemplified by the chromosome 2A interval gwm558-gwm312. Within the region, the locus shows a large reduction in diversity with selection as a one of several possible explanations in modern varieties shown by the lower PIC value. Similar comparisons between all chromosomes can be deduced from Figure S2.
Comparisons of the landrace and modern variety sub-groups on the basis of genomes are shown in Table 4. Evaluated by Shannon's information index (I) and genetic distance (GD), the A genome was the most diverse and the D genome was the least. Gene flow estimated from F st (N m ) and genetic identity (GI) placed the A genome lowest among the three genomes, while the D genome ranked first. This indicated that the largest genetic differentiation between the two sub-sets was within the A genome with the least differences in the D.
Analysis of molecular variance (AMOVA) between the landraces and modern varieties by genomes was also carried out ( Table 5). All sources of variation were highly significant (P,0.001) and more than 95% of the variance was explained by differences within the A, B and D genomes, whereas only a small part of the overall variance (less than 5%) could be attributed to differences between landraces and modern varieties. The AMOVA analysis also revealed similar structures of genetic differentiation consistent with the basic statistics (Table 3) when comparing the three genomes. The amount of variation in the A genome (4.73%) was higher than that of the B (4.25%) and D (3.05%) genomes again indicating that the A genome had the largest molecular variance and D genome the lowest between the two sub-groups.

Linkage Disequilibrium at the Whole Genome Level
After deletion of some low frequency alleles (,5%) in both subgroups, 495 loci were chosen to evaluate the extent of linkage disequilibrium (LD) on a whole genome level in the two wheat gene pools (Table 6). There were 143 (149), 171 (177) and 181 (186) loci on each of the A, B and D genomes available for LD evaluations. Across all 495 loci, 6,171 possible linked locus pairs (in the same linkage groups) and 116,094 unlinked locus pairs (from different linkage groups) could be detected in both sub-groups. Among linked locus pairs, 149 (2.41%) of 4,577 compared were in LD at the P,0.001 level for landraces, whereas there were 275 (4.46%) of 4,736 in significant LD among modern varieties. In addition, the numbers of locus pairs in LD with r 2 .0.1 or r 2 .0.2 in modern varieties were also relatively higher than those in landraces. Furthermore, the mean r 2 for all significant LD (P,0.001) in modern varieties (0.049, ranging from 0.015 to  0.348) was still larger than for landraces (0.030, ranging from 0.008 to 0.371). Although the landraces possessed more significant unlinked locus pairs (P,0.001) than modern varieties, i.e. 1,509 (1.30%) vs 1,019 (0.88%), modern varieties had higher r 2 value in other parameters. LD comparisons on a genome basis showed similar trends of higher LD in modern varieties than in landraces, even though there were very low genome-wide LD levels in both sub-groups. Plots of significant r 2 values (P,0.001) between locus pairs in different genomes of the two sub-groups ( Figure 5) further supported earlier results.
To reveal LD decay distances in the two sub-groups on a whole genome scale, we plotted percentage of locus pairs with significant (P,0.001) LD and mean r 2 among distance intervals for each gene pool ( Figure 6). The percentage of locus pairs in significant (P,0.001) LD decreased as genetic distance increased, and there were higher scales of significant LD within 10 cM generally. However, mean r 2 along distance intervals presented an uneven distribution, i.e. there were some points with relatively higher mean r 2 at larger intervals. Considering lower LD values for our samples (Table 6, Figure 6), we determined average LD decay distance in the different genomes for locus pairs with r 2 .0.05 at P,0.001 in the landrace and modern variety sub-groups (Table 7). Mean LD decay distance for landraces at a whole genome level was ,5 cM, with higher LD decay distances in modern varieties for the same genomes. For B, D and the whole genomes, the decay distances were increased to 5-10 cM, but 15-20 cM for the A genome in the modern variety sub-group, which might be caused by demographic history for genome-level changes on modern varieties.

Linkage Disequilibrium at the Chromosome Level
After scanning the extent and structure of LD between landraces and modern varieties on a whole genome scale, the same evaluations were performed at the single chromosome level based on 495 SSR loci in the two gene pools (Tables 8 and 9). Comparing SSR locus pairs in significant (P,0.001) LD and mean r 2 values between landraces and modern varieties (Table 8), the number of mean locus pairs in significant LD was 7.1 (2.41%) ranging from 1 to 30 for the landrace sub-group, and 13.1 (4.47%) ranging from 2 to 35 in the modern variety sub-group. Correspondingly, mean r 2 of the landrace sub-group was only 0.033 ranging from 0.011 to 0.140, whereas in the modern variety sub-group it was 0.053 ranging from 0.026 to 0.194. At the individual chromosome level, except for chromosomes 1A and 6D, the modern varieties had more SSR locus pairs in significant LD. Nevertheless, the mean r 2 for modern varieties was still larger than for landraces for all chromosomes except 4A and 4D. Therefore, compared with the landrace subgroup, the modern variety gene pool still had higher numbers of SSR locus pairs in significant LD and higher mean r 2 values for almost all wheat chromosomes. However, these parameters were not compared among all chromosomes within the same gene pool because of a big difference of loci selected on each chromosome in the present study. Average LD decay distances on different chromosomes for locus pairs with r 2 .0.05 at P,0.001 in the two sub-groups are depicted in Table 9. It was interesting that LD decay distance was ,5 cM for 19 of 21 chromosomes in the landrace sub-group, but 5-10 cM for 2B and 10-15 cM for 5A. In the modern variety sub-group, chromosomes 1B, 1D, 2D, 3A, 3B, 3D, 4A, 4B, 6B, 6D, 7B had ,5 cM LD decay distances similar to the landraces, but the other 10 chromosomes showed wider LD decay distances than those of the landraces, especially the values 20-25 cM for 5A and 7D. These general descriptions of LD decay distance provide important information concerning decisions on marker densities for future association analyses at the chromosome level, and also guidance on different strengths of selective signals in breeding imprinted on each chromosome.

Genetic Relationship and Population Structure
In our previous studies, 43 cornerstone breeding parents used before 1980 and widely grown varieties in current use in China [29], 96 random samples with maximized genetic diversity [30], a 340 candidate core collection from the Northwestern Spring Wheat Region [31], and a 1,110 member Chinese core collection [6], consistently demonstrated that Chinese landraces and modern varieties are relatively independent genetic sub-groups.  To address possible limitations in the number of loci used in above-mentioned studies, we employed 512 microsatellite loci identifying 6,724 alleles to obtain a genetic structure of Chinese wheat genetic resources using principal coordinate analysis and Bayesian clustering approaches. The larger number of alleles identified in 512 SSR loci also indicated that individual microsatellite loci have higher information content [32][33][34]. Using a relatively large set of molecular data-points, the Chinese mini core collection was divided into two major sub-groups basically, landraces and modern varieties. This was considered consistent with the history of Chinese wheat breeding. Within each sub-group there were some intermediate genotypes. Adopting with a threshold probability .0.50 to fitting one of the clusters [24,26], 78 of 93 modern varieties were clearly assigned to one sub-group and 135 of 157 landraces to the other. Examples of the 37 varieties with a lower probability (,0.50) of fitting either sub-group included Lianglaiyoubaipixiaomai (Inner Mongolia), Bihongsui (Inner Mongolia), Mingxian 169 (Shanxi), Shite 14 (Hebei), Fuzhuang 30 (Shaanxi), and Jingyang 60 (Shaanxi). Even though they were arbitratrily classified into modern varieties, most of them were selections of landraces or were from hybrid progeny of landraces [2,6], and still retained most of the genetic characteristics of landraces.

Genetic Diversity in Chinese Wheat Gene Pools
Allelic diversity analysis in this study revealed that the total number of alleles amplified at 512 SSR loci in 250 accessions was up to 6,724 (13.1 alleles per locus on average, ranging from 1 to 49), and polymorphism information content values ranged from 0 to 0.967 (mean 0.650). These values were higher than the previously reported estimates of SSR marker diversity in wheat [24][25][26]35,36]. And, allele number was ranged from 4.81 to 10.5 and mean PIC value from 0.46 to 0.62 for above-mentioned studies. On the other hand, a genetic diversity of 0.77 and 18.1 alleles [37], 14.5 alleles and a genetic diversity of 0.662 [38], and, 23.9 alleles per locus over 38 SSR markers [39] were also reported. Comparatively, the high SSR allele diversity found in the minicore collection approximately reflects the genetic representation of the entire set of Chinese wheat collections. It is very interesting that there were a total of 4,424 alleles with frequencies of less than 5% among all accessions, and these so-called rare alleles represented 65.8% of all alleles detected. Similar with common alleles, rare variants or new alleles unselected artificially also played an important role in genome-wide genetic research [40].
The amounts of genetic diversity in the two gene pools and PIC values were significantly different at both the genome (Table 2) and individual chromosome (Table 3) levels, in terms of allelic richness calculated using equivalent numbers of accessions from each sub-group. Results of allelic diversity using 512 SSR markers indicated that the landraces (mean genetic richness: 12.0; genetic diversity index: 0.640; allelic richness: 10.7) actually had higher genetic diversity than modern varieties (mean genetic richness: 9.8; genetic diversity index: 0.628; allelic richness: 9.5). This was consistent with a previous study analyzing 1,160 a Chinese wheat core collection composed of 762 landraces and 348 modern varieties using 78 microsatellite markers [6]. Like the whole genome, similar results were obtained for individual genomes and chromosomes. This implied there were more potentially rare variants or new alleles in the landrace gene pool. Obviously, these could be of value for genetic research or breeding.
China has a more than four millennia history of wheat cultivation, and landraces became isolated because of limited transportation in earlier times [6]. Scientific breeding in China can be traced back only 50-90 years [2]. The history of Chinese wheat breeding shows that new varieties were usually selected from landraces in the early period, later from crosses between landraces and introduced varieties, and more recently from crosses between Chinese modern varieties. In this study, genetic analyses including Shannon's information index (I), genetic distance (GD), genetic differentiation coefficient (F st ), and analysis of molecular variance (AMOVA) between the landrace and modern variety sub-groups for different genomes suggested that the A genome (4.73%) was significantly more variable than the B (4.25%) and D (3.05%) genomes, indicating stronger selective pressure on the A genome during Chinese wheat breeding. However, a selection sweep imprinted across genomes suggested that some important loci or chromosomal intervals rather than whole genomes (or chromosomes) were responsible for the differences (Figure 4). This is consistent with findings in sunflower reported by Chapman et al., [41].

LD Level in Chinese Bread Wheat
A number of LD mapping studies in wheat were performed at the genome or chromosome levels [24][25][26][27], so it is important to examine the extent of LD in Chinese wheat genomes. This determines the genetic distances over which LD will decay back to a random association of alleles and facilitates prediction of marker density needed to effectively associate genotypes with traits [11]. In the present study 512 SSR loci with a mean marker density of 5.1 cM per locus, ranging from 2.2 to 9.4 cM for all 21 chromosomes, were used to measure LD in Chinese wheat genetic resources at both the genome and chromosome levels (Tables 6-9, Figures. 5 and 6).
Population structure is one of several important factors that have strong influences on LD, besides recombination, mutation, population size, genetic drift, population mating pattern, admixture, and selection [10]. The presence of population stratification and an unequal distribution of alleles within groups can result in nonfunctional, spurious associations [42]. In our LD estimations, we took into account the effect of population structure by subdividing the genetic resources into two main gene pools, i.e. Chinese landraces and modern varieties, which were validated by the results of genetic structure with the software STRUCTURE v2.2 [28] (Figure 1) and principal coordinate analysis using NTSYS-pc version 2.1 software [43] (Figure 2).  Table 6. SSR locus pairs in significant (P,0.001)linkage disequilibrium (LD) and r 2 values between Chinese landraces and modern varieties. LD decay distance with r 2 .0.05 at P,0.001 was consistent for all linked locus pairs in each gene pool. Mean LD decay distance for the landraces at the whole genome level was ,5 cM, whereas higher values applied in the modern varieties. In detail, as to B, D and whole genomes, the decay distance increased to 5-10 cM, and 15-20 cM for A genome in the modern variety sub-group, possibly due to demographic history for genome-level changes [44]. At a chromosome level, LD decay distance was ,5 cM for 19 of the 21 chromosomes in the landrace sub-group, but 5-10 cM for 2B and 10-15 cM for chromosome 5A. As for the modern variety sub-group, chromosomes 1B, 1D, 2D, 3A, 3B, 3D, 4A, 4B, 6B, 6D, and 7B had ,5 cM values similar to the landraces, but the other 10 chromosomes showed wider LD decay distances extending to 20-25 cM for 5A and 7D. This indicated that these two chromosomes may carry more QTLs or genes related to important agronomic traits that were strongly selected in breeding [12]. Our results further demonstrated populationdependent and genome-dependent LD characteristics in comparison with genome-wide LD estimates of less than 1 cM in 43 US wheat elite cultivars [24], ,5 cM for LD decay distance across the genome among 189 bread wheat accessions from western Canadian wheat breeding programs [25], less than 1 cM on chromosome 2D and about 5 cM in the centromeric region of 5A in 95 soft winter wheat from the eastern United States [26].
In general, a significance level of P,0.001 was adopted as a comparison threshold. Thus, Somers et al., [25] found that bread and durum wheat collections had 47.9% and 14.0% of all locus pairs in significant LD, but within the groups only 0.9% (bread wheat) and 3.2% (durum wheat) of locus pairs were in LD with r 2 .0.2. Malysheva-Otto et al., [19] also showed that 100% of locus pairs in significant LD based on r 2 .0.05 at P,0.001 in a wide set of barley varieties, but this fell to 45% in a European 2-rowed spring barley subgroup. This indicated that the number of loci in significant LD, as well as the extent of LD, was clearly dependent on the population structure and on different genomes. In the present study, 2.41% and 4.46% of locus pairs for the Chinese landrace and modern variety gene pools were in significant LD (P,0.001) on a whole genome level, but only 0.02% and 0.03% were in LD based on r 2 .0.2 within the two gene pools ( Table 6). The extremely low values of LD in the two clusters can be seen as evidence that many of the recombination events in past breeding history have been maintained and fixed in homozygous self-fertilizing bread wheat, as  well as a reflection of the higher genetic diversity that is maintained in the mini core collection in Chinese wheat genetic resources. Understanding the patterns of LD across the genome will facilitate prediction of marker densities required for efficient association of genotypes with traits in Chinese wheat genetic resources at both the genome and chromosome levels.

Plant Materials
A total of 250 wheat accessions were used in the present study. These included 93 modern varieties and 157 landraces (Table S1), among which 245 (98%) were from the Chinese wheat mini core collection constructed in our group [6,31]. This collection representing just 1% of the national collection has more than 70% of its genetic diversity.

Microsatellite Analysis
Genomic DNA of all materials was extracted using lyophilized pooled young leaves of ten seedlings following Sharp et al., [45]. A total of 512 pairs of SSR primers with good genome coverage were selected to genotype the collection. The primers comprised 212 GWM [46], 114 BARC [47], 89 WMC [48], 66 CFD, 18 Figure S1). In total, these SSR loci covered 2,631.3 cM with a mean genetic distance of 5.1 cM between adjacent loci (Table S3). More information concerning these wheat microsatellite markers is available in the GrainGenes 2.0 database [http://wheat.usda.gov/GG2/index.shtml]. Fluorescence-labeled primers that were relatively evenly distributed on the 21 wheat chromosomes were synthesized at Applied Biosystems Company. An ABI 3730 Analyzer (Applied Biosystems) was used to capture amplification products by a fluorescence detection system for microsatellite markers. More detailed experimental procedures are given in Hao et al., [31]. Fragment sizes were evaluated using GeneMapper v3.7 software (Applied Biosystems), and the molecular data-points for all SSR markers are listed in Table S4.

Data Analysis
Population structure analysis for the 250 Chinese wheat accessions was performed using the molecular datasets of 512 whole-genome SSR markers with STRUCTURE v2.2 software [28]. We adopted the ''admixture model'', burn-in period equal to 50,000 iterations and a run of 100,000 replications of Markov Chain Monte Carlo (MCMC) after burn in. For each run, 5 independent runs of STRUCTURE were performed with the number of clusters (K) varying from 1 to 11, leading to 55 Structure outputs. We then estimated the number of subpopulations and the best output on the basis of the Evanno criterion [50].
Genetic dissimilarities between accessions were calculated using the simple matching coefficient in DARwin software [51]. Cluster analysis and dendrogram tree construction were performed based on dissimilarity matrices with the un-weighted pair-group method using arithmetic averages (UPGMA). Principal coordinate analysis was also used to reveal the relationships among the 250 accessions based on the above dissimilarity matrices, with the help of NTSYS-pc version2.1 software [43]. Basic statistics of genetic diversity including total number of alleles, and polymorphism information content (PIC) at each SSR locus according to the formula PIC = 1-gp i 2 [52] where p i is the frequency of the ith allele, were carried out with PowerMarker v3.25 [53]. Genetic differentiation between landraces and modern varieties on a genome basis was detected with POPGENE software [54] using coefficients gene flow (N m ), genetic distance (GD), genetic identity (GI), Shannon's information index (I) and coefficient of gene differentiation (F st ). The genetic variation within and among populations of wheat accessions for different genomes was evaluated using analysis of molecular variance (AMOVA) implemented in Arlequin v3.11 software [55]. Due to the different sample sizes of the two sub-groups, an allele rarefaction method was used to standardize the allelic richness of samples [56].
Linkage disequilibrium (LD) between markers, including the pairwise estimated squared allele-frequency correlations (r 2 ) and significance of each pair of loci [57], was calculated with the dedicated procedure of the TASSEL software [58]. In the process of LD estimation, SSR datasets were filtered for rare alleles with frequencies of less than 5% in the whole collection and computed using 100,000 permutations.