Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Identifying Loci Influencing 1,000-Kernel Weight in Wheat by Microsatellite Screening for Evidence of Selection during Breeding

  • Lanfen Wang,

    Affiliation Key Laboratory of Crop Germplasm Resources and Utilization, Ministry of Agriculture, The National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing, China

  • Hongmei Ge,

    Affiliation Key Laboratory of Crop Germplasm Resources and Utilization, Ministry of Agriculture, The National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing, China

  • Chenyang Hao,

    Affiliation Key Laboratory of Crop Germplasm Resources and Utilization, Ministry of Agriculture, The National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing, China

  • Yushen Dong,

    Affiliation Key Laboratory of Crop Germplasm Resources and Utilization, Ministry of Agriculture, The National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing, China

  • Xueyong Zhang

    Affiliation Key Laboratory of Crop Germplasm Resources and Utilization, Ministry of Agriculture, The National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing, China

Identifying Loci Influencing 1,000-Kernel Weight in Wheat by Microsatellite Screening for Evidence of Selection during Breeding

  • Lanfen Wang, 
  • Hongmei Ge, 
  • Chenyang Hao, 
  • Yushen Dong, 
  • Xueyong Zhang


Chinese wheat mini core collection (262 accessions) was genotyped at 531 microsatellite loci representing a mean marker density of 5.1 cM. One-thousand-kernel weights (TKW) of lines were measured in five trials (three environments in four growing seasons). Structure analysis based on 42 unlinked SSR loci indicated that the materials formed two sub-populations, viz., landraces and modern varieties. A large difference in TKW (7.08 g, P<0.001) was found between the two sub-groups. Therefore, TKW is a major yield component that was improved in the past 6 decades; it increased from a mean 31.5 g in the 1940s to 44.64 g in the 2000s, representing a 2.19 g increase in each decade. Analyses based on a mixed linear model (MLM), population structure (Q) and relative kinship (K) revealed 22 SSR loci that were significantly associated with mean TKW (MTKW) of the five trials estimated by the best linear unbiased predictor (BLUP) method. They were mainly distributed on chromosomes of homoeologous groups 1, 2, 3, 5 and 7. Six loci, cfa2234-3A, gwm156-3B, barc56-5A, gwm234-5B, wmc17-7A and cfa2257-7A individually explained more than 11.84% of the total phenotypic variation. Favored alleles for breeding at the 22 loci were inferred according to their estimated effects on MTKW based on mean difference of varieties grouped by genotypes. Statistical simulation showed that these favored alleles have additive genetic effects. Frequency changes of alleles at loci associated with TKW are much more dramatic than those at neutral loci between the sub-groups. The numbers of favored alleles in modern varieties indicate there is still considerable genetic potential for their use as markers for genome selection of TKW in wheat breeding. Alleles that can be used globally to increase TKW were inferred according to their distribution by latitude and frequency of changes between landraces and the modern varieties.


China is the largest wheat producer and consumer in the world, with 23.6 million ha, a mean 4,762 kg/ha, and a 112 million tonnes total production in 2008. There is long history of wheat cultivation in China extending over more than 2,000 years. Production extends from latitude 22°49′ to 48°03′ and much progress has been achieved in breeding and production in the last 60 years. Average wheat yields increased annually by 1.9% and production increased more than six-fold [1]. Thousand-kernel weight (TKW), as one of three major components of yield in wheat, has steadily increased over the period. Based on phenotyping of 1,800 cultivars released since the 1940s, TKW increased from a mean 31.5 g in the 1940s to 44.64 g in the 2000s, with a 2.19 g increase in each decade (Zhang et al. unpublished). Previous studies also showed that TKW was one of the three yield components with highest heritability, which varied from 59% to 80% [2]. Most genes affecting TKW have additive effects. Selection for TKW in the early generations of breeding is highly effective [2].

Crop domestication is an artificial evolutionary process of combining traits to meet human needs. During the domestication of cereals, for example, reductions in plant height to avoid lodging, large spikes, increased grain size, and disease resistance, were selected and conserved. Modern breeding involved further directional selection, which resulted in lower genetic diversity within the domesticated population than in the entire species. At the genome level, only a small number of genes (alleles) were positively selected and conserved [3]. Many other alleles at specific loci were gradually eliminated, leading to reduced genetic diversity at these loci compared with those present in the entire species. Diversity in genomic regions flanking the target genes was simultaneously reduced because of linkage. This phenomenon is referred to as linkage drag, hitchhiking, or selection sweep [4]. Hitchhiking generally leads to reduced diversity at target loci, linkage disequilibrium at loci surrounding the selected gene, and changed distribution patterns of alleles within the selected region [5], [6]. These effects also provide the bases for association of neutral markers, such as SSR and DArT, with agronomic traits [6][10].

We established a Chinese common wheat core collection (CC) and a mini core collection (MCC) after genotyping 5,029 candidate accessions at 78 SSR loci [11]. Choice of candidate entries was based on documentary data in the national gene bank [12]. The MCC contains 231 accessions, or 1% of the basic collection (23,135 accessions) with an estimated 70% representation of the genetic variation in that collection [11], [13]. The higher genetic diversity and artificial diminishment of dominant allelic frequencies in the MCC makes it a suitable population for detection of major QTLs controlling yield traits. It was shown to be a good reference set for revealing geographic distribution and time changes of important functional genes [10], [14][18]. In this study, we target loci associated with TKW to show the value of the MCC in dissecting complex yield traits in wheat. This association analysis provides useful information for marker-assisted selection in breeding wheat for increasing yield.


Phenotypic Assessment

TKWs of the Chinese mini core wheat collection were measured in 4 growing seasons and 3 environments, including Luoyang, Henan province 2002, 2005, and 2006; Shunyi, Beijing 2010; and Qingdao, Shandong 2010 (Table 1). Minor differences in mean TKW occurred among different planting environments, and there were major differences between landraces and modern varieties. The MTKW of modern varieties (39.23 g) calculated using BLUP methods based on multiple environments was significantly higher (P<0.001) than that of landraces (32.15 g), confirming that TKW was a yield trait improved by breeding. The maximum TKW was not in a modern variety, but was in a landrace. This indicates that further genes for this trait are present in landraces and can be accessed for breeding. The total agronomic data were considered in whole genome association analysis.

Table 1. Comparison of 1,000-kernel weights between landraces and modern varieties in the Chinese wheat mini core collection in the 5 environments.

Population Structure Analysis

Population structure analysis can identify locus associations that are statistically significant, but biologically invalid due to strong correlation with population structure. However, if the population structure is properly dealt with, the likelihood of spurious associations can be minimised [7], [19]. Forty-two loci distributed across every arm of the 21 wheat chromosomes were chosen to examine the population structure of entries in the mini core collection. We selected K values of assumed groups from 1 to 10. After 80 cycles of simulation, we found that K = 2 was the best separator providing the highest delta k value, and showing that the MCC entries comprised two sub-populations. One group was mainly the landraces, and the other included modern varieties and introduced lines (Fig. 1). Overlapping occurs between the two groups because in the early breeding period (1940–1960s), most of the released varieties were derived from crosses between Chinese landraces and introduced European or American varieties [20]. This was consistent with results based on 512 SSR loci using a similar set of materials [13].

Figure 1. Population structure analysis of 262 wheat cultivars based on 42 unlinked SSR loci.

a: Population structure as determined by Structure v2.2 analysis. Since Δk peaks at k = 2, the varietal set was split into two sub-groups. b: Structure analysis reveals that the 262 wheat cultivars are clustered into two sub-populations. I. Landrances. II. Modern varieties and introduced lines.

SSR Loci Associated with TKW

We firstly used the MLM model [21] to make a marker/MTKW (TKW) association analysis. Thirty-two loci were significantly (P<0.05) associated with MTKW. An association with cfa2257 on 7AL was detected in all five trials; 8 loci were detected in four trials, i.e. wmc304-1A, wmc147-1D, gwm312-2A, gwm547-3B, gwm234-5B, gwm174-5D, gwm55-6D, and wmc17-7A; 6 loci were detected in three environments, viz., gwm268-1B, cfa2234-3A, gwm156-3B, cfd266-5D, gwm356-6A and gwm471-7A; 9 loci in two environments, and eight loci were detected in one trial (Fig. 2, Table S1).

Figure 2. Genome wide association analysis of 1,000-kernel weight with SSR loci.

TKWs collected from 5 trials were used to estimate mean values (MTKW). TKW-L02, TKW-L05, TKW-L06, TKW-S10, and TKW-Q10 indicate 1,000-kernel weights in 2002, 2005 and 2006 in Luoyang (Henan province), 2010 in Shunyi (Beijing) and Qingdao (Shandong), respectively.

Among the 24 loci associated with MTKW in at least two trials, we found breeder-favored alleles with strong positive effects on MTKW at 22 loci; and they were mapped to 11 chromosomes, viz. 1A, 1B, 1D, 2A, 3A, 3B, 5A, 5B, 5D, 6D, and 7A. The 7A effect spanned four loci, including gwm471, wmc168, wmc17 and cfa2257. The genetic distance between wmc17 and cfa2257 is 2.72 cM. No stronger linkage disequilibrium (LD) was found between the two loci (r2 = 0.10, P>0.05) indicating they may not relate to a single yield gene, a result also suggested by previous QTL studies of TKW [22][26]. Three loci on chromosomes 1B and 2A were associated with MTKW. The allelic effect at each locus on MTKW was estimated by ANOVA (SPSS16). Significant or extremely significant differences in MTKW were detected between varieties with the favored allele and those with other alleles. Six loci with the strongest effects, and individually explaining more than 10% of the total variation were detected on chromosomes 3A, 3B, 5A, 5B and 7A (R2>10%) (Table 2).

Table 2. Favored alleles, their frequencies, genetic effects and R2 at 22 SSR loci significantly (P<0.05) associated with MTKW.

The Distribution of Favored Alleles at Associated Loci

We estimated the frequencies of favored alleles at each of the 22 loci in the landrace and modern entries groups in the mini core collection. Except at gwm403-1B, favored allele frequencies were much higher in modern varieties than in the landraces (Figure 3, Table S2). This reflects positive selection of those alleles in breeding programs.

Figure 3. Comparative frequencies of favorable alleles at 22 loci for landraces and modern varieties in the Chinese wheat mini core collection.

Modern varieties usually have fewer allelic variations than landraces [13]. However, the major allele frequency is not always higher in modern varieties than in landraces (Table S3). At the 42 loci without obvious signs of selection, the average major allelic frequency increased from 28.46% in landraces to 32.87% in modern varieties, and the t-test indicated that the change was not significant (t = 1.661, p = 0.1), but with equal variances (F = 1.23)<F0.05 = 1.69). However, at the 22 loci associated significantly with MTKW, their average frequency increased from 11.58% to 30.52%, an extremely significant difference (Table S2; t = 4.591, p = 7.95E-05), and unequal variances (F = 5.13, F0.05 = 2.07).

Among the four loci with favored allelic frequencies higher than 50% in modern varieties, cfa2234-3A, barc56-5A and wmc17-7A were among the six loci with the highest effects on phenotype variation of TKW (Table 2). In addition, dramatic increases were also detected at wmc17 and cfa2257 on 7A (Table S2); these were also among the six loci (Table 2). The increased numbers and frequencies of favored alleles were accompanied by increased mean MTKW in modern varieties (Table 3). Therefore, we believe that the increase in favored allele frequencies at the 22 loci was mainly caused by selection for grain size over the five decades before 2000 (Table S2).

Table 3. Number, frequency and mean MTKW of landraces and modern varieties in the mine core collections.

Accumulation of Favored Alleles from Breeding

Positive selection of favored alleles at key loci was also clearly implicated by changes in their number and frequency (Table 3). The best modern variety (44.01 g) had 15 favored alleles at 22 critical marker loci, whereas the best landrace (38.84 g) had 10. Almost 92% of the landraces had 0–5 favored alleles, whereas 85.2% of modern varieties had more than 5 favored alleles, ranging from 5–15. Modern breeding has significantly promoted the accumulation of favored alleles in varieties (Fig. 4). These results illustrate the reliability of identifying favored alleles. Importantly, no modern cultivar has favored alleles at all 22 marker loci (Table 3, Fig. 4), indicating further capacity for improvement of TKW by maker-assisted selection.

Figure 4. Accumulation of favorable alleles in landraces and modern varieties from different regions of China.

Modern breeding promoted the accumulation of favored alleles.

Geographic Distribution of Favored Alleles at the Six Loci with the Highest Contributions to TKW

Closely located loci cfa2257 and wmc17 on chromosome 7AL with the highest contributions to TKW were chosen to analyze their distributions in different production regions in China (Figure 5). The favored alleles (182 bp and 184 bp) of wmc17 occurred in both landraces and modern varieties, but their frequencies were significantly higher in modern varieties than in landraces. Among landraces the highest frequency of the favored allele with high TKW was in region VI with region VII in second place. Both of the regions grow spring wheats with high TKW. For modern varieties, regions IV and VI had the highest frequency, with VII in third place. Other regions showed large variations in the frequencies of favored alleles. Regarding cfa2257, the highest frequency of the favored 129 bp allele was in region V with region VI in second place, a little lower than its frequency in landraces in region V. This allele was not present in landraces from 5 wheat regions (I, II, VII, VIII, and IX), a situation clearly different from the modern variety group where all modern lines, for example in region IX, carried the favored allele. This allele was also common in varieties from regions VI and VIII and occurred in the other regions. The geographic distributions of favored alleles at four other loci are included in Figure S1.

Figure 5. Favored alleles and their frequencies at the cfa2257 and wmc17 loci on chromosome 7AL in the Chinese wheat mini core collection in ten ecological regions in China.

A and B indicate wmc17 frequencies in landraces and modern varieties, respectively; C and D indicate cfa2257 frequencies in landraces and modern varieties, respectively. Zone I: North winter region Zone II: Yellow and Huai River valleys, winter wheat region. Zone III: Middle and Low Yangtze River valleys, winter wheat region. Zone IV: Southwestern winter wheat region. Zone V: Southern winter wheat region. Zone VI: Northeastern spring wheat region. Zone VII: Northern spring wheat region. Zone VIII: Northwestern spring wheat region. Zone IX: Qinghai-Tibetan Plateau, spring-winter wheat region. Zone X: Xinjiang winter-spring wheat region. Source: Zhuang QS [20].

Genetically Additive Effects of Favored Alleles on TKW

To determine if additive effects occur among the favored alleles at the 22 loci, we estimated the mean TKW of varieties with different numbers of favored alleles. There was a high linear correlation (Y = 1.294X+29.33, R2 = 0.95) between MTKW and number of favored alleles (Figure 6) indicating clearly additive effects of favored alleles. However, an obvious negative interaction among loci after the number of favored alleles reached 10 and resulting in larger differences between real and expected TKW cannot be ignored (Fig. 6). A confounding factor was that some subgroups included only one or two varieties (Table 3).


SSR Loci Associated with TKW may Represent Major QTLs affecting Yield

According to Nordborg and Weigel [27], association mapping represents next-generation plant genetics. It uses ancestral gene associations and natural genetic diversity within a population to dissect quantitative traits, and is built upon the presence of linkage disequilibria. It offers a potentially powerful approach for mapping causal genes with modest effects [28], [29]. The association results and allelic effects are influenced by population type and size, and the breeding system of the species. Core collections are very suitable for association analysis of highly heritable and domestication traits [8]. In the Chinese wheat mini core collection, the mean LD decay distance for landraces at the whole genome level was <5 cM compared to 5–10 cM in modern varieties. Only 0.05% of marker pairs in significant (P<0.001) LD reached threshold levels of r2 = 0.2 [13]. The observed LD is much lower than for CIMMYT historical breeding materials, but is similar to a population of European varieties released since the beginning of the last century [9], [30]. The overall population structure is very weak, but the two sub-populations, landraces and modern varieties, were clearly distinguished [11], [13]. This separation makes the MCC population suitable for marker/trait association analysis. Earlier analyses revealed differences in regard to latitude distribution and changes over time in important genetic haplotypes, such as those of Pina and Pinb [14], Ppd-1 [15], GS2 (glutamine synthetase) [17], TaGW2 [18], TaSus2 [10], [16]. However, compared with the candidate core entries, the frequencies of predominant alleles declined to enable the maximum representation of allelic variation at each locus [6], [11]. This likely reduced the association power, allowing the major QTLs to be targeted [8], [10], [29]. This was supported by the data in Table 2, i.e. most of the associated loci were detected within QTL intervals controlling TKW. Comparative analysis of modern varieties and landraces reveals major loci that have been almost fixed in modern varieties because of positive selection in breeding. For example, in wheat, two haplotypes coding an invertase gene on chromosome 5D were detected among 384 European wheat varieties released since the 1880s, with 382 being the same haplotype, and only two being the other. The latter would obviously have a very low chance of being detected in general association mapping populations. However, in our MCC, 58 accessions carried the above minority haplotype (Jiang YM and Zhang XY unpublished data).

Integration of Association Mapping and QTL Mapping Generates More Reliable Results

Artificial selection (domestication and breeding) leaves strong foot-prints in plant genomes [4], [6], [10], [31]. Understanding the relationship between DNA sequence variation and variation in phenotypes for quantitative or complex traits will increase the speed of selection in breeding programs for predicting adaptive evolution [32]. Both linkage and association mapping aim to identify markers sufficiently closely linked to functional sequence variations (causal genes) encoding changes in phenotype, allowing breeders to select and manipulate these alleles routinely in diverse breeding populations [29].

Localization and interpretation of QTLs and associated loci provide confidence in results from association analysis [6], [27], [32]. In soybean,a high correlation (R2 = 0.83) between the distribution of SSR markers and genes suggested close association of SSRs with genes [33]. This makes us believe that SSR markers are suitable for association analyses. Most of the associated markers were found in genomic regions where genes or quantitative trait loci (QTL) influencing the same traits were found previously. This provides an independent validation of the approach. Additionally, new chromosome regions for TKW were identified in the wheat genome through association analysis. Overall, 22 SSR loci on 11 chromosomes were associated with TKW with high confidence. This is much greater than the number of QTLs mapped in any bi-parental population, indicating the dissection power of this methodology in natural populations (Table 2) [34][36]. After genotyping 254 loci in 194 F7 recombinant inbred lines, Groos et al. [37] detected nine chromosome regions controlling TKW (chromosomes 1D, 2B, 2D, 3A, 5B, 6A, 6D, 7A, 7D). These are largely consistent with our association results (Table 2) from which three QTLs, on chromosomes 2B (Xgwm148 - Xgwm374 - Xgwm388), 5B (Xgwm639 - Xgwm271 - Xgwm604) and 7A (Xcfa2049 - Xbcd1930) were detected in six environments. The QTL on 7A mapped to the middle to terminal region of 7AL, and partially overlapped the region wmc17 - cfa2257 detected in the present study. QTL controlling TKW were also detected at a homologous region of 7DL [37]. Furthermore, the association mapping result for this region is much more precise than with QTL mapping; the genetic distance between the two nearest markers being only 2.72 cM (Table 2, This raises the question of whether a single causal gene is involved. The r2 value between the two markers is about 0.1 in the MCC. Thus there may be two linked causal genes, a possibility that is consistent with the obvious geographic distribution difference in favored alleles at two loci (Fig. 5). Similarly, gwm312 and gwm372 on chromosome 2A also reflect effects of two causal genes, which formed weak LD (r2 = 0.23) in the MCC population. These examples illustrate how haplotype and LD analyses enable dissection of yield QTLs in practice [10].

In another comprehensive QTL mapping report based on 12 data sets obtained over three years of trials with 2–5 environments/year, Snape et al. [38] detected seven relatively stable QTLs controlling TKW in 11 DH populations. These QTLs were distributed on chromosomes 2A (gwm445), 2B (gwm148), 2D (wmc41), 3A (gwm428 - psp3001), 5A (gwm293), and 6A (wmc32, gwm518). The gwm445-associated QTL was not detected in the MCC population, but was detected in the core collection (1,160 entries) with a 2.89 g increase in TKW (Zhang and You unpublished); gwm445 is very close to an almost orthologous region of chromosome 2D marked by wmc41. Both gwm148 on 2B and gwm275 on 2A mapped to orthologous regions detected in our study (Table 2). Loci gwm55-6D and gwm415-6B associated with TKW may be homologous to a QTL on 6A flanked by wmc32 and gwm518 in the pericentromeric region, in which TaGW2 is located [18]( In addition, distinct changes in frequencies of SSR alleles between the landraces and modern varieties at the 22 loci caused by hitchhiking effects provided positive evidence of selection for specifically favored alleles (Fig. 4; Table S2) [4], [6].

Linear Correlation between TKW and Favored Alleles Showing the Practical Value of Genome Selection in Breeding

Compared with QTL mapping, another attribute of association analysis is the validation of favored alleles in germplasm collections [8]. For example, Röder et al. [39] mapped a major TKW QTL to the interval Xgwm295 - Xgwm1002 located in the distal telomeric bin (7DS4-0.61-1.00) in the physical map of wheat chromosome 7DS. Zhang et al. [6] found that allele Xgwm130132 underwent very strong positive selection during modern breeding. Xgwm130 maps between Xgwm295 and Xgwm1002, with a genetic distance of 1.1 cM from Xgwm295. Thus the identification of favored alleles will help in choosing parents for crossing programs, to ensure maximum levels of favored alleles across sets of loci targeted for selection, and to promote fixation at these loci [40].

Whereas linear correlations between TKW and favored alleles indicate the additive effects of QTLs or genes, the possibility of other genetic effects should not be ignored in practice. Higher standard errors when the numbers of favored alleles exceed 10 (Figure 6) reveals the possibility of threshold effects with excessive numbers of favored alleles. Another cause of the higher standard errors was that the number of varieties carrying more than 10 favored alleles was much fewer (Fig. 4).

The concept of genome-wide selection (GWS) was recently introduced in plant breeding; this method uses information from all markers, as opposed to significant markers, to evaluate the breeding value of each line [41], [42]. Frisch et al. [43] used transcription data from 46,000 oligonucleotide arrays to develop a prediction model for the value of parental maize lines in relation to the grain yield performance of their hybrid progeny. They found that predictions based on 50 well chosen genes were as accurate as predictions based on 5,000 random genes. Therefore, the combination of GWA and GWS will in future enhance the practical application of GWS in crop improvement [29]. This work paves the way for further targeted diversity mining in landrace populations and wild relatives via comparative genomics analysis. The most interesting example is that genes on a Thinopyrum ponticum group 7L chromosome enhance grain yield by 13% in the genetic background of newly released varieties [44], [45]. The 7L gene may be orthologous to the TKW chromatin block flanked by wmc17 and cfa2257 on 7AL (Table 2) [46]. These examples indicate that increased grain weight in wheat is feasible using genomic selection.

Frequency and Geographical Distribution of Favored Alleles Indicate Potential for Yield Increases by Selection of Loci Associated with TKW

In wheat, some genes or SSR loci associated with yield vary across latitudes, such as TaSus2 on chromosome 2B [16], TaGW2 on 6A [18] and gpw7596 on 7B (EST-SSR) [47]. Favored alleles usually occur at relatively lower latitudes. This might indicate that the functional genes at these loci, including mapped alleles and those linked with markers, might be responsive to sunlight and temperature during the growing season [48], [49]. None of the 6 SSR loci with determination coefficients higher than 10% associated with favored MTKW alleles cfa2234142 (3AL), gwm156311 (3BS), barc56119 (5AS), wmc17182, 184 (7AL) and cfa2257129 (7AL) had obvious correlations with latitude (Fig. 5, Fig. S1). They can therefore be used globally for increasing TKW. None of the 88 genotyped modern varieties, and 17 introduced lines, carried favored alleles at all 22 loci, and only one variety had 15 favored alleles (Table 3, Fig. 4). Therefore, there are still opportunities for maker-assisted selection for TKW in wheat breeding.

Materials and Methods

Phenotypic Assessment

A Chinese wheat mini core collection [6], [11], [13] was chosen for genome-wide association of 1,000-kernel weight (TKW) using SSR markers. The mini MCC contained 262 wheat lines including 157 landraces, 88 modern varieties, and 17 introduced lines representing 1% of the national collection, but more than 70% of its genetic diversity [11]. The phenotype data were collected in five environments, viz. 2002, 2005 and 2006 in Luoyang, Henan province, and 2010 in both Shunyi, Beijing, and Qingdao, Shandong. The field planting design and methods of TKW measurement were described in Su et al. [18] and Jiang et al. [16]. Mean values of TKW and standard errors were analyzed by SPSS 16.0 ( The mixed mean TKW (MTKW) was estimated by the best linear unbiased predictor (BLUP) method according to Bernardo [50][52].

SSR Genotyping

Genomic DNA was extracted from young leaves of 10 seedlings of each entry according to Sharp et al. [53] and fingerprinted by PCR amplifications that identified alleles at 531 SSR loci. Genetic map positions for most of the markers (512 loci) can be found in Hao et al. [13]. The loci were distributed evenly across all 21 wheat chromosomes. The primer sequences and genetic locations of the loci were obtained from and [54], [55]. The annealing temperature for each primer pair was obtained from Röder et al. [54] and GrainGenes ( After purification, the amplified PCR products were separated on an ABI3730 DNA Analyzer (Applied Biosystems, Foster City, CA, USA). Fragment sizes were determined using an internal size standard (LIZ500, ABI, USA), and the outputs were analyzed using GeneMapper software ( The minor allele frequency (MAF) was set as 0.05 during the following statistics.

Association Analysis

To reduce the risk of false or spurious associations, population structure was estimated by STRUCTURE v2.2 software according to Pritchard and Rosenberg [56] and Pritchard et al. [57], based on 42 unlinked loci from both arms of each chromosome with a length of burn-in period equal to 50,000 iterations and a run of 500,000 replications of Markov Chain Monte Carlo (MCMC) after burn in. A total of 80 independent runs were set with the number of presumptive groups (k) varying from 1 to 10. In order to select the most appropriate number of sub-groups, the Δk value, based on the average Ln probe of each run, was calculated allowing the internal population structure of the sample set to be determined [58], then Q data were obtained according to the corresponding K value.

In order to define the degree of genetic covariance between pairs of individuals, a kinship (K) analysis was conducted by genotypic data with SPAGeDi software [59]. The calculation of pairwise kinship coefficients was according to Loiselle et al. [60] with 10,000 permutation tests. Negative values between individual pairs were then set to 0, as this indicated that they were less related than random individuals [21].

The mixed linear model (MLM) module with Q+K of the TASSEL 2.1 software package ( [61], [62] was used for genome wide association of MTKW and TKW in each trial. The relative value of the favored allele for TKW (R2) was calculated according to the equation, R2 = (SSA−fA×MSE)/SST where SSA indicated the sum of squares between groups of favorable alleles and others, fA indicated the degrees of freedom of the group with the favored alleles, MSE indicated the error mean square, and SST indicated the sum of squares [62], [63].

Because modern varieties usually have fewer alleles than the landraces generally, frequency at most alleles would be increased in modern varieties [13]. To avoid circular reasoning in data interpretation, we randomly selected one locus on each arm of the 21 chromosomes, with PIC values higher than the global mean (0.65), for evaluating changes in major allele frequencies between the two sub-populations at loci associated significantly with MTKW and loci probably not removed by selection in domestication and breeding [64] (Table S2, Table S3). We used F-tests and t-tests to estimate differences in allelic frequencies between the landrace and modern variety groups by SPSS15.0.

Supporting Information

Figure S1.

Favored allele frequencies (in blue) in landraces (left) and modern varieties (right) at the barc56, cfa2234, gwm156 and gwm234 loci.


Table S1.

SSR loci associated with MTKW and TKW in 5 environments by Tassel 2.1(P<0.05).


Table S2.

Frequency change of favored alleles at the 22 loci associated with MTKW in landraces and modern varieties.


Table S3.

Frequency change of major alleles at the 44 loci with higher PIC than the mean (0.54) in landraces and modern varieties.



The authors are grateful to Ms. HN Zhang, YH Tian, J Lin and YJ Wang for their excellent genotyping and phenotyping of the mini core collection. We appreciated constructive discussions with Prof. Jianbing Yan. We also gratefully acknowledge help from Prof. Robert A McIntosh, University of Sydney, with English editing.

Author Contributions

Conceived and designed the experiments: XZ YD. Performed the experiments: LW HG CH. Analyzed the data: LW HG CH. Wrote the paper: XZ. Keeps the materials: LW. Provided the analysis tools: HG.


  1. 1. He ZH, Bonjean APA (2010) Cereals in China. D.F., Mexico: CIMMYT.
  2. 2. Xiao SH, He ZH (2003) Wheat yield and end use quality improvement in China (Chapter 13). In: Zhuang QS, editor. published by China Agricultural Publish Press.
  3. 3. Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, et al. (2005) The effects of artificial selection on the maize genome. Science 308: 1310–1314.
  4. 4. Andolfatto P (2001) Adaptive hitchhiking effects on genome variability. Curr Opin in Genet Dev 11: 635–641.
  5. 5. Schlötterer C (2003) Hitchhiking mapping – functional genomics from the population genetics perspective. Trend Genet 19: 32–38.
  6. 6. Zhang XY, Tong YP, You GX, Hao CY, Ge HM, et al. (2007) Hitchhiking effect mapping: A new approach for discovering agronomically important genes. Agri Sci China 6: 255–264.
  7. 7. Flint-Garcia SA, Thornsberry JM, Buckler ES (2003) Structure of linkage disequilibrium in plants. Ann Rev Plant Biol 54: 357–374.
  8. 8. Breseghello F, Sorrells ME (2006) Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics 172: 1165–1177.
  9. 9. Crossa J, Burgueño J, Dreisigacker S, Vargas M, Herrera-Foessel SA, et al. (2007) Association analysis of historical bread wheat germplasm using additive genetic covariance of relatives and population structure. Genetics 177: 1889–1913.
  10. 10. Barrero RA, Bellgard M, Zhang XY (2011) Diverse approaches to achieving grain yield in wheat. Funct Integr Genomics 11: 27–48.
  11. 11. Hao CY, Dong YC, Wang LF, You GX, Zhang HN, et al. (2008) Genetic diversity and construction of core collection in Chinese wheat genetic resources. Chinese Sci Bull 53: 1518–1526.
  12. 12. Dong YS, Cao YS, Zhang XY, Wang LF, Li LH, et al. (2003) Establishment of candidate core collections in Chinese common wheat germplasm. J Plant Genet Res 4: 1–8.
  13. 13. Hao CY, Wang LF, Ge HM, Dong YC, Zhang XY (2011) Genetic diversity and linkage disequilibrium in Chinese bread wheat (Triticum aestivum L.) revealed by SSR markers. PloS One 6(2): e17279. Doi:
  14. 14. Wang J, Sun JZ, Liu DC, Yang WL, Wang DW, et al. (2008) Analysis of Pina and Pinb in the mini core collections of Chinese wheat germplasm by ecotilling and identification of a novel Pinb allele. J Cereal Sci 48: 836–842.
  15. 15. Guo ZA, Song YX, Zhou RH, Ren ZL, Jia JZ (2009) Discovery, evaluation and distribution of haplotypes of the wheat Ppd-D1 gene. New Phytol 185: 841–851.
  16. 16. Jiang QY, Hou J, Hao CY, Wang LF, Ge HM, et al. (2011) The wheat (T. aestivum) sucrose synthase 2 gene (TaSus2) active in endosperm development is associated with yield traits. Funct Integr Genomics 11: 49–61.
  17. 17. Li XP, Zhao XQ, He X, Zhao GY, Li B, et al. (2011) Haplotype analysis of the genes encoding glutamine synthetase chloroplast isoformes and their association with nitrogen-use- and yield-related traits in bred wheat. New Phytol 189: 449–458.
  18. 18. Su ZQ, Hao CY, Wang LF, Dong YC, Zhang XY (2011) Identification and development of a functional marker of TaGW2 associated with grain weight in bread wheat (Triticum aestivum L.). Theor Appl Genet 122: 211–223.
  19. 19. Ersoz E, Yu JM, Buckler ES (2009) Applications of linkage disequilibrium and association mapping in maize. In: Kriz AL, Larkins BA, editors. Biotechnology in Agriculture and Forestry Volume 63, III. Springer-Verlag Berlin Heideberg, Germany. pp. 173–195.
  20. 20. Zhuang QS (2003) Chinese wheat improvement and pedigree analysis. Agricultural Press, Beijing (in Chinese).
  21. 21. Yu JM, Pressoir G, Briggs WH, Bi IV, Yamasaki M, et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38: 203–208.
  22. 22. Huang XQ, Kempf H, Ganal MW, Röder MS (2004) Advanced backcross QTL analysis in progenies derived from a cross between a German elite winter wheat variety and a synthetic wheat (Triticum aestivum L.). Theor Appl Genet 109: 933–943.
  23. 23. Quarrie SA, Quarrie SP, Radosevic R, Rancic D, Kaminska A, et al. (2006) Dissecting a wheat QTL for yield present in a range of environments: from the QTL to candidate genes. J Exp Bot 57: 2627–2637.
  24. 24. Cuthbert JL, Somers DJ, Brule-Babel AL, Brown PD, Crow GH (2008) Molecular mapping of quantitative trait loci for yield and yield components in spring wheat (Triticum aestivum L.). Theor Appl Genet 117: 595–608.
  25. 25. Sun XC, Marza F, Ma HX, Carver BF, Bai GH (2010) Mapping quantitative trait loci for quality factors in an inter-class cross of US and Chinese wheat. Theor Appl Genet 120: 1041–1051.
  26. 26. Tsilo TJ, Hareland GA, Simsek S, Chao S, Anderson JA (2010) Genome mapping of kernel characteristics in hard red spring wheat breeding lines. Theor Appl Genet 121: 717–730.
  27. 27. Nordborg M, Weigel D (2008) Next-generation genetics in plants. Nature 456: 720–723.
  28. 28. Yan JB, Shah T, Warburton M, Buckler ES, McMullen MD, et al. (2009) Genetic characterization of a global maize collection using SNP markers. PloS One 4(12): e8451.
  29. 29. Yan JB, Warburton M, Crouch J (2011) Association mapping for enhancing maize (Zea mays L.) genetic improvement. Crop Sci 51: 433–449.
  30. 30. Horvath A, Didier A, Koenig J, Exbrayat F, Charmet G, et al. (2009) Analysis of diversity and linkage disequilibrium along chromosome 3B of bread wheat (Triticum aestivum L.). Theor Appl Genet 119: 1523–1537.
  31. 31. Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127: 1309–1321.
  32. 32. Mackay TFC, Stone EA, Ayroles JF (2009) The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 10: 565–577.
  33. 33. Ott A, Trautschold B, Sandhu D (2011) Using microsatellites to understand the physical distribution of recombination on soybean chromosomes. PLoS ONE 6(7): e22306.
  34. 34. Gupta PK, Rustig S, Kumar N (2006) Genetic and molecular basis of grain size and grain number and its relevance to grain productivity in higher plants. Genome 49: 565–571.
  35. 35. Sun XY, Wu K, Zhao Y, Kong FM, Han GZ, et al. (2008) QTL analysis of kernel shape and weight using recombinant inbred lines in wheat. Euphytica 165: 615–624.
  36. 36. Wang RX, Hai L, Zhang XY, You GX, Yan CX, et al. (2009) QTL mapping for grain filling rate and yield-related traits in RILs of the Chinese winter wheat population Heshangmai 3× Yu8679. Theor Appl Genet 118: 313–325.
  37. 37. Groos C, Robert N, Bervas E, Charmet G (2003) Genetic analysis of grain protein-content, grain yield and thousand-kernel weight in bread wheat. Theor Appl Genet 106: 1032–1040.
  38. 38. Snape JW, Foulkes MJ, Simmonds J, Leverington M, Fish LJ, et al. (2007) Dissecting gene×environmental effects on wheat yields via QTL and physiological analysis. Euphytica 154: 401–408.
  39. 39. Röder MS, Huang XQ, Börner A (2008) Fine mapping of the region on wheat chromosome 7D controlling grain weight. Funct Integr Genomic 8: 79–86.
  40. 40. Koebner RMD, Summers RW (2003) 21st century wheat breeding: Plot selection or plate detection. Trends Biotechnol 21: 59–63.
  41. 41. Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49: 1–12.
  42. 42. Tester M, Langridge P (2010) Breeding technologies to increase crop production in a changing world. Science 327: 818–822.
  43. 43. Frisch M, Thiemann A, Fu J, Schrag TA, Scholten S, et al. (2010) Transcriptome-based distance measures for grouping of germplasm and prediction of hybrid performance in maize. Theor Appl Genet 120: 709–720.
  44. 44. Reynolds MP, Calderini DF, Condon AG, Rajaram S (2001) Physiological basis of yield gains in wheat associated with the LR19 translocation from Agropyron elongatum. Euphytica 119: 137–141.
  45. 45. Glaszmann JC, Kilian B, Upadhyaya HD, Varshney RK (2010) Accessing genetic diversity for crop improvement. Curr Opin Plant Biol 13: 167–173.
  46. 46. Gennaro A, Koebner RMD, Ceoloni C (2009) A candidate for Lr19, an exotic gene conditioning leaf rust resistance in wheat. Func Integr Genomic 9: 325–334.
  47. 47. Wang LF, Balfourier F, Exbrayat-Vinson F, Hao CY, Dong YS, et al. (2007) Comparison of genetic diversity level between European and East-Asian wheat collections using SSR markers. Sci Agric Sin 40: 2667–2678.
  48. 48. Song XJ, Huang W, Shi M, Zhu MZ, Lin HX (2007) A QTL for rice grain width and weight encodes a previously unknown RING-type E3 ubiquitin ligase. Nat Genet 39: 623–630.
  49. 49. Xue WY, Xin YZ, Wen XY, Zhao Y, Tang WJ, et al. (2008) Natural variation in Ghd7 is an important regulator of heading date and yield potential in rice. Nat Genet 40: 761–767.
  50. 50. Bernardo R (1996) Test cross additive and dominance effects in best linear unbiased prediction of maize single-cross performance. Theor Appl Genet 93: 1098–1102.
  51. 51. Bernardo R (1996) Marker-based estimate of identity by descent and alikeness in state among maize inbreds. Theor Appl Genet 93: 262–267.
  52. 52. Bernardo R (1996) Best linear unbiased prediction of maize single-cross performance. Crop Sci 36: 50–56.
  53. 53. Sharp PJ, Chao S, Desai S, Gale MD (1989) The isolation, characterization and application in Triticeae of a set of wheat RFLP probes identifying each homoeologous chromosome arm. Theor Appl Genet 78: 342–348.
  54. 54. Röder MS, Korzun V, Wendehake K, Plaschke J, Tixier MH, et al. (1998) A microsatellite map of wheat. Genetics 149: 2007–2023.
  55. 55. Somers DJ, Isaac P, Edwards K (2004) A high-density microsatellite consensus map for bread wheat (Triticum aestivum L.). Theor Appl Genet 109: 1105–1114.
  56. 56. Pritchard JK, Rosenberg NA (1999) Use of unlinked genetic markers to detect population stratification in association chain will tend to get stuck moving among very similar studies. Am J Hum Genet 65: 220–228.
  57. 57. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945–959.
  58. 58. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14: 2611–2620.
  59. 59. Hardy OJ, Vekemans X (2002) SPAGeDi: a versatile computer program to analyze spatial genetic structure at the individual or population levels. Mol Ecol Notes 2: 618–620.
  60. 60. Loiselle BA, Sork VL, Nason J, Graham C (1995) Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am J Bot 82: 1420–1425.
  61. 61. Bradbury PJ, Zhang ZW, Kroon DE, Casstevens TM, Ramdoss Y, et al. (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23: 2633–2635.
  62. 62. Zhang ZW, Ersoz E, Lai CQ, Fodhunter RJ, Tiwari HK, et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42: 355–360.
  63. 63. Agrama HA, Eizenga GC, Yan W (2007) Association mapping of yield and its components in rice cultivars. Mol Breeding 19: 341–356.
  64. 64. Ge HM, You GX, Wang LF, Hao CY, Dong YC, et al. (2011) Genome selection sweep and association analysis shed light on future breeding by design in wheat. Crop Sci. In press.
  65. 65. Quarrie SA, Steed A, Calestani C, Semikhodskii A, Lebreton C, et al. (2005) A high-density genetic map of hexaploid wheat (Triticum aestivum L.) from the cross Chinese Spring×SQ1 and its use to compare QTLs for grain yield across a range of environments. Theor Appl Genet 110: 865–880.