Association and Validation of Yield-Favored Alleles in Chinese Cultivars of Common Wheat (Triticumaestivum L.)

Common wheat is one of the most important crops in China, which is the largest producer in the world. A set of 230 cultivars was used to identify yield-related loci by association mapping. This set was tested for seven yield-related traits, viz. plant height (PH), spike length (SL), spikelet number per spike (SNPS), kernel number per spike (KNPS), thousand-kernel weight (TKW), kernel weight per spike (KWPS), and sterile spikelet number (SSN) per plant in four environments. A total of 106 simple sequence repeat (SSR) markers distributed on all 21 chromosomes were used to screen the set. Twenty-one and 19 of them were associated with KNPS and TKW, respectively. Association mapping detected 73 significant associations across 50 SSRs, and the phenotypic variation explained (R2) by the associations ranged from 1.54 to 23.93%. The associated loci were distributed on all chromosomes except 4A, 7A, and 7D. Significant and potentially new alleles were present on 8 chromosomes, namely1A, 1D, 2A, 2D, 3D, 4B, 5B, and 6B. Further analysis showed that genetic effects of associated loci were greatly influenced by association panels, and the R2 of crucial loci were lower in modern cultivars than in the mini core collection, probably caused by strong selection in wheat breeding. In order to confirm the results of association analysis, yield-related favorable alleles Xgwm135-1A138, Xgwm337-1D186, Xgwm102-2D144, and Xgwm132-6B128 were evaluated in a double haploid (DH) population derived from Hanxuan10 xLumai14.These favorable alleles that were validated in various populations might be valuable in breeding for high-yield.


Introduction
Wheat is one the most important crops in the world with a total production of about 713 million tonnes in 2013 [1]. With an increasing world population it is necessary to continuously raise production mainly through higher yields. Identification of new yield-related lociis becoming increasingly important in all food crops.
Association analysis identifies trait-marker relationships based on linkage disequilibrium [23]. This method has several advantages compared to bi-parental populations, such as (1) materials used in association analysis can be existing germplasm ranging from landraces to modern varieties and advanced lines; (2) novel and superior (favorable) alleles associated with the best phenotypes can be identified and ranked for use in breeding; (3) association mapping is more efficient and cheaper than other methods [24]; and (4) the results of association mapping apply to a wider range of genetic backgrounds. For example, Sajjad et al. [25] identified six SSR loci associated with yield-related traits on chromosome 3A, explaining 10.7 to 17.3% of the yield-related phenotypic variation in 94 wheat cultivars using 39 SSRs. Among them, Xgwm155 and Xwmc527, Xcfa2134 and Xgwm369, Xgwm155, and Xgwm369 were associated with grain yield per plant, fertile florets per spikelet, plant height, and spike length, respectively. Wang et al. [26] genotyped 531 SSR markers in the Chinese mini core wheat collection; 22 SSR loci were associated with TKW, each explaining phenotypic variation ranging from 1.56 to 21.99%. Six loci, Xcfa2234-3A, Xgwm156-3B, Xbarc56-5A, Xgwm234-5B, Xwmc17-7A and Xcfa2257-7A accounted for more than 10% of the variation. Using the same association panel Zhang et al. [27] identified 23 SSR loci significantly associated with KNPS, and reported that favorable alleles combined with additive effects. They also identified favorable alleles at the Xwmc304-1A, Xgwm311-2A, Xcfa2234-3A, Xgwm2-3A, Xgwm131-3B, Xgwm156-3B, Xgwm2-3D, Xcfe273-6A and Xcfa2257-7A loci with positive effects on both TKW and KNPS. However, relatively few studies in wheat have involved mapping/analysis of multiple yield-related traits based on combined bi-parental populations and association panels.
In the present study 230 diverse common wheat cultivars were genotyped at106 SSR loci prior to association analysis of data for seven yield-related traits obtained in multiple environments with the aim of identifying favorable loci or alleles. The purposes of the study were to provide insights into utilization of association study and linkage analysis to dissect the genetic basis of traits, as well as information that may be useful for future molecular breeding in the Yangtze River Valley.

Phenotyping
The cultivar panel was planted in four environments, viz. 2008 and 2009 at the Sichuan Academy of Agricultural Sciences in Chengdu (designated 08CD and 09CD, respectively), and in 2008 and 2009 at the Lixiahe Agricultural Institute of Jiangsu Province in Yangzhou (08YZ and 09YZ, respectively).
The field experiment consisted of three randomized complete blocks. Each cultivar was planted in three 133 cm rows with 40 seeds per row, and a row spacing of 25 cm. The yieldrelated traits PH (cm), SL (cm), SNPS, KNPS, TKW (g), KWPS (g) and SSN were measured on an average 20 plants in the middle of each plot and expressed as means.
The 150 DH lines and parents were planted in two environments, viz. 2010 and 2011 at Changping, Beijing (DH10 and DH11, respectively). The field design was three randomized complete blocks. Each cultivar was planted in two-row plots with a length of 2 m and 30 cm spacing rows. Yield-related traits included PH (cm), SL (cm), SNPS, KNPS, TKW (g) and SSN measured on 20 plants in the middle of each plot.
Mean values of yield-related traits, standard deviations, standard errors, variation coefficients (CV) and broad sense heritabilities for each environment were analyzed by IBM SPSS Statistics 21.0.0 software (http://www.brothersoft.com/ibm-spss-statistics-469577.html). The best linear unbiased predictor (BLUP) method was used to estimate mixed means of the phenotypic traits as in the association analysis [29][30][31].

SSR genotyping
Genomic DNA from 10 seedling leaves of each cultivar was extracted by the CTAB method [32]. A total of 106 SSR markers distributed across all 21 chromosomes [33] were genotyped the association set and DH population. Among them, 21 and 19 markers were previously reported to be associated with KNPS [27] and TKW [26] (S3 Table), respectively. Primer sequences and annealing temperatures (S2 Table) were obtained from GrainGenes (http:// archive.gramene.org/markers/) and Somers et al. [33]. An ABI 3730 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) was used to separate amplified products after purification. Fragment sizes were determined using an internal size standard (GeneScanTM-500 LIZ, Applied Biosystems). GeneMapperV3.7 software (Applied Biosystems) was used to estimate fragment sizes (http://www.appliedbiosystems.com.cn/).

Data analysis
General parameters of genetic diversity of each SSR marker, including MAF (major allele frequency), allele number, genetic diversity and PIC (polymorphism information content) were evaluated using PowerMarker V3.25 software [34]. To reduce spurious associations, population structure of the 230 cultivars was analyzed using Structure V2.3.2 [35]. The number of presumed sub-populations (K) was set from 1 to 15 with an admixture model and correlated allelic frequencies. This process was repeated five times. For each run, burn-in and Markov Chain Monte Carlo iterations were set to 50,000 and 100,000, respectively. The number of sub-populations and the best output was determined following the ΔK method [36]. Kinship analysis was also performed using genotypic data with SPAGeDi software [37] to determine genetic covariance between individuals. Evaluation of pairwise kinship coefficients was based on Loiselle et al. [38] with 10,000 permutation tests. All negative values between individuals were then set to 0, indicating that they were less related than random individuals [39].
The MLM (mixed linear model) module with Q + K was used for association analysis between phenotypic traits and SSRs through TASSEL 2.1 software (http://www.maizegenetics.net/ ) [40,41]. The phenotypic variation explained (R 2 ) for each associated locus was calculated for alleles with frequencies>5% [26,27]. Based on phenotypic data and the kinship matrix using TASSEL 2.1 software, the heritability (h 2 ) of each trait in different environments, defined as the proportion of genetic variance over total variance, was calculated according to the formula h 2 =

Phenotypic assessment
Yield-related traits for the association panel were determined over environments 08CD, 09CD, 08YZ and 09YZ. Average values of yield-related traits were calculated according to the BLUP method and a summary of parameters for the seven traits is listed in Table 1. The CV of phenotypic traits in each environment were higher than 10% with the exception of SNPS, indicating that trait values differed between cultivars. Moreover, the average h 2 of PH and TKW were 72.3 and 51.1%, respectively, and higher than those of other traits in all environments.

Allelic diversity
A total of 907 alleles were detected in the association panel using 106 SSR markers. MAF ranged from 0.164 to 0.987 with a mean of 0.542. Numbers of alleles per locus varied from 2 to 25 with an average of 8.6. PIC values ranged from 0.026 to 0.903, with an average of 0.552 (S3 Table). These values indicated that the association panel had a relatively high level of molecular genetic diversity.

Genetic structure and relative kinship among cultivars
The population structure of the 230 cultivars was calculated based on 106 SSR markers with 907 alleles by Structure V2.3.2. The cultivars were basically divided into two sub-populations according to their geographic origins (Fig 1a). The number of presumed sub-populations (K) was set from 1 to 15 for calculating ΔK values, which reached the highest value at K = 2 (Fig 1b), confirming that the population should be divided into two sub-populations. Relative kinship coefficients between individuals were also calculated using data for 106 SSR markers (Fig 2). About 74.1% of the pairwise kinship coefficients ranged from 0 to 0.05, indicating that most cultivars had no, or only a weak, relationship with each other.

Genetic effects of favorable alleles
The genetic effects of favorable alleles were calculated as differences between alleles and mean values. A total of 50 associated favorable alleles were identified by comparing mean phenotypic data and different alleles using multiple comparisons for alleles with allelic frequencies>5% (Table 3). Of these, the frequencies of Xgwm135-1A 138 on KNPS, Xwmc361-2B 216 on TKW, Xgwm102-2D 144 on KNPS, KWPS and SSN, Xgwm540-5B 115 on KNPS, KWPS and PH, were higher than 50%, indicating these loci might have undergone strong selection pressure during modern breeding.
Genetic effects of favorable alleles among various loci were also evaluated in four environments (Table 3)   Favorable alleles at crucial loci were not always the major ones in breeding panels The 40 SSR loci associated with TKW and KNPS in Wang et al. [26] and Zhang et al. [27] were re-evaluated in the current population, and only three, viz. Xbarc56-5A with TKW, and Xwmc24-1A and Xgwm132-6B with KNPS, were significantly associated in this study. Failure to confirm most of the previously associated loci led us question the cause. We therefore used ANOVA to verify the allelic effects at these loci in the panel. Significant allelic differences were detected at 18 loci (Table 4). Favorable alleles assigned previously at 10 loci were the major alleles, including five loci associated with KNPS (Xgwm2-3A 116 , Xgwm108-3B 127 , Xcfd64-3D 239 , Xgwm2-3D 220 and Xcfe273-6A 306 ) and five loci associated with TKW (Xgwm312-2A 190 , Xgwm372-2A 331 , Xcfa2234-3A 142 , Xcfd266-5D 167 and Xcfa2257-7A 129 ). At the other eight loci, the favorable alleles assigned previously were not the major ones in the current panel. They were Xgwm259-1B 102 , Xgwm337-1D 168 , Xgwm609-4D 111 and Xgwm132-6B 112 , which were associated with KNPS, and Xgwm234-5B 227 , Xgwm174-5D 209 , Xwmc168-7A 305 , and Xwmc17-7A 180 associated with TKW. We also found that the R 2 for TKW at nine loci were lower. For example, the R 2 for Xcfa2234-3A and Xcfa2257-7A in the previous report were 18.20 and 21.99%, respectively [26,27], whereas in current population they were 4.09 and 2.05%, indicating lower genetic effects in a breeding population.

Validation of favorable alleles in the DH population
In order to validate the genetic effects of alleles detected in the association panel, SSR loci significantly associated with yield-related traits were investigated for polymorphism between cultivars Hanxuan10 and Lumai14. A total of 19 SSR were polymorphic between the two cultivars, and were used to genotype the DH population. Statistical comparisons of phenotypic data identified significant differences between alleles at Xgwm135-1A, Xgwm337-1D, Xgwm102-2D and Xgwm132-6B in at least one of the two environments in which the DH population was grown (Table 5, Figs 4e, 4f, 5d and 5e). Trait data for favorable alleles were higher than those for pooled 'other' categories, although the differences were not statistically significant (Table 5). Therefore, the genetic effects of favorable alleles at these loci were confirmed in both a cultivar association panel and a DH population. Xgwm132 was linked with a PH QTL in a previous report (Fig 4a) [7]. Xgwm132 was associated with PH in two environments, and the favorable 128 bp allele had the highest frequency among 10 alleles at this locus (Fig 4b and 4c). The PH effects of Xgwm132 128 were -2.76 and -2.56 cm in 08CD and 08YZ, respectively (Fig  4d). This was further verified in the DH population, i.e. -3.20 and -5.50 cm in environments DH10 and DH11, respectively (Fig 4e and 4f). Xgwm135 was also associated with KNPS in four environments (Fig 5a.) The favorable 138 bp allele occurred at the highest frequency among seven alleles (Fig 5b). The phenotypic effects of Xgwm135 138 were 1.14, 1.19, 1.84 and 0.89 in 08CD, 08YZ, 09CD and 09YZ, respectively (Fig 5c). Positive effects of 0.97 and 2.15 on KNPS were also confirmed in DH10 and DH11, respectively (Fig 5d and 5e).

Discussion
Association analysis is more effective than biparental crosses in identifying yield-related genes Association mapping and bi-parental population mapping utilize information about genetic recombination and the methods are complementary in identifying genes or QTLs [42]. However, association mapping is more powerful in detecting superior alleles from a large sample of germplasm collections.
In previous studies several SSR loci reported to be associated with yield-related traits ( Table 2). Xwmc24-1A and Xgwm132-6B were associated with QTLs for KNPS [7,41]; Xgwm484-2D with KWPS [19]; Xgwm155-3A, Xgwm186-5A, Xgwm132-6B and Xgwm46-7B with PH [7,10,25,43]; Xgwm182-5D for SL [44]; Xgwm297-7B and Xgwm186-5A with SSN [7,45]; and Xbarc56-5A and Xwmc361-2B with TKW [45][46][47][48][49]. By association analysis of a panel Association and Validation of Yield-Favored Alleles in Common Wheat of cultivar, we not only confirmed earlier marker/trait associations, but also found several new associations of markers and yield-related traits (Tables 2 and 5), such as an association of Xgwm135-1A with KNPS in four environments, Xgwm102-2D with KNPS, KWPS and SSN, and Xgwm337-1D with SNPS. Association mapping combined with bi-parental population analysis is even more powerful in identifying closely linked molecular markers involving yield- related genes [50,51]. For example, in a QTL analysis of drought tolerance in three RIL populations using SNP markers and 305 diverse inbred lines in maize, Lu et al. [52] found that joint linkage-LD mapping identified 18 QTLs additional to those detected in separate linkage and LD analyses. Korir et al. [53] detected five markers associated with aluminum tolerance in both an association panel of 188 cultivars as well as184 RILs from a bi-parental soybean cross, confirming that these loci should be the best candidate regions to target. Twenty two seed weight and silique length-related QTLs were detected in three bi-parental populations in rapeseed. Among them, uq.A09-1 and uq.A09-3 were identified in all four environments and fine mapped in a set of 576 inbred lines using association analysis [42]. Four associated SSR loci, Xgwm135-1A, Xgwm337-1D, Xgwm102-2D and Xgwm132-6B, were detected in our bi-parental population (Table 5, Figs 4e, 4f, 5d and 5e). They had effects on increasing spikelet and kernel numbers and decreasing plant height. These results demonstrate the power of combined association and bi-parental analyses in identifying closely linked molecular markers for economic traits.

Genetic effects of associated loci were panel-dependent
In previous studies Wang et al. [26] and Zhang et al. [27] used Chinese wheat mini core collection (MCC) to perform association analysis between TKW, KNPS and SSR markers. That collection represented 1% of the national germplasm collection, but more than 70% of the genetic diversity [54]. We genotyped40 SSR loci associated with TKW and KNPS from Wang et al. [26] and Zhang et al. [27]. Only three loci, Xbarc56-5A associated with TKW, and Xwmc24-1A and Xgwm132-6B associated with KNPS showed significant associations. A possible reason for the low number was that the present set comprised released cultivars, among which the allelic Association and Validation of Yield-Favored Alleles in Common Wheat profiles were very different. ANOVA showed that 18 of 40 SSR loci had genetic effects on either TKW or KNPS (Table 4). This indicated that genetic effects of loci were also influenced by the population entries. In addition, the R 2 values for TKW at nine loci were much lower than reported for the MCC panel. For example, the earlier R 2 values for Xcfa2234-3A 142 and Xcfa2257-7A 129 were 18.20 and 21.99%, respectively [26,27], but were only 4.09 and 2.05% in the present study (Table 4). This lower variation is likely due to the effects of long term selection in breeding programs because the frequencies of Xcfa2234-3A 142 and Xcfa2257-7A 129 in the earlier reports were 43.9 and 20.6%, respectively [26,27], but were 93.5 and 55.7% in the current cultivar panel (Table 4). Because released cultivars usually carry superior alleles at crucial loci the genetic effects of those loci were greatly reduced and the R 2 values were lower. For example, Qin et al. [55] detected four haplotypes, Hap-6B-1, Hap-6B-2, Hap-6B-3 and Hap-6B-4, at TaGW2-6B; but the frequencies of Hap-6B-1 and Hap-6B-2 showed increasing trends over time, and by the late 1980s Hap-6B-3 and Hap-6B-4 had disappeared. The allelic difference between Hap-6B-1 and Hap-6B-2 was much smaller than that involving either of them with Hap-6B-3 or Hap-6B-4. Therefore, the more important a locus is for an agronomic trait, the stronger it will be selected in breeding. Hence the R 2 value should decline from a random germplasm collection to a released cultivar population. Genetic effects were also affected by environment (G x E). Flowering time in maize is a complex trait affected by genes and the environments. ZmCCT is one of the most important genes affecting photoperiod response. Hung et al. [56] found that many maize inbred lines carried ZmCCT alleles with no sensitivity to day length, allowing breeders to produce more widely adapted maize varieties. In the current study, association signals were detected in multiple-environments, i.e. 08CD, 09CD, 08YZ and 09YZ. Average values of yield-related traits were also calculated according to the BLUP method and associated with SSRs. Based on association detection using BLUP mean values, seven SSR loci had significant association signals in two or more environments, such as Xgwm135-1A with KNPS, Xgwm515-2A and Xgwm132-6B with PH, Xgwm219-6B with SNPS, and Xgwm102-2D, Xgwm297-7B and Xgwm383-3D with SSN. Therefore, the influence of environments on genetic effects of the associated loci was indirectly reflected by comparison of their values in different environments. Hence, there was a higher influence if an associated locus had an effect in only one or few environments (Table 3).

Favorable alleles in past and future wheat breeding
Some loci have played important roles in wheat breeding. A good example is Xgwm261 that is 0.6 cM from Rht8 on chromosome 2D. The favorable Xgwm261 allele (192 bp) is associated with an approximate 10 cm reduction in plant height [3]. Rht8 is also closely linked with Ppd-D1, which affects varietal adaptability leading to increased grain yield in certain environments [4]. Zhou et al. [57] identified Rht8 in many Chinese wheat varieties widely grown in the last 30 years. About 40% of varieties contained Rht8 based on pedigree, but its frequency varied in different ecological zones. Italian cultivars Funo, Villa Glory, St1472/506 and St2422/464, widely used as founder genotypes in Chinese wheat breeding [58,59], are all carriers of Rht8. Grain weight also underwent strong selection during wheat breeding. For example, TaGW2-A1 significantly associated with TGW [60] was mapped to chromosome 6A. The superior haplotype Hap-6A-A increases TGW by more than 3 g. Among loci that were validated in the DH population in this study the frequencies of favorable alleles Xgwm135-1A 138 and Xgwm102-2D 144 with positive effects on KNPS exceeded 50%, suggesting that these loci have contributed to Chinese wheat breeding (Tables 3 and 5). On the other hand, the frequency of Xgwm337-1D 186 with a clear genetic effect verified in both populations was only 5.22%. Clearly that allele could be selected to increase SNPS in future marker-assisted selection (MAS) breeding. Thus, strong selection in the breeding of newly released cultivars has already focused on some favorable alleles [26,27,61]. Modern Chinese varieties produced over the last 60 years are based on 16 founder parents [59]. Some of these founder parents were included in the 230 wheat cultivars used in this study. Abbondanza, Funo, and St2422/464, for example, carried more favorable alleles than some other founder parents in our study as was also reported by Ge et al. [61].
Supporting Information S1