Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Favorable QTL Alleles for Yield and Its Components Identified by Association Mapping in Chinese Upland Cotton Cultivars

  • Hongxian Mei,

    Affiliations National Key Laboratory of Crop Genetics & Germplasm Enhancement, Cotton Research Institute, Nanjing Agricultural University, Nanjing, China, Sesame Research Center, Henan Academy of Agricultural Sciences, Zhengzhou, China

  • Xiefei Zhu,

    Affiliation National Key Laboratory of Crop Genetics & Germplasm Enhancement, Cotton Research Institute, Nanjing Agricultural University, Nanjing, China

  • Tianzhen Zhang

    cotton@njau.edu.cn

    Affiliation National Key Laboratory of Crop Genetics & Germplasm Enhancement, Cotton Research Institute, Nanjing Agricultural University, Nanjing, China

Favorable QTL Alleles for Yield and Its Components Identified by Association Mapping in Chinese Upland Cotton Cultivars

  • Hongxian Mei, 
  • Xiefei Zhu, 
  • Tianzhen Zhang
PLOS
x

Abstract

Linkage disequilibrium based association mapping is a powerful tool for dissecting the genetic basis underlying complex traits. In this study, an association mapping panel consisting of 356 representative Upland cotton cultivars was constructed, evaluated in three environments and genotyped using 381 SSRs to detect molecular markers associated with lint yield and its components. The results showed that abundant phenotypic and moderate genetic diversities existed within this germplasm panel. The population could be divided into two subpopulations, and weak relatedness was detected between pair-wise accessions. LD decayed to the background (r2 = 0.1182, P≤0.01), r2 = 0.1 and r2 = 0.2 level within 12–13 cM, 17–18 cM and 3–4 cM, respectively, providing the potential for association mapping of agronomically important traits in Chinese Upland cotton. A total of 55 marker-trait associations were detected between 26 SSRs and seven lint yield traits, based on a mixed linear model (MLM) and Bonferroni correction (P≤0.05/145, −log10P≥3.46). Of which 41 could be detected in more than one environment and 17 markers were simultaneously associated with two or more traits. Many associations were consistent with QTLs identified by linkage mapping in previous reports. Phenotypic values of alleles of each loci in 41 stably detected associations were compared, and 23 favorable alleles were identified. Population frequency of each favorable allele in historically released cultivar groups was also evaluated. The QTLs detected in this study will be helpful in further understanding the genetic basis of lint yield and its components, and the favorable alleles may facilitate future high-yield breeding by genomic selection in Upland cotton.

Introduction

Cotton is the most important natural textile fiber source globally. The worldwide economic impact of the cotton industry is estimated at approximately $500 billion per year with an annual utilization of about 27 million metric tons of cotton fiber. In recent years, demand for cotton fiber in the world market has dramatically increased, stock and use ratio dropped to 37% in 2010, compared to 55% in 2009. While cotton acreage has declined worldwide in the past few years, mainly due to strong competition from other crops as well as production costs (National Cotton Council, USA, http://www.cotton.org, 2012). The tetraploid species Gossypium hirsutum L. (n = 26, AD genome), commonly referred to as Upland cotton, accounts for 95% of the world's cotton production [1]. Thus, improving lint yield of Upland cotton cultivars will be critical for meeting worldwide demand, and maintaining profitability for cotton growers.

Lint yield is a complex trait in cotton, which is controlled by a large number of quantitative loci (QTLs). It is becoming progressively more difficult to improve lint yield using conventional breeding methods. Fortunately, the development in applied genomics research has provided alternative tools to improve efficiency in plant breeding programs. Molecular markers linked to the causal genes and QTLs can be used for marker-assisted selection (MAS) and/or genomic selection (GS) [2][3]. In the past two decades, a large number of QTLs for lint yield and fiber quality traits have been identified in Upland cotton [4][13]. However, approximately 80% of the previously reported QTLs could not be confirmed in subsequent studies, and few have actually been applied in breeding programs [14][16]. This may be because that most QTLs were population-specific, and the genetic variation detected in a unique bi-parental population might not be shared with other genetic populations, or shared but fixed in the parental lines. In addition, the limited genetic recombinations in most populations used for linkage mapping make it difficult to map QTLs with a high resolution, which severely limits their application in breeding programs. With the potential to exploit all recombination events that occurred in the evolutionary history of natural populations, linkage disequilibrium (LD) based association mapping (AM) has become a powerful approach for the dissection of complex traits and identification of causal variation with modest effects for target traits in many plant species [17][18] including cotton [19][24]. While the key constraints for the successful use of association mapping in plants are population structure and genetic relatedness, which can result in spurious marker-trait associations that may make it difficult to distinguish loci that truly affect the target traits [25][26]. Several statistical strategies have been developed to account for issues related to population structure and relatedness [27][29]. One powerful strategy is the unified mixed model approach (mixed linear model, MLM), which accounts for multiple levels of relatedness simultaneously, and can improve control of both type I and type II error rates [28]. In cotton, the first attempt of association mapping was reported by Kantartzi and Stewart in 2008. In that study, 30 marker and fiber trait associations were detected in 56 Gossypium arboreum accessions genotyped by 98 SSR markers [19]. Abdurakhmonov et al. performed an association mapping study, with the MLM model considering both kinship (K) and population structure (Q), of fiber quality traits by using a set of 95 core microsatellite markers in 285 exotic Gossypium hirsutum accessions and detected between 6% and 13% of SSR markers associated with the main fiber traits. Meanwhile, they indicated the genome-wide LD (r2≥0.1) declined at <10 cM in the landrace stocks and >30 cM in variety germplasm, but at r2≥0.2 which reduced to ∼1–2 cM and ∼6–8 cM, respectively [20]. Abdurakhmonov et al. performed another association mapping study of fiber quality traits using 202 microsatellite markers in a panel of 335 G. hirsutum varieties [21]. The result showed that the genome-wide LD extended up to 25 cM at r2≥0.1 and reduced to ∼5–6 cM at r2≥0.2 and an average of ∼20 SSR markers was associated with each main fiber quality trait in two environments. Zeng et al. carried out an association mapping study between 86 SSR markers and fiber traits using an exotic germplasm population of 260 lines derived from multiple crosses among Gossypium tetraploid species and found 59 markers were significantly associated with six fiber traits [22]. All the results mentioned above provided useful evidences of the potential for association mapping of agronomically important traits in cotton. But till now, association mapping study of lint yield traits has not been reported in cotton.

Although AM has been successfully used to detect the QTLs underlying quantitative traits in some crops, from a breeding standpoint, detecting associated loci is just the first step; analyzing the genetic effects of alleles and identifying favorable alleles will be more beneficial for target trait improvement. Breseghello & Sorrells identified several potentially beneficial alleles for kernel size and milling quality by comparing the average phenotypic value with specific alleles and null alleles in a soft winter wheat population [30]. Jia et al. identified some putative resistant alleles for Sheath Blight resistance in a rice panel composed of 217 accessions from the USDA core collection, and found that the number of putative resistant alleles presented in an entry was highly and significantly correlated with the decrease of ShB rating [31]. We performed a preliminary AM study in 81 Upland cotton cultivars and identified some elite alleles for yield and fiber quality traits [24]. China is the world's largest cotton-growing nation, but not a cotton domestication region. Most Upland cotton cultivars developed in China were derived from a few germplasm resources such as Deltapine (DPL), Stoneville (STV), Foster, King and Uganda, all of which were introduced from abroad [32]. Current and obsolete cultivars have been and continue to be the main resources for cotton breeding programs. Dissecting the genetic basis of lint yield and quality traits will be of great benefits to germplasm evaluation and future molecular breeding. In the present study, we aimed to detect QTLs underlying lint yield and its components, and to identify the favorable alleles in an AM panel composed of 356 accessions. We also analyzed genetic diversity, LD decay, population structure, genetic relatedness and favorable allele frequency in historically released cutivar groups. Our results should provide useful information for further understanding the genetic basis of lint yield and its components, and will facilitate future high-yield breeding by genomic selection in Upland cotton.

Materials and Methods

Association mapping panel construction

A total of 356 representative Upland cotton cultivars and breeding lines were selected from the cotton germplasm collections in our laboratory and the Cotton Research Institute, Chinese Academy of Agricultural Sciences (CRI-CAAS), and assembled to construct an AM panel. The population consisted of 348 cultivars developed in China, seven introduced from the U.S., including the genetic standard line TM-1, and one introduced from Uganda. According to their release year, the 348 Chinese cultivars could be divided into the following six groups: I (1930–1960, 26 lines); II (1961–1970, 26 lines); III (1971–1980, 39 lines), IV (1981–1990, 83 lines); V (1991–2000, 125 lines); and VI (2000–2005, 49 lines). The cultivars introduced from abroad, DPL 15, DPL 16, STV 2B, King, Foster 6 and Uganda 3 were used as a check group for genetic diversity and allele transmission evaluation, because they had been used as the main founder parents in China's Upland cotton breeding programs and are the progenitors of many cultivars [32]. All accessions have been self-pollinated for more than six generations and their detailed information are summarized in Table S1.

Trait phenotyping

All of the accessions were planted in the following three environments to evaluate phenotypic performance: (1) Jiangpu Breeding Station, Nanjing Agricultural University, Nanjing, in 2009 (designated as E1), (2) Dafeng Agronomy Farm, Yancheng, Jiangsu Province (E2), and (3) Zhengzhou Agricultural Research Institute, Zhengzhou, Henan in 2010 (E3). The first two locations were in the Yangtze River cotton-growing region, and the third was in the Yellow River region. A randomized complete block design with single row plot and two replications was used in all field trails. The sowing dates were from late March to early April in different years and locations, and seedlings having up to 3–4 leaves were transplanted from seedbeds to fields, with 20 plants per row, a 30 cm plant-to-plant spacing, and 80 cm between rows. For most of the accessions are non-BT cottons, chemical control were used for preventing from bollworm damage and field managements were adjusted to local practices.

Field planting has been approved by Nanjing Agric Univ.. No specific permissions were required for these locations/activities since they are pure-line cultivars and the field studies did not involve endangered or protected species.

Ten consecutive plants in the middle of each row were tagged for trait measurement. Yield traits evaluated included: lint yield (LY, g/plant), seed cotton yield (SY, g/plant), bolls per plant (BN), boll weight (BW, g), lint percentage (LP, %), lint index (LI, g/100 seeds) and seed index (SI, g/100 seeds).

SSR genotyping

Young leaves from each of the 356 accessions were collected and stored at −20°C. Total genomic DNA was extracted from the leaf samples as described by Guo et al. [33]. Based on the dense genetic linkage map constructed in our laboratory [33], 381 pairs of SSR primers that amplify loci evenly covering the tetraploid cotton genome (one marker per 10 cM, 186 on At and 195 on Dt subgenome with an average of 14.65 markers each chromosome) were selected to genotype the 356 accessions. The procedure for PCR-amplification and product analysis followed the published methods from our laboratory [34][35]. Since G. hirsutum is an allopolyploid species, SSR markers often yield complex band patterns and some of them had been located to more than one locus. To measure the complex band patterns of large scale genotypes, the band pattern in TM-1 (genetic standard line, one parent with which the reference linkage map was constructed) was treated as a check and the following criteria were used to assign the alleles to the corresponding loci: 1) when only one fragment was amplified in each accession, the fragments were regarded as alleles belonging to the single locus; 2) when multiple fragments were amplified in each line and the bands showed an obvious co-segregating relationship among different samples, they were regarded as alleles belonging to the same locus; and 3) when multiple bands produced in each line did not co-segregate among different accessions, the corresponding fragments in TM-1 that had been mapped to the reference map and co-segregated among different accessions were measured, other bands were discarded. According to the above criteria, the band pattern in TM-1 was designated as 1, the same patterns were also designated as 1, and the different ones were designated as 2, 3, 4, 5 and so on, thus the alleles from all accessions on each locus were measured. Markers with more than 10% missing data were not used in further analysis.

Genotypic data analysis

Summary statistics including the total number of alleles, the number of alleles per locus, and gene diversity values were calculated using the software PowerMarker 3.25 [36]. The Bayesian model-based program STRUCTURE 2.3 was used to infer the population structure using 66 unlinked or weakly linked SSR markers [37]. The length of the burn-in period and the number of Markov Chain Monte Carlo replications after burn-in were all assigned at 100,000 with an admixture and allele frequencies correlated model. Five independent run iterations were performed with the hypothetical number of subpopulations (k) ranging from 1 to 10. The correct estimation of k was provided by joining the log probability of data [LnP(D)] from the STRUCTURE output and an ad hoc statistic Δk [38]. Based on the correct k, each accession was assigned to a subpopulation for which the membership value (Q value) was >0.5 [39], and the population structure matrix (Q) was generated for further marker-trait association mapping. The software SPAGeDi was used to calculate the pair-wise relatedness coefficients (K, kinship matrix) in order to estimate the genetic relatedness among individuals, with the negative value of kinship set to zero [40]. To estimate LD pattern in Upland cotton genome, the weighted average of squared correlation coefficient r2 of each pair of SSR loci was calculated using the software package TASSEL 2.1 with rare alleles (allele frequency less than 0.05) treated as missing data [41]. The r2 was estimated for total, linked and unlinked markers both in the entire panel and each subpopulation, respectively. The 99th percentile of r2 distribution for unlinked markers, which determined whether LD is due to physical linkage [42], was treated as the background LD level [43].The r2 values of each pair of SSR loci were plotted against map distance (cM), and LD decay was estimated.

Phenotypic data analysis

Statistical analysis of all phenotypic data across three environments was performed with SAS 8.0 software (SAS Institute 1999). Analysis of variance (ANOVA) of all phenotypic data was calculated with PROC GLM, based on the trait means for each line across the three environments. Decomposition of variance components (genotype, environment, block, and the interactions among these factors) was evaluated using PROC VARCOMP, and the broad-sense heritability (hB2) of each trait was estimated with the variance components. Correlation coefficients between traits were calculated with PROC CORR.

Association mapping and favorable allele identification

Because the MLM model accounts for the effects both of population structure and genetic relatedness, and can significantly reduce spurious associations [28], the marker-trait AM was carried out with the MLM model as implemented in TASSEL software, and the P value and R2 for each marker-trait association were determined [41]. Based on the results of AM, QTL alleles of loci significantly associated with the target traits were further analyzed. The phenotypic allele effect was estimated through comparison between the average phenotypic value over accessions with the specific allele and that of all accessions:where ai is the phenotypic effect of the ith allele; xij is the phenotypic value over the jth accession with the ith allele; ni is the number of accessions with the ith allele; Nk is the phenotypic value over all accessions; nk is the number of accessions. If the value of ai>0, the allele is considered to have a positive effect, if it is <0, it corresponds to a negative allele. The favorable alleles were then identified according to the breeding objective of each target trait [24].

Results

Genetic diversity, population structure and genetic relatedness

Of the 381 SSR markers selected, only 145 amplified polymorphism (67 of 186 in At and 78 of 195 in Dt subgenome) in the present panel and a total of 415 alleles were detected (Table S2). The allele number, gene diversity and polymorphism information content (PIC) value of the 145 loci averaged 2.86, 0.32 and 0.27, respectively; with ranges of 2–9, 0.01–0.73 and 0.27–0.68, respectively. Approximately 80% of the polymorphic loci (115 of 145) had only two or three alleles. Among the 415 alleles detected, population frequencies of 131 alleles were rare (less than 0.05) and 34 were unique (detected in only one accession). The total number of alleles and the number of alleles per locus detected in the six historically released cultivar groups were much greater than that in the six founder parents ().

The model-based evaluation of the population structure of the 356 Upland cotton cultivars showed that the LnP(D) value corresponding to each hypothetical k kept increasing with k value and did not show any peak. The Δk value showed a much higher likelihood at k = 2 than at k = 3–10 (Figure 1), suggesting that the total panel could be divided into two major subpopulations [38], designated as P1 and P2, respectively. The P1 group contained 115 accessions including 63 cultivars from Yellow River cotton growing region, 46 lines from North and Northwest China regions, and six cultivars from Yangtze River region. The P2 group consisted of 241 accessions including 116 lines from Yellow River cotton growing region, 107 lines from Yangtze River region, 10 lines from the North and Northwest China regions, and eight lines intrduced from abroad (Table S1). Then, the corresponding Q matrix at k = 2 was used for the following association analysis.

thumbnail
Figure 1. Estimated LnP(D) and ΔK over five repeats of STRUCTURE analysis.

(a) LnP(D) for k from 1 to 10 for 356 accessions. LnP(D) value of each hypothetical k kept increasing with k value and did not show any peak. (b) ΔK for k from 2 to 9 for 356 accessions. The Δk value showed a much higher likelihood at k = 2 than at k = 3–10, suggesting that the total panel should be divided into two major subpopulations.

https://doi.org/10.1371/journal.pone.0082193.g001

For the kinship coefficient values, 86.85% was less than 0.05, 8.56% had a range of 0.05–0.10, and the remaining 4.59% showed various degrees of genetic relatedness (data not shown). Based on the results of the relatedness analysis, a K matrix was constructed for association mapping.

Pair-wise linkage disequilibrium across the whole genome

The r2 was calculated for total, linked and unlinked markers (Table 1), respectively, with SSR loci on the same chromosome considered as linked and those from different chromosomes as unlinked. In the entire panel, the average r2 of locus pairs was 0.0103, and 18.29% were significant (P≤0.01). Moreover, 21.03% of the linked locus pairs and 18.18% of the unlinked pairs showed significant LD (P≤0.01) with the average r2 of 0.0160 and 0.0101, respectively. In the subpopulation P1 and P2, the average r2 of locus pairs was 0.0151 and 0.0104, respectively, and the proportion of significant LD (P≤0.01) was 5.10% and 10.78%, respectively. In the entire panel and subpopulations, both average r2 and proportion of significant LD for linked loci were all higher than those for unlinked markers (Table 1).

The r2 value and genetic distance of each pair of SSR loci was plotted into a scatter diagram, and then a curve was drawn to describe the trend of LD decay using the nonlinear regression model [43]. The curve exhibited a clear decay of LD with increase in genetic distance (Figure 2). In this study, the 99th percentile of r2 distribution for unlinked markers, which determined the background level of LD, was 0.1182; and LD decayed to the background level within 12–13 cM. If the threshold of LD decay was set to r2 = 0.1 and r2 = 0.2, the genome-wide LD extended up to about 17–18 cM and 3–4 cM, respectively (Figure 2).

thumbnail
Figure 2. Scatter plots of r2 values against genetic distance among linked loci (P≤0.01) in 356 G. hirsutum L. accessions.

The trend line is a non-linear logarithmic regression curve of r2 on genetic distance. LD decayed to the background (r2 = 0.1182, P≤0.01), r2 = 0.1 and r2 = 0.2 level within about 12–13 cM, 17–18 cM and 3–4 cM, respectively.

https://doi.org/10.1371/journal.pone.0082193.g002

Variation of phenotypic traits

Seven traits for lint yield and its components were measured in 356 Upland cotton accessions across three different environments. Each trait varied widely (Table 2), and the ANOVA showed that the genotype (G) and the interactions between genotype and environmental factors (G×E) were both significant (P≤0.01) for all the seven traits. The mean coefficient of variance for FY, SY, BN, BW, LP, LI and SI was 29.09%, 23.19%, 19.36%, 9.11%, 9.33%, 12.48% and 9.02%, respectively, demonstrating that there was a high degree of diversity in lint yield traits of Chinese Upland cotton cultivars. The broad sense heritabilty (hB2) for the seven traits had a range of 27.34–75.77% in the reference population (Table 2). The highest hB2 value was for LP (75.77%), indicating that LP was less impacted by environmental factors than the other six traits.

thumbnail
Table 2. Descriptive statistics, ANOVA and broad-sense heritability (hB2) for yield and its components across three different environments.

https://doi.org/10.1371/journal.pone.0082193.t002

Phenotypic correlation analysis showed that there were significant positive correlations between lint yield and its most components, while the negative correlation between lint yield and SI was also significant (Table 3). The correlation coefficients for LY with SY, BN, BW, LP, LI and SI were 0.963, 0.869, 0.461, 0.704, 0.671, and −0.257, respectively.

thumbnail
Table 3. Phenotypic correlations among lint yield and its components based on trait means of 356 upland cotton accessions across three environments.

https://doi.org/10.1371/journal.pone.0082193.t003

Markers associated with lint yield and its components

The marker-trait AM was performed with the MLM model, considering both kinship (K) and population structure (Q), implemented in TASSEL software. At the α = 0.01 (−log10P = 2) level, a total of 195 significant associations were detected between 82 SSR markers and seven lint yield traits (Table S4). Among these, most of the associations (125 of 195) were detected in only one environment, and the proportion of phenotypic variation explained by markers ranged from 0.0152 to 0.0940, with an average of 0.0370 (Table S4).

In this study, 145 markers were used for detecting association, so the same statistical test was performed 145 times at a significance level of 0.01, and the experimental type I error rate would be much higher than 0.01. To overcome this problem, the Bonferroni correction (P≤0.05/145, −log10P≥3.46) was used to obtain an appropriate significance threshold [44]. After Bonferroni correction, 55 associations were found to be significant between 26 SSR markers and seven lint yield traits, and the results are shown in Table 4. Most (41 associations between 23 SSR markers and seven lint yield traits) of the associations could be detected in more than one environment, and the proportion of phenotypic variation explained by markers ranged from 0.0163 to 0.0940, with an average of 0.0451. The number of SSR markers associated with LY, SY, BN, BW, LP, LI and SI were 9, 4, 6, 4, 14, 17 and 1, respectively. Seventeen loci were co-associated with two or more different traits (Table 4). For example, NAU3269 (Chr. 5) and NAU3100 (Chr. 23) were simultaneously associated with FY, SY, BN, LP, and LI, and most of the lint yield-associated loci were associated with at least one of its components.

thumbnail
Table 4. SSR marker loci significantly associated with lint yield traits and their explained proportion of phenotypic variation in three different environments.

https://doi.org/10.1371/journal.pone.0082193.t004

Favorable QTL alleles and their transmission in Chinese Upland cotton cultivars

Phenotypic effects of each QTL allele for the 41 associated loci detected in more than one environment were measured according to the method mentioned above, and 5, 2, 3, 4, 12, 14 and 1 favorable alleles for FY, SY, BN, BW, LP, LI and SI were identified, respectively. Phenotypic effects and representative accessions for each favorable allele are shown in Table 5. Among the favorable alleles, NAU3100-2 had the most positive phenotypic effect for FY and SY, and increased FY and SY by 3.61 g and 7.27 g, respectively; NAU6584-2, NAU3398-2, NAU5166-2 and NAU3917-2 increased BN, BW, LP and LI by 0.89, 0.42 g, 4.93% and 0.94 g, respectively; while NAU493-1 deceased SI by 0.17 g.

thumbnail
Table 5. Favorable QTL alleles, their phenotypic effects (ai) and representative accessions.

https://doi.org/10.1371/journal.pone.0082193.t005

Allele frequencies of the 23 favorable alleles in the CK group and the six Chinese historically released cultivar groups are summarized in Table 6. Based on allele frequencies across the different groups, these favorable alleles could be categorized into three classes. The alleles in the first class, such as JESPR135-1, BNL1404-1 and Gh508-1, presented in the founder cultivars and with high frequency in all populations, might have been passed down stably from the original parents and were almost fixed in modern cultivars by selection. Alleles in the second class, such as BNL3269-2, BNL1414-2, NAU3100-2 and JESPR208-2, presented in the founder cultivars and with moderate to low frequency in most populations, should have been underutilized in modern breeding programs. Those in the third class, such as NAU5166-2, NAU980-3, Gh369-3 and CIR246-3, not presented in the founder cultivars and presented at low frequency in modern cultivars, might be from other original parents or could have been generated by mutations and/or recombinations. Favorable alleles, especially of the latter two classes, should have a great potential in future Upland cotton genetic improvement.

thumbnail
Table 6. Allele frequency for each favorable QTL allele in historically released Chinese Upland cotton cultivar groups.a

https://doi.org/10.1371/journal.pone.0082193.t006

Discussion

Genetic diversity and population structure of the association panel

A suitable association mapping panel should embrace as much phenotypic and genotypic diversity as can be reliably measured in common environments [45]. Most Upland cotton cultivars developed in China were derived from a few germplasm resources introduced from abroad and therefore the genetic base is narrow [46][47]. It is especially critical to select samples that encompass genetic diversity as much as possible. In this study, the 356 Upland cotton accessions, which can normally flower and ripen for target trait evaluation, were chosen from more than 1000 cultivars and breeding lines in CRI-CAAS and NAU germplasm collections. The phenotypic measurments in three different locations indicated that there was a high degree of diversity in lint yield and its component traits (Table 2). Of the 381 SSR markers, only 145 were found to be polymorphic in the 356 Upland cotton accessions, indicating that intraspecific genetic diversity is far less than interspecific diversity; for the reference linkage map of allotetraploid cotton was constructed with a BC1 mapping population derived from an interspecific cross (G. hirsutum TM-1×G. barbadense Hai7124) [33]. Eighty percent of the 145 polymorphic loci only generated two or three alleles, and the allele frequencies of 131 of the 415 alleles were <0.05 (Table S2), showing that the genetic diversity in this panel is relatively low, which might affect the QTL detection power of AM in Upland cotton.

Many crops have a long and complex history of domestication and breeding, such as Upland cotton, and complex population structures may confound AM [48]. It is important to consider the influence of population structure and relationships between individuals in the AM panel [27][28]. The model-based evaluation of the population structure of the 356 Upland cotton cultivars showed that the population could be divided into two major subpopulations (Figure 1). Of the 115 accessions in P1 group, 63, 46 and 6 cultivars were from the Yellow River, North/Northwest China and Yangtze River cotton growing regions, respectively. Out of the 241 accessions in P2 group, 116 were from Yellow River, 107 from Yangtze River, 10 from North/Northwest China cotton growing regions and eight from abroad (Table S1). The North/Northwest China, Yellow River and Yangtze River region represented short-, middle- and long-growth-period cotton cultivation area in China, respectively. The P1 group contained almost all cultivars with early maturity and part of cultivars with middle maturity, while the P2 group contained almost all cultivars with late maturity and part of cultivars with middle maturity [32].

Linkage disequilibrium in Upland cotton

The extent of LD can provide information for the needed marker density and mapping resolution in AM study [25]. LD decay had been repeatedly estimated in many plant species [26],[48], while that was limited in cotton. Abdurakhmonov et al. (2008) performed a pioneer estimation [20]. They reported that, in a panel contained 285 exotic Gossypium hirsutum accessions, the genome-wide LD (r2≥0.1) declined at <10 cM in the landrace stocks and >30 cM in variety germplasm, but at r2≥0.2 which reduced to about 1–2 cM and 6–8 cM, respectively. In another panel composed of 335 G. hirsutum variety germplasm, the genome-wide LD extended up to 25 cM at r2≥0.1 and reduced to about 5–6 cM at r2≥0.2 [21]. In the present panel, the average r2 of locus pairs was 0.0103, and 18.29% were significant (P≤0.01) (Table 1), which is higher than that (13% siganificant at P≤0.01) reported by Abdurakhmonov et al. [21]. In our panel, the LD decayed to genome background level (r2 = 0.1182) within 12–13 cM (Fig. 2). If the threshold of LD decay was set to r2 = 0.1 and r2 = 0.2, the genome-wide LD extended up to 17–18 cM and 3–4 cM (Figure 2), respectively, which is shorter than those previously reported [20][21].

In the entire panel and subpopulations, both average r2 and proportion of significant LD for linked loci were all higher than those for unlinked markers (Table 1), demonstrating that physical linkage is predominant in determining LD compared with random forces in the present association panel [48]. Therefore this Upland cotton panel is suitable for association analysis and has the potential to identify QTLs in an interval equivalent to the distance of LD decay of 3–4 cM. Based on the LD decay in the panel of 335 G. hirsutum varieties, it is suggested that about 1,000 polymorphic markers be required for successful association mapping with LD extending to 5–6 cM [21]. In our panel, the LD decayed faster, suggesting that more markers are probably needed for genome wide association analysis (GWAS) of complex traits. As is often the case in self-pollinated crops [26], the level of LD in the Upland cotton genome was moderately high, suggesting that the mapping resolution gained from LD is likely to be limited. Given that genomic selection is less challenging than map-based cloning, the level of LD in the present population would guarantee that the identified SSR markers would facilitate breeding for high-yield in Upland cotton.

QTLs for lint yield identified by association mapping

Association mapping can be affected by many factors, such as population structure, relatedness among accessions, small sample size, and low frequency of specific alleles; these may increase the detection of false positive associations [25],[28]. In this study, the AM was performed in a moderately large sized panel (356 accessions) with the optimal model of MLM, considering both population structure and relatedness, to detect SSR markers associated with lint yield and its components. A total of 195 significant associations were detected between 86 SSR markers and 7 lint yield and yield component traits at the α = 0.01 (−log10P = 2) level (Table S4). It is very difficult to say which significance level is acceptable in a given association study. The use of stringent probability thresholds will reduce the danger of false positives, but meanwhile has the risk of rejecting true positives caused by setting the thresholds too high [49]. Since the present study aimed at mining favorable alleles of main QTL for lint yield, a relatively stringent significance threshold (P≤0.05/145, −log10P≥3.46) for the Bonferroni correction was adopted to reduce the experimental type I error rate induced by multiple tests [44]. After Bonferroni correction (P≤0.05/145, −log10P≥3.46), 55 associations remained significant and 74.55% (41 of 55) could be detected in more than one environment (Table 4). Population size had been considered as a factor that severely affects the QTL detection power in AM [18],[26]. Many more associations were detected in our present panel than in another 81-accession panel between the same markers and target traits at the same significance level [24, unpublished data].

Although the markers used in different studies are different, and QTL mapping results are not easy to be compared, some of the marker associations detected in this study were consistent with QTLs for lint yield and its components that had been mapped previously by conventional linkage mapping. The locus JESPR204 associated with LY on chromosome 13 (detected in 2 environments) was located in the same region as a QTL identified by Wu et al. [50]; TMK19 (Chr. 25, detected in 3 environments) was consistent with the results of our prevenient study [51]; NAU3100 (Chr. 23, detected in 2 environments) associated with SY was consistent with that found in the study of Wang et al. [8]; NAU980 (Chr. 11, detected in 2 environments), BNL3590 (Chr. 17, detected in 2 environments) and TMK19 (Chr. 25, detected in 3 environments) associated with LP were consistent with several earlier reports [11],[51],[52]; CIR246 (Chr. 14, detected in 2 environments) and NAU3100 (Chr. 23, detected in 3 environments) associated with LI were also consistent with results from several recent QTL mapping studies [11],[51],[52].

Moreover, in our study, seventeen markers were co-associated with two or more different traits, and most of the lint yield-associated markers were associated with at least one of its components, which coincided with phenotypic correlations among these traits. This could result from pleiotropy of a single causal gene or tight linkage of multiple causal genes. We found that 10 of 14 markers associated with LP were detected in all three environments, which was consistent with the phenotypic statistical analysis that LP possessed the highest broad-sense heritability (hB2 = 75.77%). The phenotype of complex traits often results from the combined actions of multiple genes and environmental factors, all these can easily lead to lost heritability [18]; only those traits with high heritability can be stably detected. The resulting stably associated markers should be useful for cotton breeding with broad adaptability to different environments.

Favorable alleles and their potential application in future cotton breeding programs

Since most Upland cotton cultivars developed in China were derived from limited founder parents, there is great challenge in genetic improvement and high risk of vulnerability to changing climate. New variations that have emerged and accumulated during the long breeding history in China should be fully exploited and additional diversity should be introduced into breeding programs to broaden the genetic basis of Chinese Upland cotton. By comparing the average phenotypic value of each allele for target traits in the 41 stably detected associations, we identified 5, 2, 3, 4, 12, 14 and 1 favorable alleles for FY, SY, BN, BW, LP, LI and SI, respectively (Table 5). We suggest that a multi-parent population should be constructed using cultivars that possess most of the favorable alleles, and in the meantime, a ranking system for MAS or genomic selection should be developed based on the results of AM. Favorable alleles that were passed down from the founder parents and have been almost fixed in modern cultivars formed the basis of lint yield of Chinese Upland cotton, and should be treated as fundamental elements in order to reject deleterious alleles at the corresponding loci. Alleles either absent in the founder cultivars or present at moderate to low frequencies in most cultivar groups have been underutilized in modern breeding programs, and should be regarded as essential elements for increasing lint yield potential.

Lint yield of cotton is the result of series components and their interactions, such as boll number, boll weight, lint percentage, lint index, and seed index. Developing potentially high-yielding cultivars thus relies to some extent on selecting the appropriate yield components. As some of the QTLs were associated with more than one yield component, favorable alleles must be treated with caution. Positively co-associated genetic loci could simultaneously improve multiple target traits, while negative linkages must be broken. In summary, the favorable alleles indentified in this study have great potential for developing high-yielding Upland cotton cultivars in future breeding programs.

Supporting Information

Table S1.

Detailed information of the 356 accessions in the association panel.

https://doi.org/10.1371/journal.pone.0082193.s001

(DOC)

Table S2.

Detailed information of 145 polymorphic SSR markers.

https://doi.org/10.1371/journal.pone.0082193.s002

(DOC)

Table S3.

Genetic diversity in different historically released cultivar groups.

https://doi.org/10.1371/journal.pone.0082193.s003

(DOC)

Table S4.

SSR markers significantly (P≤0.01, −log10P≥2.0) associated with lint yield traits and their explained phenotypic variation across three different environments.

https://doi.org/10.1371/journal.pone.0082193.s004

(DOC)

Acknowledgments

We thank Dr. XM Du of the Cotton Research Institute, Chinese Academy of Agriculture Science for providing part of Upland cotton germplasm resources used in the present study.

Contributions

Conceived and designed the experiment: TZZ, performed the experiments: HXM and XFZ, analyzed the data: HXM, and wrote the manuscript: HXM and TZZ.

Author Contributions

Conceived and designed the experiment: TZZ, performed the experiments: HXM and XFZ, analyzed the data: HXM, and wrote the manuscript: HXM and TZZ.

References

  1. 1. Zhang HB, Li YN, Wang BH, Chee PW (2008) Recent advances in cotton genomics. Int J Plant Genomics ID 742304. doi:https://doi.org/10.1155/2008/742304.
  2. 2. Xu Y, Crouch J (2008) Marker-assisted selection in plant breeding: from publications to practice. Crop Sci 48: 391–407.
  3. 3. Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9: 166–177.
  4. 4. Shen X, Guo W, Zhu X, Yuan Y, Yu JZ, et al. (2005) Molecular mapping of QTLs for fiber qualities in three diverse lines in Upland cotton using SSR markers. Mol Breed 15: 169–181.
  5. 5. Zhang ZS, Xiao Y, Luo M, Li X, Luo X, et al. (2005) Construction of a genetic linkage map and QTL analysis of fiber-related traits in upland cotton (Gossypium hirsutum L.). Euphytica 144: 91–99.
  6. 6. Wang B, Guo W, Zhu X, Wu Y, Huang N, et al. (2006) QTL mapping of fiber quality in an elite hybrid derived-RIL population of upland cotton. Euphytica 152: 367–378.
  7. 7. Shen X, Guo W, Lu Q, Zhu X, Yuan Y, et al. (2007) Genetic mapping of quantitative trait loci for fiber quality and yield trait by RIL approach in Upland cotton. Euphytica 155: 371–380.
  8. 8. Wang B, Guo W, Zhu X, Wu Y, Huang N, et al. (2007) QTL mapping of yield and yield components for elite hybrid derived-RILs in upland cotton. J Genet Genomics 34: 35–45.
  9. 9. Qin H, Guo W, Zhang Y, Zhang T (2008) QTL mapping of yield and fiber traits based on a four-way cross population in Gossypium hirsutum L. Theor Appl Genet 117: 883–894.
  10. 10. Zhang ZS, Hu MC, Zhang J, Liu DJ, Zheng J, et al. (2009) Construction of a comprehensive PCR-based marker linkage map and QTL mapping for fiber quality traits in upland cotton (Gossypium hirsutum L.). Mol Breed 24: 49–61.
  11. 11. Liu R, Wang B, Guo W, Qin Y, Wang L, et al. (2012) Quantitative trait loci mapping for yield and its components by using two immortalized populations of a heterotic hybrid in Gossypium hirsutum L. Mol Breed 29: 297–311.
  12. 12. Sun FD, Zhang JH, Wang SF, Wang SF, Gong WK, et al. (2012) QTL mapping for fiber quality traits across multiple generations and environments in upland cotton. Mol Breed 30: 569–582.
  13. 13. Zhang K, Zhang J, Ma J, Tang S, Liu D, et al. (2012) Genetic mapping and quantitative trait locus analysis of fiber quality traits using a three-parent composite population in upland cotton (Gossypium hirsutum L.). Mol Breed 29: 335–348.
  14. 14. Lacape JM, Nguyen TB, Courtois B, Belot JL, Giband M, et al. (2005) QTL analysis of cotton fiber quality using multiple Gossypium hirsutum×Gossypium barbadense backcross generations. Crop Sci 45: 123–140.
  15. 15. Rong J, Feltus FA, Waghmare VN, Pierce GJ, Chee P, et al. (2007) Meta-analysis of polyploid cotton QTL shows unequal contributions of subgenomes to a complex network of genes and gene clusters implicated in lint fiber development. Genetics 176: 2577–2588.
  16. 16. Lacape JM, Llewellyn D, Jacobs J, Arioli T, Becker D, et al. (2010) Meta-analysis of cotton fiber quality QTLs across diverse environments in a Gossypium hirsutum×G. barbadense RIL population. BMC Plant Biol 10: 132.
  17. 17. Zhu C, Gore M, Buckler ES, Yu J (2008) Status and prospects of association mapping in plants. Plant Genome 1: 5–20.
  18. 18. Mackay TFC, Stone EA, Ayroles JF (2009) The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 10: 565–577.
  19. 19. Kantartzi SK, Stewart JMcD (2008) Association analysis of fibre traits in Gossypium arboreum accessions. Plant Breed 127: 173–179.
  20. 20. Abdurakhmonov IY, Kohel RJ, Yu JZ, Pepper AE, Abdullaev AA, et al. (2008) Molecular diversity and association mapping of fiber quality traits in exotic G. hirsutum L. germplasm. Genomics 92: 478–487.
  21. 21. Abdurakhmonov IY, Saha S, Jenkins JN, Buriev ZT, Shermatov SE, et al. (2009) Linkage disequilibrium based association mapping of fiber quality traits in G. hirsutum L. variety germplasm. Genetica 136: 401–417.
  22. 22. Zeng L, Meredith WR Jr, Gutierrez OA, Boykin DL (2009) Identification of associations between SSR markers and fibre traits in an exotic germplasm derived from multiple cross among Gossypium tetraploid species. Theor Appl Genet 119: 93–103.
  23. 23. Rakshit A, Rakshit S, Singh J, Chopra SK, Balyan HS, et al. (2010) Association of AFLP and SSR markers with agronomic and fibre quality traits in Gossypium hirsutum L. J Genetics 89: 155–162.
  24. 24. Zhang TZ, Qian N, Zhu XF, Chen H, Wang S, et al. (2013) Variations and transmission of QTL alleles for yield and fiber qualities in Upland cotton cultivars developed in China. PLoS ONE 8: e57220.
  25. 25. Gupta PK, Rustgi S, Kulwal PL (2005) Linkage disequilibrium and association studies in higher plants: present status and future prospects. Plant Mol Biol 57: 461–485.
  26. 26. Myles S, Peiffer J, Brown PJ, Ersoz ES, Zhang Z, et al. (2009) Association mapping: critical consideration shift from genotype to experimental design. Plant Cell 21: 2194–2202.
  27. 27. Price A, Patterson N, Plenge R, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909.
  28. 28. Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38: 203–208.
  29. 29. Yu J, Holland JB, McMullen MD, Buckler ES (2008) Genetic design and statistical power of nested association mapping in maize. Genetics 178: 539–551.
  30. 30. Breseghello F, Sorrells ME (2006) Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics 172: 1165–1177.
  31. 31. Jia L, Yan W, Zhu C, Agrama HA, Jackson A, et al. (2012) Allelic analysis of sheath blight resistance with association mapping in rice. PLoS ONE 7: e32703.
  32. 32. Huang ZK (2007) The Cultivars and Their Pedigree of Cotton in China. China Agriculture Press, Beijing.
  33. 33. Guo WZ, Cai C, Wang C, Han Z, Song X, et al. (2007) A microsatellite-based, gene-rich linkage map reveals genome structure, function and evolution in Gossypium. Genetics 176: 527–541.
  34. 34. Zhang J, Guo W, Zhang T (2002) Molecular linkage map of allotetraploid cotton (Gossypium hirsutum L.×Gossypium barbadense L.) with a haploid population. Theor Appl Genet 105: 1166–1174.
  35. 35. Zhao L, Cai CP, Mei HX, Guo WZ (2012) Screening of microsatellite loci for identifying genome barcoding of cotton cultivars. ACTA Agronomica Sinica 38: 1810–1817.
  36. 36. Liu KJ, Muse SV (2005) PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinfo Appl Note 9: 2128–2129.
  37. 37. Pritchard JK, Wen X, Falush D (2009) Documentation for structure software: Version 2.3 http://pritch.bsd.uchicago.edu/structure.html.
  38. 38. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14: 2611–2620.
  39. 39. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945–959.
  40. 40. Hardy O, Vekemans X (2002) SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes 2: 618–620.
  41. 41. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, et al. (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23: 2633–2635.
  42. 42. Mather KA, Caicedo AL, Polato NR, Olsen KM, McCouch S, et al. (2007) The extent of linkage disequilibrium in rice (Oryza sativa L.). Genetics 177: 2223–2232.
  43. 43. Xiao Y, Cai D, Yang W, Ye W, Younas M, et al. (2012) Genetic structure and linkage disequilibrium pattern of a rapeseed (Brassica napus L.) association mapping panel revealed by microsatellites. Theor Appl Genet 125: 437–447.
  44. 44. Lander ES, Botstein D (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.
  45. 45. Flint-Garcia SA, Thuillet AC, Yu J, Pressoir G, Romero SM, et al. (2005) Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J 44: 1054–1064.
  46. 46. Guo WZ, Zhang T, Pan J, Wang X (1997) A preliminary study on genetic diversity of Upland cotton cultivars in China. Acta Gossypii Sinica 9: 19–24.
  47. 47. Chen G, Du X (2006) Genetic diversity of source germplasm of Upland cotton in China as determined by SSR marker analysis. Acta Genetica Sinica 33: 733–745.
  48. 48. Flint-Garcia SA, Thornsberry JM, Bucker ES (2003) Structure of linkage disequilibrium in plants. Annu Rev Plant Biol 54: 357–374.
  49. 49. Yan J, Warburton M, Crouch J (2011) Association mapping for enhancing maize (Zea mays L.) genetic improvement. Crop Sci 51: 433–449.
  50. 50. Wu JX, Gutierrez OA, Jenkins JN, McCarty JC, Zhu J (2009) Quantitative analysis and QTL mapping for agronomic and fiber traits in an RI population of upland cotton. Euphytica 165: 231–245.
  51. 51. Qin YS, Liu RZ, Mei HX, Zhang TZ, Guo WZ (2009) QTL Mapping for yield traits in Upland cotton (Gossypium hirsutum L.). Acta Agronomica Sinica 35: 1812–1821.
  52. 52. Guo WZ, Ma GJ, Zhu YC, Yi CX, Zhang TZ (2006) Molecular tagging and mapping of quantitative trait loci for lint percentage and morphological marker genes in upland cotton. J Integr Plant Biol 48: 320–326.