Favorable QTL Alleles for Yield and Its Components Identified by Association Mapping in Chinese Upland Cotton Cultivars

Linkage disequilibrium based association mapping is a powerful tool for dissecting the genetic basis underlying complex traits. In this study, an association mapping panel consisting of 356 representative Upland cotton cultivars was constructed, evaluated in three environments and genotyped using 381 SSRs to detect molecular markers associated with lint yield and its components. The results showed that abundant phenotypic and moderate genetic diversities existed within this germplasm panel. The population could be divided into two subpopulations, and weak relatedness was detected between pair-wise accessions. LD decayed to the background (r 2 = 0.1182, P≤0.01), r 2 = 0.1 and r 2 = 0.2 level within 12–13 cM, 17–18 cM and 3–4 cM, respectively, providing the potential for association mapping of agronomically important traits in Chinese Upland cotton. A total of 55 marker-trait associations were detected between 26 SSRs and seven lint yield traits, based on a mixed linear model (MLM) and Bonferroni correction (P≤0.05/145, −log10 P≥3.46). Of which 41 could be detected in more than one environment and 17 markers were simultaneously associated with two or more traits. Many associations were consistent with QTLs identified by linkage mapping in previous reports. Phenotypic values of alleles of each loci in 41 stably detected associations were compared, and 23 favorable alleles were identified. Population frequency of each favorable allele in historically released cultivar groups was also evaluated. The QTLs detected in this study will be helpful in further understanding the genetic basis of lint yield and its components, and the favorable alleles may facilitate future high-yield breeding by genomic selection in Upland cotton.


Introduction
Cotton is the most important natural textile fiber source globally. The worldwide economic impact of the cotton industry is estimated at approximately $500 billion per year with an annual utilization of about 27 million metric tons of cotton fiber. In recent years, demand for cotton fiber in the world market has dramatically increased, stock and use ratio dropped to 37% in 2010, compared to 55% in 2009. While cotton acreage has declined worldwide in the past few years, mainly due to strong competition from other crops as well as production costs (National Cotton Council, USA, http://www.cotton.org, 2012). The tetraploid species Gossypium hirsutum L. (n = 26, AD genome), commonly referred to as Upland cotton, accounts for 95% of the world's cotton production [1]. Thus, improving lint yield of Upland cotton cultivars will be critical for meeting worldwide demand, and maintaining profitability for cotton growers.
Lint yield is a complex trait in cotton, which is controlled by a large number of quantitative loci (QTLs). It is becoming progressively more difficult to improve lint yield using conventional breeding methods. Fortunately, the development in applied genomics research has provided alternative tools to improve efficiency in plant breeding programs. Molecular markers linked to the causal genes and QTLs can be used for marker-assisted selection (MAS) and/or genomic selection (GS) [2][3]. In the past two decades, a large number of QTLs for lint yield and fiber quality traits have been identified in Upland cotton [4][5][6][7][8][9][10][11][12][13]. However, approximately 80% of the previously reported QTLs could not be confirmed in subsequent studies, and few have actually been applied in breeding programs [14][15][16]. This may be because that most QTLs were population-specific, and the genetic variation detected in a unique bi-parental population might not be shared with other genetic populations, or shared but fixed in the parental lines. In addition, the limited genetic recombinations in most populations used for linkage mapping make it difficult to map QTLs with a high resolution, which severely limits their application in breeding programs. With the potential to exploit all recombination events that occurred in the evolutionary history of natural populations, linkage disequilibrium (LD) based association mapping (AM) has become a powerful approach for the dissection of complex traits and identification of causal variation with modest effects for target traits in many plant species [17][18] including cotton [19][20][21][22][23][24]. While the key constraints for the successful use of association mapping in plants are population structure and genetic relatedness, which can result in spurious marker-trait associations that may make it difficult to distinguish loci that truly affect the target traits [25][26]. Several statistical strategies have been developed to account for issues related to population structure and relatedness [27][28][29]. One powerful strategy is the unified mixed model approach (mixed linear model, MLM), which accounts for multiple levels of relatedness simultaneously, and can improve control of both type I and type II error rates [28]. In cotton, the first attempt of association mapping was reported by Kantartzi and Stewart in 2008. In that study, 30 marker and fiber trait associations were detected in 56 Gossypium arboreum accessions genotyped by 98 SSR markers [19]. Abdurakhmonov et al. performed an association mapping study, with the MLM model considering both kinship (K) and population structure (Q), of fiber quality traits by using a set of 95 core microsatellite markers in 285 exotic Gossypium hirsutum accessions and detected between 6% and 13% of SSR markers associated with the main fiber traits. Meanwhile, they indicated the genomewide LD (r 2 $0.1) declined at ,10 cM in the landrace stocks and .30 cM in variety germplasm, but at r 2 $0.2 which reduced to ,1-2 cM and ,6-8 cM, respectively [20]. Abdurakhmonov et al. performed another association mapping study of fiber quality traits using 202 microsatellite markers in a panel of 335 G. hirsutum varieties [21]. The result showed that the genome-wide LD extended up to 25 cM at r 2 $0.1 and reduced to ,5-6 cM at r 2 $0.2 and an average of ,20 SSR markers was associated with each main fiber quality trait in two environments. Zeng et al. carried out an association mapping study between 86 SSR markers and fiber traits using an exotic germplasm population of 260 lines derived from multiple crosses among Gossypium tetraploid species and found 59 markers were significantly associated with six fiber traits [22]. All the results mentioned above provided useful evidences of the potential for association mapping of agronomically important traits in cotton. But till now, association mapping study of lint yield traits has not been reported in cotton.
Although AM has been successfully used to detect the QTLs underlying quantitative traits in some crops, from a breeding standpoint, detecting associated loci is just the first step; analyzing the genetic effects of alleles and identifying favorable alleles will be more beneficial for target trait improvement. Breseghello & Sorrells identified several potentially beneficial alleles for kernel size and milling quality by comparing the average phenotypic value with specific alleles and null alleles in a soft winter wheat population [30]. Jia et al. identified some putative resistant alleles for Sheath Blight resistance in a rice panel composed of 217 accessions from the USDA core collection, and found that the number of putative resistant alleles presented in an entry was highly and significantly correlated with the decrease of ShB rating [31]. We performed a preliminary AM study in 81 Upland cotton cultivars and identified some elite alleles for yield and fiber quality traits [24]. China is the world's largest cotton-growing nation, but not a cotton domestication region. Most Upland cotton cultivars developed in China were derived from a few germplasm resources such as Deltapine (DPL), Stoneville (STV), Foster, King and Uganda, all of which were introduced from abroad [32]. Current and obsolete cultivars have been and continue to be the main resources for cotton breeding programs. Dissecting the genetic basis of lint yield and quality traits will be of great benefits to germplasm evaluation and future molecular breeding. In the present study, we aimed to detect QTLs underlying lint yield and its components, and to identify the favorable alleles in an AM panel composed of 356 accessions. We also analyzed genetic diversity, LD decay, population structure, genetic relatedness and favorable allele frequency in historically released cutivar groups. Our results should provide useful information for further understanding the genetic basis of lint yield and its components, and will facilitate future high-yield breeding by genomic selection in Upland cotton.

Association mapping panel construction
A total of 356 representative Upland cotton cultivars and breeding lines were selected from the cotton germplasm collections in our laboratory and the Cotton Research Institute, Chinese Academy of Agricultural Sciences (CRI-CAAS), and assembled to construct an AM panel. The population consisted of 348 cultivars developed in China, seven introduced from the U.S., including the genetic standard line TM-1, and one introduced from Uganda. According to their release year, the 348 Chinese cultivars could be divided into the following six groups: I 26  The cultivars introduced from abroad, DPL 15, DPL 16, STV 2B, King, Foster 6 and Uganda 3 were used as a check group for genetic diversity and allele transmission evaluation, because they had been used as the main founder parents in China's Upland cotton breeding programs and are the progenitors of many cultivars [32]. All accessions have been self-pollinated for more than six generations and their detailed information are summarized in Table S1.

Trait phenotyping
All of the accessions were planted in the following three environments to evaluate phenotypic performance: (1) Jiangpu Breeding Station, Nanjing Agricultural University, Nanjing, in 2009 (designated as E1), (2) Dafeng Agronomy Farm, Yancheng, Jiangsu Province (E2), and (3) Zhengzhou Agricultural Research Institute, Zhengzhou, Henan in 2010 (E3). The first two locations were in the Yangtze River cotton-growing region, and the third was in the Yellow River region. A randomized complete block design with single row plot and two replications was used in all field trails. The sowing dates were from late March to early April in different years and locations, and seedlings having up to 3-4 leaves were transplanted from seedbeds to fields, with 20 plants per row, a 30 cm plant-to-plant spacing, and 80 cm between rows. For most of the accessions are non-BT cottons, chemical control were used for preventing from bollworm damage and field managements were adjusted to local practices.
Field planting has been approved by Nanjing Agric Univ.. No specific permissions were required for these locations/activities since they are pure-line cultivars and the field studies did not involve endangered or protected species.

SSR genotyping
Young leaves from each of the 356 accessions were collected and stored at 220uC. Total genomic DNA was extracted from the leaf samples as described by Guo et al. [33]. Based on the dense genetic linkage map constructed in our laboratory [33], 381 pairs of SSR primers that amplify loci evenly covering the tetraploid cotton genome (one marker per 10 cM, 186 on At and 195 on Dt subgenome with an average of 14.65 markers each chromosome) were selected to genotype the 356 accessions. The procedure for PCR-amplification and product analysis followed the published methods from our laboratory [34][35]. Since G. hirsutum is an allopolyploid species, SSR markers often yield complex band patterns and some of them had been located to more than one locus. To measure the complex band patterns of large scale genotypes, the band pattern in TM-1 (genetic standard line, one parent with which the reference linkage map was constructed) was treated as a check and the following criteria were used to assign the alleles to the corresponding loci: 1) when only one fragment was amplified in each accession, the fragments were regarded as alleles belonging to the single locus; 2) when multiple fragments were amplified in each line and the bands showed an obvious cosegregating relationship among different samples, they were regarded as alleles belonging to the same locus; and 3) when multiple bands produced in each line did not co-segregate among different accessions, the corresponding fragments in TM-1 that had been mapped to the reference map and co-segregated among different accessions were measured, other bands were discarded. According to the above criteria, the band pattern in TM-1 was designated as 1, the same patterns were also designated as 1, and the different ones were designated as 2, 3, 4, 5 and so on, thus the alleles from all accessions on each locus were measured. Markers with more than 10% missing data were not used in further analysis.

Genotypic data analysis
Summary statistics including the total number of alleles, the number of alleles per locus, and gene diversity values were calculated using the software PowerMarker 3.25 [36]. The Bayesian model-based program STRUCTURE 2.3 was used to infer the population structure using 66 unlinked or weakly linked SSR markers [37]. The length of the burn-in period and the number of Markov Chain Monte Carlo replications after burn-in were all assigned at 100,000 with an admixture and allele frequencies correlated model. Five independent run iterations were performed with the hypothetical number of subpopulations (k) ranging from 1 to 10. The correct estimation of k was provided by joining the log probability of data [LnP(D)] from the STRUCTURE output and an ad hoc statistic Dk [38]. Based on the correct k, each accession was assigned to a subpopulation for which the membership value (Q value) was .0.5 [39], and the population structure matrix (Q) was generated for further markertrait association mapping. The software SPAGeDi was used to calculate the pair-wise relatedness coefficients (K, kinship matrix) in order to estimate the genetic relatedness among individuals, with the negative value of kinship set to zero [40]. To estimate LD pattern in Upland cotton genome, the weighted average of squared correlation coefficient r 2 of each pair of SSR loci was calculated using the software package TASSEL 2.1 with rare alleles (allele frequency less than 0.05) treated as missing data [41]. The r 2 was estimated for total, linked and unlinked markers both in the entire panel and each subpopulation, respectively. The 99th percentile of r 2 distribution for unlinked markers, which determined whether LD is due to physical linkage [42], was treated as the background LD level [43].The r 2 values of each pair of SSR loci were plotted against map distance (cM), and LD decay was estimated.

Phenotypic data analysis
Statistical analysis of all phenotypic data across three environments was performed with SAS 8.0 software (SAS Institute 1999). Analysis of variance (ANOVA) of all phenotypic data was calculated with PROC GLM, based on the trait means for each line across the three environments. Decomposition of variance components (genotype, environment, block, and the interactions among these factors) was evaluated using PROC VARCOMP, and the broad-sense heritability (h B 2 ) of each trait was estimated with the variance components. Correlation coefficients between traits were calculated with PROC CORR.

Association mapping and favorable allele identification
Because the MLM model accounts for the effects both of population structure and genetic relatedness, and can significantly reduce spurious associations [28], the marker-trait AM was carried out with the MLM model as implemented in TASSEL software, and the P value and R 2 for each marker-trait association were determined [41]. Based on the results of AM, QTL alleles of loci significantly associated with the target traits were further analyzed. The phenotypic allele effect was estimated through comparison between the average phenotypic value over accessions with the specific allele and that of all accessions: where a i is the phenotypic effect of the ith allele; x ij is the phenotypic value over the jth accession with the ith allele; n i is the number of accessions with the ith allele; N k is the phenotypic value over all accessions; n k is the number of accessions. If the value of a i .0, the allele is considered to have a positive effect, if it is ,0, it corresponds to a negative allele. The favorable alleles were then identified according to the breeding objective of each target trait [24].

Genetic diversity, population structure and genetic relatedness
Of the 381 SSR markers selected, only 145 amplified polymorphism (67 of 186 in At and 78 of 195 in Dt subgenome) in the present panel and a total of 415 alleles were detected (Table  S2). The allele number, gene diversity and polymorphism information content (PIC) value of the 145 loci averaged 2.86, 0.32 and 0.27, respectively; with ranges of 2-9, 0.01-0.73 and 0.27-0.68, respectively. Approximately 80% of the polymorphic loci (115 of 145) had only two or three alleles. Among the 415 alleles detected, population frequencies of 131 alleles were rare (less than 0.05) and 34 were unique (detected in only one accession). The total number of alleles and the number of alleles per locus detected in the six historically released cultivar groups were much greater than that in the six founder parents (Table S3).
The model-based evaluation of the population structure of the 356 Upland cotton cultivars showed that the LnP(D) value corresponding to each hypothetical k kept increasing with k value and did not show any peak. The Dk value showed a much higher likelihood at k = 2 than at k = 3-10 ( Figure 1), suggesting that the total panel could be divided into two major subpopulations [38], designated as P1 and P2, respectively. The P1 group contained 115 accessions including 63 cultivars from Yellow River cotton growing region, 46 lines from North and Northwest China regions, and six cultivars from Yangtze River region. The P2 group consisted of 241 accessions including 116 lines from Yellow River cotton growing region, 107 lines from Yangtze River region, 10 lines from the North and Northwest China regions, and eight lines intrduced from abroad (Table S1). Then, the corresponding Q matrix at k = 2 was used for the following association analysis.
For the kinship coefficient values, 86.85% was less than 0.05, 8.56% had a range of 0.05-0.10, and the remaining 4.59% showed various degrees of genetic relatedness (data not shown).
Based on the results of the relatedness analysis, a K matrix was constructed for association mapping.

Pair-wise linkage disequilibrium across the whole genome
The r 2 was calculated for total, linked and unlinked markers ( Table 1), respectively, with SSR loci on the same chromosome considered as linked and those from different chromosomes as unlinked. In the entire panel, the average r 2 of locus pairs was 0.0103, and 18.29% were significant (P#0.01). Moreover, 21.03% of the linked locus pairs and 18.18% of the unlinked pairs showed significant LD (P#0.01) with the average r 2 of 0.0160 and 0.0101, respectively. In the subpopulation P1 and P2, the average r 2 of locus pairs was 0.0151 and 0.0104, respectively, and the proportion of significant LD (P#0.01) was 5.10% and 10.78%, respectively. In the entire panel and subpopulations, both average r 2 and proportion of significant LD for linked loci were all higher than those for unlinked markers ( Table 1).
The r 2 value and genetic distance of each pair of SSR loci was plotted into a scatter diagram, and then a curve was drawn to describe the trend of LD decay using the nonlinear regression model [43]. The curve exhibited a clear decay of LD with increase in genetic distance ( Figure 2). In this study, the 99th percentile of r 2 distribution for unlinked markers, which determined the background level of LD, was 0.1182; and LD decayed to the background level within 12-13 cM. If the threshold of LD decay was set to r 2 = 0.1 and r 2 = 0.2, the genome-wide LD extended up to about 17-18 cM and 3-4 cM, respectively ( Figure 2).

Variation of phenotypic traits
Seven traits for lint yield and its components were measured in 356 Upland cotton accessions across three different environments. Each trait varied widely ( Table 2), and the ANOVA showed that the genotype (G) and the interactions between genotype and environmental factors (G6E) were both significant (P#0.01) for all the seven traits. The mean coefficient of variance for FY, SY, BN, BW, LP, LI and SI was 29.09%, 23.19%, 19.36%, 9.11%, 9.33%, 12.48% and 9.02%, respectively, demonstrating that there was a high degree of diversity in lint yield traits of Chinese Upland cotton cultivars. The broad sense heritabilty (h B 2 ) for the seven traits had a range of 27.34-75.77% in the reference population ( Table 2). The highest h B 2 value was for LP (75.77%), indicating that LP was less impacted by environmental factors than the other six traits.
Phenotypic correlation analysis showed that there were significant positive correlations between lint yield and its most components, while the negative correlation between lint yield and SI was also significant ( Table 3)

Markers associated with lint yield and its components
The marker-trait AM was performed with the MLM model, considering both kinship (K) and population structure (Q), implemented in TASSEL software. At the a = 0.01 (2log 10 P = 2) level, a total of 195 significant associations were detected between 82 SSR markers and seven lint yield traits (Table S4). Among these, most of the associations (125 of 195) were detected in only one environment, and the proportion of phenotypic variation explained by markers ranged from 0.0152 to 0.0940, with an average of 0.0370 (Table S4).
In this study, 145 markers were used for detecting association, so the same statistical test was performed 145 times at a significance level of 0.01, and the experimental type I error rate would be much higher than 0.01. To overcome this problem, the Bonferroni correction (P#0.05/145, 2log 10 P$3.46) was used to obtain an appropriate significance threshold [44]. After Bonferroni correction, 55 associations were found to be significant between 26 SSR markers and seven lint yield traits, and the results are shown in Table 4. Most (41 associations between 23 SSR markers and seven lint yield traits) of the associations could be detected in more than one environment, and the proportion of phenotypic variation explained by markers ranged from 0.0163 to 0.0940, with an average of 0.0451. The number of SSR markers associated with LY, SY, BN, BW, LP, LI and SI were 9, 4, 6, 4, 14, 17 and 1, respectively. Seventeen loci were co-associated with two or more different traits ( Table 4). For example, NAU3269 (Chr. 5) and NAU3100 (Chr. 23) were simultaneously associated with FY, SY, BN, LP, and LI, and most of the lint yield-associated loci were associated with at least one of its components.

Favorable QTL alleles and their transmission in Chinese Upland cotton cultivars
Phenotypic effects of each QTL allele for the 41 associated loci detected in more than one environment were measured according to the method mentioned above, and 5, 2, 3, 4, 12, 14 and 1 favorable alleles for FY, SY, BN, BW, LP, LI and SI were identified, respectively. Phenotypic effects and representative accessions for each favorable allele are shown in Table 5. Among the favorable alleles, NAU3100-2 had the most positive phenotypic effect for FY and SY, and increased FY and SY by 3.61 g and 7.27 g, respectively; NAU6584-2, NAU3398-2, NAU5166-2 and NAU3917-2 increased BN, BW, LP and LI by 0.89, 0.42 g, 4.93% and 0.94 g, respectively; while NAU493-1 deceased SI by 0.17 g.
Allele frequencies of the 23 favorable alleles in the CK group and the six Chinese historically released cultivar groups are summarized in Table 6. Based on allele frequencies across the different groups, these favorable alleles could be categorized into three classes. The alleles in the first class, such as JESPR135-1, BNL1404-1 and Gh508-1, presented in the founder cultivars and with high frequency in all populations, might have been passed down stably from the original parents and were almost fixed in modern cultivars by selection. Alleles in the second class, such as BNL3269-2, BNL1414-2, NAU3100-2 and JESPR208-2, presented in the founder cultivars and with moderate to low frequency in most populations, should have been underutilized in modern breeding programs. Those in the third class, such as NAU5166-2, NAU980-3, Gh369-3 and CIR246-3, not presented in the founder cultivars and presented at low frequency in modern cultivars, might be from other original parents or could have been generated by mutations and/or recombinations. Favorable alleles, especially of the latter two classes, should have a great potential in future Upland cotton genetic improvement.

Genetic diversity and population structure of the association panel
A suitable association mapping panel should embrace as much phenotypic and genotypic diversity as can be reliably measured in common environments [45]. Most Upland cotton cultivars developed in China were derived from a few germplasm resources introduced from abroad and therefore the genetic base is narrow [46][47]. It is especially critical to select samples that encompass genetic diversity as much as possible. In this study, the 356 Upland cotton accessions, which can normally flower and ripen for target trait evaluation, were chosen from more than 1000 cultivars and breeding lines in CRI-CAAS and NAU germplasm collections. The phenotypic measurments in three different locations indicated that there was a high degree of diversity in lint yield and its component traits ( Table 2). Of the 381 SSR markers, only 145 were found to be polymorphic in the 356 Upland cotton accessions, indicating that intraspecific genetic diversity is far less than interspecific diversity; for the reference linkage map of allotetraploid cotton was constructed with a BC 1 mapping population derived from an interspecific cross (G. hirsutum TM-16G. barbadense Hai7124) [33]. Eighty percent of the 145 polymorphic loci only generated two or three alleles, and the allele frequencies of 131 of the 415 alleles were ,0.05 (Table S2), showing that the genetic diversity in this panel is relatively low, which might affect the QTL detection power of AM in Upland cotton.
Many crops have a long and complex history of domestication and breeding, such as Upland cotton, and complex population structures may confound AM [48]. It is important to consider the influence of population structure and relationships between individuals in the AM panel [27][28]. The model-based evaluation of the population structure of the 356 Upland cotton cultivars    showed that the population could be divided into two major subpopulations ( Figure 1). Of the 115 accessions in P1 group, 63, 46 and 6 cultivars were from the Yellow River, North/Northwest China and Yangtze River cotton growing regions, respectively. Out of the 241 accessions in P2 group, 116 were from Yellow River, 107 from Yangtze River, 10 from North/Northwest China cotton growing regions and eight from abroad (Table S1). The North/Northwest China, Yellow River and Yangtze River region represented short-, middle-and long-growth-period cotton cultivation area in China, respectively. The P1 group contained almost all cultivars with early maturity and part of cultivars with middle maturity, while the P2 group contained almost all cultivars with late maturity and part of cultivars with middle maturity [32].

Linkage disequilibrium in Upland cotton
The extent of LD can provide information for the needed marker density and mapping resolution in AM study [25]. LD decay had been repeatedly estimated in many plant species [26,48], while that was limited in cotton. Abdurakhmonov et al. (2008) performed a pioneer estimation [20]. They reported that, in a panel contained 285 exotic Gossypium hirsutum accessions, the genome-wide LD (r 2 $0.1) declined at ,10 cM in the landrace stocks and .30 cM in variety germplasm, but at r 2 $0.2 which reduced to about 1-2 cM and 6-8 cM, respectively. In another panel composed of 335 G. hirsutum variety germplasm, the genomewide LD extended up to 25 cM at r 2 $0.1 and reduced to about 5-6 cM at r 2 $0.2 [21]. In the present panel, the average r 2 of locus pairs was 0.0103, and 18.29% were significant (P#0.01) ( Table 1), which is higher than that (13% siganificant at P#0.01) reported by Abdurakhmonov et al. [21]. In our panel, the LD decayed to genome background level (r 2 = 0.1182) within 12-13 cM (Fig. 2). If the threshold of LD decay was set to r 2 = 0.1 and r 2 = 0.2, the genome-wide LD extended up to 17-18 cM and 3-4 cM (Figure 2), respectively, which is shorter than those previously reported [20][21].
In the entire panel and subpopulations, both average r 2 and proportion of significant LD for linked loci were all higher than those for unlinked markers ( Table 1), demonstrating that physical linkage is predominant in determining LD compared with random forces in the present association panel [48]. Therefore this Upland cotton panel is suitable for association analysis and has the potential to identify QTLs in an interval equivalent to the distance of LD decay of 3-4 cM. Based on the LD decay in the panel of 335 G. hirsutum varieties, it is suggested that about 1,000 polymorphic markers be required for successful association mapping with LD extending to 5-6 cM [21]. In our panel, the LD decayed faster, suggesting that more markers are probably needed for genome wide association analysis (GWAS) of complex traits. As is often the case in self-pollinated crops [26], the level of LD in the Upland cotton genome was moderately high, suggesting that the mapping resolution gained from LD is likely to be limited. Given that genomic selection is less challenging than map-based cloning, the level of LD in the present population would guarantee that the identified SSR markers would facilitate breeding for highyield in Upland cotton.

QTLs for lint yield identified by association mapping
Association mapping can be affected by many factors, such as population structure, relatedness among accessions, small sample size, and low frequency of specific alleles; these may increase the detection of false positive associations [25,28]. In this study, the AM was performed in a moderately large sized panel (356 accessions) with the optimal model of MLM, considering both population structure and relatedness, to detect SSR markers associated with lint yield and its components. A total of 195 significant associations were detected between 86 SSR markers and 7 lint yield and yield component traits at the a = 0.01 (2log 10 P = 2) level (Table S4). It is very difficult to say which significance level is acceptable in a given association study. The use of stringent probability thresholds will reduce the danger of false positives, but meanwhile has the risk of rejecting true positives caused by setting the thresholds too high [49]. Since the present study aimed at mining favorable alleles of main QTL for lint yield, a relatively stringent significance threshold (P#0.05/145, 2log 10 P$3.46) for the Bonferroni correction was adopted to reduce the experimental type I error rate induced by multiple tests [44]. After Bonferroni correction (P#0.05/145, 2log 10 P$3.46), 55 associations remained significant and 74.55% (41 of 55) could be detected in more than one environment ( Table 4). Population size had been considered as a factor that severely affects the QTL detection power in AM [18,26]. Many more associations were detected in our present panel than in another 81-accession panel between the same markers and target traits at the same significance level [24, unpublished data].
Although the markers used in different studies are different, and QTL mapping results are not easy to be compared, some of the  marker associations detected in this study were consistent with QTLs for lint yield and its components that had been mapped previously by conventional linkage mapping. The locus JESPR204 associated with LY on chromosome 13 (detected in 2 environments) was located in the same region as a QTL identified by Wu et al. [50]; TMK19 (Chr. 25, detected in 3 environments) was consistent with the results of our prevenient study [51]; NAU3100 (Chr. 23, detected in 2 environments) associated with SY was consistent with that found in the study of Wang et al. [8]; NAU980 (Chr. 11, detected in 2 environments), BNL3590 (Chr. 17, detected in 2 environments) and TMK19 (Chr. 25, detected in 3 environments) associated with LP were consistent with several earlier reports [11,51,52]; CIR246 (Chr. 14, detected in 2 environments) and NAU3100 (Chr. 23, detected in 3 environments) associated with LI were also consistent with results from several recent QTL mapping studies [11,51,52]. Moreover, in our study, seventeen markers were co-associated with two or more different traits, and most of the lint yieldassociated markers were associated with at least one of its components, which coincided with phenotypic correlations among these traits. This could result from pleiotropy of a single causal gene or tight linkage of multiple causal genes. We found that 10 of 14 markers associated with LP were detected in all three environments, which was consistent with the phenotypic statistical analysis that LP possessed the highest broad-sense heritability (h B 2 = 75.77%). The phenotype of complex traits often results from the combined actions of multiple genes and environmental factors, all these can easily lead to lost heritability [18]; only those traits with high heritability can be stably detected. The resulting stably associated markers should be useful for cotton breeding with broad adaptability to different environments.

Favorable alleles and their potential application in future cotton breeding programs
Since most Upland cotton cultivars developed in China were derived from limited founder parents, there is great challenge in genetic improvement and high risk of vulnerability to changing climate. New variations that have emerged and accumulated during the long breeding history in China should be fully exploited and additional diversity should be introduced into breeding programs to broaden the genetic basis of Chinese Upland cotton. By comparing the average phenotypic value of each allele for target traits in the 41 stably detected associations, we identified 5, 2, 3, 4, 12, 14 and 1 favorable alleles for FY, SY, BN, BW, LP, LI and SI, respectively ( Table 5). We suggest that a multi-parent population should be constructed using cultivars that possess most of the favorable alleles, and in the meantime, a ranking system for MAS or genomic selection should be developed based on the results of AM. Favorable alleles that were passed down from the founder parents and have been almost fixed in modern cultivars formed the basis of lint yield of Chinese Upland cotton, and should be treated as fundamental elements in order to reject deleterious alleles at the corresponding loci. Alleles either absent in the founder cultivars or present at moderate to low frequencies in most cultivar groups have been underutilized in modern breeding programs, and should be regarded as essential elements for increasing lint yield potential.
Lint yield of cotton is the result of series components and their interactions, such as boll number, boll weight, lint percentage, lint index, and seed index. Developing potentially high-yielding cultivars thus relies to some extent on selecting the appropriate yield components. As some of the QTLs were associated with more than one yield component, favorable alleles must be treated with caution. Positively co-associated genetic loci could simultaneously improve multiple target traits, while negative linkages must be broken. In summary, the favorable alleles indentified in  1930-1960, 1961-1970, 1971-1980, 1981-1990, 1991- this study have great potential for developing high-yielding Upland cotton cultivars in future breeding programs.