Variations and Transmission of QTL Alleles for Yield and Fiber Qualities in Upland Cotton Cultivars Developed in China

Cotton is the world’s leading cash crop, and genetic improvement of fiber yield and quality is the primary objective of cotton breeding program. In this study, we used various approaches to identify QTLs related to fiber yield and quality. Firstly, we constructed a four-way cross (4WC) mapping population with four base core cultivars, Stoneville 2B, Foster 6, Deltapine 15 and Zhongmiansuo No.7 (CRI 7), as parents in Chinese cotton breeding history and identified 83 QTLs for 11 agronomic and fiber quality traits. Secondly, association mapping of agronomical and fiber quality traits was based on 121 simple sequence repeat (SSR) markers using a general linear model (GLM). For this, 81 Gossypium hirsutum L. accessions including the four core parents and their derived cultivars were grown in seven diverse environments. Using these approaches, we successfully identified 180 QTLs significantly associated with agronomic and fiber quality traits. Among them were 66 QTLs that were identified via linkage disequilibrium (LD) and 4WC family-based linkage (FBL) mapping and by previously published family-based linkage (FBL) mapping in modern Chinese cotton cultivars. Twenty eight and 44 consistent QTLs were identified by 4WC and LD mapping, and by FBL and LD mapping methods, respectively. Furthermore, transmission and variation of QTL-alleles mapped by LD association in the three breeding periods revealed that some could be detected in almost all Chinese cotton cultivars, suggesting their stable transmission and some identified only in the four base cultivars and not in the modern cultivars, suggesting they were missed in conventional breeding. These results will be useful to conduct genomics-assisted breeding effectively using these existing and novel QTL alleles to improve yield and fiber qualities in cotton.


Introduction
Cotton is the most important natural textile fiber source globally. The worldwide economic impact of the cotton industry is estimated to approximately $500 billion per year with an annual utilization of approximately 115 million bales or 27 million metric tons of cotton fiber. The tetraploid species, Gossypium hirsutum L. (n = 26, AD genome), also referred to as 'Upland cotton', accounts for 95% of the world's cotton production (National Cotton Council, USA, http://www.cotton.org, 2006). Current and obsolete cultivars of Upland cotton have been the main sources of cotton breeding program worldwide.
China is the largest cotton-growing nation, but is not an Upland cotton domestication country. Most cotton cultivars planted in China were derived from a few sources of germplasm such as Deltapine (DPL), Stoneville (STV), Foster and King, all of which were introduced from America. These cultivars represent the foundation of Chinese cotton breeding program and played a crucial role in the development of Chinese self-breeding cultivars. Cotton breeding in China has experienced several periods and cotton cultivar replacement began initially in the 1920's. In 1919, the King cultivar was introduced and followed by Trice and Lonestar cultivars in 1920. STV 4 and Delfos 531 were introduced in 1935-1936 and DLP in 1946. In 1950, large quantities of DPL 15 and STV 2B were introduced directly to replace all G. arboretum cultivars planted in China for several thousands of years and those deteriorated Upland cotton cultivars which were previously introduced [1][2]. In 1959, several cultivars were developed from cotton introduced from Uganda. Following their introduction, Chinese breeders started to develop cultivars via pedigree selection (PSP), and later hybridization programs (HSP) were conducted to develop high yield cotton cultivars with resistance to Fusarium wilt [2]. Therefore, the genetic base was narrow and, as a result, the genetic diversity of Upland cotton was low, especially in China due to the limited quantity of sources used [3][4].
Intra-specific genetic linkage maps of Upland cotton have been developed and used to identify quantitative trait loci (QTL) for agronomy and fiber quality traits [5][6][7][8][9][10][11][12][13][14][15]. The use of linkage disequilibrium (LD)-based association mapping has been suggested as a powerful genetic tool to identify DNA markers that are in LD with a locus controlling the trait of interest. This method is convenient because it helps avoiding the need to screen large biparental mapping populations [16]. LD can be detected statistically, and has been used to map genes underlying complex genetic traits in humans [17][18]. Association mapping was introduced to plant genetics in 2001 [19] and was subsequently applied to many plant species [20][21]. Identification of QTL by association mapping is widely used and has been employed in genetic studies of rice, corn, barley and other important agricultural crops [22][23][24][25]. Breseghello et al. (2006) used association mapping in 95 cultivars of soft winter wheat to identify alleles for kernel size and milling quality [26]. On the basis of association of 62 SSR loci with kernel size and milling quality traits, the authors compared the average phenotypic value of accessions with specific alleles and null alleles, and were able to identify several alleles potentially beneficial for these traits. In this study 'null allele' referred to markers which were no longer detected by PCR because of a mutation. Therefore, the phenotypic effect was judged by the marker's other allele [27]. However, not all of the markers may have null alleles, and even if they exist, they may be difficult to identify.
In the present study, four previously introduced Upland cotton cultivars (STV 2B, Foster 6, DPL 15, and CRI 7, a Ugandan germplasm-derived cultivar) were used to construct a four-way cross mapping population (4WC). This population was used to detect QTLs influencing agronomic and fiber quality traits. At the same time, we conducted LD-based association mapping using simple sequence repeat (SSR) markers. We measured important agronomic and fiber quality traits in 81 representative cultivars that were cultivated in China before transgenic cotton was introduced. Using this approach, a draft transmission table of QTLs and QTL alleles of breeding traits in Chinese Upland cotton was obtained. Some elite QTL alleles for yield and fiber quality traits were mined. The results provide preliminary insight into the genetic basis and diversity of Upland cotton cultivars, and offer useful information for cotton breeding and for further research.

4WC Mapping Population and Trait Evaluation
STV 2B, Foster 6, DPL 15 and CRI 7 seeds were made available from the Cotton Research Institute, Chinese Academy of Agricultural Sciences (CRI-CAAS). A mapping population consisting of 239 individuals was constructed from the 4WC (STV 2B/Foster 6//DPL 15/CRI 7), grown and evaluated for fiber quality and yield in 2007 in the Jiangpu Breeding Station of the Nanjing Agricultural University (JBS/NAU), Nanjing, China. Due to a lack of enough self-pollinated seeds, 220 4WC families (F 2:3 progeny families) were grown in 2008 in JBS/NAU in one-row plots with a randomized block design in triplicates to evaluate their performance. The plot was 0.8 m wide and 5 m long and the plant density approximated 37,500 plants ha-1. Fifteen individuals per replication were measured and averaged (n = 3) for each trait from the four parents and 220 4WC families in 2008. The following seven agronomic traits were evaluated: Plant height (PH, cm), number of fruit branches per plant (PB), number of bolls per plant (NB), boll weight (BW, g), lint percentage (LP), lint index (LI) and seed index (SI). Lint yield (LY) was determined by multiplying lint percentage with total seed cotton weight. The following fiber quality traits were evaluated by HVI spectrum: 2.5% fiber span length (FL, mm), strength (FS, cN/tex), elongation (FE), micronaire reading (FM), and uniformity ratio (FU).

Linkage Map Construction and QTL Mapping
DNA was extracted from 239 4WC individuals, two F 1 s and the four inbred parents as described before in our laboratory [28]. To  screen for polymorphisms among inbred lines parents, 8,342 SSR  primer pairs available in our laboratory were used. These SSRs  included NAU, BNL, CIR, JSEPR, STV, MUSS, MUCS, TM,  CER, CGR, DC, DPL and SHIN, which were described previously in detail [29][30][31][32][33][34]. Primers sequences can be obtained from Cotton Microsatellite Database (CMD, http://www. cottonmarker.org). Marker nomenclature consisted of a letter that specified the origin of the marker, followed by the primer number. The procedure for SSR analysis followed our published method of Zhang et al.(2000) [35].
All SSR primer pairs were used to screen for polymorphisms among STV 2B, Foster 6, DPL 15 and CRI 7. If one locus screened for polymorphisms was homozygous in two of the F 1 parents (aa_bb), this locus would be excluded from linkage analysis because the alleles would not segregate in 4WC. The polymorphic markers identified between STV 2B and Foster 6, or DPL 15 and CRI7 were used to survey 239 individuals of the 4WC. A Chisquare test for goodness of fit was used to assess Mendelian segregation ratios, including 1:1, 1:2:1, 3:1 and 1:1:1:1 ratios in 4WC.
JoinMap 3.0 [36] was employed to construct linkage maps, and linkage groups were assigned to chromosomes based on anchored markers in a high dense linkage map [28].
QTL analysis was carried out using the program Map-QTL 5.0 [37]. The significance thresholds for LOD scores were calculated by permutation tests in Map-QTL 5.0, with a genome-wide significance level of a = 0.05, n = 1,000 as significant QTL and a linkage group-wide significance level of a = 0.05, n = 1,000 as suggestive QTL [38]. QTL position indicated location of the peak. QTL nomenclature was adapted according to the method in rice [39], starting with 'q', followed by an abbreviation of the trait name (for example FL for fiber length, FS for fiber strength, etc.) and the name of chromosome, then followed by the number of QTL affecting the trait on the chromosome.

Population-based Association Mapping
A total of 81 representative Upland cotton cultivars were used in this experiment ( Table 1). These cultivars (excluding transgenic Bt cotton) were made available from the cotton germplasm collection in our laboratory and CRI-CAAS. These can be grouped into three types as follows: the first type includes those cultivars directly introduced and planted from USA and Uganda, the second type includes improved cultivars developed using PSP or once HSP from the first type cultivars, and the third type includes further improved cultivars developed with HSP or other breeding methods. Furthermore, these cultivars can be still classified on the basis of their ecological areas: the Yangtze River valley, the Yellow River valley, the Northern China area and America ( Table 1).
Eighty one cultivars were grown and evaluated in three locations: JBS/NAU in the Yangtze River valley cotton growing region from 2006 to 2008; Linqing/Shandong in the Yellow River valley cotton growing region in 2008, Kuerl/Xinjiang in 2007 in the Northwestern cotton growing region, and Sanya/ Hainan in the Southern cotton growing region during 2007 and 2008. A completely randomized block design with duplicates was employed for the field trials. The field management was adjusted to local practice. The same 12 agronomic and fiber quality traits mentioned above (see 4WC mapping populations) were evaluated. The genome-wide LD between pairs of SSR marker loci was studied according to Witt and Buckler (2003) using the software package TASSEL ver. 2.0i (http://www.maizegenetics.net/ tassel) [40]. LD was estimated by a weighted average of squared allele frequency correlations (r 2 ) between SSR loci. The significance of pairwise LD (p-values#0.01) among all possible SSR loci was evaluated using TASSEL with the rapid permutation test using 10,000 random draws with replacement.
The LD values between all pairs of SSR loci were plotted as triangle LD plots using TASSEL to estimate the general view of genome-wide LD patterns and evaluate LD structures. The r 2 values for pairs of SSR loci were plotted as a function of map distances (cM), and LD decay (at r 2 ,0.1) was estimated [40].
To evaluate the population structure of the association mapping population, the software package STRUCTURE 2.2 [41][42][43] was employed to subdivide cultivars into genetic subgroups. One hundred thirty one unlinked or distantly linked marker loci (hereafter referred to as ''unlinked''), distributed over all the cotton chromosomes, were used for assessment of population structure. The number of subgroups (K) was set from 1 to 10. For each K, three runs were performed separately. The burn-in was set to 10,000 and the number of replications was set to 100,000. The general linear model (GLM) association test was performed according to Yu et al.(2006) [44] using the TASSEL software package [45].

DNA Extraction and Microsatellite Markers
An equal quantity of fresh, young leaves from each variety were collected and immediately brought to the laboratory where total genomic DNA was extracted as described before in our laboratory [28].
Our study is based on a genetic map which contains 3,147 loci in 26 linkage groups and was constructed in our laboratory [28,46]. We selected one pair of SSR primers every 10 cM on this map. This resulted in use of 402 primer pairs to screen the 81 cultivars and to ensure a broad genome-wide coverage of genotyping and a representative estimation of genetic distances.

Mining of QTL Alleles
Based on results of SSR association with the 12 traits, QTL alleles that associated significantly with the traits were further analyzed. The phenotypic allele effect was estimated through comparison between the average phenotypic value over accessions with the specific allele and that of all accessions: where a i is the phenotypic effect of the ith allele; x ij is the phenotypic value over the jth material with the ith allele; n i is the number of materials with the ith allele; N k is the phenotypic value over all accessions; n k is the number of all accessions. If a i .0, it is supposed to be the positive allele, if it is ,0, it corresponds to the negative allele.

4WC and Family-based QTL Mapping for Yield and Fiber Qualities
Mean values, standard deviation, ranges, skewness, and kurtosis for traits measured in the parents and 4WC families are shown in Table 2. All traits from these four data sets exhibited continuous  distribution in the 4WC population. ANOVA showed that there were significant differences (P,0.05) for all 12 traits among the four parents and in the population tested here. Of 8,324 SSR primers, only 238 (2.85%) detected polymorphisms between STV 2B and Foster 6, and DPL 15 and CRI 7, and generated 246 loci. In this 4WC screening, three polymorphic types comprising two, three and four alleles can be theoretically identified. Out of the 246 polymorphic loci, 240 (97.6%) produced two alleles, 6 (2.4%) three alleles (ab_ac), but none produced four alleles at one locus (ab_cd). A linkage map with 201 SSR loci and 58 linkage groups was constructed, and covered a length of 1691.0 cM with an average interval of 8.4 cM between loci. Based on our microsatellite-based, gene-rich linkage map [28,46], 54 linkage groups were assigned to 25 chromosomes except chromosome D4 (chro.D4), in which 24 linkage groups assigned to the A-subgenome (which contained 86 loci and spanned 654.3 cM) and 30 linkage groups assigned to the D-subgenome (containing 104 loci and spanning 885.4 cM) (Figure 1).
The data for yield and fiber qualities of 239 4WC-F 2 plants and their 220 F 2:3 family lines were used to detect QTLs by interval mapping. As a whole, 83 QTLs were identified which explained 2.6% to 73.9% of the total phenotypic variance (PV). A summary of characteristics of the QTLs detected in each analysis, including position, confidence interval, LOD score, the mean value of four different genotypes, PV, additive effects of a 1 and a 2 and overall dominance effect (d) are shown in Figure 1 and Table S1. A total of 59 QTLs for yield components and 24 QTLs for five fiber qualities were detected in two progenies. Among 59 QTLs for yield components, seven (qPH-A7-1, qPH-D1-1, qPH-D7-1, qBW-D2-1, qLP-A2-1, qLP-D3-1 and qLI-D3-1) were significant, and three were detected in both generations. In the significant QTLs contributing to PH, qPH-A7-1 with minus a 1 meant that the synergistic site came from Foster 6. Similarly, both qPH-D1-1 and qPH-D7-1 came from STV 2B (positive a 1 ) and DPL 15 (positive a 2 ), qBW-D2-1 from Foster 6 and DPL15, and qLI-D3-1 from STV 2B and DPL15. Accordingly, compared with the other three parents, DPL15 had a higher impact on agronomic traits. Among the 24 QTLs for the five fiber qualities detected, STV 2B contributed six QTLs which led to an increase in FM and FL, Foster 6 contributed nine QTLs leading to an increase in FL, FS and FU, and DPL15 and CRI7 contributed eight and seven QTLs, respectively, to enhance the fiber qualities.
Comparing 4WC QTL mapping with our previously published results using the traditional family-based linkage (FBL) method in modern Chinese cotton cultivars or germplasm lines [5,[7][8][10][11][47][48][49][50], we found 28 consistent QTLs (28/59, 47.5%) between these four base cultivars and modern Chinese cotton cultivars ( Table 3). This result indicates that these are stably transferred or inherited QTL which can be further used in marker-assisted selection (MAS) breeding to improve cotton yield and fiber quality in future.

Population-based Association QTL Mapping for Yield and Fiber Qualities
LD is the basis of association mapping. The analysis of genomewide LD between SSR loci provides markers for the status of LD in the cotton genome. In this study, the proportion of locus pairs supported by significant probability (P,0.01) was low and accounted for only 2.93% (624/21321), indicating that the level of LD in the cotton genome was low. We also determined the structure of haplotypic LD since a strong block-like LD structure simplifies LD mapping of complex traits. Triangle plots for pairwise LD between SSR markers demonstrated significant LD blocks in the genome-wide LD analysis. The decay rate of r 2 values was very fast, the maximum distance of LD decay of cotton cultivars in this study was approximately 13-14 cM ( Figure S1). The results of STRUCTURE showed that the Chinese Upland cotton cultivars (Table 1) could be best divided into four subgroups ( Figure S2).
QTLs for yield and fiber qualities detected in 4WC, LD association and FBL QTL mapping in modern Chinese cotton cultivars are summarized in Table 3. There were 66 populationbased QTL associations for 12 yield and fiber quality traits which we detected either in 4WC or FBL mapping in modern Chinese cotton cultivars. By comparing 4WC mapping and LD mapping, we found that there were 28 consistent QTLs (28/180, 15.56%) between them. Furthermore, the 44 consistent QTLs (44/180, 24.44%) which were mapped in modern Chinese cotton cultivars using conventional FBL and LD mapping ( Table 3), revealed that these are stably inherited QTLs which can be used in MAS breeding. We believe that the more the cotton cultivars are used to tag QTL and the more the consistent QTLs will be detected.

Mining of Elite QTL Alleles to Improve Yield and Fiber Qualities in Cotton
Among the 402 amplified SSRs, 207 appeared polymorphic and produced a total of 541 alleles. The average number of alleles per locus was 2.61, ranging from 2 to 7. More than half of the primers amplifying polymorphic alleles (120 SSR primers) generated two alleles. The large range and the low mean value indicated that the variation of cotton cultivars was rich at the genome level, but that the genetic basis of variation in Upland cotton was limited.
Phenotypic effects of some elite QTL alleles significantly associated with agronomic and fiber quality traits and their typical characteristics are shown in Table S3. Each QTL allele had positive and/or negative alleles to some extents. Among the alleles associated with LP, qNAU3398-3 in Simian 4 had the most positive phenotypic effect and was able to increase LP by 8.26%, whereas NAU5166-3 in Shanmian1 had the most negative phenotypic effect (211.49%). Among the alleles associated with PH, qNAU5091-2 had the most positive phenotypic effect (5.10 cm), whereas qJESPR232-2 and qJESPR227-2 had the most negative phenotypic effect (218.3 cm). Among the alleles of loci associated with FS, qNAU2156-2 in CRI4133 had the most positive phenotypic effect and increased fiber strength to 1.80 cN/ tex while qNAU2156-3 in 52-128 had the most negative phenotypic effect (20.94 cN/tex).

Transmission and Variations of QTL Alleles for Yield and Fiber Qualities among Chinese Cotton Cultivars
The transmission and variation of elite QTL alleles for each trait in the three breeding periods are summarized in Table 4. From this table it is obvious which QTL allele was passed down from the four core cultivars, which ones detected to exist in the four core cultivars and were not selected by breeders to develop modern Chinese cotton cultivars, and which ones were new and/ or unreported QTL alleles associated with agronomic and fiber quality traits. It enabled us to classify QTL alleles detected in the present study into three types and this is illustrated using lint percentage as an example( Table 4). The first type of QTL alleles, such as qNAU3917-1 and qBNL3103-1, can be detected in all four core cultivars and were transferred into most cultivars in the two breeding periods. These QTL alleles should be regarded as base genetic constitution for lint development. The second type, such as qNAU1302-1 and qNAU3700-1, were detected in three core cultivars and transferred into some of the cultivars during the two breeding periods. The third type, such as qNAU5166-2 and qNAU3398-3, which can greatly increase lint percentage by 6.48% and 8.26%, respectively, were neither found in the four core cultivars nor in most Chinese cultivars. These QTL alleles may have been introduced from other sources, perhaps by genetic recombination, and have a great potential in increasing lint percentage and lint yield in MAS breeding.

Discussion
In the present study, we successfully identified 180 QTL using 121 SSR markers and these were significantly associated with 12 agronomic and fiber quality traits. Among them, we identified 66 QTL via LD mapping for 12 yield and fiber quality traits which we detected either by 4WC or FBL mapping in some modern Chinese cotton cultivars. We found that there were 28 consistent QTLs between our 4WC and LD association mapping, and 44 consistent QTLs mapped in modern Chinese cotton cultivars using conventional FBL and LD mapping methods. Comparison of 4WC, LD association and FBL QTL mapping suggested that some of these QTLs were transmitted and/or kept in conventional breeding selection from the four introduced core cultivars and may be very important in cotton agronomic and fiber quality development. Our results revealed that association mapping based on LD using diverse sets of cultivated cotton germplasm is a useful tool in detecting QTLs efficiently.

Association Mapping Based on LD is an Alternative Powerful Tool to Exploit the Natural Genetic Diversity in Cotton
The application of LD-based association mapping is an alternative powerful molecular tool to exploit the natural genetic diversity conserved within crop germplasm collections. The resolution of association mapping depends on the extent and distribution of LD across the genome within a given population [51]. The extent of LD has been scaled and association mapping has been successfully used in many plant species [21]. In sugar beet (Beta vulgaris L.), genomewide LD extended up to 3 cM [52], but in some Arabidopsis populations, LD exceeded 50 cM [53]. Genome-wide LD decay as a function of genetic distance is very common for distances ,10 cM [54] in barley (Hordeum vulgare L.), and very different in maize (Zea mays L.), in which LD diminished after 2000 bps [51].
Though association mapping based on LD was successfully used in some crops, it is important to consider the influence of mixed population structure and relationship of individuals in association mapping [42,44,55]. Many crops have a long and complex history of domestication and breeding, and complex population structures may confound association mapping [56][57].
Overall, the small extent of LD in the cotton genome illustrates the significant potential for LD-based association mapping for agronomic and fiber quality traits in cotton with a relatively large number of various sorts of markers. However, the limited polymorphism between Upland cotton cultivars may reduce the mapping resolution, particularly in breeding germplasm. As crosspollination is common in cotton, the LD level in cotton genomes was low and only 2.95% of locus pairs were significant. LD decay was measured at 13-14 cM in cotton. Considering the tetraploid cotton genome with a total recombination length of about 5,200 cM and an average 400 kb per cM [58], the LD block sizes are still small to conduct association mapping of complex traits which would require nearly 1000 polymorphic markers. It is difficult to reach such a high density using only SSR markers, highlighting the need for new molecular markers. As next generation sequencing techniques develop, any progress to sequence tetraploid cotton will advance association mapping of complex traits based on single nucleotide polymorphisms.

Potential Usages of QTL Alleles Identified in Genomicsassisted Cotton Breeding in Future
Association mapping based on LD using a GLM approach with 81 Upland cotton cultivars laid the foundation for a potential genomics-assisted breeding program in cotton. We analyzed SSR markers significantly associated with genotypes and phenotypes of cultivars in the average environment of every trait, and detected a number of elite alleles associated with 12 agronomic and fiber quality traits in Upland cotton. These will be useful for MAS breeding program to develop cultivars with high yield and superior fiber qualities. We suggest that a genomics-assisted ranking system for QTL alleles should be developed based on LD association mapping. First of all, great attention should be paid to those QTL alleles that are not found in the four core cultivars and in most other Chinese cultivars. They may have been introduced from other sources via genetic recombination and may hold great potential in increasing lint yield and fiber qualities. For example, qNAU5166-2 and qNAU3398-3 increased lint percentage by 6.48% and 8.26%, respectively. The more cotton germplasm lines are surveyed, the more elite QTL alleles may be mined.
Secondly, using MAS breeding, it would be prudent to select those QTL alleles which can be detected in all four core cultivars and most other Chinese cultivars since they may represent a basic genetic requirement. Examples are qNAU3917-1 and qBNL3103-1. In addition, QTL alleles which were detected in the core cultivars, but not in most of Chinese cultivars (such as qNAU1302-1 and qNAU3700-1) may represent desirable traits.
Thirdly, a genomics-assisted breeding program to pyramid QTL alleles could be developed on the basis of LD association mapping. For example, in our study qBNL3792-3 was associated with an increase in the number of cotton fruit branches, qNAU5166-2 was associated with enhanced lint percentage, qNAU4921-2 contributed to increased fiber length, and qNAU2156-2 associated with fiber strength. In view of specific links between phenotype and genotype, when selecting mating parents one should consider phenotype and genotype to achieve maximum complementary between materials.
To improve fiber quality, for example, one should hybridize simultaneously the material with alleles qNAU4921-2 and qNAU1048-1, which can enhance fiber length efficiently, and another allele qNAU2156-2 which can increase fiber strength. It will then be possible to select cultivars with superior fiber qualities from their offspring through MAS programs.