QTL Mapping for Fiber and Yield Traits in Upland Cotton under Multiple Environments

A population of 178 recombinant inbred lines (RILs) was developed using a single seed descendant from a cross between G. hirsutum. acc DH962 and G. hirsutum. cv Jimian5, was used to construct a genetic map and to map QTL for fiber and yield traits. A total of 644 polymorphic loci were used to construct a final genetic map, containing 616 loci and spanning 2016.44 cM, with an average of 3.27 cM between adjacent markers. Statistical analysis revealed that segregation distortion in the intraspecific population was more serious than that in the interspecific population. The RIL population and the two parents were phenotyped under 8 environments (two locations, six years), revealing a total of 134 QTL, including 64 for fiber qualities and 70 for yield components, independently detected in seven environments, explaining 4.40–15.28% of phenotypic variation (PV). Among the 134 QTL, 9 common QTL were detected in more than one environment, and 22 QTL and 19 new QTL were detected in combined analysis (E9). A total of 26 QTL hotspot regions were observed on 13 chromosomes and 2 larger linkage groups, and some QTL clusters related to fiber qualities or yield components were also observed. The results obtained in the present study suggested that to map accurate QTL in crops with larger plant types, such as cotton, phenotyping under multiple environments is necessary to effectively apply the obtained results in molecular marker-assisted selection breeding and QTL cloning.


Introduction
Cotton (Gossypium L.) is an important economic crop, providing most of the natural textile fiber utilized worldwide. Upland cotton (G. hirsutum) is widely cultivated, and planted in more than 70 countries, contributing to over 95% of the total cotton yield worldwide [1]. Because of the softness and comfort of cotton fiber, cotton products are very popular. In recent decades, improvements in cotton fiber quality and yield have been stagnant and unable to meet the demands of the modern textile industry. However, yield is often negatively correlated with fiber quality in upland cotton [2,3]. Conventional cultivar breeding programs, primarily selecting novel allele combinations based on phenotypic selection [4], is difficult to break the linkage of negatively correlated traits. Fortunately, the development of genetic linkage maps facilitate the dissection of quantitative trait loci (QTL) that control fiber qualities and yield components, which make it possible to pyramid elite genes of fiber quality and yield traits.
Because of the low genetic polymorphism between the intraspecific hybridization of upland cotton, several high-density interspecific linkage maps between G. hirsutum and G. barbadense have been constructed to study QTL for fiber quality and yield traits [5][6][7]. However, due to the sterility and segregation distortion of interspecific progeny, intraspecific hybridization has become the primary method in breeding programs, contributing to the recent development of upland cotton intraspecific genetic maps [8][9][10][11][12][13][14][15].
As the most important cotton cultivar, much research attention has been paid to the improvement of the fiber quality and yield of upland cotton. The QTL mapping of fiber quality traits could provide a solid foundation for future studies concerning marker-assisted selection breeding and map-based cloning. Hundreds of QTL associated with fiber quality and yield components have been obtained from F 2 , F 2:3 , and RIL populations in upland cotton [8,9,[12][13][14][15][16][17][18][19][20][21][22][23][24][25]. The F 2 population is a common population used in genetic map construction and QTL mapping, but these studies cannot be replicated; thus, stable and available QTL could not be identified using the F 2 population. The construction of immortalized mapping populations is an effective approach to obtain stable QTL, such as recombination inbred lines (RILs). However, the complex allotetraploid genome and agronomic traits in crops are inherited in a complex manner, suggesting that cotton traits are highly affected by environmental and climatic conditions, obtaining stable QTL in allotetraploid cotton is difficult. Tang et al. (2015) constructed a genetic map, containing 1,540 loci spanning 2,842.06 cM, and a total of 62 QTL were identified using combined analysis and single environment analysis; seventeen QTL were detected in more than one environment. Ning et al. (2014) identified 86 QTL for yield components and fiber qualities from an RIL population. In addition, a stable fiber strength QTL (qFS-D3-1), explaining 4.51-17.55% of the phenotypic variation (PV), and a stable fiber length QTL (qFL-D11-1), explaining 10.02-25.34% of the PV, were obtained.
In the present study, two upland cottons, DH962 and Jimian5, with different fiber qualities and yield component traits [21], were used as parents to establish a recombinant inbred line (RIL) population. The objectives of this study were to construct an intraspecific upland cotton map using SSRs, InDels and SNPs based on this RIL population, which was used to detect QTL associated with fiber quality and yield traits under multiple environmental conditions.

Mapping population and DNA isolation
The G. hirsutum acc. DH962 and G. hirsutum cv. Jimian5 were used as the mapping parents. DH962 was derived from the [(Jinmian6 × G. thurberi) F 4 × Jinmian6] F 3 population, showing good performance in fiber quality as a female parent and continuous self-pollination since 2001. Jimian5 is a cultivar with high yield as a male parent. DH962 and Jimian5 were crossed to obtain F 1 plants on the farm at Huazhong Agricultural University (HAU), Wuhan, China, in the summer of 2002 [10]. The F 1 plants were planted during the winter in Hainan Province and self-pollinated to produce the F 2 generation. The F 2 seeds were planted and self-pollinated to produce F 2:3 seeds on the farm of HAU in 2003. An RIL population was developed using the single seed descendant method to generate F 2:7 plants, which were subsequently planted at HAU for propagation in 2007. The F 7:8 generations of 178 RIL families were used in the present study. Genomic DNA was isolated from the fresh leaves of 178 RIL plants and parents using a CTAB procedure [26]. . These fields are only used for research purposes, and the field studies did not involve endangered or protected species. The fiber quality data in E7 and yield components in E1 were lost, and combined analysis (E9) was conducted after determining the mean values in seven environments. The lines were planted in single-row plots of 5 m in length with 0.8 m row spacing. All the lines were planted in the field using a randomized block. Twenty bolls from each line were simultaneously harvested for fiber quality and yield investigation. Six yield components and 5 fiber quality trait components were analyzed, including boll number per plant (BN), seed cotton weight per boll (SCW), lint weight per boll (LW), lint percentage (LP), lint index (LI), seed index (SI), fiber length (FL, mm), fiber strength (FS, cN/tex), fiber length uniformity ratio (FU), fiber elongation (FE), and micronaire (MIC).

DNA marker analysis
A total of 634 SSRs, InDels, and SNPs, selected according to Wang et al. (2015) [15], were used to genotype each RIL plant. PCR amplification and silver staining were performed according to Lin et al. (2005) [27]. The SRAP markers were not genotyped in this RIL population because these polymorphisms are difficult to crosstalk with those identified in previous studies. The PCR products of SSRs were separated on 6% denaturing polyacrylamide gels [27] or 8% native polyacrylamide gels using single-strand conformation polymorphism (SSCP) technology [28]. The PCR products of InDels and SNPs were separated on 8% native polyacrylamide gels using SSCP technology [28].

Data analysis, genetic map construction and QTL analysis
The difference between the two parents for each trait was detected using a t-test. The broadsense heritabilities of measured traits were calculated according to the method of Knapp et al. (1985) [29]. The coefficients of genetic correlation between measured traits were computed according to the method of Kwon and Torrie (1964) [30]. The phenotype data were analyzed using SPSS version 21.0 (SPSS, Chicago, IL, USA).
A chi-square test was performed to determine whether the genotypic frequencies at each locus deviated from the expected 1:1 segregation ratio in the RIL population. The genetic linkage map of the RIL population was constructed using JoinMap 3.0 [31] with a logarithm of odds (LOD) threshold of 4.0 and a maximum recombination fraction of 0.4. Map distances in centi-Morgans (cM) were calculated using the Kosambi mapping function [32]. Linkage groups were assigned to chromosomes based on BC 1 [33] and F 2 [10] linkage maps and marker mapping information on the CottonGen database (http://www.cottongen.org/). The linkage groups were named as "ChrΧ" based on the linkage group length from long to short. QTL were identified using Windows QTL Cartographer version 2.5 (http://statgen.ncsu.edu/qtlcart/ WQTLCart.htm) based on composite interval mapping (CIM). The statistical significance of the LOD threshold value was determined using a permutation procedure (1,000 times for all traits). The QTL nomenclature was adapted according to the method of McCouch et al. (1997) [34]. The resulting linkage map and QTL were drawn using MapChart V2.2 software [35].

Meta-analysis of the co-located QTL
The meta-analysis of the co-located QTL was conducted using Biomercator V3 software (http://moulon.inra.fr/index.php/fr/component/docman/cat_view/21-logiciels/101-abiproject-and-software/104-biomercator-v21). A map file and a QTL file were required to import into the Biomercator V3 software [36]. The map file contains map name, marker name, and distance between adjacent makers, etc. And the QTL file contains map name, QTL name, chromosome name, trait, LOD score, phenotypic variance (PV), position of the QTL, etc. The order 'meta-analysis' was used to integrate the QTL to detect the hotspot region.

Trait performance and correlation analysis in the RIL population
The traits of fiber qualities and yield components are summarized in S1 Table. Overall, the trait values of DH962 were higher than those of Jimian5 in fiber qualities, and Jimian5 demonstrated higher trait values than DH962 in yield components, except that no significant difference was observed for SI and LI. The RIL population performed transgressive segregation on all traits. The results of ANOVA, shown in S2 Table revealed that most of fiber quality and yield component traits presented significant genetic and environmental effects (P < 0.01), except that SI showed no significantly environmental effect. And the broad-sense heritabilities of the fiber quality and yield component traits were showed in Table 1. The genetic potential of fiber quality and yield component traits were general low in cotton [37][38][39]. Boll number had the lowest broad-sense heritability (16.19%), fiber length had the highest broad-sense heritability (70.82%).
Genetic correlation analysis between fiber quality and yield component traits was calculated based on covariance (Table 2). FL was significantly and positively correlated with FS, FU, and significantly and negatively correlated with MIC, FE, LW, LP and BN. FS was significantly and positively correlated with FU and SI, and significantly and negatively correlated with MIC, FE, LW, LP, BN and LI. Among the yield component traits, most traits were positively correlated between two traits, except that SI was significantly and negatively correlated with LP.

Genetic map construction
A total of 644 loci were obtained from the selected 634 primer pairs from Wang et al. (2015) after genotyping the RIL population. After linkage analysis, 616 of 644 loci were mapped on 59 linkage groups; the total length of the linkage map was 2016.44 cM, with a mean distance of 3.27 cM between adjacent markers (Fig 1). The map included 538 SSR loci, 32 InDel loci and 46 SNP loci. Among the 59 linkage groups, there were 2-58 loci on each linkage group with 1.88-104.57 cM long. Fifty-three linkage groups were assigned to 24 chromosomes and 4 ones were unanchored; and most of the loci from two larger linkage groups (LG1-Chr1/15 and LG2-Chr9/23) were mapped on two pairs of homologous chromosomes (Chr1 and Chr15, Chr9 and Chr23, respectively).
Fiber length uniformity ratio. Seven QTL were detected on 7 chromosomes (Chr9, Chr14, Chr17, Chr20, Chr21, Chr24, and Chr26), explaining 5.09-8.67% of the PV, with LOD scores ranging from 2.52 to 4.13 (S3 Table). Only one QTL was located in the A T genome, and six QTL were located in the D T genome. Among which, 5 QTL showed positive additive effects originating from 'DH962', and 2 QTL showed negative effects originating from 'Jimian5'.
Seed index and lint index. Two and two QTL were identified for SI and LI, respectively (S3 Table). Two QTL of SI explained 6.90 and 14.38% of the PV, with LOD scores of 3.49 and 6.64, respectively. Two QTL for LI explained 5.02 and 7.30% of the PV, with LOD scores of 2.69 and 3.71, respectively. The QTL qLI-c17 was detected in E6 and combined analysis (E9).

Discussion
In the present study, 616 loci were mapped on 59 linkage groups, and the total length of the linkage map was 2016.44 cM, covering 40.33% of the upland cotton genome. Compared with high-density interspecific genetic maps [40][41][42][43], the intraspecific genetic map generated in the present study showed low coverage throughout the cotton genome. This phenomenon was also observed for the construction of other genetic maps in upland cotton [11,13,14]. Due to the narrow genetic base of upland cotton, we screened most of the SSRs in the CottonGen database (http://www.cottongen.org) and some SNPs and InDels developed in the laboratory [15], but the results were not satisfactory, and with the increase in map density, the efficiency of QTL detection was greatly improved [7,15]. There is no high-density and coverage genetic map for upland cotton, which is the huge obstacle for QTL mapping in upland cotton. In previous studies [14,15], the effect of SSRs was lower in the construction of the upland cotton genetic map, and the development of lots of SNPs and InDels using next-generation sequencing technology could provide a new outlet for future studies.
Segregation distortion widely exists in the study of population genetics. In the present study, 144 (22.36%) of the 644 loci showed segregation distortion. In previous studies of cotton [3,8,11,13,14,19,27,42,[44][45][46][47][48][49][50], many reports have focused on segregation distortion, we summarized these results in S5 Table. The percentages of distorted loci in interspecific populations ranged from 7.65% to 25.50%, while they ranged from 22.36% to 81.25% in intraspecific populations. Although the genetic difference is bigger, the segregation distortion is smaller in interspecific populations, and this phenomenon presents an interesting topic for future studies. S5 Table also showed that most of the distorted loci skewed toward upland cotton in interspecific populations, and most of the distorted loci skewed toward one of the parents in upland cotton in intraspecific populations. Lacape et al. (2010) showed that the average fiber characteristics of the interspecific RILs were closer to the G. hirsutum parent than to the G. barbadense parent, and 71% and 29% of the observed distorted allelic skewed toward G. hirsutum and G. barbadense, respectively. As the important cultivated cotton, most of the varieties of upland cotton were improved, and some good genes from other Gossypium species were introgressed into upland cotton through hybridization and backcross methods. The introgression of good genes generated the genetic differences observed between the mapping parents of upland cotton, resulting in segregation distortion [8,11,13,14,19]. The number of distorted loci skewed toward 'DH962' and 'Jimian5' were similar, and the percentage of distorted loci was lower than that in other upland cotton populations in the present study, suggesting that the ability of selection and the combination of gametophyte mapping parents are similar. In addition, the fiber quality and yield component traits were significantly different (S1 Table), and all the results showed that the two parents were suitable for the development of the RIL population.
The clustering of QTL in tetraploid cotton has been reported in some studies [3,8,11,13,14,19,20,47,51,52]. The present study also identified 26 QTL hotspot regions, among which 17 QTL hotspot regions affected two or more different fiber quality or yield component traits. The phenomenon of QTL clustering might represent the linkage of genes and QTL or result from pleiotropic effects of a single QTL in the same genomic region [3]. For example, NAU5107, detected on E5, was the nearest marker of qFS-c1/15-1 and qFE-c1/15-4; and FS was significantly and negatively correlated with FE in the present study (Table 2). These QTL hotspot regions revealed that the linkage drag of QTL hindered the improvement of fiber quality [20]. In addition, 11 larger QTL clusters were also obtained in the present study. As shown in Fig 1, we observed that most of the clusters showed the enrichment of fiber quality or yield component traits. On LG1-Chr1/15, most of the QTL were correlated with FS and FE in the cluster. On Chr13, most of the QTL were correlated with LP and BN in the cluster. On LG2-Chr9/23, 18 QTL were distributed on the QTL clusters, 8 QTL were associated with SCW, and 5 QTL were associated with LW. Previous studies [51,52] generated multiple QTL clusters and hotspots of fiber quality or yield component traits in the cotton genome through the analysis of most of the publications on cotton. The QTL clusters provided valuable information to determine genome regions with different traits.
S2 Table showed that all the fiber quality and yield component traits presented significant environmental effects, and the present study revealed that cotton traits were highly affected by environments and climate conditions. An example of meteorological information of E5, E6, E7 and E8 is shown in S6 Table and S2 Fig. S6 Table showed that the number of rainy day during the four years were different. For example, 14 rainy days were reported in E7, but 6 rainy days were reported in E8 in August. And S7 Table showed that the temperature presented significant environmental effects (P < 0.01) in July and August, and these two months represented a critical period for blossom and fiber development in cotton. The different climate conditions during the two months might have seriously affected the fiber quality and yield of cotton.
Among the 134 QTL detected in the present study, 9 common QTL were obtained in more than one environment. The difficulty of obtaining stable QTL has been reported in the previous studies [8,11,12,14]. Two QTL, qFL-c10-1 and qMIC-c25, were detected in three environments and combined analysis (E9), and the results revealed that fiber quality traits were more stable than yield traits under multiple environmental conditions. Notably, two stable QTL were not clustered with any other QTL associated with fiber quality and yield traits, providing an opportunity to identify candidate genes for improving the fiber quality of upland cotton.

Author Contributions
Conceived and designed the experiments: ZL. Performed the experiments: HW CH HG XL WZ BD ZY. Analyzed the data: HW. Wrote the paper: HW ZL.