Dissecting Genetic Network of Fruit Branch Traits in Upland Cotton by Association Mapping Using SSR Markers

Genetic architecture of branch traits has large influences on the morphological structure, photosynthetic capacity, planting density, and yield of Upland cotton (Gossypium hirsutum L.). This research aims to reveal the genetic effects of six branch traits, including bottom fruit branch node number (BFBNN), bottom fruit branch length (BFBL), middle fruit branch node number (MFBNN), middle fruit branch length (MFBL), upper fruit branch node number (UFBNN), and upper fruit branch length (UFBL). Association mapping was conducted for these traits of 39 lines and their 178 F1 hybrids in three environments. There were 20 highly significant Quantitative Trait SSRs (QTSs) detected by mixed linear model approach analyzing a full genetic model with genetic effects of additive, dominance, epistasis and their environment interaction. The phenotypic variation explained by genetic effects ranged from 32.64 ~ 91.61%, suggesting these branch traits largely influenced by genetic factors.


Introduction
Gossypium hirsutum is one of the commercially grown species of cotton. In the past decades, thousands of analyses have been carried out to find what factors are important in cotton growth. Previous studies have revealed that branch traits were related to plant density, canopy structure, and photosynthetic capacity, thus influencing fiber quality and yield [1,2]. Association mapping aims to discover quantitative trait loci (QTLs) by evaluating the marker-trait associates, which influences the strength of linkage disequilibrium between genotypes and phenotypes across a population [3]. Since association mapping has been used in many researches related to complex disease and agronomic traits, it becomes an effective way to dissect the genetic basis of complex traits. Compared with linkage analysis, association mapping has three advantages: no need to construct mapping population; detect multiple loci at one time; high resolution. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 The first research that utilized association mapping on plant genetics is in 2001 [4]. Several works have also been done on rice [5], maize [6], sorghum [7], wheat [8], barley [9], and potato [10]. As for cotton, a few studies have been carried out to understand its agronomic traits such as fiber traits and yield [11,12]. Some studies also are involved with branch traits [13][14][15].
In this study, the association mapping was carried out for 872 SSR markers of 39 lines and their 179 F 1 hybrids in three environments. Different with previous studies on branch traits, the branches of cotton plant divided into three parts: upper, middle and bottom. This may help to find whether the branch traits of different parts have different genetic characteristics. PCR-based molecular markers SSRs (simple sequence repeats) from the cotton markers database (CMD) were extracted. Software QTXNetwork (http://ibi.zju.edu.cn/software/ QTXNetwork/) was used in this study, which is based on mixed linear model approach [16]. And the association mapping was implemented by using a full genetic model with genetic effects of additive, dominance, epistasis and their environment interactions.

Parent selection and experimental design
A tested mapping population is representative for part of Upland cotton cultivars in China's cotton-growing regions. These cultivars or lines (S1 Table) were provided by the Institute of Cotton Research, Chinese Academy of Agricultural Sciences and Tarimu University. The 39 cotton cultivars or lines (S1 Table) and their 178 F 1 hybrids were analyzed by association mapping because of their genetic diversities. These materials were planted and evaluated for six branch traits in three environments (e 1 : Anyang in 2012; e 2 : Anyang in 2013; e 3 : Alar in 2012). A randomized complete block design with three replications was employed in the field trials. Each block was settled with two rows, and each row was kept in a 5.00 m long and 0.70 m wide plot. The distance between plants was 0.25 m wide (approximately 40 plants for each material in each replication) in Anyang; the distance between plants was 0.10 m wide (approximately 40 plants for each material in each replication) in Alar. Six fruit branch architecture traits were analyzed from random selected 10 plants for each material in each replication, these traits are bottom fruit branch node number (BFBNN), bottom fruit branch length (BFBL), middle fruit branch node number (MFBNN), middle branch length (MFBL), upper fruit branch node number (UFBNN), and upper fruit branch length (UFBL).

DNA extraction and SSR markers screening
The leaves of cotton were collected in the pots in March 2012. Two cotyledons were taken from two plants of each variety (or line). DNA samples were extracted from 39 parents as described by Paterson et al. [17]. The downloaded sequences of SSR primes are mostly from the Cotton Marker Database (CMD, http://www.cottonmarker.org/cgi-bin/panel.cgi). This database contains all publicly available cotton SSR markers, which provides a more efficient utilization of molecular marker resources and will help accelerate basic and applied research in molecular breeding and genetic mapping in Gossypium spp. Polymorphic information is acquired from a standard screening panel including Upland cotton cultivars and other tetraploid species. A total of 5,052 SSR primer pairs were examined to screen for polymorphisms among thirty-nine inbred lines parents. These SSR primer pairs included CGR, GH, HAU, NAU, BNL, DPL, C and MGHES primer series. Marker nomenclature consisted of a letter indicating the origin of the marker, followed by the primer number. SSR analysis was conducted following the procedure described by Yao et al. [18]. For a difference stripe, if parent 1, 2 had no stripe, coded "-1" for the genotypes of SSR of parent 1, 2 (or QQ genotype); if parent 3, 4 had stripe, coded "1" for the genotypes of SSR of parent 3, 4 (or genotype qq), F 1 from parent 1 and parent 2 is coded to "0", F 1 from parent 3 and parent 4 is coded to "1", (or genotype Qq). For a couple of primer, only a stripe is coded for the same stripe to conducted association analysis. A total of 351 couple primers (872 difference stripes) showed polymorphic among the 39 Upland cotton varieties (or lines) in current study.

Statistical analysis
The full genetic model for the phenotypic value of the k-th individual in the h-th environment (y hk ) can be expressed by the following mixed linear model, where μ is the population mean; e h is the fixed effect of the h-th environment; a i is the additive effect of the i-th locus with coefficient u A ik (1 for QQ, coded no stripe parent; −1 for qq stripe coded having stripe parent); d i is the dominance effect of the i-th locus with coefficient u D ik (1 for Qq, 0 for QQ and qq, coded having stripe hybrid F 1 ); aa ij , ad ij , da ij and dd ij are the digenic epistasis effects between the i-th locus and j-th locus with coefficients u AA ijk (1 for QQ × QQ and qq × qq, −1 for QQ × qq and qq × QQ), u AD ijk (1 for QQ × Qq, −1 for qq × Qq), u DA ijk (1 for Qq × QQ, −1 for Qq × qq) and u DD ijk (1 for Qq × Qq), respectively; ae ih is the additive × environment interaction effect of the i-th locus in the h-th environment with coefficient u AE ikh ; de ih is the dominance × environment interaction effect of the i-th locus in the h-th environment with coefficient u DE ikh ; aae ijh , ade ijh , dae ijh and dde ijh are the digenic epistasis × environment interaction effects between the i-th locus and j-th locus in the h-th environment with coefficient u AAE ijkh , u ADE ijkh , u DAE ijkh and u DDE ijkh , respectively; and ε hk is the residual effect of the k-th line or hybrid in the h-th environment.

Association analysis
The SSR-trait association analyses were performed based on the above mixed linear model, and genetic effects were estimated by newly developed software QTXNetwork based on GPU parallel computation (http://ibi.zju.edu.cn/software/QTXNetwork/). The genotypic data (S1 Data CottonMeiYJ.Gen) and phenotypic data (S2 Data CottonMeiYJ.Phe) are included as a zip file (CottonMeiYJ.zip). All the detected SSR-trait association loci were fitted by a full model to estimate genetic effects of additive, dominance, epistasis as well as their environment interaction. The mixed linear model framework with Henderson method III [19] was used to construct the F-statistic test for the association analysis. 2,000 permutations of phenotypes were carried out to generate a null distribution of the extreme P-values. The most extreme Pvalue from each of the 2,000 scans was obtained and the 5% experiment-wise error rate was set for the 95% most extreme of these P-values. The QTS effects were predicted by using the MCMC (Markov Chain Monte Carlo) algorithm with 20,000 Gibbs sample iterations [20]. The distribution-based outliers were conducted for detecting and removing the outlier of phenotype. Based on the information of genetics effects for detected QTSs of six branch traits, the breeding values (û þê þĜ þ c GE) were predicted for the superior lines and superior hybrids of the mapping population [21].

Estimated heritability and predicted genetic effects
Cotton has a within-canopy structure, and the fruit branches of plants can be divided into three parts: bottom, middle, and upper. The within-canopy structure of cotton is adjusted and affected by genetic basis and environment, and different part has different genetic inheritance rules. A total of 20 significant QTSs (P EW -value < 0.05) were detected for six branch traits in cotton. The number of QTSs for each trait were 2~6, including DPL0061-2 for BFBL and MFBL, HAU2273-1 for MFBNN, BFBNN and BFBL. For the six fruit branch traits, estimated heritability for genetic effects was listed in Table 1 (Table 2) with highly significant QTSs (P EWvalue < 1×10 −8 ) ( Table 2). CGR6795-1 had large positive dominance effect (d1 1:165, h 2 d1 35:13%) but negative dominance × additive epistasis effect (da1 À 1:341, h 2 da1 46:54). Increasing the BFBNN could be achieved by selecting genotype Qq of CGR6795-1 along with Qq of NAU879-1. HAU2273-1 also had positive effects of dominance (d1 0:572, h 2 d1 8:48%) and additive effect (a10:205, h 2 a1 1:09). A total of six QTSs were detected for BFBL (S1 Table), and five of these QTSs (except for DPL0061-2) had P EW -value lower than 1×10 −5 ( Table 2). Unlike other traits, additive effects were close to dominance effects, and environment-specific additive effects were also close to environment-specific dominance effects for BFBL. Positive additive effects were highly significant in four QTSs, and positive environment-specific dominance effects were also found for CGR6902-1 and HAU1951-2 in e 1 . In addition, HAU2273-1 was shared for increasing both BFBL and MFBNN by additive and dominance effects. In order to increase the BFBL, it was suggested to select genotype QQ of CGR5534-3, HAU1951-2 and HAU2469-1 with additive effect. Large positive environment-specific dominance of CGR6902-1 and HAU2273-1 can be selected for increasing BFBL of F 1 hybrid in e 1 .
For two middle fruit branch traits, dominance effects were the main genetic effects, indicating that we can select high-performance offspring via hybrid especially in F 1 . MFBNN is found associated with two QTSs (S1 Table), and both of them were detected with additive, dominance and environment-specific additive effects in e 1 . HAU1385-2 had the environment-specific dominance effect and environment-specific additive effect both in e 1 and e 3 .  HAU2781-1 was shared by BFBL and MFBNN both with additive, dominance and environment-specific additive effects in e 1 . During the breeding process, improving this trait could be obtained by selecting genotype F 1 of HAU2273-1 with high-heritability dominance effect. Besides, selecting genotype QQ of HAU1385-2 could also increase the MFBNN especially in e 2 and e 3 .
Six QTSs were significantly associated with MFBL ( Table 2). Among them, four of these QTSs were involved in dominance effects. GH638-3 was detected with dominance effect, accounting for 24.99% of phenotypic variance. One pair of epistasis interaction DPL0061-2 × GH638-3 was identified with negative additive-additive and positive dominance-additive epistasis effects. Besides, two QTSs (HAU2119-1 and DPL0061-2) were shared by different traits. Selecting genotype QQ of CIR246-1 and DPL0061-2 could increase the MFBL by positive effects. Large positive dominance effects of two QTSs (GH638-3 and HAU2119-1) suggested that heterozygote Qq of these two QTSs could have heterosis of MFBL in F 1 hybrids.
Topping is a vital step in cotton cultivating because there is apical dominance, which causes imbalance growth. From Tables 2 and 3 additive and dominance effects were main genetic components with not too large positive and negative effects. This showed that natural selection tended to control upper branch shorter than middle and bottom branches.
Four QTSs were found ( Table 2) for UFBNN and two with highly significant QTSs (P EWvalue < 10 −8 ) ( Table 2). CGR5876-2 and HAU2781-1 were both found with additive effects. HAU2781-1 also had dominance effect with considerable heritability of 42.85%. HAU1434-1 was shared by UFBNN and UFBL with additive, dominance and environment-specific dominance effects in e 1 , respectively. The UFBNN will grow when the breeders select genotype QQ of CGR5876-2 with positive additive effect. Furthermore, negative dominance of HAU2781-1 can be selected for decreasing UFBNN of F 1 hybrid.
A total of four QTSs were detected for UFBL (S2 Table), and all of them were highly significant ( Table 2). All these QTSs were found with additive effect. Except for CGR6848-1, other three QTSs had also dominance effects. And HAU1434-1 had environment-specific dominance effect in e 1 and e 2 . No epistasis effects were detected in these two traits. Increasing UFBL of F 1 hybrid could be selected by large positive environment-specific dominance of HAU1434-1 in e 1 or BNL4023-1 for different environments. Selecting genotypes QQ of CGR6848-1 or qq of HAU1081-3 could increase UFBL.

Predicted genetic effects for different genotypes
Besides the genetic effects of each QTS, the maximum positive and minimum negative genotypic effects of the superior lines and superior hybrids were also predicted in three environments (Table 3). For all the six fruit branch traits, predicted breeding values of the superior hybrids (+) were larger than the superior lines (+), which suggested that breeders could modify these traits through selecting F 1 hybrids.
In the meantime, the genotypic effects of homozygotes (QQ, qq), and heterozygote (Qq) also were predicted for four traits in three environments. For MFBNN, MFBL and BFBL, the predicted genotypic effects were positive for major-allele homozygote QQ, but negative for minor-allele homozygote qq. It indicated that the natural selection tends to increase the middle branch node number and length, and also bottom branch length. This tower-like pattern is an ideal plant type for manipulating the plant density, enhancing the total light energy efficiency of a population, and increasing the yield [22]. As for UFBNN, no obvious pattern was discovered. Besides, the genotypic effects of QQ of BFBNN, BFBL, MFBNN and UFBL in e 3 are significantly larger than them in e 1 and e 2 . It may suggest that these traits would change in different environments and the environment in Alar is inclined to improve these traits. In Anyang, the high-yield variety is more tower-like. However, under the high-density condition in Alar, the middle and bottom fruit branch length need to be shorter and node number need to be less, and the upper fruit branch length needs to be longer and node number to be more. Thus, only by selecting different genotypes can breeders get high yield in different environments.

Discussion
In this study, software QTXNetwork is applied to perform association mapping. This software has been successfully utilized in the previous study in cotton yield traits [23]. A full genetic model is used for estimating genetic effects of additive, dominance, epistasis and their environment interaction. Compared with a reduced model, a full model could retrieve the missing heritability and increase the total heritability. It could also help the researchers for better understanding the genetic basis and network in cotton branch traits. In this study, dominance, epistasis and their environment interaction were the major contribution of total heritability in each trait. The strong heterosis of cotton fruit branch traits was mostly due to these non-additive effects. The calculated correlation coefficient between phenotype values and predicted genotypic effects of six traits (RŶ ¼ 0:53 $ 0:93) were close to the total heritability (Table 1). It was suggested that the full genetic model could predict the phenotypic variation very well. Furthermore, the heritability of BFBNN, MFBNN, and UFBNN was in descending order, which was also applicable to BFBL, MFBL and UFBL. This order indicated that the bottom branches were more stabilized during the inheritance process.
Several loci associated with the fruit branch traits were also detected by other researches on genetic mapping: CGR5876 [24]; GH220 [25]; HAU1385 [23]; GH638 [26]. These evidence verified that the mixed linear model approach was effective to reveal related loci of complex traits in cotton. Besides, seven QTSs involved in controlling more than one trait were detected. DPL0061-2 had positive additive and environment-specific dominance effects in e 1 both for BFBL and MFBL, which suggested that common genes could improve the branch length by selecting the genotype QQ of DPL0061-2 in e 1 . The HAU2273-1 had positive additive and dominance effects for MFBNN, BFBNN and BFBL, indicating that improvement of branch node number and length could be obtained by selecting the genotype QQ or Qq of this QTS. Moreover, HAU1434-1 was shared by UFBNN and UFBL, including negative additive and positive dominance effects. This phenomenon indicated that some QTSs associated with upper fruit branch traits tend to maintain the tower-like plant shape.

Conclusion
Association mapping is a statistically powerful method, which is utilized to dissect the genetic architecture of complex traits with high resolution. This study indicated that full genetic model, which included additive, dominance, epistasis and their environment interaction, could successfully predict the performance of causal loci. In total, 20 QTSs were significantly associated with six fruit branch architecture traits. And the results showed that dominance and epistasis effects played a significant role in the cotton fruit branch architecture traits. Marker assistant selection (MAS) breeding in different environments could obtain further improvements.