Genetic Mapping and Comparative Expression Analysis of Transcription Factors in Cotton

Transcription factors (TFs) play an important role in the regulation of plant growth and development. The study of the structure and function of TFs represents a research frontier in plant molecular biology. The findings of these studies will provide significant information regarding genetic improvement traits in crops. Currently, a large number of TFs have been cloned, and their function has been verified. However, relatively few studies that genetically map TFs in cotton are available. To genetically map TFs in cotton in this study, specific primers were designed for TF genes that were published in the Plant Transcription Factor Database. A total of 977 TF primers were obtained, and 31 TF polymorphic loci were mapped on 15 cotton chromosomes. These polymorphic loci were clearly preferentially distributed on chromosomes 5, 11, 19 and 20; and TFs from the same family mapped to homologous cotton chromosomes. In-silico mapping verified that many mapped TFs were mapped on their corresponding chromosomes or their homologous chromosomes’ corresponding chromosomes in the diploid genomes. QTL mapping for fiber quality revealed that TF-Ghi005602-2 mapped on Chr19 was associated with fiber length. Eighty-five TF genes were selected for RT-PCR analysis, and 4 TFs were selected for qRT-PCR analysis, revealing unique expression patterns across different stages of fiber development between the mapping parents. Our data offer an overview of the chromosomal distribution of TFs in cotton, and the comparative expression analysis between Gossypium hirsutum and G. barbadense provides a rough understanding of the regulation of TFs during cotton fiber development.


Introduction
Transcription factors (TFs) are a class of the most widely studied and important trans-acting factors, regulating gene expression at the transcriptional level. A typical TF from a higherorder plant typically contains a DNA-binding domain, a transcription regulation domain, an oligomerization site, and a nuclear localization domain [1]. TFs play an important role in the regulation of plant growth and development, organ morphogenesis, secondary metabolism, hormonal signal transduction, and plant responses to various environmental stresses [2][3][4][5][6][7]. Therefore, it is likely that investigating and exploiting the function of TFs could provide an alternative approach for improving cotton fiber quality and production.
To date, a large number of TFs have been reported, and their functions have been successively investigated and verified in cotton. Among them, the majority of research on TFs has focused on the ethylene-response factor (ERF) family, myeloblastosis (MYB) family, WKRY family, and basic helix-loop-helix (bHLH) family. A large amount of research has confirmed that the MYB family plays a functional role in cotton fiber differentiation and development [8][9][10][11][12][13]. ERF genes are induced by biotic and abiotic stresses in cotton and are involved in regulating plant disease resistance pathways [14,15]. The bHLH family plays an important role in regulating plant secondary metabolism [16,17] and morphogenesis [6,7,18]. Plant WKRY TFs have important functions in the transcriptional regulation of a variety of biological processes that are related to growth and development [19][20][21], various environmental stimuli [22][23][24], and disease resistance pathways [25][26][27].
As of April 2012, approximately 1,116 TFs from 50 families were annotated in Gossypium hirsutum (http://planttfdb.cbi.edu.cn/index.php?sp=Ghi). However, previous research has focused mainly on identifying and verifying the biological functions of these TFs rather than their genomic distributions. In this study, the TFs were mapped to reveal their genomic distribution in cotton using specific primers designed based on transcription factor sequences available in the Plant Transcription Factor Database (http://planttfdb.cbi.edu.cn/). We also confirmed their chromosomal location by in-silico mapping the experimental mapped TFs in two sequenced diploid genomes, A 2 genome of G. arboretum and D 5 genome of G. raimondii. QTL mapping was conducted to identify TFs related to fiber quality. RT-PCR and qRT-PCR analysis were also conducted to detect differences in expression during fiber development between G. hirsutum and G. barbadense in selected TFs from each family.

Plant materials
Polymorphisms of the designed TF primers were detected using G. hirsutum cv. Emian22 and G. barbadense acc. 3-79, which are the parents of the BC 1 mapping population [(Emian22 × 3--79) × Emian22] [28,29], using single-strand conformation polymorphism (SSCP) with minor modifications [30]. The BC 1 population, which consisted of 141 plants, was used as the mapping population for all of the polymorphic TF markers.

Marker development
Cotton TFs were obtained from the Plant Transcription Factor Database V2.0 (http:// planttfdb.cbi.edu.cn/), which contains 1,116 G. hirsutum TFs classified into 50 families ( Table 1). The primers were designed based on the sequences surrounding specific motifs of the TF genes using Primer 3.0 (http://frodo.wi.mit.edu/primer3/). For those TFs with multiple motifs, if the sequence interval between motifs was too long, individual primers were designed for each motif (examples are presented in Fig 1). The criteria for the primer design were as follows: a primer length of 18 to 25 bp (20 bp is optimal), a GC content of 35 to 70% (50% is optimal), an annealing temperature of 50 to 65°C (55°C is optimal), and a PCR product size ranging from 100 to 1,000 bp. The primers were named "TF-Ghi ××××××". For TFs with multiple motifs, the primers were named "TF-Ghi ××××××-1" and "TF-Ghi ××××××-2".

Experimental mapping and identification of TFs related to fiber quality
The polymorphic loci were integrated into the interspecific BC 1 genetic linkage map [29] using JoinMap V3.0 [31]. The logarithm of odds (LOD) threshold was 5.0. Map distances were measured in centi-Morgans (cM), which were calculated using the Kosambi mapping function [32]. The linkage map was generated using MapChart V2.2 software [33]. QTL mapping of TFs related to fiber quality was performed using the genetic linkage map integrated with the TF markers.
The phenotype data of fiber quality and QTL mapping methods were as same as Li et al. [30].

In-silico mapping TFs in two diploid genomes
To verify our genetic mapping results, we blasted the mapped TF sequences against the diploid A 2 genome of G. arboretum and D 5 genome of G. raimondii (http://www.phytozome.net/ cotton.php) with an E-value cut-off 1e-10. Then, the best match for each TF sequence was retained; one TF was only mapped to A 2 or D 5 genome rather than both.

Primer design and polymorphisms
Primers were designed for the 1,116 TF sequences, and then repeated primers were eliminated by a BLAST analysis. In total, 977 primer pairs were obtained (S1 Table). In the present study, TF primers were screened based on SSCPs. Thirty-four polymorphic primers were obtained, and 37 polymorphic loci were produced with a primer polymorphism rate of 3.48%. These 37 polymorphic loci were from 16 TF families. Seven loci were from the ERF family, and four loci were from the bHLH families. The remaining TF families contained 1-2 polymorphic loci (S1 Table).

Distribution of TFs in the cotton genome and fiber-related QTLs
After linkage analysis, 31 of the 37 TF polymorphic loci were mapped to 15 cotton chromosomes (Chr05, Chr06, Chr07, Chr09, Chr11, Chr12, Chr13, Chr17, Chr19, Chr20, Chr21, Chr22, Chr24, Chr25, and Chr26). Among these chromosomes, 14 TF loci were mapped to 7 chromosomes of the A T genome, and 17 loci were mapped to 8 chromosomes of the D T genome (Fig 2). The 31 TF markers were not equally distributed on the 15 chromosomes. Comparatively, more loci (a total of 15 loci) were mapped to Chr05, Chr11, Chr19, and Chr20. These mapped TF markers belonged to 16 TF families, among which 7 markers were from the ERF family, 4 were from the bHLH family, and 3 were from the WRKY family (S1 Table). As shown in Fig 2, TF-Ghi005868 and TF-Ghi005905 of the ERF family were located on the same chromosome (Chr20) and separated by 0.3 cM. S1 Table indicates that TF-Ghi012239 of the C2H2 family (located on Chr05) and TF-Ghi000600 of the C2H2 family (located on Chr19) were present on homologous chromosomes. In addition, TF-Ghi010629 of the WRKY family has two loci, TF-Ghi010629a and TF-Ghi010629b, which were located on the homologous chromosomes Chr25 and Chr06, respectively. TF-Ghi006349 of the bHLH family and TF-Ghi001350 of the bHLH family were located on the homologous chromosomes, Chr11 and Chr21, respectively. QTL mapping TF markers associated with fiber quality revealed that only one TF markers, TF-Ghi005602-2 mapped on Chr19, was tightly linked with fiber length with LOD value of 8.70 and explained 12.23% of the phenotypic variance with an additive effect of -0. 35. This result may imply that TFs in cotton are involved in developments of many traits rather than fiber development.

Comparison of the genetically mapped TFs with in-silico mapping in two sequenced diploid genomes
With the availability of cotton genome sequences, we can check the consistency of mapped results with their chromosomal positions by in-silico mapping their sequences to the genome sequence. By unique matching genetically mapped TF sequences to A 2 and D 5 genomes, 15 TFs were mapped to 7 chromosomes of A 2 genome and 14 to 10 chromosomes of D 5 genome (Fig  3). The in-silico mapping results was different to genetic mapping results. TF-Ghi018559b, mapped on Chr05, was not mapped on the corresponding Ga10, but on Gr11; Gr10 is the corresponding chromosome of Chr20 which is not the homologous chromosome of Chr19. The two loci on Chr06 were not mapped on the corresponding Ga12 but on Gr10; Gr10 is the corresponding chromosome of Chr25 which is the homologous chromosome of Chr06. TF-Ghi019589 on Chr07 was mapped on Gr01, the corresponding chromosome of Chr16 which is the homologous chromosome of Chr07. TF-Ghi001068 on Chr09 was mapped on Gr06, the corresponding chromosome of Chr23 which is the homologous chromosome of Chr09. TF-Ghi006349 on Chr11 was not mapped on the corresponding Ga04, but on Ga06; TF-Ghi012730 was mapped on Gr07, the corresponding chromosome Chr21 which is the homologous chromosome of Chr11. The two loci on Chr13 were mapped on Gr13, the corresponding chromosome Chr18 which is the homologous chromosome of Chr13.
TF-Ghi015538 on Chr17 was mapped on Ga13, the corresponding chromosome Chr13 which is not the homologous chromosome of Chr17. TF-Ghi005602-2 and TF-Ghi019046 on Chr19 was mapped on Ga10, the corresponding chromosome Chr05 which is the homologous chromosome of Chr19. TF-Ghi005868 and TF-Ghi016105 on Chr20 was mapped on Ga09, the corresponding chromosome Chr10 which is the homologous chromosome of Chr20; TF-Ghi001019 was mapped on Ga07, the corresponding chromosome Chr01 which is not the homologous chromosome of Chr20. TF-Ghi013545 on Chr24 was mapped on Ga08, the corresponding chromosome Chr06 which is not the homologous chromosome of Chr24. TF-Ghi005927 on Chr26 was mapped on Ga06, the corresponding chromosome Chr12 which is the homologous chromosome of Chr26.

RT-PCR and qRT-PCR analysis between mapping parents
One or two markers were randomly selected from each family, and a total of 85 primer pairs from 45 TF families were used for the RT-PCR analysis. Thirty-six TF primer pairs (42.4%) from 31 families were expressed during the cotton fiber stages (Fig 4, Table 2). Among them, 27 displayed clear differences in expression, 4 exhibited minor differences, and 5 showed no differences. Almost all expressed TFs clearly differed between Emian22 and 3-79 at various stages of fiber development (0, 5, 10, 15, 20, and 25 DPA). Five TFs were weakly expressed or not expressed during any stages in either Emian22 or 3-79, and these TFs were defined as having no differences in expression in this study. Furthermore, 17 TFs displayed similar expression patterns, and 14 had different expression patterns. To further confirm the RT-PCR results, four randomly chosen genes belonging to different categories were analyzed (S1 Fig). Consistent results were observed in both the RT-PCR and the qRT-PCR analyses.

Low level of polymorphisms of the TF markers
TFs, as an important type of trans-acting factor, are extensively involved in the development of plants and animals. Various TFs are related to cotton fiber development [3,16,35,36]. A large number of TF functions have been verified. However, few genetic mapping studies of TFs have been performed in cotton [37]. Therefore, we downloaded TF sequences reported in the Plant Transcription Factor Database V2.0 (http://planttfdb.cbi.edu.cn/) ( Table 1) and designed specific primers based on the motifs of the TF genes. Thus, the primers developed in the present study were TF-specific. In contrast, the simple sequence repeats (SSRs) primers derived from the TF sequences [37] may not have been TF-specific.
To detect additional polymorphisms in the TF markers, SSCP was applied. We discovered a very low rate of primer polymorphisms in the TF markers (3.48%), which was potentially attributed to the highly conserved nature of TFs. The results also indicated that the genes that were compared between G. hirsutum and G. barbadense are highly conserved. Although the primer polymorphism rate was low, the TF markers examined in this study exhibited more polymorphisms than other TF markers. Li et al. [37] used SSRs designed from 1,116 G. hirsutum TFs in an analysis of polymorphisms and revealed polymorphism ratios of 1.6%, 2.1%, and 2.3% in the (Yumian 1×CCRI35) F 2:6 , (Yumian 1×T586) F 2:7 , and (Yumian 1×7235) F 2:6 populations, respectively. The higher rate of polymorphisms detected in the present study may have been caused by SSCP, which can detect minor difference between sequences [38]. This result may also be caused by differences between the interspecific and intraspecific populations.

Genetic mapping TFs in cotton
To date, some genetic mapping analyses of TFs have been reported. SNP primers of the MYB family were mapped on the cotton chromosome by An et al. [39]. Myb1Gbmt_238 is located on Chr13, Myb1Gb_500 is located on Chr18, and Myb2Gb_204 is located on Chr8. Guo et al. [40] also mapped TFs of the MYB family on cotton chromosomes; for example, MYB38 is located on Chr16. However, in the present study, TF markers of the MYB family were not mapped on the interspecific linkage map. In 2012, SSR primers designed from the same 1,116 G. hirsutum TFs were mapped on cotton chromosomes by Li et al. [37]. Unfortunately, due to different marker develop strategy, no common TFs were mapped.
In this study, 31 polymorphic TF loci were mapped to 15 chromosomes. Among them, 14 TF loci were mapped to 7 chromosomes of the A T genome and 17 loci on 8 chromosomes of the D T genome (Fig 2). These loci were clearly evenly distributed between the A T and the D T genome. However, Chr05, Chr11, Chr19, and Chr20 contained 15 loci that accounted for 48.4% of the total mapped TF loci. Therefore, genetic mapping revealed preferential distribution of TF loci on cotton chromosomes (Fig 2). Comparison of chromosomal location of TF loci revealed that TFs from the same family mapped to homologous cotton chromosomes.
Comparison of the genetically mapped TFs with in-silico mapping results in two sequenced diploid genomes showed that some TFs were not mapped on their corresponding chromosomes in the diploid genome, but on their homologous chromosomes' corresponding chromosomes in the other diploid genome. It is reasonable that many genes are duplicated on the homologous chromosomes in the tetraploid genome. However, some TFs were not mapped on their corresponding chromosomes or their homologous chromosomes' corresponding chromosomes in the diploid genomes; the reason may be that these genes translocate after polyploidization.
Differences in the expression of TFs between G. hirsutum and G. barbadense during fiber development Cotton is an important cash crop. Therefore, improvements in yield, fiber quality, and disease resistance are areas of focus in cotton genetics and breeding. A number of studies have indicated that various TFs are involved in fiber development. Therefore, in the present study, we used RT-PCR analysis to compare TFs between G. hirsutum and G. barbadense during fiber development.
We discovered dynamic expression of TFs during various stages of fiber development in G. hirsutum and G. barbadense. The most expressed TFs (75%) from 25 families exhibited significantly different expression levels during different stages between parents. Further studies of TFs showing different expression patterns between G. hirsutum and G. barbadense may be very helpful for understanding differences in fiber quality between the two species. Unique expression patterns may be associated with a particular function. Additional studies are required to determine the functions and mechanisms of action of these TFs.