Molecular Evolution and Phylogenetic Analysis of Eight COL Superfamily Genes in Group I Related to Photoperiodic Regulation of Flowering Time in Wild and Domesticated Cotton (Gossypium) Species

Flowering time is an important ecological trait that determines the transition from vegetative to reproductive growth. Flowering time in cotton is controlled by short-day photoperiods, with strict photoperiod sensitivity. As the CO-FT (CONSTANS-FLOWER LOCUS T) module regulates photoperiodic flowering in several plants, we selected eight CONSTANS genes (COL) in group I to detect their expression patterns in long-day and short-day conditions. Further, we individually cloned and sequenced their homologs from 25 different cotton accessions and one outgroup. Finally, we studied their structures, phylogenetic relationship, and molecular evolution in both coding region and three characteristic domains. All the eight COLs in group I show diurnal expression. In the orthologous and homeologous loci, each gene structure in different cotton species is highly conserved, while length variation has occurred due to insertions/deletions in intron and/or exon regions. Six genes, COL2 to COL5, COL7 and COL8, exhibit higher nucleotide diversity in the D-subgenome than in the A-subgenome. The Ks values of 98.37% in all allotetraploid cotton species examined were higher in the A-D and At-Dt comparison than in the A-At and D-Dt comparisons, and the Pearson’s correlation coefficient (r) of Ks between A vs. D and At vs. Dt also showed positive, high correlations, with a correlation coefficient of at least 0.797. The nucleotide polymorphism in wild species is significantly higher compared to G. hirsutum and G. barbadense, indicating a genetic bottleneck associated with the domesticated cotton species. Three characteristic domains in eight COLs exhibit different evolutionary rates, with the CCT domain highly conserved, while the B-box and Var domain much more variable in allotetraploid species. Taken together, COL1, COL2 and COL8 endured greater selective pressures during the domestication process. The study improves our understanding of the domestication-related genes/traits during cotton evolutionary process.

Introduction D-genome species were chosen, which represent the best living models of the A-and D-genome donor, respectively. A total of 23 allopolyploid accessions involved in five cotton species were examined, including three wild allopolyploid species, 11 accessions from G. hirsutum species (seven semi-domesticated races and four cultivated accessions), and nine accessions from G. barbadense species (three semi-domesticated races and six cultivated accessions). Thespesia populneoides (Roxb.) Kostel was chosen as the outgroup. Cultivated G. hirsutum and G. barbadense accessions were sampled from the Jiangpu Experimental Station at Nanjing Agricultural University, Nanjing, Jiangsu, China. Other wild, semi-domesticated cotton species and the outgroup was collected from the National Wild Cotton Plantation at Hainan Island, China. All necessary permits for collecting the wild, semi-domesticated cotton species and the outgroup were obtained from the National Wild Cotton Plantation at Hainan Island, Cotton Research Institute, Chinese Academy of Agricultural Sciences, China. Genomic DNA was isolated from young leaves using methods reported previously [37].

Identification of new CO family genes in cotton
COL protein sequences in Arabidopsis and rice referred to Griffiths et al. 2003 [22] and the Plant Transcription Factor Database (http://plntfdb.bio.uni-potsdam.de/v3.0/), reported by Wu et al. 2013 [30]. To identify COL transcription factor genes in the cotton genome, CO genes in Arabidopsis were used as queries to screen the Pfam database. Then, the Pfam database providing the B-box (PF00643) or CCT (PF06203) domain seed file was compared with the diploid cotton (Gossypium raimondii)_221_protein transcript database. The protein-coding genes with both B-box (PF00643) and CCT (PF06203) domains were manually examined as putative new members of CO in cotton (designated GrCOLs). Multiple alignments were then further performed with Cluster 1.83 and examined manually to confirm the correction. Finally, BLASTP was performed to search the diploid cotton (G. raimondii) genome database with an e-value cutoff of 1×10 -15 , and the genomic sequences of the GrCOLs were obtained.

PCR amplification, cloning, and sequencing
Gene-specific PCR primers for eight COLs in group I were designed according to the predicted sequences in the diploid cotton (G. raimondii) genome database by Primer 5.0 (S2 Table). All PCR amplification products were extracted using an Axyprep DNA Gel Extraction Kit, cloned into the pMD19-T Vector (TaKaRa) according to the manufacturer's instructions, and sequenced. In cases of apparent PCR-mediated recombination detection in allopolyploid cotton [38], at least 10 clones per gene were randomly sequenced, with at least three clones per subgenome, and these recombinant clones were omitted to confirm the sequence correction for each duplicated copy and to obtain the homeologs of both the A-and D-subgenomes by comparing sequences from their diploids using the Neighbor-Joining method in Clustal 1.83. The genomic sequences were compared with the cDNA sequences to determine the sizes of exons and introns.

RNA isolation and qRT-PCR analysis
For diurnal expression pattern analysis, G. hirsutum acc. TM-1 were grown in LD (16h light/ 8h dark) or SD (8h light /16h dark) conditions respectively, and harvested the young leaves fully open every 4 h for 48h during the seeding period when the third leaf fully open, and put them in the liquid nitrogen immediately for use. Total RNA was extracted from leaves according to the method of Jiang and Zhang [39], and 10μl cDNA was synthesized using 500ng RNA with the HiScript Q RT SuperMix for qPCR (+gDNA wiper) (Vazyme), and the gene-specific primers used for qRT-PCR were designed by Primer 5.0 and are listed in (S2 Table). qRT-PCR (20μl reaction volume with 1μl cDNA, 0.5μM each gene-specific primer and FastStart Universal SYBR Green Master(ROX) (Roche) 10ul) were performed by ABI 7500 real-time PCR system. The Histone3 (AF024716) gene (forward primer 5 0 -GAAGCCTCATCGATACCGTC-3 0 and reverse primer squences 5 0 -CTACCACTACCATCATGG-3 0 respectively) was used as the control. The expression level of COL genes was analyzed according to the relative quantification method [40].

Data analysis
The corresponding subgenome sequences of each gene with the same order were combined, and the combined sequence data were used to conduct phylogenetic analyses using the Maximum Likelihood (ML) method provided by MEGA5.1, with 1,000 bootstrap replicates. A ML tree of CONSTANS-like proteins from Arabidopsis, rice, and cotton was also constructed to determine the evolutionary relationships between GrCOL gene family members and those of Arabidopsis and rice. DnaSP 5.0 was used to estimate the total nucleotide diversity for the genomic sequence of each data set (π), and the nucleotide diversity of all site (π total ), synonymous (π s ) and nonsynonymous site(π a ) for the entire coding region and separately for the B-box, Var, and CCT domains of each gene were also analyzed. The synonymous substitution rates (Ks) and nonsynonymous substitution rates (Ka) among 25 cotton accessions and one outgroup for the entire coding region of each gene, and neutral tests of Tajima's D, Fu and Li's D and F were also estimated by DnaSP 5.0 [41].

Isolation and characterization of COL family genes in cotton
To identify genes encoding COL in the cotton genome, the primary HMM profiles using the B-box (PF00643) and CCT (PF06203) domains as the seed file, which were retrieved from the Pfam database [42], were used to search the diploid cotton (Gossypium raimondii)_221_protein transcript database (http://www.phytozome.net), and those with both domains (i.e., the candidate COL genes in G. raimondii) were manually selected.
We identified 23 genes encoding both B-box and CCT domains, which were designated COL1-23 (S1 Table). To clarify the phylogenetic relationship between COL family genes, we constructed phylogenetic trees using amino acid sequences of the COLs in Arabidopsis, rice, and cotton using the maximum likelihood (ML) method based on multiple alignment analyses. As shown in Fig. 1, three major clades are indicated in the tree, and COLs in cotton were classified into groups I, II, and III, which are similar to the groups identified for the dicot plant Arabidopsis and the monocot plant rice. These results indicate that divergence of COLs from different species occurred earlier than the divergence of monocots and dicots. There are eight genes in group I, which were predicted to encode two B-box and one CCT domain, except for COL8, encoding a protein with one intact B-box, one incomplete B-box, and one CCT domain. Three genes in group II were predicted to encode proteins containing one B-box and one CCT domain. The remaining 12 genes, which are in group III, encode one B-box, a second diverged zinc finger, and one CCT domain. Furthermore, we cloned the eight genes in group I, designated COL1 to COL8, and studied these genes via phylogenetic and evolutionary analysis.
Using gene-specific primers (S2 Table), we performed full-length PCR cloning and sequencing of the eight genes in 25 cotton accessions and one outgroup (Table 1), which led to the identification of eight COL genes present in one copy in the diploid cotton species and two copies in the allotetraploids. The results of structural characterization of the eight genes in the 25 cotton accessions are summarized in Table 2. The eight genes are highly conserved, and their full-length genomic DNA sequences are ranging from 1,030 bp (COL6) to 1,611 bp (COL1), with exception of frame-shift mutation for 1bp deletion in few species. Multiple alignments of the genomic and cDNA sequences showed that all genes share the same one-intron structure. This intron ranges from 77 to 680 bp in length, with the longest intron present in COL1, COL2, and COL8 compared with that of the other family members. For the same subgenome in different cotton accessions, insertion/deletion events occurred in introns or exon II of COL2, COL6, and COL8, leading to their length variation, while the remaining five genes had the same length in the same subgenomes of different cotton accessions. The structures of Aand D-homeologs from the same gene were further analyzed. Length differences were present in homeologs of COL4 and COL7, which were caused by insertions/deletions in exon I or II. There are two distinct homeologs for all genes in each allotetraploid cotton accession, while there is a single type of COL3 in the outgroup Thespesia populneoides (Roxb.) Kostel. Sequence information for these eight genes in the 25 cotton accessions and one outgroup has been submitted to GenBank (accession numbers: KM201660-KM202059).

Diurnal expression pattern in light/dark cycles of the eight COL genes
To examine the circadian rhythm of the candidate COL genes in cotton, we designed the genespecific primers for qRT-PCR according to D-genome sequences (S2 Table), and investigated the expression level in the seedling leaves when the third leaf fully open under long-day (LD) (16h light/8h dark) or short-day (SD) (8h light/ 16h dark) condition respectively. The eight COL genes all showed diurnal expression patterns (Fig. 2). COL1, COL3 and COL5 exhibited the similar diurnal expression patterns under LD and SD conditions, the expression peaked in the dawn and started to decrease rapidly to the lowest at the end of light, then started to accumulate until the next dawn. COL6 and COL7 also had cycled with the light/dark induction treatment, but with the highest level 4 h later after the dawn and with lowest 8 h later. COL8 expression started to accumulate after dawn with the peak 4 h later, and then declined quickly in the both photoperiodic conditions. COL2 and COL4 showed different expression patterns in the two photoperiodic conditions. The expression pattern of COL4 was similar to COL6 and COL7 in SD condition, while there was no obvious diurnal expression pattern in LD condition. The expression of COL2 started to accumulate at 4 h after dawn with expression peaking at dusk, and then declined during the dark in LD condition. However, COL2 peaked twice in SD condition, its first peak occurred at the dusk, and reached the second peak 8h later. The diurnal expression patterns of the eight COLs suggest their conserved function in regulating the light signaling pathway in cotton.  Eight COL homeologs from allotetraploid species showed independent evolution after polyploid formation The sequences of the eight genes from the same subgenome were combined in order for 25 cotton accessions and one outgroup, and a phylogenetic tree was constructed using the ML method (Fig. 3). The outgroup Thespesia populneoides (Roxb.) Kostel was the most divergent member of this group and clustered into an individual clade, while the other members were divided into two principal clades; the A-genome and A-subgenomes in the tetraploid cotton accessions comprised one monophyletic clade, while the D-genome and D-subgenomes represented another monophyletic clade. Furthermore, the A-genome group was divided into two main subgroups; one included the A-subgenomes of G. hirsutum semi-domesticated, cultivated accessions and G. tomentosum and the other included the A-subgenomes of G. barbadense semi-domesticated, cultivated accessions and G. mustelinum species. G. darwinii and G. hirsutum race richmondii were clustered below in the A-genome group. Similarly, two main subgroups for G. barbadense and G. hirsutum were divided in the D-genome group, with the closest relationship between G. hirsutum and G. tomentosum and between G. barbadense and G. mustelinum. The exception was G. darwinii, the lone member of the D-genome group.  Using G. herbaceum and G. raimondii as controls for comparisons of their orthologs, we calculated the synonymous substitution rates (Ks) of each tested gene between orthologs (A vs. D, A vs. At, and D vs. Dt) and between homeologs (At vs. Dt) in the 25 accessions based on their coding regions (S3 Table). Of the 184 pairs compared (eight pairwise comparisons × 23 allotetraploid accessions) in allotetraploid species, the Ks values of 98.37% of the genes were higher in the A-D and At-Dt comparison than in the A-At and D-Dt comparisons. Furthermore, the Pearson's correlation coefficient (r) of Ks between A vs. D and At vs. Dt also showed positive, high correlations, with correlation coefficients of at least 0.797 (Table 3).
Taken together, these results suggest that A-D divergence for the eight COLs occurred well before the formation of the polyploids, and duplicated genes of A-and D-subgenomes from allotetraploid species evolve independently after the formation of the polyploids.
Nucleotide diversity of the eight COLs showed different homoelogous evolutionary rate in allotetraploid species Pairwise comparisons of nucleotide diversity (π) for the combined sequence of the eight COL genes and each gene between subgenomes within each allotetraploid accession was performed, respectively ( Table 4). The average π value of the combined sequence in the D vs Dt (0.01051) Table 3. Correlation analysis between A-D and At-Dt comparisons for each allotetraploid accession.  were significantly greater than the value in A vs At (0.00586) (P = 4.9E-21). Among the 184 pairwise comparisons, 76.63% (141) harbored greater nucleotide diversity in the D-subgenome than that in the A-subgenome in the allotetraploid accessions. In detail, six genes, including COL2 to COL5, COL7 and COL8, showed significantly higher nucleotide diversity in the Dsubgenome than in the A-subgenome of the allotetraploid accessions examined. However, COL6 showed significantly higher nucleotide diversity in the A-subgenome than in the D-subgenome. There was no significant difference in the A vs At and D vs Dt in COL1. These results indicate that the eight COLs in group I harbor different evolutionary rates between homeologs of the allotetraploid accessions, and most genes of the D-subgenomes have been evolving more rapidly than those of the A-subgenomes.

Pairwise comparison r of Ks
Nucleotide diversity of the eight COLs showed different evolutionary rate in different cotton species and different domains To further explore the domestication forces acting on allotetraploid species, we divided the tested allotetraploid accessions into three types, including tetraploid wild species, semi-domesticated and domesticated species of G. hirsutum, semi-domesticated and domesticated species of G. barbadense. Their nucleotide diversity (π) was estimated respectively for synonymous, nonsynonymous and the total sites of each data set with the ORF of each gene (Table 5). Generally speaking, the nucleotide diversity at synonymous substitution sites (π s ) was significantly higher than that at non-synonymous substitution sites (π a ) (0.00451 vs 0.00179)(P = 0.0001), and the three wild allotetraploid species possessed significally higher nucleotide diversity of π total than G. hirsutum  [30], the B-box and CCT domains are two conserved domains of CO proteins that are required for the promotion of flowering. To further explore the evolutionary rate of the three characteristic domains, we further analyzed the nucleotide diversity of the three domains of each data set for the eight COL Evolution and Phylogenetic Analysis of Eight COLs in Cotton genes respectively ( Table 6

Neutrality tests of the eight COL genes reveal three selected genes during domestication
To test the departure from neutrality, Tajima's D (1989) and Fu and Li's D and F (1993) [43][44] were estimated to test whether the nucleotide polymorphism data of the eight COL genes fit the neutral model (Table 5). We showed that both COL1 A-subgenome and COL2 D-subgenome in G. hirsutum significantly deviated from the neutral expectation with a negative value, indicating an excess of low frequency alleles. And the negative values are consistent with the possibility of recent positive selection in G. hirsutum. Fu and Li' D and F were significantly positive in COL8 D-subgenome of G. hirsutum at P<0.1, this result suggest that the allele of COL8 maintained a high frequency variants and might experience balance selection [45][46]. Taken together, COL1, COL2 and COL8 endured greater selective pressures during the domestication process.

Discussion
The homeologs of eight COL genes are evolving independently at the allopolyploid level Allotetraploids originate from an interspecific hybridization event between diploid A-and D-genome species. Here, we performed ML analysis of the eight genes among 26 accessions, including 25 cotton accessions and one outgroup, to help elucidate the relationship between the homeologs at the allopolyploid level. The phylogenetic analysis showed that the outgroup Thespesia populneoides (Roxb.) Kostel was quite distant from the other allotetraploid cotton species and clustered into an individual clade, while the others were divided into two major clades, each containing the At or Dt subgroup with their corresponding diploid ancestral species. The results show that homeologs of the eight genes are evolving independently in the tetraploid accessions examined, including wild, semi-domesticated, and cultivated species. Furthermore, 98.37% of the Ks values were higher in the A-D and At-Dt comparisons than in the A-At and D-Dt comparisons, and the Pearson's correlation coefficient for the A-At and D-Dt comparisons of the eight genes of the diploid and all of the allotetraploid accessions exhibited a significant positive correlation (r 2 = 0.797). This observation indicates that the A-D divergence occurr well before the formation of the polyploids, and duplicated genes of At and Dt of eight COL genes from allotetraploid species evolve independently after the formation of the polyploids. These results are in agreement with the results of previous reports [47][48][49]. From the study, G. tomentosum (from the Hawaiian Islands) had a closer relationship with G. hirsutum, while G. mustelinum was closer to G. barbadense than to G. darwinii. Similarly, the D and Dt clade yielded similar results to those of the A and At clade. These results are also largely in agreement with those of previous studies [2,5].

COL transcription factors have conserved functions among different plant species
COL transcription factors play important roles in regulate flowering time in the photoperiod signaling pathway, which coordinates light and circadian clock inputs (primarily in leaves) to induce the expression of the florigen gene FLOWERING LOCUS T (FT) [50][51]. These proteins are widely present among species, from lower plants such as mosses [52][53] to algae (which exhibit strong photoperiod responses [54][55]) to higher flowering plants including monocots and dicots. These transcription factor genes include CO in Arabidopsis, Hd1 in rice, and its homologs in barley, ryegrass, sugar beet, and soybean [25,26,[56][57][58][59]. The CO-FT module is conserved in all known plant species, although it has different modes of action in different species. CO promotes the expression of FT under LD conditions in Arabidopsis thaliana [25,33], while Hd1, the ortholog of CO in rice, functions in the promotion of Hd3a (the FT ortholog) expression under SD conditions and as a repressor under non-inductive long day conditions [50,60]. CO is a central regulator of the photoperiod pathway, triggering the production of the mobile florigen hormone FT, which induces flower differentiation. The homologs of COL3 and COL5 have previously been cloned in cotton, and qRT-PCR analysis shows that the expression of the COL5 homolog is controlled by daily oscillations and exhibits a diurnal rhythm, with higher expression levels observed in the dark than in the light [61]; this expression pattern is similar to that of CO in Arabidopsis and COL in other plants [22,31,56,62,63]. COL genes in group I of cotton harbored two B-box and a CCT conserved domains with the same to that in other plants, and expression analysis indicated that all the eight genes showed a diurnal rhythm expression pattern in TM-1. COL1, COL3, and COL5-COL7 showed similar diurnal expression patterns under both LD and SD conditions, and the expression peak were present in the dawn or 4h later, and declined rapidly to the lowest until dusk, with similar to AtCOL1 and AtCOL2 in Arabidopsis, GmCOL1 and GmCOL2 in soybean, OsB, OsE and OsD in rice [31,33,56]. COL8 showed similar diurnal expression pattern with OsG, which was also a special gene with internal deletion of the second B-box domain [31]. COL4 was one unique gene that the diurnal expression pattern were more evident under SD than in LD condition just like ZCN8 in maize [64], and COL4 might perceive SD signal in TM-1, but not responsive to LD regulation. The COL2 expression in LD condition peaked once per 24hperiod and twice in SD condition. The diurnal expression analysis indicated that the COL gene family in group I was potentially involved in regulating the light signaling pathway or photoperiodic flowering in cotton as other plants, but more detailed functional analyses are needed for further study.

Selection signatures of eight COL genes in coding region and domains in the allotetraploid species
Nucleotide polymorphisms of the eight COL genes show that most COL genes in wild allopolyploid species possess significantly higher nucleotide diversity than that of G. hirsutum and G. barbadense, this reduction of diversity could result from genetic bottlenecks during various stages of domestication. The limited genetic diversity of cultivated G. hirsutum had been observed in previous studies [65][66]. The neutrality test showed that COL1 A-subgenome and COL2 D-subgenome of G. hirsutum significantly deviated from zero with a negative value, implying an excess of low frequency alleles. This was consistent with the possibility of recent positive selection in G. hirsutum [3,11]. COL1 displays diurnal expression patterns with similar to AtCOL1 and AtCOL2 in Arabidopsis, the result indicated that COL1 in cotton may play conserved function in light input pathway but not affect flowering time [62]. While COL2, orthologous to Hd1 in rice, exhibits distinct diurnal expression in LD and SD conditions, indicating that COL2 was potentially regulating photoperiodic flowering in cotton, with similar function as Hd1 [26]. So, COL1 and COL2 genes were the potential target of positive selection in light signal or photoperiodic flowering pathway of G. hirsutum. Especially, nucleotide diversity of COL2 D-subgenome of G. hirsutum is approximately sixfold lower than the wild allopolyploid species, so COL2 is expected to be the better selected CO gene in cotton. Interestingly, COL8, with one intact B-box domain similar to that of OsB/OsCO3 and OsG [30], evolved faster among the tested eight genes. OsB/OsCO3 was reported to regulate negatively the photoperiodic flowering in rice [67], and COL8 showed similar diurnal expression pattern with OsG [31]. So COL8 might also involve in the photoperiodic flowering in cotton. The neutrality tests of COL8 D-subgenome were significantly positive with P<0.1 in G. hirsutum, indicating an excess of higher frequency alleles, and balancing selection is expected to act on COL8 [45]. Higher frequency variants in COL8 D-subgenome may contribute to satisfy the need of multiple environments and better adaptation of cotton, and promote the evolution of photoperiod sensitivity in G. hirsutum. Taken together, selection acted on the three potential target COL genes in G. hirsutum might be responsibe for the wide adaptation of G. hirsutum [2]. In other plants such as rice and maize, selection of COL homologs appears to be common during parallel adaptation [12,22,[68][69][70]. CO is a typical transcription factor with three characteristic domains, including the B-box, Var domain, and CCT domain, which indicates that it is a unique type of transcriptional regulator present only in the plant kingdom [25]. COL genes within the Brassicaceae family are evolving rapidly, and different domains in the COL genes are heterogeneous [34]. We analyzed the nucleotide diversity of B-box domain, the Var domain and the CCT domain in cotton respectively. The results suggested that the nucleotide diversity of most genes were significantly lower in the CCT domain, indicated that the CCT domain is highly conserved, possibly due to high functional evolutionary constraints acting on this domain. Natural variation within genes with CCT domains has previously been reported, including COL, PRR (PSEUDO RE-SPONSE REGULATORs), and CMF genes, which are critical to the control of plant flowering [24][25][26]32,[71][72][73]. The CCT domain shows homology to the NF-YA1/2 domains of HAP2, which help form the trimeric CO/At HAP3/At HAP5 complex and bind to CCAAT boxes in eukaryotic promoters to regulate flowering of Arabidopsis through the expression of FT [74], as well as interacting with the ubiquitin ligase COP1 [75] and nuclear localization signal [30,33]. Therefore, the strong conservation of the CCT domain is thought to be necessary for its role in the control of photoperiodic flowering. B-box domain is involved in DNA binding and protein-protein interactions, as plants with mutations in this region display severe late flowering phenotypes [33,76]. Most genes with B-box domains display a divergent diurnal expression pattern, indicating that this domain functions in the light signaling pathway [31,61,62,77]. The Var domain, with a lower degree of conservation in amino acid sequence among the COLs, activates transcription, as demonstrated by yeast-two hybrid assays [78], although its fixed residues are significantly conserved. It is recently shown that DTH2, which encodes a COL protein in rice, and two functional nucleotide polymorphisms (FNPs) in the B-box and Var domain, respectively, are associated with the changes in flowering time and increased reproductive fitness that have occurred during the northward expansion of rice cultivation [22]. In this study, the B-box and the Var domain evolve signifacally faster than the CCT domain among the eight COL genes, indicating the two domains endure relax evolutionary constraint, and may be associated with the changes in flowering time of cotton. Higher nucleotide diversity in the two domains may enable cotton to form a diversity of habitats to adapt the variable environments and expansion of the cultivation area.

Conclusions
CO-FT is conserved and plays important roles in the photoperiodic regulation of flowering time in plants. We revealed that eight COL homeologs from allotetraploid accessions have evolved independently after polyploid formation. COL1, COL2 and COL8 are potential selected genes during domestication, with strong conservation on the CCT domain and great diversity on the B-box and Var domains. This study provides valuable information that increases our understanding of the dynamic evolutionary of the COL gene family in cotton and the potential target COL genes during the domestication and adaptation of cotton.
Supporting Information S1