Genome-wide classification, evolutionary analysis and gene expression patterns of the kinome in Gossypium

The protein kinase (PK, kinome) family is one of the largest families in plants and regulates almost all aspects of plant processes, including plant development and stress responses. Despite their important functions, comprehensive functional classification, evolutionary analysis and expression patterns of the cotton PK gene family has yet to be performed on PK genes. In this study, we identified the cotton kinomes in the Gossypium raimondii, Gossypium arboretum, Gossypium hirsutum and Gossypium barbadense genomes and classified them into 7 groups and 122–24 subfamilies using software HMMER v3.0 scanning and neighbor-joining (NJ) phylogenetic analysis. Some conserved exon-intron structures were identified not only in cotton species but also in primitive plants, ferns and moss, suggesting the significant function and ancient origination of these PK genes. Collinearity analysis revealed that 16.6 million years ago (Mya) cotton-specific whole genome duplication (WGD) events may have played a partial role in the expansion of the cotton kinomes, whereas tandem duplication (TD) events mainly contributed to the expansion of the cotton RLK group. Synteny analysis revealed that tetraploidization of G. hirsutum and G. barbadense contributed to the expansion of G. hirsutum and G. barbadense PKs. Global expression analysis of cotton PKs revealed stress-specific and fiber development-related expression patterns, suggesting that many cotton PKs might be involved in the regulation of the stress response and fiber development processes. This study provides foundational information for further studies on the evolution and molecular function of cotton PKs.

Numerous studies demonstrated that the expansion of plant PKs is mainly due to the expansions of certain RLK subfamilies [9][10][11][12].Lehti-Shiu and Shiu also demonstrated that each flower plant kinome is significantly larger than other eukaryotes' kinomes mainly due to the expansion of certain RLK subfamilies [5].In 2001, Shiu and Bleecker performed a genomewide identification and phylogenetic analysis of Arabidopsis RLKs [13].Then, they analyzed the expansion mechanism of Arabidopsis RLKs and found that the RLKs ancient origin could be attributed to a more recent plant-specific expansion [9].By comparing rice (Oryza sativa) RLKs with Arabidopsis RLKs, they found that OsRLKs involved in development have rarely undergone duplications after the rice-Arabidopsis split, whereas OsRLKs involved in defense resistance have undergone more duplications [10].In 2008, Vij et al. performed identification and phylogenetic analysis of OsRLCKs (receptor-like cytoplasmic kinase, subfamily of RLKs) and analyzed the expression patterns of certain OsRLCKs involved in development and stress [14].The expression analysis of rice RLKs was studied in different tissues, with the emphasis on seed development and abiotic stress response [11].Evolutionary analysis indicated that the expansion of RLK family coincided with the establishment of land plants, and expression analysis of Arabidopsis RLKs supported the importance of RLKs in biotic stress response [15].In addition, phylogenetic analysis and expression profile analysis of the LRRs (leucine-rich repeat, subfamily of RLKs) were performed in Populus trichocarpa [16], Platanus × acerifolia [17], Brassica rapa [18], Arabidopsis [19], tomato [20] and cotton [21].Based on phylogenetic analysis of the RLK-LRR genes from 31 sequenced flower plants, Fischer et al. found that subgroupand species-specific expansion rates differ significantly due to the complex history of expansion-retention-loss cycles and whole genome multiplication [22].
Functional characterization studies of cotton PKs indicated that cotton mitogen-activated protein kinases (MAPKs) are involved in abiotic and biotic stress responses [23][24][25][26], and cotton RLKs function in fiber development [27], anther development [28], disease resistance [29] and drought-stress tolerance [30].The completion of cotton genome sequencing provides an opportunity to perform genome-wide research on cotton PKs.In this study, we identified the putative PK genes in Gossypium raimondii, Gossypium arboretum, Gossypium hirsutum and Gossypium barbadense and classified them into groups, families and subfamilies.Some conserved exon-intron structures were identified not only in cotton species but also in primitive plants fern and moss, suggesting that these subfamilies may play important and conserved roles during plant evolutionary processes.Analyses of chromosome location and collinearity events were combined to explore the relationship between expansion and duplication events (WGD, TD and synteny analysis).Expression profiles in developing fibers and under abiotic and biotic stresses were assessed in G. hirsutum and G. barbadense PKs.This work will help to understand the cotton PK evolution mechanism and provide a starting point for further experimental research.

Identification and classification of cotton PKs
The genome and proteome sequences of G. raimondii were downloaded from Phytozome V10.0 (http://phytozome.jgi.doe.gov/pz/portal.html).The genome and proteome sequences of G. arboretum (V2.0) and G. hirsutum (V1.0) were downloaded from the Cotton Genome Project (CGP: http://cgp.genomics.org.cn/).The genome and proteome sequences of G. barbadense were downloaded from the website CHGC (http://database.chgc.sh.cn/cotton/index.html).HMMER v3.0 [31] in our local server and Pfam 28.0 in batch (http://pfam.xfam.org/)with an E-value of less than 0.01 were performed against all proteomes.The Pfam profiles PF00069 (Pkinase domain) and PF07714 (Pkinase_Tyr domain) were used in HMMER.After the initial screen, putative typical PKs were retained only if Pkinase or Pkinase_Tyr domain alignments covered greater than 50%, as previously described by Legti-Shiu and Shui [5].Classifications of "typical" cotton PKs were performed using the HMM models developed by Legti-Shiu and Shui [5].The classifications were further confirmed by phylogenetic analysis using the neighbor-joining (NJ) method.The truncated sequences of the Pkinase or Pkinase_Tyr domain were aligned using ClustalW (v2.0) [32].The NJ phylogenetic analysis with the p-distance model and 1000 bootstrap repetitions was constructed by MEGAcc 7.0 [33] in our local server.The classification information of Vitis vinifera, Arabidopsis thaliana, Carica papaya, Zea mays and Oryza sativa PKs was extracted from the supplemental files of Legti-Shiu and Shui [5].

Chromosome location and synteny analysis of cotton PKs
The chromosome location information of G. raimondii, G. arboretum, G. hirsutum and G. barbadense was extracted from GFF files of Phytozome, CGP and CHGC using our perl script.A BLAST search was performed against all the cotton PKs with an E-value of 1e-100.Based on the BLAST and GFF results, the segmental/tandem duplication events of G. raimondii, G. arboretum, G. hirsutum and G. barbadense were determined using MCScanX software [34].The "add_ka_and_ks_to_collinearity.pl" of MCScanX was used to calculate the synonymous (K s ) and non-synonymous substitution (K a ) rates.The collinearity visualization of duplicated PKs was performed using GenomePixelizer [35].The chromosomal location of identified tandem PKs was mapped on each chromosome with Mapchart v2.3 (http://www.wageningenur.nl/en/show/mapchart.htm).The synteny blocks between G. arboretum and G. hirsutum At-subgenome, G. raimondii and G. hirsutum Dt-subgenome; G. arboretum and G. barbadense At-subgenome; G. raimondii and G. barbadense Dt-subgenome; and G. raimondii and G. arboretum were identified by MCScanX [34].All paralogous and orthologous PK pairs were visualized using the Circos program [36].

Domain and exon-intron structure search of cotton PKs
The classifications of V. vinifera, A. thaliana, O. sativa, Selaginella moellendorffii, Physcomitrella patens and Chlamydomonas reinhardtii PKs were extracted from Legti-Shiu and Shui [5].Exon-intron structure diagrams of these six plants and four cotton species were generated by our perl and R scripts based on extracting information from Phytozome, CGP and CHGC GFF files.The domain information was obtained from pfam 28.0 in batch (http://pfam.xfam.org/).

RNA-seq expression data analysis
Public cotton expression datasets were retrieved from the Sequence Read Archive (SRA) of NCBI.The RNA-seq data of two commercial cotton species, G. barbadense and G. hirsutum, in fiber developmental stages (10,15,18,21, and 28 dpa) were used (PRJNA263926).The RNAseq data of G. hirsutum Li1 mutant and wild type in leaf tissue and ovules of different fiber developmental stages (1, 3, and 8dpa) were used (PRJNA301536).Quality control assessment of raw data was performed using FastQC.High-quality RNA-seq reads were aligned to reference cotton genomes by software Hisat2.The counts of expression genes were performed using Samtools and HTseq software.The reads per kilobase of exon model per million mapped reads (RPKM) algorithm was used to normalize the data.The expression levels (PRJNA301536, Li1 mutant vs. wild type, log2 value of fold change) of cotton PKs for fiber development was calculated using R software and R package DESeq2.Heat maps of cotton PK expression levels were generated using Mev4.9 [43].

Genome-wide identification and classification of cotton PKs
We identified 1517, 1407, 2508 and 2745 PK genes with typical kinase domains (S1  ).The classifications of these cotton PKs were performed against HMM models developed by Legti-Shiu and Shui [5].These genes were classified into 122-124 subfamilies (S1 Table ).To validate the classification by HMMER, we selected 1-3 random genes from each subfamily as representatives to construct a NJ phylogenetic tree using the truncated kinase domain sequences with the p-distance model and 1000 bootstraps ( To comprehensively study the cotton PK classifications, we compared the PK distributions of four cotton species with three eudicots (V.vinifera, A. thaliana, and C. papaya) and two monocots (Z.mays and O. sativa) (S3 Table ).First, the "one-two-four" pattern was found to be existed in cotton PK subfamily member size comparing with grape.For instance, the RLK/Pelle_RLCK-XI subfamily of V. vinifera, G. raimondii, G. arboretum, G. hirsutum and G. barbadense contained 2, 4, 4, 8 and 6 members, respectively.Similar examples were also found in other PK subfamilies, such as AGC_Pl, AGC_RSK-2, CAMK_CAMKL-CHK1 and CAMK_CDPK.This expansion pattern could be explained by the report indicating that cotton has experienced only one cotton-specific WGD event after it split from the resemblegrape ancestor [44].Moreover, tetraploidization of G. hirsutum and G. barbadense also contributed to the "one-two-four" expansion pattern [45][46].Second, some species-specific expansions were identified in some PK subfamilies.For example, cotton RLK/Pelle_ CrRLK1L-1 (G.raimondii 55, G. arboretum 56, G. hirsutum 93 and G. barbadense 103) contained more members compared with V. vinifera (9), suggesting that it potentially experienced cotton-specific expansion during the evolutionary process.Similar examples were also identified in RLK/Pelle_L-LEC, RLK/Pelle_LRK10L-2 and RLK/Pelle_WAK.One monocotspecific PK subfamily, CMGC_CDKL-Os, was absent in all the six investigated eudicots but was present in the two monocots (Z.mays and O. sativa).RLK/Pelle_URK-4 had only one member in V. vinifera, A. thaliana, and C. papaya, separately, but was absent in four cottons and two monocots (Z.mays and O. sativa).These results suggest that it might arise in an eudicot ancestor after the monocot-eudicot split but was lost in cotton evolution.RLK/Pelle_ LRR-I-1 contained more members in in O. sativa (34) and A. thaliana (48) compared with other investigated plants (8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22), implying that it might experience rice-and Arabidopsis-specific expansions.RLK/Pelle_DLSV contained the largest members among PK subfamilies in all the nine investigated plants.

Conserved domains and intron-exon structures of cotton PKs
Based on the Pfam information, we diagrammed the domain distributions of all the putative cotton PKs (G.raimondii 1517, G. arboretum 1407, G. hirsutum 2508 and G. barbadense 2745) (S2 Fig) .Our results indicated that approximately half of cotton PKs exclusively contained the  ).Most of these genes belonged to the AGC_NDR, AGC_RSK-2, and CMGC_SRPK subfamilies and RLK group.
To gain further insights into the cotton PKs, we generated the exon-intron structures within the kinase domain in four cotton species (S3 Fig) .Interestingly, cotton PKs belonging to the same subfamily contained similar exon-intron structures within the conserved exon phases, especially in the kinase domain.For example, AGC_ NDR sequences in G. raimondii (Gorai.011G175100.1),G. arboretum (Cotton_A_39141), G. hirsutum (CotAD_71296) and G. barbadense (GOBAR_DD23594) shared the same exon-intron structure within "0020-200" exon phases in the kinase domain.
Our other research on wheat PKs showed that these exon-intron structures were conserved across the plant evolution [47].To test it in cotton PKs, we also diagramed the PK exon-intron structures in V. vinifera (grape), A. thaliana, O. sativa (rice), S. moellendorffii (a fern), P. patens (moss) and C. reinhardtii (a green alga) (S3 Fig) .Similarly, our results revealed that these conserved exon-intron structures were also identified in investigated plants, especially in the primitive plants S. moellendorffii and P. patens but not in C. reinhardtii.These findings suggest that these conserved exon-intron structures may have occurred during the emergence of Pteridophytes or Bryophytes.For instance, the CAMK_CDPK subfamily sequences from P. patens (Pp1s370_37V6), S. moellendorffii (164119), O. sativa (LOC_Os02g03410.1),V. vinifera (GSVIVT01018778001) A. thaliana (AT5G66210.2),G. raimondii (Gorai.003G009500.1),G. arboretum (Cotton_A_19365), G. hirsutum (CotAD_33891) and G. barbadense (GOBAR_ DD26903) shared the same exon-intron structure of the kinase domain within the exon phases "022-0101" (Fig 2A).Similarly, RLK_Pelle_LysM members from moss to cotton also shared the conserved exon-intron structure with exon phases "001-0200" in the kinase domain (Fig 2B).Based on the above analysis, we summarized all the conserved exon-intron structures of some PK subfamilies in P. patens, S. moellendorffii, V. vinifera and G. raimondii (S4 Fig) .These results suggested that these conserved exon-intron structures might have important biological functions for plants during the evolutionary process.

Chromosome location and duplication events of cotton PKs
We mapped the 1501 G. raimondii PKs (  ).The distributions of cotton PKs were assessed in G. raimondii, G. arboretum, G. hirsutum and G. barbadense (Table 1).
The results showed that cotton PKs were unequally distributed across 13 or 26 chromosomes.G. raimondii chromosome 9 contained the most PKs (205), whereas G. arboretum chromosome 10, G. hirsutum chromosome 9Dt and G. barbadense chromosome 5At/5Dt contained the most PKs (145, 207, and 175, respectively).Gossypium lineages experienced two WGD events, which occurred 16.6 and 130.8 million years ago (Mya) [48][49][50].Therefore, we decided to investigate the contributions of WGD events to cotton PK expansions.We identified 681, 643, 510 and 1391 collinearity events in G. raimondii, G. arboretum, G. hirsutum and G. barbadense, respectively (S6 Table ).These results suggest that segmental duplication events might play important roles in the cotton PK expansions.The WGD event appeared after cotton speciation 16.6 Mya, and its synonymous distance (Ks value) peak ranged from 0.4 to 0.6 [48].Our results were consistent with a paper suggesting that cotton PK collinearity events also formed peaks of Ks 0.4-0.6 (Fig 3A -3D).We detected the chromosome positions of these collinearity events with Ks 0.   ).Our results demonstrated that approximately half of collinearity events exhibited single gene correspondence (S9 Table ), indicating that these single gene pairs might exist in the common cotton ancestor and did not experience expansion after the split of cotton A-and D-subgenomes.We also noticed that some gene pairs from chromosome fragments did not locate on the corresponding chromosomes between subgenomes, suggesting that they might experience chromosome rearrangement after the split of cotton A-and D-subgenomes.For instance, eight gene pairs (from Cotton_A_10476-Gorai.013G272100.1 to Cotton_A_10556-Gorai.013G263500.1)are located in the A8 chromosome fragment of G. arboretum and D13 chromosome fragment of G. raimondii, separately (S8 Table ).

Expression pattern of cotton PKs in abiotic and biotic stresses
We analyzed the gene expression profiles of G. hirsutum PKs using 11 public datasets of the Affymetrix microarray GPL8672 platform.As a result, 564 of 2508 G. hirsutum PKs have probes in GPL8672.Based on the RLE and NUSE diagrams (S8 Fig), we performed quality control assessment and excluded 5 CEL files in the following analysis (S10 Table ).The expression patterns of 564 G. hirsutum PKs were assessed under various abiotic stresses (S9 Fig and S11 Table).To identify PK genes differentially expressed under abiotic stresses, the PKs with |FC|>1.5 (fold change) and p<0.01 were retained (S12 Table ).( 1) Five stress conditions: We assessed genes in response to five abiotic stress conditions: ABA, cold, drought, salinity and alkalinity stresses.Some PKs, such as CotAD_10936 (CAMK_CAMKL-CHK1), CotAD_29572 (CMGC_MAPK) and CotAD_24426 (AGC-Pl), exhibited up-regulation in response to these five stresses.These results suggest that several common components of signaling pathways might be shared by these five abiotic stress response.In addition, certain PKs responded to particular stress treatments.For example, CotAD_51610 (CMGC_CDK-CRK7-CDK9) only exhibited up-regulation in response to ABA treatment but down-regulation to other four abiotic stress treatments.(2) Heat stress: Most PKs exhibited the same expression trend in both varieties "Sicala 45" and "Sicala 53" under high temperature treatment.However, some PKs exhibited two opposing expression trends between variety "Sicala 45" and "Sicala 53".For instance, CotAD_74347 (CAMK_CDPK) exhibited slight down-regulation in "Sicala 45" but up-regulation in "Sicala 53".(3) Waterlog and drought stresses (S10 Fig, S13 and S14 Tables): We observed that some PKs, such as CotAD_65513 (RLK-Pel-le_RLCK-VIIa-2), CotAD_27962 (AGC-Pl) and CotAD_57424 (RLK-Pelle_DLSV), exhibited up-regulation in root under waterlog stress, whereas these PKs exhibited no change or down-regulation in leaf.The reason for these findings might be that the low oxygen of the cotton root influenced some energy metabolism.Under drought stress, some PKs, such as CotAD_09571 (CMGC_MAPK), CotAD_06862 (CAMK_CAMKL-CHK1) and CotAD_ 43021 (CAMK_OST1L), exhibited up-regulation at 5, 15 and 20 dpa (days post anthesis).Similarly, several PKs, such as CotAD_70308 (RLK-Pelle_LRK10L-2), CotAD_62253 (TKL-Pl-3) and CotAD_62987 (RLK-Pelle_DLSV), exhibited down-regulation at 5, 15 and 20 dpa.We also selected PKs of some known subfamilies, such as CMGC_MAPK, CAMK_CDPK and CAMK_AMPK, to investigate their expression patterns under abiotic stresses (Fig 5A and 5B).

Expression pattern of cotton PKs during fiber development stages
We assessed the expression pattern of cotton PK genes in five G. hirsutum varieties during fiber development stages (6,9,12,19   investigated five varieties.Similarly, several PKs, such as CotAD_06790 (RLK-Pelle_RLCK-VIIa-2), CotAD_54619 (CAMK_OST1L) and CotAD_17222 (RLK-Pelle_LRR-XI-1), exhibited sustained down-regulation at 6-25 dpa among the five varieties.We also noticed that some PKs exhibited different expression patterns among these five cotton varieties.For example, a few PKs, including CotAD_22808 (CAMK_CDPK), CotAD_15427 (CMGC_GSK) and CotAD_61960 (TKL-Pl-4), exhibited peak up-regulation at 19 and 25 dpa in varieties "JKC 725" and "JKC 703", respectively, but remained at the same expression level or exhibited downregulation in the other three cotton varieties.We also assessed PKs exhibiting differential expression (|FC|>1.5 and p<0.01) (S18 Table) and selected the PKs of known subfamilies to investigate their expression patterns during fiber development stages (Fig 7A ).
The cotton transcription factor GhHD-1 plays roles in cotton fiber initiation and is expressed in trichomes and early fibers [41].The expression patterns of cotton PKs between GhHD-1 silenced and over-expressed lines were assessed during the early fiber development stage (S13 Fig and S19 Table ).Our result revealed that a few PKs exhibited reverse expression trends between GhHD-1 silenced and over-expressed lines, hinting that they might be the downstream of GhHD-1 in the signaling pathway.For instance, CotAD_03134 (RLK-Pel-le_LRR-XI-1) and CotAD_34129 (RLK-Pelle_CR4L) exhibited down-regulation in a GhHD-1silenced line but up-regulation in a line exhibiting GhHD-1 over-expression.
We also analyzed two public RNA-seq data of NCBI SRA about the fiber development of G. hirsutum and G. barbadense.Based on the results of FastQC, we performed quality control assessment and excluded 15 RUN files in the following analysis (S21 Table ).1619 G. hirsutum PKs and 963 G. barbadense PKs were identified in all the five runs of 10, 15, 18, 21 and 28 dpa (S14 and S15 Figs, and S22 Table ).We noticed that some G. hirsutum PKs, such as CotAD_12382 (NEK) and CotAD_16078 (CAMK_CAMKL-CHK1), exhibited sustained high expression from 10 to 28 dpa.Similarly, some G. hirsutum PKs, such as CotAD_31836 (RLK-Pelle_LRR-XI-1) and CotAD_21804 (RLK-Pelle_DLSV), exhibited sustained low expression from 10 to 28 dpa.We also selected PKs of the known subfamily, CMGC_MAPK, to investigate their expression patterns of fiber development (Fig 8A and 8B).
1749 G. hirsutum PKs of two genotypes (Li1 mutant and wild type) were identified in all the eight RUN files of leaf tissue and ovule tissues in fiber development stages (1, 3 and 8 dpa) (S16 Fig and S23 Table).Most PKs exhibited the similar expression levels between genotypes Li1 mutant and wild type.We also found that some PKs, such as CotAD_21607 (RLK-Pel-le_DLSV) and CotAD_58349 (RLK-Pelle_RLCK-XII-1), exhibited different expression levels in leaf tissue between the two genotypes.In order to investigate the differential expression PKs between genotypes Li1 mutant and wild type, we also caculated the log2 Fold Change (Li1 mutant vs. wild type) and padj values (S24 Table ).For instance, the log2 value of CotAD_ 27047 (CMGC_MAPK) in leaf tissue is 5.885973619 and its padj value is 0.000776909, suggesting that Li1 mutant might greatly influence the expression of CotAD_27047 (CMGC_MAPK) in leaf tissue.We also selected PKs of the known subfamily, CMGC_MAPK, to investigate their expression patterns of fiber development between the two genotypes (Fig 8C).

Phylogeny of cotton PK family
The identification and analyses of A. thaliana, wheat, soybean and maize kinomes has been reported in recent years [7-8, 12, 47].Diversity, classification and functions of 25 plant kinomes were also assessed to determine the evolutionary history of PK subfamilies [5].In this study, we also identified, classified, and performed phylogenetic and expression pattern analyses of the PK family in four cotton species G. raimondii, G. arboretum, G. hirsutum and G. barbadense.As reported in the other plant kinomes, cotton PKs were also classified into seven groups: RLK, AGC, CAMK, CMGC, STE, TKL and others.
Our previous research on wheat PKs revealed that some conserved exon-intron structures were present from primitive plant moss to flowering plant wheat [47].In this study, we also revealed that some conserved exon-intron structures with conserved exon phases in the kinase domain existed in cotton PKs (S4 Fig) .For instance, the CAMK_AMPK subfamily in cotton (Gorai.007G080200.1),grape (GSVIVT01011467001), fern (80443) and moss (Pp1s3_550V6) shared the same exon-intron structure within the exon phase "1000" in the kinase domain.This finding suggested that these conserved exon-intron structures might play important roles in plant development and evolution.Indeed, previous functional studies of plant PKs indicated that these PK genes within conserved exon-intron structures function as core components of various signaling pathways that control multiple plant cellular processes.(1) Mitosis: Aur [51] and ULK_ULK4 [52] function in cell division in A. thaliana.The rice and Arabidopsis CMGC_CDK−CDK7 homologs phosphorylate both CDKs (cyclin-dependent protein kinases) and the CTD (carboxy-terminal domain) of RNA polymerase II [53][54].(2) Microtubule: Arabidopsis CK1_CK1 affects cortical microtubules organization [55].Arabidopsis NEKs form homo-or heterodimers to regulate microtubule organization during epidermal cell expansion [56].(3) Metabolism: The Arabidopsis CAMK_AMPK homolog SnRK1 forms the SnRK1-ADK complex and plays significant roles in energy homeostasis [57].The Arabidopsis CAMK_CAMKL−LKB homologs GRIK1 and GRIK2 specifically activate SnRK1, and the GRIK-SnRK1 cascade may coordinate metabolic requirements of rapidly growing cells [58].The Arabidopsis CAMK_CAMKL−CHK1 homologs SOS2 (Salt Overly Sensitive2) and SOS3 maintain ion homeostasis and confer salt tolerance [59].(4) MAPK signaling network: Plant MAPK cascades regulate several processes, including stress response, immunity and development [60].STE_STE11 functions as a MAPK3K [60].Similar to animal RSK−2, the plant AGC_RSK−2 homolog is also activated by PDK1 [61], which is involved in the MAPK signaling cascade [62].(5) Stress response and development: The plant CMGC_GSK homolog GSK3 functions in floral organs, cell expansion, and abiotic and biotic stress responses [63].Plant RLK/Pelle members play various roles in cell specification [64], pathogen recognition [65], stress response and development [66].The soybean WNK_NRBP member GmWNK1 regulates root system architecture via ABA and osmotic signals [67] and modulates the osmotic stress response [68].
We observed that some cotton PKs contained two or more kinase domains (S4 Table ).Interestingly, soybean and wheat also contained similar special PKs within two or three kinase domains [8,47].Comparing these special PKs within 2-4 kinase domains of cotton, soybean and wheat, we identified four overlapping PK subfamilies: AGC_NDR, AGC_ RSK-2, CMGC_SRPK and RLK-Pelle_DLSV.Notably, the structure of AGC_RSK-2 and CMGC_SRPK within two kinase domains can be found in moss, fern, grape and cotton (S4 Fig) , suggesting that the duplication event of the kinase domain in these PKs occurred during the emergence of land plants and these structures were subsequently retained during the evolutionary process.Given that some PKs form homo-or hetero-dimers [56,69], we hypothesized that these structures within two or three kinase domains might be required for specific substrates to form PK dimers.This hypothesis is consistent with soybean PKs [8] and requires more functional research.

Evolution, duplication and expansion of cotton PKs
The cotton genome experienced two WGD events: a cotton-specific WGD event 16.6 Mya and a WGD that occurred 130.8 Mya and was shared by eudicots [48].Our results revealed that the cotton-specific WGD that occurred 16.6 Mya partly contributed to the expansion of cotton PKs (Fig 3).Similar to the Ks peak (0.4-0.6) of the G. raimondii whole genome [48], G. raimondii PKs also formed a Ks peak (0.4-0.6) (Fig 3A).The paleo-hexaploidy of ancient eudicot occurred after the split of Monocotyledons and Dicotyledons [70].Considering that the WGD that occurred 130.8 Mya is shared by eudicots [48], we proposed that most of the cotton PK collinearity events (Ks 0.6-3) potentially attributed to the 130.8 Mya WGD.However, we did not identify the second obvious cotton PK Ks peak, which corresponds to the WGD that occurred 130.8 Mya.We were potentially unable to identify this peak because these duplicated PKs might exhibit low retention after the WGD that occurred 130.8 Mya.Indeed, the TD after the paleo 130.8 Mya WGD caused dosage imbalance as the gene balance hypothesis claimed [71].Tetraploidization of G. hirsutum occurred 1.5 Mya and corresponds to a Ks peak [50].In our study, we also identified G. hirsutum and G. barbadense PK Ks peaks (0-0.Most tandem cotton PKs belonged to the RLK group, suggesting that TD partly contributed to the cotton PKs expansion (S7 Table ).Our result was consistent with reports demonstrating that most soybean and wheat TD PKs belonged to the RLK group [8,47].An A-genome ancestor resembling G. arboretum and a D-genome ancestor resembling G. raimondii diverged from a common ancestor 5-10 Mya and subsequently reunited to produce an allotetraploid Gossypium species [50].Our result demonstrated that the TD PK clusters of G. raimondii, G. arboretum, G. hirsutum and G. barbadense exhibited different chromosome locations and related subfamilies, suggesting that they might experience independent PK TDs or different chromosome rearrangements after the split of cotton A-and D-subgenomes.Indeed, the G. raimondii genome underwent large-scale rearrangement on chromosomes 2 and 3 compared with G. arboretum [49].
Similar to A. thaliana, rice and soybean [8][9][10], cotton RLK groups were also remarkably large, ranging from 913 to 1,855.The expansion of cotton RLKs was associated with specific subfamilies or subgroups, including LRR, RLCK and DLSV.

Expression pattern of cotton PKs
Our previous studies demonstrated that cotton PKs GhMKK1 [24], GhMKK3 [26] and GhMKK5 [23] are involved in drought stress resistance.In this study, we provided some candidate cotton PKs involved in drought resistance.For example, expression pattern analysis revealed that CotAD_10106 (STE_STE7) of G. hirsutum exhibited slight up-regulation (log2 value 0.3230955 in leaf) under drought stress (S13 Table ), suggesting that it exerts a positive effect in drought tolerance.Using BLAST in NCBI, we found that cotton PK CotAD_10106 is a homolog of A. thaliana MKK3.This prediction is consistent with our laboratory research demonstrating that GhMKK3 positively regulates the drought stress response [26].In addition, a cotton RLK GbRLK from Gossypium barbadense is involved in drought and high salinity stress pathways by participating or activating the ABA signaling pathway [30].In this study, we provided some candidate cotton RLKs for drought resistance.For instance, CotAD_53902 (RLK-Pelle_SD-2b) and CotAD_15239 (RLK-Pelle_LRR-Xb-1) exhibited up-regulation at 5 and 10 dpa under drought stress (S13 Table ).
Our previous research also demonstrated that several cotton PKs are involved in disease defense.GhMPK2 mediates defense responses to oxidative stress and pathogen infection [77].GhMPK6a interacts with the upstream MAPK kinase GhMKK4 and negatively regulates bacterial infection and osmotic stress [78].GhMPK7 plays a role in SA-regulated resistance to pathogen infection [79].GhMPK16 displays significant resistance to fungi (Alternaria alternata and Colletotrichum nicotianae) and bacteria (Pseudomonas solanacearum) [80].In this study, we also observed that some cotton MAPKs, such as CotAD_66205, CotAD_27603 and CotAD_29572, exhibited up-regulation in response to bollworm infection at 2-10 DAI (S11 Fig) .Interestingly, CotAD_53696 (CMGC_MAPK, a homolog of A. thaliana MPK16) of G. hirsutum exhibited slight up-regulation at 3 and 6 DAI (log2 values 0.213498 and 0.296303333, respectively) after A. alternate infection without chilling pre-treatment (S15 Table ), suggesting that it positively regulates the A. alternate defense response.This prediction is consistent with our laboratory research demonstrating that GhMPK16 regulates A. alternata resistance [80].In addition, cotton RLKs are also involved in disease defense.Two cotton RLK genes, Gbve1 [29] and GbRLK [81], improve tolerance to Verticillium dahliae infection.In this study, we also found that some cotton RLKs, such as CotAD_75319 (RLK-Pelle_WAK), CotAD_53902 (RLK-Pelle_SD-2b) and CotAD_03142 (RLK-Pelle_WAK_LRK10L-1), exhibited up-regulation at 2-10 DAI after bollworm infection (S11 Fig).
Cotton PKs are also involved in fiber development.According to expression analysis, some cotton CrRLK1L proteins were predicted to be related to fiber development [21].A cotton CDPK gene GhCPK1 [82] and a cotton CDK gene GhCDKA [83] were cloned and characterized to be associated with fiber development.In this study, we also found that some cotton RLKs

Conclusion
In this study, we systematically identified cotton PKs and analyzed their classification, evolution and expression patterns.Some conserved exon-intron kinase domain structures were identified during plant evolution.WGD, TD and synteny PKs of three cotton species G. raimondii, G. arboretum and G. hirsutum were analyzed by MCScanX.These results suggest that cotton-specific WGD, TD and tetraploidization of G. hirsutu that occurred 16.6 Mya partially contributed to the expansion of cotton PKs.We also performed global expression pattern analysis under abiotic and biotic stress conditions and during fiber development, providing candidate PKs for further experimental research.Our results will provide clues for further research on the evolution and physiology of the cotton kinome.

Fig 1 .
Fig 1. Classification and phylogenetic relationships of the cotton PK subfamilies.The Neighbor-Joining tree was built by the amino acid sequences of the kinase domain using the MEGAcc 7.0 with the p-distance model.Random representative PKs of each subfamily are selected by following criteria: members 6, 1 PK; 6 < members 30, 2 PKs; members > 30, 3 PKs.Six major groups are labeled with different colors, including AGC, CAMK, CMGC, RLK, STE, TKL.Detailed information is provided in S1A Fig. https://doi.org/10.1371/journal.pone.0197392.g001 Figure A in S5 File, excluding 16 genes from scaffold), 1407 G. arboretum PKs (Figure B in S5 File), 1893 G. hirsutum PKs (Figure C in S5 File, excluding 615 genes from scaffolds) and 2477 G. barbadense PKs (Figure D in S5 File, excluding 268 genes from scaffolds) to chromosome positions (S5 Table

Fig 2 .
Fig 2. Two examples of conserved exon-intron structure.This diagram indicated that conserved exon-intron structure with conserved exon phases existed in kinase domain.Filled boxes: red represents Pkinase or Pkinase_Tyr domain; black boxes: untranslated regions (UTRs); white boxes: other exon regions; lines: introns.Numbers 0, 1, and 2: exon phases.The lengths of the boxes and lines are scaled based on the lengths of genes.The long introns are shorted by "//".(A) non-RLK example: CAMK_CDPK.(B) RLK example: RLK_Pelle_LysM.https://doi.org/10.1371/journal.pone.0197392.g002 4-0.6 in G. raimondii, G. arboretum, G. hirsutum and G. barbadense, respectively (Figure A-N in S6 File).The results indicated that 271, 158, 72 and 264 PKs of G. raimondii, G. arboretum, G. hirsutum and G. barbadense, respectively, were involved in the collinearity events (Ks 0.4-0.6).These results suggest that the cotton PK expansions might be due in part to WGD 16.6 Mya.We noticed that 440 G. hirsutum PKs and 1333 G. barbadense PKs were involved in the collinearity events (Ks 0.0-0.1)(Fig 3C and 3D, Figure G and K in S6 File), suggesting that tetraploidization of G. hirsutum and G. barbadense might play important roles in the PK expansion.We identified 291, 246, 264 and 455 tandem cotton PKs in G. raimondii, G. arboretum, G. hirsutum and G. barbadense, respectively.The chromosome positions of these cotton PKs were detected in G. raimondii, G. arboretum, G. hirsutum and G. barbadense (S7 Table).They formed several tandem PK clusters among 13 or 26 cotton chromosomes.The G. raimondii tandem PKs formed 83 clusters that were distributed among the 13 chromosomes (Figure A in S7 File).The largest G. raimondii cluster contained 13 RLK-Pelle_LRR-XII-1 genes located on chromosome 11.The G. arboretum tandem PKs formed 87 clusters among the 13 chromosomes (Figure B in S7 File).The largest G. arboretum cluster was identified on chromosome 1

Fig 5 .Fig 6 .
Fig 5. Heat map of the expression patterns of G. hirsutum PK genes from known subfamilies under various abiotic stress conditions.The expression patterns of PKs of known subfamilies, including CAMK_CDPK, CAMK_AMPK, CMGC_GSK and CMGC_MAPK.(A) The expression patterns under abiotic stress treatments.(B) The expression patterns under waterlog and drought stress treatments.https://doi.org/10.1371/journal.pone.0197392.g005