Genome-Wide Comparative Analysis of the Phospholipase D Gene Families among Allotetraploid Cotton and Its Diploid Progenitors

In this study, 40 phospholipase D (PLD) genes were identified from allotetraploid cotton Gossypium hirsutum, and 20 PLD genes were examined in diploid cotton Gossypium raimondii. Combining with 19 previously identified Gossypium arboreum PLD genes, a comparative analysis was performed among the PLD gene families among allotetraploid and two diploid cottons. Based on the orthologous relationships, we found that almost each G. hirsutum PLD had a corresponding homolog in the G. arboreum and G. raimondii genomes, except for GhPLDβ3A, whose homolog GaPLDβ3 may have been lost during the evolution of G. arboreum after the interspecific hybridization. Phylogenetic analysis showed that all of the cotton PLDs were unevenly classified into six numbered subgroups: α, β/γ, δ, ε, ζ and φ. An N-terminal C2 domain was found in the α, β/γ, δ and ε subgroups, while phox homology (PX) and pleckstrin homology (PH) domains were identified in the ζ subgroup. The subgroup φ possessed a single peptide instead of a functional domain. In each phylogenetic subgroup, the PLDs showed high conservation in gene structure and amino acid sequences in functional domains. The expansion of GhPLD and GrPLD gene families were mainly attributed to segmental duplication and partly attributed to tandem duplication. Furthermore, purifying selection played a critical role in the evolution of PLD genes in cotton. Quantitative RT-PCR documented that allotetraploid cotton PLD genes were broadly expressed and each had a unique spatial and developmental expression pattern, indicating their functional diversification in cotton growth and development. Further analysis of cis-regulatory elements elucidated transcriptional regulations and potential functions. Our comparative analysis provided valuable information for understanding the putative functions of the PLD genes in cotton fiber.


Introduction
Upland cotton Gossypium hirsutum is the world's most valuable fiber crop [1,2]. It has been widely grown in over 80 countries and accounts for more than 90% of commercial cotton production worldwide [3]. G. hirsutum is also studied as a model polyploid plant. It was a classic natural allotetraploid (AADD, 2n = 4x = 52) that arose from interspecific hybridization approximately 1 to 2 million years ago (mya) between the A genome diploid species Gossypium arboreum (AA, 2n = 2x = 26) and the D genome diploid species Gossypium raimondii (DD, 2n = 2x = 26) [2,4]. In general, allopolyploid plants grow more vigorously and adaptively than their parents, and thereby allotetraploid cotton produces a higher yield and superior quality of fibers than their diploid progenitors under similar conditions [5]. Cotton fiber (commonly known as cotton lint) is the most important natural and renewable material for the textile industry and profoundly affects the world economy and human daily life [6].
Phospholipase D (PLD) is a major type of phospholipase in plants and catalyzes the hydrolysis of phospholipids at the terminal phosphodiester bond to produce a free head group and phosphatidic acid (PA) [7]. Two highly conserved HxKxxxxD (HKD) domains are the common feature of PLD proteins. The functional domains near the N-terminus are diverse. Some phylogenetic relationship, sequence characteristics, functional divergence and selective pressure analyses. In addition, the expression profiles of GhPLD genes in different allotetraploid cotton tissues were examined, and the cis-regulatory elements were also analyzed to account for the expression specificity and transcriptional regulation. This study will provide a better understanding of the expansion and diversification of the allotetraploid cotton PLD gene family and advance PLD functional research to enhance our ability to manipulate fiber and agronomic production of cotton.

Plant materials
Allotetraploid cotton (G. hirsutum cultivar 'CRI 35') was grown in the field under standard conditions at the Tsinghua University in China. When cotton plants were in full bloom (approximately 90 days after planting), we collected different cotton tissues, including roots, stems, leaves, petals, and stamens. Cotton fibers were harvested at 0, 5, 10, 15, 20 and 25 days post anthesis (dpa). All of these samples were immediately frozen in liquid nitrogen and then stored at -80°C until RNA extraction. Each sample was collected from 40 individual cotton plants.

Database search for cotton PLD genes
G. hirsutum genome data were downloaded from the CottonGen database (http://www. cottongen.org) [30], and G. raimondii genome data were obtained from the Phytozome v9.1 database (http://www.phytozome.net/) [32]. To identify the G. hirsutum and G. raimondii PLD genes, the PLD protein sequences from G. arboreum, Arabidopsis and rice were used as queries in BLASTP search [33]. Moreover, the HMMER search was performed in the annotation database using the HKD domain (PF00614) as a keyword. All of the candidate PLD genes were conformed for two HKD domains by Pfam (http://pfam.sanger.ac.uk/search) and InterproScan databases (http://www.ebi.ac.uk/interpro/search/sequence-search).

Sequence analysis methods
Multiple sequence alignment of PLD proteins was performed using Clustal W with standard settings [34]. The program MUSCLE was also used to perform multiple sequence alignments to confirm the ClustalW results [35].
Sequence identities of cotton PLDs at both the nucleotide and amino acid levels were calculated with the program DNASTAR Lasergene (http://www.dnastar.com/). The exon-intron structures of PLD genes were generated online by the Gene Structure Display Server (http:// gsds.cbi.pku.edu.cn/). The theoretical molecular weight (Mw) and the isoelectric point (pI) of the the deduced cotton PLD proteins were calculated by ExPASy (http://cn.expasy.org/tools). Subcellular localization was analyzed using the CELLO v2.5 server (http://cello.life.nctu.edu.tw/).

Phylogenetic tree construction
The program MEGA 6.0 was used to align the deduced amino acid sequences of cotton PLD genes to construct a phylogenetic tree [36]. A neighbor joining (NJ) consensus tree was constructed with the following parameters: P-distance, pairwise gap deletion, and bootstrap (1,000 replicates). Then, Maximum likelihood and Minimal Evolution methods of MEGA 6.0 were applied to validate the results from the NJ method. Meanwhile, Maximum Parsimony method of PHYLIP software [37] was also employed to create a new phylogenetic tree to validate the results from the NJ method. The tree topologies from different methods were very similar.

Chromosomal location and gene duplication
The detailed physical positions of PLD genes were obtained from the CottonGen database and the Phytozome v9.1 database. Chromosomal localization and collinear relationships of genes were visualized using the program Circos [38].
Paralogous PLD genes were chosen from the results of sequence identity calculations and the phylogenetic tree. Gene duplication events were indicated by shared aligned sequence covering >70% of the longer gene and similarity of the aligned regions of >70%. A tandem duplication event has been defined as paralogous genes that are physically close to each other on the chromosomes [39]. A segmental duplication event has been defined as paralogous genes that result from large-scale events such as whole genome duplication or duplications of large chromosomal regions [40].
The Ka (nonsynonymous substitution rate) and Ks (synonymous substitution rate) values of the paralogous genes were estimated by the KaKs_Calculator [41]. Mean Ks could be used as the proxy for time and the conserved flanking protein-coding genes were also used to estimate dates of the segmental duplication events [42]. Based on the λ of synonymous substitution 1.5×10 −8 substitutions/ synonymous site/year for cotton [43], the approximate age of duplicated events of the duplicate PLD gene pairs was estimated (Time = Ks/2λ). Moreover, the Ka/ Ks ratio was used to show the selection pressure for the duplicate PLD genes. A Ka/Ks ratio >1, <1 or = 1 indicates positive, negative (purifying selection) and neutral evolution, respectively [44].

RNA extraction and real-time quantitative RT-PCR
Total RNAs were extracted using the RNAprep pure plant kit according to the manufacturer's instructions (TIANGEN, Beijing, China) from different cotton tissues. The first strand cDNAs were synthesized using the Takara Reverse Transcription System (TaKaRa, Shuzo, Otsu, Japan).
Quantitative real-time PCR (qRT-PCR) reactions were performed in Mini Opticon Real-Time PCR System (Bio-Rad, Hercules, California, USA) using the SYBR Green Master Mix Reagent (TaKaRa, Shuzo, Otsu, Japan) under standardized thermal cycling conditions according to the manufacturer's protocol. Each PCR reaction (20 μL) contained 10 μL real-time PCR Mix, 0.5 μL of each gene-specific primer (S1 Table), and appropriately diluted cDNA. GhUBQ7 (Accession number: DQ116441) was used as an internal reference gene. Each PCR reaction was repeated three times independently. The specificity of the reactions was verified by melting curve analysis, and products were further confirmed by agarose gel electrophoresis. The comparative 2 -ΔΔCT method was used to calculate the relative expression levels [45]. Heat maps were generated with the MultiExperiment Viewer (MeV) [46].

Results
PLD genes in G. hirsutum and G. raimondii genomes The BLASTP and HMMER searches against G. hirsutum and G. raimondii genomes were performed to identify PLD genes. Then, the candidate PLD genes were confirmed through similarity searches against Pfam and InterproScan databases. Finally, a total of 40 G. hirsutum PLDs (GhPLDs) and 20 G. raimondii PLDs (GrPLDs) were identified (Table 1 and S2 Table). The properties of newly found cotton PLDs were analyzed by ExPASy and CELLO v2.5 servers. The ORF lengths of these genes ranged from 1,545 bp to 3,693 bp, which encoded polypeptides of 514 aa to 1,230 aa with predicted molecular weights ranging from 57.91 kD to 140.56 kD ( Table 1). The theoretical pIs ranged from 5.37 to 9.13. Most of the PLDs were predicted to be localized in the cytoplasm ( Table 1).
Among these cotton PLD subgroups, the PLDδ constituted the largest, containing 20 members. The second largest subgroups (α and z) both consisted of 16 members. The subgroup β/γ was a unique bi-type subgroup which was comprised of 11 PLDβs and four PLDγs. In the subgroups φ and ε, eight and four members were identified, respectively ( Fig 1A). Notablely, subgroups z and φ were far from the other four subgroups in evolutionary distance. Based on the functional domain prediction by searching the Pfam database, subgroups z and φ belonged to PX/PH-PLD and SP-PLD subfamilies, respectively, whereas the other subgroups fell into the subfamily C2-PLD. Overall, according to the phylogenetic relationships of cotton PLDs, it was speculated that these multiple subgroups might play specialized roles in the adaptive evolution of cotton.

Sequence characteristics of cotton PLD genes
To understand phylogenetic relationships, we calculated the sequence identities of pairwise cotton PLDs at both the nucleotide and amino acid level and also compared the exon-intron structures of individual PLDs. The genes that belong to the same subgroup had high identity to each other, especially for the ones with an orthologous relationship (S1 Fig). For different subgroups, the nearer evolutionary distance they were in, the higher sequence identities they had, such as the PLDβ/γ and PLDδ subgroups ( Fig 1A and S1 Fig). Significantly, four gene clusters (PLDβ1s-PLDβ2s-PLDβ3s, PLDδ1s-PLDδ2s, PLDδ3s-PLDδ4s, and PLDφ1s-PLDφ2s) with more than 90% identity existed in three subgroups, indicating that they might originate from gene duplication events (S1 Fig). The detailed illustration showed the distribution and position of introns within each of the cotton PLD genes (Fig 1B). A total of 680 introns were found in the cotton PLD genes, with an average intron number of 8.6 per gene and an average intron length of 258.9 bp (S3 Table). The members of the individual subgroups shared similar intron numbers, consistent with the phylogenetic classification of the cotton PLDs (Fig 1 and S3  Table). For example, both subgroups PLDβ/γ and PLDδ included members with approximately nine introns.
To reveal the typical domain characteristics of cotton PLD subgroups, comparative analyses for the conservation of amino acid residues in functional domains were performed on the basis of the alignments of these PLDs. The HKD1 domains were relatively more diverse than the HKD2 domains (Fig 2A and 2B). Both of the HKD1 and HKD2 domains contained three highly conserved amino acids (6H, 8K and 13D), which might have a key functional importance within these two domains (Fig 2A and 2B). The members of the individual subgroups possessed some marker amino acids in the HKD1 and HKD2 domains, for instance, "TMFT"  (Fig 2A and 2B). Similarly, the members of individual cotton PLD subfamilies also contained some specific amino acids, for example, "HHQK" in C2-PLDs, "HHEK" in PX/PH-PLDs, and "VHAK" in SP-PLDs (Fig 2A). For C2 domain sequences, five amino acids (3Y, 28W, 32F, 38H and 59G) are highly conserved, while some other amino acids were extremely specific for their own PLD subgroup, such as 4A, 7D, 12R, 35Y, 46T, 53I, 60R, 62Y and 67D for PLDα (Fig 2C).

Gene duplication events in GhPLD and GrPLD gene families
After mapping all cotton PLD genes to their corresponding chromosomes, we found that they were unevenly distributed on the chromosomes of three cotton species (Fig 3 and S2 Fig). For the GhPLD gene family, 39 out of 40 PLD genes were assigned to 19 of the 26 G. hirsutum chromosomes (none were assigned to A03, A04, A09, A13, D04, D09 or D13), while one gene GhPLDz1D was assigned to an as of yet unmapped scaffold (Fig 3). For the GrPLD gene family, all 20 PLD genes were assigned to 10 of the 13 G. raimondii chromosomes (none were assigned to Gr6, Gr12 or Gr13) (S2 Fig). We further assessed the contribution of gene duplication to the expansion of these two cotton PLD gene families. In G. hirsutum, eight segmental duplications (GhPLDβ1A/β2A, GhPLDδ1A/δ2A, GhPLDδ3A/δ4A, GhPLDφ1A/φ2A, GhPLDβ1D/β2D, GhPLDδ1D/δ2D, GhPLDδ3D/δ4D and GhPLDφ1D/φ2D) and two tandem duplications (GhPLDβ1A/β3A and GhPLDβ1D/β3D) occurred from 17.88 to 28.01 mya (Fig 3 and S4  Table). In G. raimondii, four segmental duplication (GrPLDβ1/β2, GrPLDδ1/δ2, GrPLDδ3/δ4 and GrPLDφ1/φ2) and one tandem duplication (GrPLDβ1/β3) jointly took place during the time of 18.06~21.93 mya (Fig 3 and S4 Table). All identified gene duplication events in the three cotton PLD families happened before the split of two diploid cottons. Thus, we speculate that an unidentified tandem duplication event also happened previously in G. arboreum, followed by a specific gene loss event of GaPLDβ3.
To investigate which type of selection pressure had been involved in the divergence after gene duplication events, we calculated the Ka/Ks ratios (synonymous substitutions to non-synonymous substitutions) for the duplicated cotton PLD gene pairs on the basis of coding sequences. The resulting pairwise comparison data showed that all of the paralogous genes had Ka/Ks ratios of < 1, suggesting that cotton duplicated PLD genes had experienced strong purifying selection pressure. Strong functional constraints had a bearing on the evolution of these gene families, reflecting the essential roles of PLDs in cotton.

Spatial expression profiles of GhPLD genes
To better reveal the potential functions of PLDs in allotetraploid cotton, the expression profiles of GhPLDs were investigated by both quantitative RT-PCR and RNA-seq data analysis. Because of the extremely high similarity between the mRNAs of the GhPLDxA-GhPLDxD gene pairs (such as GhPLDα1A and GhPLDα1D) and their nearly identical transcript sizes, we could not distinguish them using quantitative RT-PCR. Therefore, we regarded GhPLDxA-GhPLDxD as one combination named GhPLDx and investigated its expression level by quantitative RT-PCR, and then distinguished the portion of GhPLDα1A and that of GhPLDα1D by analyzing RNA-seq data, which included both gene expression levels and the information of gene location. Alternatively, the expression profiles of GhPLDs obtained from qRT-PCR and semiquantitative PCR, were confirmed by RNA-seq data analysis. We found that GhPLD genes have rather broad expression patterns across a variety of cotton tissues under normal growth conditions including roots, stems, leaves, petals, stamens and fibers. The gene combination from the PLDα subgroup, GhPLDα1, had the most preferential expression levels in roots, stems, leaves and fibers and maintained high levels of expression in petals and stamens (Fig 4A). GhPLDδ2, with the highest expression abundance only in stamens, belongs to the largest subgroup (PLDδ). Another gene combination from this subgroup, GhPLDδ1, showed the second highest expression level in almost all cotton tissues, and GhPLDβ3, which belongs to the PLDβ/γ subgroup, expressed the third highest levels (Fig 4A). Similarly, GhPLDz2 and GhPLDφ1, which were from the PLDz and PLDφ subgroups, respectively, were expressed at moderate levels, while the only gene combination of the remaining subgroup, GhPLDε, showed low expression values in all tissues (Fig 4A). Other low expression genes included GhPLDα2, GhPLDβ1, GhPLDγ, GhPLDδ3, GhPLDδ4, GhPLDz1, GhPLDz3, GhPLDz4 and GhPLDφ2. However, GhPLDα3, GhPLDα4 and GhPLDβ2 had only weak expression or even undetected expression (Fig 4A). Additionally, the duplicated GhPLD gene pairs displayed variant expression patterns. For instance, GhPLDβ2 had almost undetected expression in all tissues, while GhPLDβ1 maintained obvious expression levels in leaves and stamens. Examining when and where a gene is expressed in the plant tissues/organs can often lead to clues about gene function. Therefore, these diverse expression patterns of GhPLDs allude to functional diversification of this gene family in allotetraploid cotton.
To gain deeper insights into the expression patterns, we analyzed a RNA-seq dataset that encompassed results from six studied cotton tissues. In these surveyed gene combinations, we successfully distinguished the contribution of GhPLDxAs and that of GhPLDxDs. Most GhPLDxAs were more preferentially expressed than GhPLDxDs (Fig 4B). For instance, GhPLDα1A was expressed more preferentially than GhPLDα1D in all studied cotton tissues. However, the opposite correlation was observed for GhPLDφ1D, which maintained higher expression levels than GhPLDφ1A. Overall, the results of RNA-seq expression data were in close agreement with that of quantitative RT-PCR (Fig 4B).

Expression patterns of GhPLD genes in developing fibers
It is worth noting that the GhPLDs that were expressed in fibers, which is the most valuable economic tissue in cotton, were GhPLDα1A/D, GhPLDβ3A/D, GhPLDδ1A/D, GhPLDz2A/D and GhPLDφ1A/D. To survey the expression profiles of these fiber expressed PLD genes, quantitative RT-PCR and RNA-seq data analysis were performed at the representative stages of fiber development (0~25 DPA). Strikingly, among these selected genes, GhPLDα1 had high expression in elongating fibers. Specifically, expression increased from 0 to 20 DPA, peaking at 20 DPA, and then decreased substantially (Fig 5A). Furthermore, GhPLDα1A made more of a contribution than GhPLDα1D did at every studied stages of fiber elongation (Fig 5B). These results implied that GhPLDα1A and GhPLDα1D might jointly play an important role in fiber development, most likely in the initiation stage of secondary cell wall thickening in fiber. In addition, GhPLDδ1A and GhPLDδ1D may also have an essential role in fiber development around the time of 25 DPA ( Fig  5). However, GhPLDβ3A, GhPLDβ3D, GhPLDz2A, GhPLDz2D, GhPLDφ1A and GhPLDφ1D had relatively low or even undetectable expression levels in elongating fibers (Fig 5).

Cis-regulatory elements in the promoters of GhPLD genes
To obtain more insights into the expression patterns and putative functions, the cis-regulatory elements were scanned in the promoter regions of GhPLDs. According to the expression levels, we divided 40 GhPLDs into four groups, including high, middle, low and weak expressed gene groups. Putative promoter regions in the 1500 bp sequences upstream of the translational start site were used to identify putative cis-regulatory elements. Moreover, the PLD genes of two diploid cottons were also sent to analyze the distributions and positions of the cis-regulatory elements. In addition to two core cis-elements (TATA box and CAAT box), one high transcription related element, six phytohormone response elements and nine stress response elements were found (S6 Table and Fig 6).
Most of the gene combinations (GhPLDxA-GhPLDxD) shared the similar cis-regulatory element organization, except for GhPLDz2A-GhPLDz2D, GhPLDβ2A-GhPLDβ2D and GhPLDδ3A-GhPLDδ3D (Fig 6). It was also shown that most of the identified cis-elements were found to be preserved in three studied cotton species (Fig 6). The conservation of cis-regulatory elements reflected the importance of meaningful transcriptional regulation of cotton PLDs. Remarkably, the element Py-rich stretch (F) existed in the 5`UTR regions of most of the highly expressed GhPLDs. The most preferentially expressed GhPLDα1A and GhPLDα1D had even two copies, suggesting that this regulatory element might direct the high-level expression ( Fig 6A). Moreover, in all of the studied cotton tissues, GhPLDα1A and GhPLDφ1D expressed more preferentially than GhPLDα1D and GhPLDφ1A, respectively. For these two gene pairs, the differences in the promoters were the GA response elements P-box and GARE, which might be the key factors of expression level (Figs 4 and 6).

Expansion of the allotetraploid cotton PLD gene family
In this study, the analyses of phylogeny, sequence characteristics and gene duplication were integrated to estimate an expansion pattern of allotetraploid cotton PLDs (S3 Fig). Our results have illustrated that the expansion was primarily due to a recent cotton specific large-scale genome duplication event, with tandem duplication and segmental duplication jointly taking place at some locations. These allotetraploid cotton PLDs were divided into six different subgroups with distinct sequence characteristics. Interestingly, the positions of introns in each subgroup were well conserved, indicating that the members of the same subgroup might have a common ancestor (Fig 1). PLDβ/γs and PLDδs had the closest evolutionary relationship and a similar distribution of exon-intron organization and could form a larger clade, implying that Comparative Analysis of PLD Gene Families in Cotton they might originate from the common ancestor I (Fig 1 and S3 Fig). PLDαs and PLDεs possessed 2 or 3 introns, suggesting that they originated from the common ancestor II (Fig 1 and S3  Fig). In addition, of these PLDs, PLDzs and PLDφs belonged to the PX/PH-PLD and SP-PLD subfamilies, respectively, and had dissimilar intron numbers, suggesting the convergent evolution via two independent evolutionary paths (Fig 1 and S3 Fig). Despite specific features of the functional domains near the N-terminus, the sequence identities of the C-terminus were relatively conserved, especially near the two characteristic HKD domains [47] (Fig 2). Thus, we reasonably speculated that all plant PLDs might have evolved from one original ancestor followed by some unknown changes that took place near the N-terminus, dividing the plant PLD gene family into three distinct PLD subfamilies: C2-PLD, PX/PH-PLD and SP-PLD (S3 Fig). In the long history of evolution, selection pressure after gene duplication events shaped gene families, resulting in distinct evolutionary patterns among different gene families and even different subgroups in one gene family [48]. The members of the subgroup PLDφ were believed to be more conserved than those of the subgroups PLDβ/γ and PLDδ, as they had higher degrees of sequence identity and more similar exon-intron structures than PLDβ/γs and PLDδs (Fig 1 and S1 Fig). This was consistent with our analysis on the rate of molecular evolution, in which the mean values of Ka/Ks ratios in the subgroup PLDφ were smaller than those in the PLDβ/γ and PLDδ subgroups (S5 Table).

Functional diversification of allotetraploid cotton PLD genes
Increasing evidence suggests the PLDs are involved in many plant processes including plant development, responses to biotic and abiotic stresses. Cis-regulatory elements in gene promoter regions have essential roles in determining tissue-specific and stress-responsive expression patterns of genes [49], which might help to elucidate transcriptional regulation and potential functions of GhPLDs. Phytohormones are important regulators of plant growth and development. Cotton fiber development is known to be regulated by gibberellic acid (GA) [50], jasmonic acid (JA) [51], abscisic acid (ABA) [52], ethylene [53] and auxin [54]. Among the 10 GhPLD genes expressed in elongating allotetraploid cotton fibers, 5, 4, 4, 2 and 1 genes were found to have cisregulatory elements (P-box, GARE, TGACG-motif, ABRE, ERE and TGA-element) specific for GA, JA, ABA, ethylene and auxin responses in their respective promoters (S6 Table and Fig 6). Three preferentially expressed genes in fiber, including GhPLDα1A, GhPLDδ1A and GhPLDδ1D, all had more than three types of hormone response elements, strongly suggesting that the functions of these GhPLDs in elongating cotton fibers were synergistically regulated by different phytohormones. Other tissue-specific GhPLDs, such as GhPLDγA, GhPLDγD, GhPLDδ4A and GhPLDδ4D, possessed 2~4 types of hormone response elements, indicating that multiple phytohormones also regulated the transcription of GhPLDs in other cotton tissues.
Stress response elements related to diverse environmental stimuli were found in the promoter regions of allotetraploid cotton PLD genes (S6 Table and Fig 6). For the vast majority of GhPLDs, except for the preferentially expressed genes, the distribution of elements for stress responsiveness was more extensive, explaining why most of the GhPLDs exhibited the relatively low or weak expression levels under normal growth conditions (Figs 4 and 6). Adverse environmental stimulus, such as drought, defense, and heat shock, might trigger high levels of transcription of most low or weak expressed GhPLDs, as a result of that a number of the specific elements (MBS, TC-rich and HSE) were located in the promoters of these genes (Fig 6C and  6D). Therefore, we hypothesize that GhPLDs might also have important functions in the adaptation to adverse environmental stimuli. Overall, complex and diverse transcriptional regulation significantly broadened our view and understanding of the functional diversification of allotetraploid cotton PLDs.
To date, the biological and cellular functions of GhPLD genes remain largely unknown. The current investigation demonstrates some of GhPLD genes that might be involved in cotton development and stress response, and provides important clues for the selection of candidate genes, especially GhPLDα1A/D and GhPLDδ1A/D for the further studies.

Conclusions
In conclusion, a total of 40 and 20 PLD genes were identified in G. hirsutum and G. raimondii, respectively. Our comparative analyses provided valuable insight into the understanding of phylogenetic relationships, sequence characteristics, molecular evolution of PLD genes in allotetraploid cotton and its two diploid progenitors. Moreover, we also characterized the broad spatial and fiber developmental expression profiles of allotetraploid cotton PLDs and expanded the view of transcriptional regulation of these genes. Unveiling the roles of PLD genes in cotton growth, development and stress adaptation processes may facilitate advances in crop variety development and utilization.  Table. The Ka/Ks ratios for duplicate PLD genes in G. arboreum, G. raimondii, and G. hirsutum. (XLSX) S6 Table. Putative cis-element sequences in promoter regions of cotton PLD genes. Ã N-A, C, G or T; K-G or T; M-A or C; R-A or G; W-A or T; Y-C or T (XLSX)