Genome-wide identification of WRKY genes and their expression profiles under different abiotic stresses in Elaeis guineensis

African oil palm (Elaeis guineensis) is an important oil crop grown in tropical region and sensitive to low temperature along with high tolerance to salt and drought stresses. Since the WRKY transcription factor family plays central roles in the regulation of plant stress tolerance, 95 genes belonging to the WRKY family were identified and characterized in oil palm genome. Gene structure analysis showed that EgWRKY genes have considerable variation in intron number (0 to 12) and gene length (477bp to 89,167 bp). Duplicated genes identification indicated 32 EgWRKY genes originated from segmental duplication and two from tandem duplication. Based on transcriptome data, most EgWRKY genes showed tissue-specific expression patterns and their expression could be induced under cold stress. Furthermore, six EgWRKY genes with more than two-folded increased expression level under cold stress were validated by RT-qPCR, which has higher expression level in cold, drought and high salinity treatment. The identification and characterization of WRKY gene family showed that EgWRKY were associated with a wide range of abiotic stress responses in Elaeis guineensis and some EgWRKY members with high expression levels could be selected for further research in analyzing their functions in the stress response in African oil palm.


Introduction
African oil palm (2n = 32, Elaeis guineensis), belonging to the family Arecaceae, is a major tropical oil crop worldwide, which has the highest oil production per unit area. Presently, this tropical crop is cultivated in Southeast Asia, Africa, Central America and Brazil. In 2012, the total yield of African oil palm was approximately 50 million tons of palm oil from 17 million hectares of plantation (http://faostat3.fao.org/home/E). The whole-genome sequence of Elaeis guineensis was completed and released in2013 and a total of 1.53 Gb of sequence data was assembled with 34,802 genes annotated [1]. As a tropical oil crop, Elaeis guineensis is a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 particularly sensitive to low temperature but has high tolerance to salt and drought stresses. Identifying and validating genes related with stress response processes in African oil palm especially for cold responsive process will assist the further molecular breeding of stress tolerance cultivars.
WRKY genes have been identified and functionally analysed in various species. A large proportion of WRKY genes were found to be involved in various biotic [2][3][4][5][6][7] and abiotic stresses [8][9][10][11][12][13][14]. The expression of WRKY gene can also be induced by abscisic acid [15], gibberellins (GA) [16], salicylic acid [17], jasmonic acid [18], and ethylene treatment [19]. In Arabidopsis, the expression of AtWRKY30can be induced by abiotic stress and methyl viologen (MV), H 2 O 2 , arsenic, drought, NaCl, and mannitol treatment. Over-expression ofAtWRKY30can significantly increase the tolerance of Arabidopsis to MV and salinity stress [20]. Meanwhile, some WRKYs in rice were also found to be associated with abiotic stress, including OsWRKY11 [11], OsWRKY30 [14], OsWRKY45 [21], and OsWRKY47 [22]. Mutation of WRKY46 can decrease the tolerance to osmotic and salt stress in Arabidopsis [23]. The expression levels of WRKY25 and WRKY26 can be up-regulated by heat stress [24]. Moreover, in Arabidopsis, WRKY53 was validated in negative regulation of leaf senescence [25]. Some WRKY proteins were also shown to be phosphorylated by MAPKs and to contain a conserved motif for this phosphorylation [14]. Identification and characterization of WRKY genes in African oil palm could show a clue for genes involved in its stress responsive processes.
The WRKY transcription factors generally contain one or two conserved WRKY domains and a zinc finger motif and could be divided into three large clusters based on the variations in WRKY and zinc finger motifs [26]. Members of cluster I contain two conserved WRKY motifs with a C 2 H 2 motif. ClustersII and III are characterized by a single conserved WRKY motif. The functional amino acid sequence of a WRKY motif is WRKYGQK and existed different variations, including WRKYGKK, WRKYGEK, WRKYGRK, WKKYGQK, WKRYGQK, and WSKYEQK. Meanwhile, there are two types of zinc finger motifs in WRKY transcription factors: the C 2 H 2 motif (C-X 4-5 -C-X 22-23 -H-X 1 -H) and the C 2 HC motif (C-X 5-7 -C-X 2-3 -H-X 1 -C) [26].
To date, many WRKY genes in different species have been identified, including 74 WRKY genes in Arabidopsis [27], 103 in rice [28], 45 in barley [29], 55 in cucumber [30], 119 in maize [31], 182 in soybean [32], 109 in cotton [33], and 85 in cassava [34]. However, there is still no systematic identification and characterization of WRKYs conducted in African oil palm to find a clue for candidate WRKY genes involved in the stress responsive processes. In this study, genome-wide scanning for WRKY genes in African oil palm were conducted based on its genome sequence and the gene structures, phylogenetic relationships, expression profiles for all WRKY genes were also been performed. This research will provide insight into the functionality of EgWRKYs in this tropical oil crop.

Plant materials and stress treatments
The seedlings of African oil palm (pisifera, thin-shelled) were grown in nurseries before stress treatments. Thirty-six oil palm plants germinated in the same week and grown in the same nursery were selected for subsequent cold, drought and salt treatment. The control seedlings for the three kinds of treatment were grown under 16 hours light/8 hours dark photoperiod at 26˚C and three biological replicates were set in each stress treatment. For the cold treatment, a set of 12 seedling was placed in a growth chamber at 26˚C for one day. Three oil palm plants were used as controls and the rest nine oil palm plants were kept at 8˚C for 4, 24, and 48 hours under 16 hours light/8 hours dark photoperiod. Spear leaves were sampled from control and cold-treated seedlings and immediately frozen in liquid nitrogen. For the drought treatment, when soil water content reached 23% (2.3 g water per 10 g soil), spear leaves were sampled after 0, 4, 24 and 48 hours and immediately frozen in liquid nitrogen. For the salt treatment, the roots of oil palm seedlings were soaked in a NaCl solution (400 mmol/L)under 16 hours light/8 hours dark photoperiod at 26˚C, and spear leaves were sampled 0, 4, 24, and 48 hours after salt treatment and immediately frozen in liquid nitrogen for RNA isolation. The MRIP method was used to extract RNA from these leaves [35].

Identification and phylogenetic analysis of EgWRKYs
The whole-genome sequence of Elaeis guineensis was downloaded from the National Center for Biotechnology Information (NCBI). Sequences of AtWRKY genes were downloaded from the Arabidopsis Information Resource (TAIR), available at http://www.arabidopsis.org. To identify EgWRKYs in Elaeis guineensis, AtWRKY sequences were used as queries to blast against all gene models in African oil palm, using a cutoff E-value of 1e -10 . Multiple protein sequences alignment was used to confirm the conserved domains of EgWRKY sequences. Meanwhile, the conserved motifs were also predicted by alignment with CDD (http://www. ncbi.nlm.nih.gov/cdd) and PFAM databases (http://pfam.sanger.ac.uk). The MEME program (http://meme-suite.org) was used to identify the conserved motifs of EgWRKYs. MEGA 5.0 software was used to construct a neighbour-joining phylogenetic tree based on amino acid sequences of conserved WRKY domain with 1000 bootstrap replicates [36,37].

Characterization of gene structures and gene duplication events of EgWRKY
The gene annotation summaries, CDSs, and mRNA sequences of Elaeis guineensis was downloaded from NCBI. Gene structures of EgWRKYs in Elaeis guineensis were confirmed by aligning mRNA sequences with the whole-genome sequence of Elaeis guineensis. The gene structures were identified by the Gene Structure Display Server programme. The duplication events of all EgWRKY identified in this study were analysed using an algorithm that can scan multiple genomes or subgenomes to identify putative homologous chromosomal regions and align these regions using genes as anchors [38].

The expression profiles analysis of EgWRKYs based on transcriptome datasets
The raw data of nine transcriptomes from different tissues, including mesocarp (five different developmental stages), leaf, fruit, flower, root, and shoot, were downloaded from the SRA (Sequence Read Archive) database of the NCBI website. RPKM (reads per kb per million reads)values were used to calculate gene expression levels using the following formula [39]: where C is the number of reads that aligned exclusively with one expressed sequence; N is the total number of reads that aligned with all expressed sequences; and L is the number of bases in the CDS of the corresponding sequence.

Real-time qPCR assays
Quantitative real-time PCR was carried out using a standard SYBR Premix Ex Taq™ kit (TaKaRa) protocol in 96-well optical plates (Axygen) with a final reaction volume of 10 μl. The real-time qPCRs were incubated in 0.2-ml tubes in a Mastercycler ep realplex4 (Eppendorf) machine as follows: 95˚C for 5 seconds, 55˚C for 15 seconds and 68˚C for 20 seconds. The procedure ended with a melt curve ramping from 60 to 95˚C for 20 minutes to verify the PCR specificity. All qPCRs were carried out in biological and technical triplicate. The final Ct values were the means of nine measurements. The expression levels of the selected EgWRKYs were normalized to those of ELF, which was previously found to be a stable reference gene under abiotic stress [40]. One-way ANOVA was used to test significant difference (LSD, p< 0.05) of expression level between different time periods after cold, drought and salt treatment. All expression data was analyzed using SPSS software.

Identification of EgWRKY transcription factors in the genome of Elaeis guineensis
The identification of EgWRKY genes in African oil palm were using the amino acid sequences of 72 AtWRKY genes were used as query sequences with a cutoff E-value < 10 -5 via BLAST analysis. The blast results showed that 97orthologous genes of AtWRKY were found in homemade database containing all protein coding genes from African oil palm and these genes could be classified into the EgWRKY gene family. The conserved WRKY domain sequences analysis in these EgWRKY genes excluded two genes from further analysis due to a lack of the conserved WRKY domain. These remaining EgWRKYs were designated from EgWRKY01 to EgWRKY95 according to their genomic location in African oil palm. Detailed information about the 95 EgWRKY genes was listed in Table 1 (S1 and S2 Text).

Chromosomal locations, gene structures, and gene duplication of EgWRKY
The distribution of EgWRKY in African oil palm was uneven: 83EgWRKY genes were unevenly distributed among13 chromosomes; while no EgWRKY genes were detected in chromo-somes12, 13 and 16 (Fig 1). Chromosome 1 harboured 14EgWRKY genes, the largest number of any chromosome. Moreover, 12EgWRKYgenes were located on undetermined chromosomes. Analysis of EgWRKY gene structures indicated a notably variation exist in these genes (Fig 2). The intron numbers of EgWRKY genes identified in this study varied from 0 to 12, with an average of 2.99 introns per EgWRKY (S1 Table). The largest fraction of EgWRKYs (42,44.21%) contained two introns, followed by EgWRKYs containing three introns (20, 21.05%). Large gene size variation were detected in EgWRKY which varied from 477 bp (EgWRKY08) to 89,167 bp (EgWRKY71), with an average size of 5992 bp. The large variations in structure in the members of the EgWRKY gene family suggest that the genome of Elaeis guineensis has undergone significant divergence during a long evolutionary history.
The segmental duplication in Elaeis guineensis defined this species may have originated from a palaeotetrapolyploid. In this study, we investigated the segmental duplication event shappened in blocks containing EgWRKYs genes. The gene distribution in oil palm homologous blocks showed that 32 EgWRKYs were produced by segmental duplication, and two EgWRKYs were produced by tandem duplication. According to whole-genome duplication analysis of Elaeis guineensis, four large synteny blocks were detected in chromosomes 1, 3, 6, and 7. The 32 EgWRKYs produced by segmental duplication were also found to be located in the four large synteny blocks indicated by Singh et al [1] (Fig 3). Nine pairs of EgWRKYs in

Classification of EgWRKYs according to conserved WRKY domains
Of the 95 EgWRKYs identified, the majority(67, 70.52%)contained only one conserved WRKY domain, 19 (20%) contained two conserved WRKY domains, and the remaining 9 (9.48%)contained one conserved C 2 H 2 -type zinc finger domain and one conserved WRKY domain (Fig 4). Phylogenetic analyses were performed to evaluate the relationship between the identified EgWRKYs by the neighbour-joining analysis. The phylogenetic tree was constructed based on the WRKY motif, and 95 EgWRKY genes were classified into eight groups (Fig 5).

Expression levels of EgWRKYs in different tissues based on transcriptome data
To assess EgWRKY expression levels in different tissues of Elaeis guineensis, nine transcriptomes were downloaded from the SRA database of NCBI. These transcriptomes covered six oil palm tissues, including mesocarp (five different developmental stages), leaf, fruit, flower, root and shoot tissue. Almost every EgWRKY gene expressed (RPKM value > 0) in at least one transcriptome data set, except for EgWRKY08, EgWRKY09, EgWRKY37, EgWRKY38, EgWRKY40, EgWRKY46and EgWRKY85 (Fig 6). However, a large proportion of EgWRKYs     Meanwhile, we also investigated the expression patterns of the duplicated gene pairs in different tissues. As shown in Fig 7, the expression levels of most duplicated gene pairs showed different patterns across different tissues, except for WRKY24_Chr3/WRKY56_Chr7 in flower tissue andWRKY26_Chr3/WRKY59_Chr7 in flower and root tissue. These two duplicated gene pairs both showed high expression levels in flower and root tissue. These results suggested that subfunctional processes have conducted in these duplicate gene pairs of EgWRKY during a long evolutionary history.

Validation of expression of some EgWRKYs under cold, drought and salt stress
Based on trasncriptome data under cold treatment, 17 EgWRKY genes had increased expression levels at least two fold, six (EgWRKY06, 11, 25, 61, 72, and 88) of which were validated using real-time qRT-PCR. As was observed in the transcriptome data under cold stress, the expression of the six EgWRKYs could be induced under cold stress (Fig 8) which in accordance with the result from transcriptome data. Meanwhile, these six EgWRKYs could also be induced under drought and salt treatment. These data seem to indicate that these six EgWRKYs were associated with a wide range of abiotic stress responses in Elaeis guineensis.

Discussion
African oil palm (Elaeis guineensis, 2n = 32) is an important tropical oil crop and has high tolerance to drought and salt stress. Currently, African oil palm is mainly cultivated in the tropics with a minimal growth temperature of 15˚C and the cultivation area is limited. Low temperature causes cold damage such as yellowing and withering of young leaves and flowers in oil palm. To provide clues for molecular basis of oil palm stress response, the WRKY transcript factor family, which plays a critical role in the regulation of plant stress tolerance, was systematically analyzed in this study. A total of 95 EgWRKY genes were identified in African oil palm genome and characterized in four aspects-gene structure variation (477 bp in EgWRKY08 to 89,167 bp in EgWRKY71), phylogenetic relationship, gene expansion and expression profiles.
According to the amino acid sequences of the conserved domains, phylogenetic and evolutionary relationships among the 95 EgWRKYs were established. Previous studies had shown that WRKY transcription factors can be grouped into three clusters. Cluster I proteins contain two conserved WRKY motifs with a C 2 H 2 motif. Clusters 2 and 3 are characterized by a single conserved WRKY motif [26]. A few studies also showed that in some species, the WRKY family may include cluster IV, which contains an incomplete WRKY domain [41]. In this study, 95 EgWRKYs were classified into nine clusters. For the N-and C-terminal WRKY conserved domains, the phylogenetic analysis showed that they were classified into different classes: Nterminal conserved domains were grouped into cluster I, and C-terminal domains were grouped into clusters V and VI. WRKY genes with conserved WRKY and bZIP motifs were grouped into cluster VII. WRKYs with single conserved motif were mainly grouped into clusters II, III, IV, VIII, and IX. EgWRKY13, EgWRKY40, and EgWRKY74, which were found to contain incomplete conserved motifs, were grouped into clusters II, VIII, and IX, respectively.
Gene structure diversity was thought to be able to reflect the evolutionary relationships of gene families, providing additional information for phylogenetic classification [42]. In this study, the numbers of introns found in the identified EgWRKY genes varied from 0 to 12, with an average of 2.99 introns per EgWRKY. Some previous studies have shown the diversity of gene structure in various species. The number of introns of cassava varied from 1 to 5 [34]. In rice and rubber tree, the number of introns varied from 0 to 8 and 1 to 7, respectively [43,44]. These results indicated that oil palm had more gene structure diversity than did cassava, rice and rubber tree. Meanwhile, the largest fraction of EgWRKYs (42, 44.21%) contained two introns. A similar phenomenon was also detected in cassava (42 of 85 MeWRKY genes contained two introns), rice (42 of 92) and rubber tree (40 of 81). Cluster I members contained 1 to 12 introns; cluster II, 3 to 6; cluster III, 2 to 4; cluster IV, 0 to 3; cluster V, 1 to 12; cluster VI, 0 to 6; cluster VII, 2 to 3 with the exception ofEgWRKY71; and clustersVIIIandIX, 2 to 3. Previous research showed that intron loss occurs more rapidly than intron gain after segmental duplication in rice. Consequently, some EgWRKYs from clusters I and V may be original genes compared to other EgWRKYs in Elaeis guineensis.
Some previous studies showed that the expression of WRKY transcription factors can be induced by abiotic stress and play crucial roles in regulating plant responses to abiotic stress [8][9][10][11][12][13][14]. In Arabidopsis, WRKY34 was found to be involved in cold stress and is a negative regulator of cold response [45]. AtWRKY34 showed high similarity toEgWRKY18 andEgWRKY64. Based on transcriptomic data, expression of EgWRKY18 andEgWRKY64 can be induced under cold stress. In Arabidopsis, AtWRKY30expression can be induced by abiotic stress and MV, H 2 O 2 , arsenic, drought, NaCl, and mannitol treatment. Overexpressing AtWRKY30can significantly increase the tolerance of Arabidopsis to MV and salinity stress [14]. AtWRKY30 showed high similarity to EgWRKY07 and EgWRKY52. According to transcriptome data, expression of these two EgWRKYs was strongly induced, up-regulated 3.367-and 11.883-fold, respectively. The C-terminal domain of AtWRKY33 can interact with multiple VQ proteins to regulate multiple abiotic stresses [10]. AtWRKY33has high similaritytoEgWRKY34and EgWRKY42, and these two EgWRKYs were up-regulated 2.72-and 1.15-fold under cold treatment, respectively. Based on transcriptome data and qPCR results, the expression of almost all the EgWRKYs was induced under abiotic stresses. This indicated that EgWRKYs may play important roles in oil palm responses to abiotic stresses.