Involvement of an ABI-like protein and a Ca2+-ATPase in drought tolerance as revealed by transcript profiling of a sweetpotato somatic hybrid and its parents Ipomoea batatas (L.) Lam. and I. triloba L.

Previously, we obtained the sweetpotato somatic hybrid KT1 from a cross between sweetpotato (Ipomoea batatas (L.) Lam.) cv. Kokei No. 14 and its drought-tolerant wild relative I. triloba L. KT1 not only inherited the thick storage root characteristic of Kokei No. 14 but also the drought-tolerance trait of I. triloba L. The aim of this study was to explore the molecular mechanism of the drought tolerance of KT1. Four-week-old in vitro-grown plants of KT1, Kokei No. 14, and I. triloba L. were subjected to a simulated drought stress treatment (30% PEG6000) for 0, 6, 12 and 24 h. Total RNA was extracted from samples at each time point, and then used for transcriptome sequencing. The gene transcript profiles of KT1 and its parents were compared to identify differentially expressed genes, and drought-related modules were screened by a weighted gene co-expression network analysis. The functions of ABI-like protein and Ca2+-ATPase, two proteins screened from the cyan and light yellow modules, were analyzed in terms of their potential roles in drought tolerance in KT1 and its parents. These analyses of the drought responses of KT1 and its somatic donors at the transcriptional level provide new annotations for the molecular mechanism of drought tolerance in the somatic hybrid KT1 and its parents.


Introduction
Sweetpotato (Ipomoea batatas (L.) Lam.) is an important food, fodder, industrial raw material, and bio-energy resource crop, and plays important roles in food security and energy security worldwide, especially in China [1]. In the context of the increasing world population and climate change, the global water shortage has become one of the major challenges in agricultural production. Agricultural activities account for about 75% of global water consumption; therefore, the impact of drought stress on the productivity of field crops is an important issue. PLOS

RNA extraction and Illumina sequencing
Total RNA was extracted from whole plants using a Quick RNA Isolation Kit (Huayueyang Biotech Co., Ltd., Beijing, China) according to the manufacturer's instructions. Residual DNA was removed by RNase-free Dnase I (TaKaRa Biotech Co., Ltd., Dalian, China), and RNA integrity was checked by electrophoresis on a 1.2% agarose gel. The RNA concentration was quantified by an Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA). High-quality RNA samples were sent to the Biomarker Technologies Corporation (Beijing, China) for cDNA library construction and sequencing, which was performed using the Illumina HiSeq 2500 platform with 125-bp paired-end reads.

Data filtering and de novo assembly
High-quality clean reads were obtained after quality control by removing adaptor sequences, duplicated sequences, ambiguous reads ('N'), and low-quality reads (that had N bases of more than 10% and that bears more than 50% of bases that have a Q-value<10). Trinity (version: r20131110; http://trinityrnaseq.sourceforge.net/) was used for transcriptome de novo assembly after data normalization of the clean reads. Clean reads with a certain overlap length were initially combined to form long fragments without N (contigs). Related contigs were clustered using TGICL (2.1) software [20] to yield unigenes (without N) that could not be extended at either end. Redundancies were removed to acquire non-redundant unigenes. It should be noted that the unigene mentioned in this article is based on Trinity assembly, not equvalent to that based on EST sequences.

Screening of differentially expressed genes
The transcript levels of unigenes in KT1 and its parents under drought stress for 6, 12 and 24 h were compared with those in the control (0 h). Gene expression data were filtered by removing genes with low transcript levels in all 24 samples. The standard for screening was that the FPKM value in at least one group of two biological replicates was ! 5. A total of 22,412 unigenes remained after screening. The remaining genes were used to identify the differentially expressed genes (DEGs), which were then analyzed by a weighted gene co-expression network analysis (WGCNA). The EBSeq (1.6.0) package was used to obtain the "base mean" value to identify DEGs. The absolute value of log 2 (FPKM+1) ratio ! 1 and p value < 0.05 were set as the thresholds for a significant difference in gene expression between two biological repeats.

Weighted gene co-expression network analysis
The FPKM values of the 22,412 unigenes filtered from the above FPKM screening were added by 0.001 and then normalized by a Log 10 transformation (Lg (FPKM+0.001)). The WGCNA R software package (v1.41.1) was used to identify modules containing genes that were coexpressed and correlated with drought tolerance. Unsigned, weighted correlation network construction and module detection were performed using the automatic one-step function (for blockwise modules). The resulting Pearson correlation matrix was transformed into a matrix of connection strengths using a power of 10.
The resulting gene modules were assigned colors by R software and Module-Trait relationships were calculated by Pearson's correlation analyses. Each module was represented by module Eigengenes, which were calculated from the first principal component capturing the maximum amount of variation of the module. Then, the topological overlap was calculated to measure network interconnectedness.

Homologous cloning of candidate genes and evolutionary analysis
Two candidate genes screened from the WGCNA modules were further analyzed by Sanger sequencing. Primers were designed using Primer Premier 6 (http://www.premierbiosoft.com/) software, and homologous genes were cloned from K14, KT1 and Tri. Protein sequences of the cloned genes were compared with their homologs from other plant species to build a phylogenetic tree with MEGA 6.0 software using the neighbor-joining method [28].

Accession numbers
The accession numbers for phylogenetic analysis in this article from the GenBank/EMBL databases are as follows:

Summary of transcriptome sequencing data
After quality control, 677,222,430 clean reads were obtained. The average clean reads per sample were 28,217,601.25, and the lowest was 24,146,171. The average Q30 base percentage was 90.91% and the lowest was 88.22% (S1 Table). A "total gene number vs total read number plot" was made to assess the quality of assembly and the depth of sequencing, which showed that sequencing volume was already saturated (S3 Appendix). All clean data were uploaded in the

Functional annotation of unigenes
The BLAST parameter E-value was set to < 1e-5 and the HMMER parameter E-value was set to < 1e-10, resulting in 36,767 unigenes with annotated information. The statistical results of gene annotations are listed in S2 Table.

Gene expression analysis
The gene expression data were screened to remove unigenes with low transcript levels in all 24 samples (12 groups). Finally, a total of 22, 412 unigenes were retained and used to construct the heatmap. The heatmap clustering indicated that the gene expression profiles were similar in the two biological repeats in each group, confirming high consistency between the two biological replicates (Fig 1A). At 0 h, the gene expression profiles of K14, KT1 and Tri were similar. After drought stress (6 h, 12 h, 24 h), the expression profiles differed among the three lines, with the greatest differences between K14 and Tri ( Fig 1A). In KT1, the expression profiles of some unigene clusters were similar to those in K14, and the expression profiles of other unigene clusters were similar to those in Tri ( Fig 1A).

Drought-related modules obtained by WGCNA
The 22,412 unigenes filtered from the above FPKM screening were further analyzed by WGCNA. When constructing the modules, the β parameter was adjusted to 10, yielding a total of 30 modules (Figs 2 and 3, Table 3). Since Tri and KT1 are more drought tolerant than K14, we first screened the modules from Tri with correlation coefficients higher than 0.6. Seven modules were obtained after filtering (Table 4). We further screened the seven modules according to the criterion that the correlation coefficient of a module in KT1 had the same positive or negative property as the same module in Tri, but opposite to that of the same module in K14. In addition, the absolute value of correlation coefficients should not be small. On the  basis of these criteria, we chose the cyan and light yellow modules for candidate gene selection ( Table 4). The Hubgenes heatmap clustering (Fig 2C and 2D) and the Eigengene expression patterns (Fig 2E and 2F) showed that the expression levels of genes in the cyan module were relatively high in K14, but relatively low in KT1 and Tri. Thus, these drought-associated candidate genes screened from the cyan module might be involved in the negative regulation of drought tolerance. In the light yellow module, gene expression was relatively low in K14 and relatively high in KT1 and Tri. Therefore, the drought-associated candidate genes screened from the light yellow module may be involved in the positive regulation of drought tolerance. The candidate genes enriched in each module were then ranked according to the level of gene-gene connectivity. The top 20 genes in the cyan and light yellow modules are listed in Tables 5 and 6, respectively. The top 20 genes enriched in the cyan module included those encoding ABA signaling-related (c73612.graph_c0), flowering-related (c71165.graph_c0), disease-related (c60381.graph_c0), and light signal-responsive proteins (c94723.graph_c0, c95174.graph_c0), etc. (Table 5). Among them, c73612.graph_c0 encodes an ABI-like protein involved in reverse regulation of ABA signaling [29-33], and c71165.graph_c0 encodes EMBRYONIC FLOWER 2 (EMF2), which is homologous to the polycomb proteins PRC1 and PRC2. The PRC1 and PRC2 proteins are essential for long-term epigenetic chromatin silencing and for stem cell differentiation and early embryonic development [34,35].

Cloning of candidate genes and evolutionary analysis
Two candidate genes identified in the WGCNA, ABI-like protein (c73612.graph_c0) and Ca 2 + -ATPase (c91322.graph_c0), were further analyzed. The open reading frame (ORF) of these two genes were amplified and their sequences could be found in S2 Appendix. Both the ABIlike protein and Ca 2+ -ATPase belong to gene families and previous studies have demonstrated they were major players in drought tolerance [29-33, 36,37]. The high ranking of these two genes in the gene-gene connectivity analysis also indicated that they might play important roles in drought tolerance. Their evolutionary relationships with homologous genes from different plant species are shown in Fig 4, which showed that ABI-like genes of K14, KT1 and Tri had higher homology with AtABIL1. Moreover, they are more closely related to homologs from dicots rather than to homologs from monocots ( Fig 4A). In addition, Ca 2+ -ATPase genes of K14, KT1 and Tri were more homologous to Ca 2+ -ATPase 12 and 13 from Arabidopsis ( Fig  4B).

High-quality transcriptome data lay a foundation for further research
The lack of a reference genome for sweetpotato seriously restricts research on the molecular mechanisms of this species and its close relatives. Although previous studies have reported transcriptome sequencing of sweetpotato, the numbers of transcripts and unigenes were relatively small due to the small size of sequencing samples. In addition, the splicing length was relatively short [38,39]. In this study, RNA from 24 whole plants was sequenced, with highquality sequence splicing and many more genes obtained than in previous studies (Table 1). Finally, 322,803 transcripts and 105,959 unigenes were assembled (N50 of 1,965 and 1,403, respectively). Of the transcripts and unigenes, 148,263 (83,496 + 64,767) transcripts and 23,083 (13,328 + 9,755) unigenes were longer than 1 kb (Table 1). These sequencing results lay the foundation for further research on the molecular mechanisms of sweetpotato and its close relatives.

Cyan and light yellow modules are most related to drought tolerance
Several studies have demonstrated that WGCNA is a powerful tool to screen transcriptome sequencing and chip detection data for key candidate genes [18, 40,41]. Firstly, WGCNA provides various modules, which are groups of genes possibly related to phenotypic differences. Modules can then be further screened to find candidate genes. Reasonable selection combined with phenotypic trait data can identify the most closely related modules. Our previous work showed that Tri and KT1 (which inherited some genetic material from Tri) are strongly drought tolerant, whereas K14 is not [9,10]. Based on these phenotypic characteristics, we first chose the modules with a high correlation coefficient in Tri. There were seven modules with absolute values of ! 0.6. Secondly, we compared the correlation coefficients of these modules in K14 and KT1 with those in Tri. We expected that the correlation coefficients related to KT1 and Tri would be both positive or negative, and the correlation coefficient related to K14 should be the opposite. In addition, the absolute value of the correlation coefficients should not be small. The cyan and light yellow modules met these requirements (Table 4).

ABI-like protein is an important negative regulator responsible for drought tolerance in KT1 and Tri
The Hubgenes heatmap clustering and the Eigengenes expression patterns showed that, in the cyan module, the overall gene expression level was high in K14 but low in KT1 and Tri (Fig 2C  and 2E). This suggests that negative regulators associated with drought tolerance could be screened from this module. Genes encoding proteins related to ABA signaling (c73612. graph_c0), flowering (c71165.graph_c0), disease (c60381.graph_c0), and light signal responsiveness (c94723.graph_c0, c95174.graph_c0) were enriched in the top 20 candidate genes in this module (Table 5).
Several studies have shown that ABA signaling is related to drought tolerance [42][43][44][45]. In plants, ABA is an endogenous hormone that plays important roles in growth and development, and in the responses to abiotic stresses such as high salinity, drought, and low temperature. Under stress conditions, plants increase ABA content by regulating its biosynthesis and transport. The changes in ABA content induce stomatal closure, promote the accumulation of certain substances, and regulate the expression of stress-related genes [29][30][31][32][33]. The ABA content in plants is also associated with ABA insensitive (ABI) factors. In Arabidopsis, ABI1-5 and other PP2C proteins are accessory proteins that diminish ABA signaling by inhibiting downstream protein kinases. ABA binds to its receptor protein PYR/PYL/RCAR to inhibit PP2Cs protein activity, thereby activating ABA signaling [46][47][48][49][50][51][52]. Evolutionally, ABI-like proteins diverged from ABIs. However, Arabidopsis ABI1-like 1 (ABIL1) appears to mediate similar interactions as ABI1 that assembles DIS3/SCAR2 into a SCAR/WAVE complex [53][54]. Components of SCAR/WAVE complex are revealed to mediate epidermal cell morphogenesis [55], stomatal response [56] and water loss [57]. In the cyan module, c73612.graph_c0, encoding an ABI-like protein, was highly ranked in the gene-gene connection analysis ( Table 5). The c73612.graph_c0 transcript levels were low in Tri and KT1 but high in K14, consistent with the differences in their drought-tolerance phenotypes (S1 Fig). We speculate that the low expression level may contribute to drought tolerance in Tri and KT1 by releasing the inhibition of ABA signaling. Therefore, we suggest that the ABI-like protein acts as an important negative regulator in the drought tolerance of Tri and KT1.

Ca 2+ -ATPase is an important positive regulator of drought tolerance in KT1 and Tri
The Hubgenes heatmap clustering and the Eigengenes expression patterns showed that in the light yellow module, the overall gene expression level was low in K14 but high in KT1 and Tri (Fig 2D and 2F). This result indicated that positive regulators of drought tolerance could be screened from this module. Among the top 20 genes enriched in the light yellow module, CCCH-type zinc finger proteins are associated with mRNA destabilization [58]. PPR is a 35-amino acid motif and some proteins containing PPR locate in organelles such as mitochondria and express in guard cells and seeds [59,60]. Arabidopsis CRR4 containing PPR participates in RNA editing [61]. ROOT PRIMORDIUM DEFECTIVE 1 (RPD1) is a plant-specific gene that is required for the proliferation of active cells. It has been shown to play a role in generating adventitious roots in response to exogenous auxins in Arabidopsis hypocotyls. Arabidopsis rpd1 mutants are susceptible to temperature variations, and high temperature (28˚C) was shown to hinder root primordium development [62].
While Ca 2+ -ATPase is involved in transporting Ca 2+ , which is an important second messenger in plants and an important component of plants' responses to various stresses. The Ca 2+ signaling pathway plays an important role in responses to biotic and abiotic stresses including pathogen and pest attacks, salt, drought, and low or high temperatures [63]. An increase in the cytoplasmic Ca 2+ concentration is an important step in early ABA signaling [48]. Silencing of the Ca 2+ -binding protein SCaBP5 and its interacting protein PKS3 in Arabidopsis rendered seed germination, seedling growth, stomatal closure, and gene expression sensitive to ABA [64]. Ca 2+ -ATPases participate in Ca 2+ signal regulation. For example, the plasma membrane Ca 2+ -ATPase (PMCA) is a cytoplasmic membrane transporter that removes Ca 2+ from cells [65]. Therefore, proteins in the Ca 2+ -ATPase family play an important role in plants' environmental stress responses. Overexpression of GsACA1 in Glycine soja significantly improved its tolerance to carbonic acid and alkaline salt stresses [36,37].
In the light yellow module, the Ca 2+ -ATPase gene c91322.graph_c0 ranked highly (fifth) in the gene-gene connection analysis (Table 6). Its transcript abundance was highest in Tri, followed by KT1, but was very low in K14. In addition, the changes in the transcript abundance of c91322.graph_c0 during drought stress were very similar in Tri and KT1 (significant decrease at 6h, followed by a slow increase; S2 Fig). Based on the above analyses, we speculate that Ca 2+ -ATPase is an important positive regulator of drought tolerance in Tri and KT1.

Conclusions
The drought-stress responses of the sweetpotato somatic hybrid KT1 and its parents K14 and Tri were compared by transcriptome sequencing analyses. Gene transcript levels were analyzed and DEGs were screened. A total of 22,412 unigenes were analyzed by WGCNA to select negatively or positively regulatory modules, and key candidate genes were identified. Further research should focus on functional analyses of these candidate genes during the drought response.