Characterization of the Transcriptome of the Xerophyte Ammopiptanthus mongolicus Leaves under Drought Stress by 454 Pyrosequencing

Background Ammopiptanthus mongolicus (Maxim. Ex Kom.) Cheng f., an endangered ancient legume species, endemic to the Gobi desert in north-western China. As the only evergreen broadleaf shrub in this area, A. mongolicus plays an important role in the region’s ecological-environmental stability. Despite the strong potential of A. mongolicus in providing new insights on drought tolerance, sequence information on the species in public databases remains scarce. To both learn about the role of gene expression in drought stress tolerance in A. mongolicus and to expand genomic resources for the species, transcriptome sequencing of stress-treated A. mongolicus plants was performed. Results Using 454 pyrosequencing technology, 8,480 and 7,474 contigs were generated after de novo assembly of RNA sequences from leaves of untreated and drought-treated plants, respectively. After clustering using TGICL and CAP3 programs, a combined assembly of all reads produced a total of 11,357 putative unique transcripts (PUTs). Functional annotation and classification of the transcripts were conducted by aligning the 11,357 PUTs against the public protein databases and nucleotide database (Nt). Between control and drought-treated plants, 1,620 differentially expressed genes (DEGs) were identified, of which 1,106 were up-regulated and 514 were down-regulated. The differential expression of twenty candidate genes in metabolic pathways and transcription factors families related to stress-response were confirmed by quantitative real-time PCR. Representatives of several large gene families, such as WRKY and P5CS, were identified and verified in A. mongolicus for the first time. Conclusions The additional transcriptome resources, gene expression profiles, functional annotations, and candidate genes provide a more comprehensive understanding of the stress response pathways in xeric-adapted plant species such as A. mongolicus.

Introduction technology [28]. Also, Liu et al. [29] reported 5,282 ESTs from a cDNA library prepared from A. mongolicus plants under cold and drought stresses. Of the 1,594 putatively unique transcripts assembled from the 5,282 ESTs, 528 were specific to stress-response. Recently, Wu et al. [30] reported the transcriptome analysis of A. mongolicus using dehydration treatment on filter paper, identifying 2,028 DEGs in common across three time points (2, 8, 24 h).
Here, we describe the analysis of gene expression patterns in seedlings of the xerophyte A. mongolicus subjected to drought stress by extended water deprivation simulating natural drought conditions. Two cDNA libraries constructed from RNA of leaves from droughttreated seedlings and untreated seedlings were sequenced on a 454 pyrosequencing platform. Insights into the functions of expressed genes were obtained from COG annotations, GO classifications and KEGG metabolic pathway analysis. The putative functions of transcripts from leaves observed in this study represent a different set of genes from the previous reports from transcriptomes of root tissues and seedlings under drought and non-drought conditions. This report focuses on discovery of differentially expressed transcription factor genes and transcripts classified as 'response to stimulus', and their potential roles in regulating important stress-response pathways in A. mongolicus.

Results
Water potential of A. mongolicus subjected to drought stress After four weeks exposure to drought stress conditions, the water potential of control samples (CK) and drought-treated samples (DT) from A. mongolicus plants measured in triplicate using the PSYPRO water potential system were -0.936 ± 0.040 Mpa and -1.618 ± 0.082 Mpa, respectively. 454 pyrosequencing read and de novo assembly metrics RNA sequencing was performed on total RNA isolated from leaves of the CK and DT groups, using the Roche 454 pyrosequencing platform (GS-FLX Ti). A total of 261,419 and 272,339 cleaned (filtered and trimmed) reads were generated from control and drought treated samples, respectively. Most of the sequence reads were distributed between 300 bp and 500 bp, peaking in length between 400 and 450 bp ( Fig 1A). The minimum and maximum read lengths were 40 and 707 bp, respectively, with a median length of 387 bp and a mean length of 358 bp. The cleaned sequence data was deposited in the NCBI Sequence Read Archive (SRA, http://www. ncbi.nlm.nih.gov/sra) under accession number SRP026002. The cleaned reads from the two samples were assembled into PUTs separately, resulting in 8,480 contigs for CK and 7,474 contigs for DT, most of which ranged between 400 bp and 800 bp in length (Fig 1B). The average contig lengths were 526 bp (CK) and 579 bp (DT). When all of the sequences for the two groups were combined using TIGR Gene Indices clustering tools (TGICL) and the Contig Assembly Program (CAP3), 11,357 PUTs were obtained, with an average length of 618 bp ( Table 1). The high quality assembled PUTs were deposited in the NCBI Transcriptome Shotgun Assembly Sequence Database (http://www.ncbi.nlm.nih.gov/genbank/tsa) under accession number SUB932387.

Assessment of sequence read distribution
In general, similar distributions of depth (number of reads per transcript contig) and coverage (number of genes identified amongst the reads) were observed when the previously published root RNA sequence reads (Zhou et al. [28]) and the leaf RNA sequence reads obtained in this study were mapped to the 35,982 Glycine max reference unigenes (Figs 2 and 3). However, some interesting contrasts emerged.
A generally univariate distribution of depth was observed overall for the mapping of root and leaf mRNA sequence reads to the G. max unigenes (Fig 2A). However, in spite of the highly similar overall distribution of depth of reads mapped to the G. max unigenes, there was little similarity in the number of root and leaf mRNA reads which mapped to specific G. max unigenes (each dot represents in Fig 2B represents a different unigene). We observed that of the 6,190 total G. max unigenes to which the average depth of reads mapped was greater than 0.5x, only approximately 12.5% (771) of the unigenes were represented in common among the reads from both tissues (Fig 2C). The extent of gene coverage in the two EST data sets was also examined. The sequence reads from leaf tissue in this study mapped to 38.13% (13,720) of the G. max unigenes, while root tissue reads (from Zhou et al. [28]) mapped to 41.53% (14,943) of the G. max unigenes. In terms of the total number of A. mongolicus RNA sequence reads that mapped to the G. max unigenes, leaf-derived reads and root-derived were equally represented, at 48.4% and 49%, respectively. In contrast, 96.7% of the leaf-derived reads mapped to the A. mongolicus PUTs assembled in this study, while only 28.6% of root-derived reads mapped to the A. mongolicus PUTs.
Drastic differences were noted between the fraction of each reference unigene that the sequence reads from leaf and root tissues covered by mapping (Fig 3B). The fraction of unigene sequence coverage was simply calculated as the length of the combined consensus sequence of all reads mapped to a specific unigene, divided by the length of the respective unigene. Four distinct clusters of sequences emerged when the coverage by root vs. leaf sequence reads were compared for all cases in which at least 10% of the unigenes length was covered ( Fig 3B). The unigenes clearly fell into all four possible clusters of root vs. leaf read coverage: unigenes for which the length of coverage was high by both root and leaf reads, unigenes for which the length of coverage was high by root reads but low by leaf reads, unigenes for which the length of coverage was low by root reads but high by leaf reads, and cases in which unigene length of coverage was low for both root and leaf reads. The numbers of unigenes in each of the clusters differed greatly however. The factor most affecting the difference in total counts of unigenes in each of the four classes was the disproportionate number of unigenes for which the length of coverage was high in leaf tissue mRNA reads, in contrast to the more uniform distribution of coverage of unigenes with reads from root tissue. The total numbers of unigenes covered at a minimum of 10% of their length by root vs. leaf reads is shown in Fig 3C. Of the 35,982 soybean reference unigenes, 12,817 (35.6%) had coverage length greater than 10% by either root and/or leaf reads. Of the 12,817 mapped unigenes, 2,753 (20.7%) were covered to 10% or greater length by mRNA from both root and from leaf tissues, while 3,865 (22.4%) were covered 10% length only by mRNA root reads, and a disproportionately large 7,379 (57.6%) of the unigenes were only mapped to 10% length with leaf reads.

Functional annotation and classification of PUTs
The A. mongolicus PUTs were annotated by alignment to gene and protein sequences in the public databases, as described in Methods. Homologous sequences for 9,375 (82.55%) of the PUTs were identified in at least one of the protein databases, and 2,546 (22.41%) of the PUTs  4A). For the 1,291 PUTs that did not return any sequence matches, these may represent either non protein-coding sequences or genes encoding proteins with A. mongolicus-specific sequences (novel proteins). In addition, some of the A. mongolicus PUTs may have been too short to produce strong matches to sequences in the databases; given that 1,137 of the 1,291 un-matched PUTs (88.07%) were less than 500 bp in length.
To identify possible functions of the PUTs during cellular metabolic processes, we performed a KEGG pathway analysis. The analysis resulted in 5,257 PUTs mapping onto 127 KEGG pathways. Among those mapped, 5,194 (98.80%) PUTs were related to metabolism, 1,182 (22.48%) PUTs corresponded to genetic information processing, 283 (5.38%) PUTs mapped to environmental information processing, 305 (5.80%) PUTs were classified as cellular processes and 231 (4.39%) PUTs belonged to organism systems.
The Nr annotations showed that a total of 85.79% of the PUTs aligned with best BLAST scores to the three legume species with genome sequences-G. max (49.41%), C. arietinum (20.32%) and Medicago truncatula (16.06%) (Fig 5A). This conforms with the taxonomic classification of A. mongolicus in the Leguminosae. In addition, nearly one half of the PUTs had sequence alignment lengths to Nr database entries of over 600 bp (Fig 5B).

Analysis of differentially expressed genes (DEGs)
To identify differentially expressed genes in response to drought stress, and to compare expression levels, the RPKM method (Reads Per Kilobase per Million reads; [31]) was used. The Gene Expression software package in the CLCbio Genomics Workbench bioinformatics platform identified 1,620 PUTs as reliable DEGs (using a cutoff greater than 2 fold change and less than 0.01 of Kal's z-test FDR p-value). Among the DEGs, 1,106 were significantly up-regulated in expression, while 514 showed down-regulation in the drought-stress treated samples (Fig 6). It is also noteworthy that 114 of the up-regulated genes and 202 down-regulated genes followed the same expression trends previously reported for cold-treated A. mongolicus plants [11]. The Nr annotations for these up-regulated genes included 'CBL-interacting protein kinase 2-like', 'early light-induced protein', 'mitogen-activated protein kinase 19-like', 'WRKY transcription factor 23-like', and 'delta-1-pyrroline-5-carboxylate synthase-like isoform'. Annotations for the down-regulated genes included 'protein early responsive to dehydration', 'mitogen-activated protein kinase 1', 'oxygen-evolving enhancer protein 1', 'iron-superoxide dismutase', and 'auxin-binding protein ABP19a-like precursor' (S1 Table). A total of 1065 (65.74%) DEGs were successfully assigned to entries in the MIPS (Munich Information Center for Protein Sequence) Functional Catalogue database, of which 408 DEGs with strong functional annotations were classified into 19 main functional categories (S2 Fig).
GO annotations were used to classify the DEGs as to the biological processes, cellular components, and molecular functions represented (Fig 7). For biological processes, the five largest categories of up-regulated and down-regulated DEGs (respectively in parentheses) were: 'cellu-  Because this study was oriented towards understanding drought stress response in A. mongolicus, the genes identified as 'response to stimulus' in the third GO term levels were examined in more detail (Fig 7, small box). Among the 176 up-regulated genes and 96 down-regulated genes in the 'response to stimulus' category, most of the DEGs fell into the 'response to abiotic stimulus' category (92:53), followed by 'response to temperature stimulus' (43:35), 'response to cold' (31:32), 'response to osmotic stress' (25:9), 'response to salt stress' (18:7), 'response to heat' (17:4), and 'response to water deprivation' (16:4), respectively.

Transcription factor analysis
Transcription factors are widely involved in various biological processes, and play important roles in regulating genes expression in response to abiotic stresses. The domains of plant transcription factors were identified amongst the A. mongolicus PUTs, and classified according to gene families. A total of 144 PUTs were identified as encoding putative transcription factors in this study (Fig 8A). The largest transcription factor gene family observed was the HD-ZIP family (9.03%), followed by the C3H (8.33%), bHLH (7.64%), ERF (7.64%) and AP2 (7.64%) families. These five families accounted for a total of 40.28% of PUTs which were transcription factors. To demonstrate the dimensions of differential gene expression levels of transcription factors, a heat map of differential expression of transcription factors was constructed, for 23 transcription factors that were highly expressed under drought conditions, and 11 transcription factors that were down-regulated ( Fig 8B). For TFs of interest to stress response, genes in the MYB, CO-like, C3H, NF-YA, Trilhelix, BES1, WRKY, DBB, and TCP families were specifically induced under drought conditions, whereas two TF groups, BBR-BPC and HB-other, were down-regulated.

Quantitative real-time PCR
The transcriptome analysis results were validated by quantitative real-time PCR. A total of twenty candidate DEGs with specific annotations were selected, including genes encoding proteins of interest such as dehydrin, mitogen-activated protein kinase (MAPK), delta-1-pyrroline-5-carboxylate synthetase (P5CS), serine/threonine-protein kinase, temperature-induced lipocalin, RD22-likedehydration-responsive protein, salt-tolerance protein, transcription factor bZIP131, sodium/hydrogen exchanger, and auxin-binding protein ABP19a-like precursor. The qRT-PCR results were highly consistent with the expression profiles obtained by in silico digital expression analysis for these candidate genes ( Table 2).

Discussion
A. mongolicus is an ancient evergreen broadleaf shrub endemic to the semi-arid region of North-western China. A. mongolicus is able to survive in deserts where the average annual rainfall can be as low as 140 mm [32]. A. mongolicus' drought adaptabilities include a strong root system to ensure absorption of any available water, and leaf structure with trichomes and sunken stomata that are believed to constrain transpiration. A. mongolicus should thus be an ideal species for the discovery of genes and physiological adaptations related to drought tolerance. Using the Roche-454 next generation sequencing technology, we successfully identified 11,357 unique transcripts in A. mongolicus, most of which were annotated by homologies to genes in other legumes, such as G. max, C. arietinum, and M. truncatula. Moreover, 1,620 (14.26%) transcripts showed significant expression differences between control and droughtstressed plants, with 1,106 being up-regulated and 514 being down-regulated. The DEG results were validated by qRT-PCR assay of the expression levels of 20 drought-responsive genes, selected randomly from the numerous DEGs. The DEGs were classified as being involved in drought-stress response and signal transduction.

Sequencing strategy and data assessment
Zhou et al. [28] reported the analysis of the root transcriptome in A. mongolicus seedlings treated by irrigation with 20% PEG-6000 for 72 hours to induce water-stress. In contrast, we used a drought stress treatment as similar as possible to conditions experienced in nature, i.e. a long-term treatment (4 weeks) of plants in soil stressed to specific water potential values. Short vs. long term desiccation durations can produce different results in physiological and adaptive responses. Generally, plants respond to rapid dehydration by minimizing water loss or altering expression of certain genes to prevent dehydration damage. Under slowly developing water deficits, plants can survive against dehydration by optimizing their long term acclimation responses, including changes to shoot physiology and resource allocation [33]. As soils become dry, signals from the roots are transported via the xylem to leaves, resulting in reduced water loss and decreased leaf growth, followed by changes in abscisic acid (ABA), pH, cytokinins, ethylene precursors and other factors related to stress signaling processes [34][35]. Thus, while it is appropriate to study responses in root tissues under short-term water-stress conditions, above ground tissues are valuable in research on longer-term responses to drought stress. The treatment length was chosen from results of preliminary experiments in which physiological status and expression of candidate drought-response genes were assessed by qRT-PCR at 2, 3 and 4 weeks intervals. The 4 week period produced most consistent results, without affecting plant growth or phenotype (S3 Fig). Zhou et al. [28] assembled 29,056 unique transcripts (including both 15,713 transcript contigs and 13,883 unique unassembled singlet reads) from root RNA sequences after short term osmotic-stress, from which they selected 27 candidate genes from specific GO categories such as 'response to osmotic stress', 'response to oxidative stress', 'response to hormone stimulus', and 'response to light stimulus'. We considered it important to compare expression of a broader set of functional categories in the shoot and root transcriptomes. Thus we selected candidate genes from several general response to stimuli categories including 'response to osmotic stress', 'response to salt stress', 'response to temperature stimulus', 'response to heat', 'response to cold', 'response to water deprivation' and 'detection of abiotic stimulus'. In addition, we studied transcription factors to identify potential regulators of drought response. Only two of the 27 candidate genes identified by Zhou et al. (sdq_isotig00259 and sdq_isotig02931) overlapped with the 1,106 up-regulated and 514 down-regulated DEGs identified in this study.
Our PUT set of 11,357 contigs is similar in number to the 15,713 contigs assembled by Zhou et al. [28], given that singlets were not included in our PUT totals. We did not include singlets to avoid over-representation of genes represented as multiple small fragments among the contigs, which can bias differential gene expression results. Also individual reads can be more difficult to assign to specific members of gene families than contigs, complicating transcriptome analysis.
More PUTs may have been expected from the transcriptome assembly, however, given the large number of high quality sequence reads obtained (over 190M) and the fact that many genes are normally expressed in leaves. Strong abiotic stresses, such as drought, may transiently suppress transcription, however, as well as result in degradation of the existing transcriptome. This could result in a smaller, less complex transcriptome, if de novo transcription of stressresponse genes are then predominant, at least transiently. Perhaps differences in complexity of transcriptomes among abiotic stress studies may relate in part to stress levels, for example if shorter, less intense stress treatments have less effect on suppression of gene expression. Furthermore, at the time that the stress treatment commenced, the leaves that we sampled were already fully developed, at which time only house keeping genes may have been active. These factors are difficult to assess with our A. mongolicus data however, without a genome or complete gene set for A. mongolicus to serve as a reference.
The sequencing was conducted in this study with RNA isolated from leaves pooled from among the replicate plants within each treatment. Pooling of replicates was chosen to maximize the depth of sequencing possible within each treatment, and to capture general drought responses in A. mongolicus. However, the approach did not capture variation that may have existed among the A. mongolicus seedling genotypes, nor the response of other above ground tissues in the seedlings. Our analyses from this study can only infer general differences in gene expression levels in leaves of control vs. treated plants. Thus further investigation into geneticand tissue-based differences in gene expression under stress conditions is warranted for future studies.

Comparative genomics analysis
The MIPS functional catalogue assigned 25 DEGs as being involved in 'transcription', of which 18 were strongly up-regulated. Also, 26 DEGs were identified to be related to 'signal transduction', of which 18 were up-regulated and 8 down-regulated. Several of these DEGs have homologues in Arabidopsis, such as AtCIPK9 (CL736.contig1), AtCIPK11 (Contig7006_DT) and AtCIPK12 (CL1014.contig1), which are known to be stress related regulators of downstream target genes [36,37]. We also noticed that 584 DEGs were annotated as 'Classification not yet clear-cut' or 'Unclassified protein' when aligning to Arabidopsis database. However most of these genes have homologues in other species, including G. max, M. truncatula and C. arietinum. It is noteworthy that three significantly up regulated DEGs were annotated as genes associated with asparagine synthesis-AtASN1 (CL828.Contig1 and CL759.Contig1) and AtASN3 (Contig6402_DT). As is known, osmotic stresses lead to Asn accumulation and/or Asn-synthetase induction [38]. ASN is a small gene family encoding the synthetase of asparagine, which was reported to be induced during periods of carbon starvation in maize and rice [39]. Also, TaASN1 from wheat was reported to be up-regulated by osmotic stress and ABA treatment [40]. The enhancement of asparagine synthesis and accumulation suggest the nitrogen metabolism and transport may be induced to ameliorate the deleterious effects of longer-term drought stress in A. mongolicus. On the other hand, some genes not previously reported only in association with ABA-or salt-tolerance [41,42] but not in drought stress were identified in A. mongolicus, such as FAD6 (Fatty acid desaturase-6) and CPN20 (chaperonin 20). Further studies with these genes ae required to determine if they represent drought-specific mechanisms however.

A. mongolicus response to drought stress
Plants respond to drought stress in various physiological and morphological ways, which includes the accumulation of proline, expression of Aquaporins (AQPs), and biochemical alterations in the cell wall [43]. Among the DEGs identified in this study, 145 were classified in the GO category of 'response to abiotic stimulus' (Fig 7), including genes which encode enzymes for the synthesis of osmoregulatory substances, such as proline which is known to increase in response to osmotic stress [44]. It has been previously reported that delta 1-pyroline-5-carboxylate synthetase (P5CS) for proline synthesis is upregulated by drought stress [45]. Similarly, homologs of genes encoding P5CS (CL47.Contig1, CL449.Contig1) were observed to be highly expressed in A. mongolicus under drought stress in this study. In addition, genes encoding membrane transport proteins such as AQPs and plasma membrane ATPase (CL73.Contig1) were also significantly induced in drought-treated samples. Five DEGs related to AQPs were identified, among which three (Contig762_DT, Contig3396_DT, CL677.Contig1) showed significant up-regulation and two (CL679.Contig1, Contig3627_CK) were down-regulated between control and drought conditions. Aquaporins belong to a large protein family, with members regulated differentially during drought stress, which is consistent with important roles in water homeostasis [46]. Furthermore, numerous DEGs with high expression levels were observed which may be related to specific structural adaptations that enable A. mongolicus to survive severe drought in deserts, such as cell wall-associated hydrolases (CL1.Contig77, CL1.Contig411, CL1.Contig510, et al.). Cell wall-associated hydrolase genes were reported to function in altering cell wall extensibility and other cell wall modifications, such as the accumulation of compatible extra-cellular solutes [47,48].

Photosynthesis
As a complex metabolic process, photosynthesis is well studied and known to be sensitive to drought stress [49]. The result of our metabolic pathway analysis showed 29 DEGs involved in photosynthesis, including 13 subunits (PsaB, PsaD, PsaE, PsaF, PsaG, PsaH, PsaJ, PsaK, PsaL, PsaN, PsaO) in the photosystem I (PSI) complex. Most of the differentially expressed Psa subunit genes were down-regulated in response to drought stress, with the exception of PsaB, PsaD and PsaJ. There was also one DEG (Contig1238_DT) related to 'Ribulose bisphosphate carboxylase small chain', which was found to be up-regulated under drought conditions. Even though Rubisco expression has been previously observed to be down-regulated under osmotic stress in many plant species [50,51], studies with sugarcane and cucumber [52,53] reports upregulation of Rubisco, similar to our finding with A. mongolicus. Furthermore, change in expression of Rubisco protein has been identified in the halophyte Suaeda aegyptiaca under salt stress [54]. Overall, these results suggest that drought stress response in A. mongolicus may have similar mechanisms as salt stress response, in keeping with previous observations on similarities among responses to abiotic stresses in plants (1,12,35,53,63).
In addition, 10 DEGs in A. mongolicus identified in this study were associated with the KEGG term of 'Photosynthesis-antenna proteins', including genes encoding the Light-harvesting complexes (LHCs). It is noteworthy that all ten A. mongolicus DEGs related to LHCs were down-regulated under drought stress in this study. It has been reported that some members of the CAB superfamily (Lhca and Lhcb genes) are down-regulated by high light stress, and that overall about 70% of genes observed to be down-regulated in high-light were also down-regulated by drought [55]. Studies in A. thaliana indicated that photosynthetic response to drought stress is highly complex, involving alterations of a multitude of genes, resulting in the improvement of stress tolerance [33]. Therefore, A. mongolicus may have specific photosynthesis responses to drought related to being a xerophyte shrub with thick evergreen leaves.

Protein phosphorylation and dephosphorylation
The complex process of signal transduction is an important component in the response to drought stress. Protein phosphorylation and dephosphorylation play an important role in signal transduction cascades, involving protein kinases and phosphatases [56]. In this study, a total of four DEGs (two up-regulated and two down-regulated) were found to be involved in mitogen activated protein kinases (MAPK) cascades. Evidence from animal and yeast studies suggests that the MAPK superfamily performs critical roles in adaptive responses to osmotic stress [57]. Kiegerl et al. reported the isolation and characterization of alfalfa MAPK kinase, and that it was activated by salt stress [58]. Recently, a series of tobacco MAPK genes have been identified including six which were shown by qRT-PCR analysis to be induced by drought treatment [59]. One DEG (CL1037.Contig1, 7.755 fold) observed in this study encoded a serine/threonine-protein kinase TOR-like isoform 1.TOR kinase expression can be regulated by both sugar abundance and starvation, connected to many energy-consuming cellular outputs [60]. Therefore, TOR is believed to play important roles in plant developmental and metabolic processes [61]. Sucrose non-fermenting1-related protein kinase 2 (SnRK2) is known to be related to plant sugar signaling via phosphorylation [62]. Wheat SnRK2 members, like TaSnRK2.4, TaSnRK2.7 and TaSnRK2.8 were identified as being involved in abiotic stress responses [63]. Arabidopsis transformed with TaSnRK2.4 was reported to have significant improvement in drought, high salinity and cold stress resistances, as supported by physiological experiment data [64]. In this study, two DEGs (Contig2630_DT and Con-tig5982_DT) associated with SnRK2s were found to be significantly up regulated in A. mongolicus.

ABA signaling pathway
Considering the crucial role that ABA plays in abiotic stress responses, another focus of this study was transcription factors and genes related to the ABA signaling pathway. It has been reported that dehydration activates both ABA-dependent and ABA-independent stress signaling pathways [65]. In the ABA-dependent signaling pathway, the bZIP domain in basic leucine zipper family transcription factors has been shown to interact with a conserved ABA-responsive, cis-acting element known as ABRE (ABA-responsive element) in promoter regions [66]. In this study, two DEGs in A. mongolicus (Contig3471_DT and CL511.Contig1) homologous to G. max bZIP protein encoding genes were identified. Interestingly, the Contig3471_DT gene was observed to be up-regulated, while the CL511.Contig1 gene was down-regulated in response to drought. It is also known that transcription factors such as MYB and MYC may be activated, and interact with certain cis-elements, thus inducing the expression of drought responsive genes [67]. We observed two DEGs related to MYB transcription factors that were induced by drought stress. However, no MYC-related DEGs were identified in A. mongolicus in this study. In the alternate, ABA-independent signaling pathway, gene expression under drought stress is not related to ABA levels [68]. The dehydration-responsive element (DRE) is a 9-bp conserved sequence, TACCGACAT, that has been observed in the promoter region of drought-responsive genes [69]. Commonly, the transcription factors of DRE-binding proteins contain the AP2/EREBP domain, such as DREB1/CBF and DREB2 (DRE binding protein) [70,71]. In this study, a number of A. mongolicus PUTs were observed that match the AP2/DREBP domain, 11 of which showed significant differential expression under drought. The soybean GmDREB2 protein has been reported to promote the expression of downstream genes to enhance drought and high salinity tolerance in transgenic Arabidopsis [72]. The WRKY gene (Contig3894_DT) identified in our study, which was highly expressed under drought stress, had not been previously reported in A. mongolicus. Thus further functional characterization of this gene is warranted, similar to studies of overexpression of soybean GmWRKY genes in Arabidopsis in which plants were reported to be more tolerant to cold (GmWRKY21), salt (GmWRKY54 and GmWRKY21) or drought (GmWRKY54) stress than wild-type plants [73,74]. WRKY TFs were suggested to play a key role in ABA and drought-responsive signalling networks [75]. Even though WRKY23 was originally identified as a plant-parasitic nematodeinducible gene, current studies also indicated that PtWRKY23 is involved in redox homeostasis and cell wall-related metabolism [76]. The function of the WRKY23 gene in A. mongolicus under drought treatment remains to be determined through further research, including transgenic studies in Arabidopsis or a leguminous model such as M. truncatula.

Conclusion
In this report, we provided a comprehensive transcriptome analysis of A. mongolicus under drought stress, incorporating data and findings from previous reports as well as new experimental findings. All of the putatively unique transcripts were annotated, and drought-responsive DEGs were classified as to functional categories and metabolic pathways. Many candidate genes potentially involved in drought stress in A. mongolicus were identified.
The one month water deprivation regime applied in this study generated a relatively small decrease in the leaf water potential in the drought-tolerant A. mongolicus plants indicating that the plants did not experience the degree of physiological stress that might be expected in drought-sensitive plants under the same conditions. To create greater physiological stress levels, a longer and more severe water-deprivation regime would have been required, which may have elicited a stronger gene expression response in the A. mongolicus plants as well. However, in this study we were interested in learning how A. mongolicus plants tolerate water deprivation conditions that would damage most other species, and if there is a gene expression component associated with that, rather than determining what levels of water stress would produce the same physiological and gene expression responses in A. mongolicus as in drought-sensitive species exhibit under lower stress levels. All species, even desert plants, will be damaged or perish at some point after prolonged drought conditions. However response to extreme, damaging stress conditions may not reflect inherent tolerance mechanisms in desert plants. The fact that A. mongolicus plants in this study did demonstrate many of the typical changes in gene expression in response to water stress conditions, relative to controls, documented that the treated A. mongolicus plants did perceive the stress conditions, even though the plants were not yet showing many signs of physiological stress. Perhaps, in part, A. mongolicus adaptation to xeric conditions involves a more rapid gene expression response, or the ability to respond to lower levels of physiological stress, than in more drought-sensitive species. It would be interesting in future research to investigate if the upstream transcription factor DEGs which we uncovered in this study play a role in the priming of expression of defence-response gene expression pathway genes in A. mongolicus. Drought-tolerance mechanisms in A. mongolicus may also include structural and constitutive adaptations of equal or greater importance than temporal gene expression changes, which would also be worthy of additional study.
All data in this study have been publically archived for future research. We hope that this report will lead to more research on the functional identification of drought-responsive genes and mechanisms in A. mongolicus, to enrich our knowledge of the basis of stress tolerance in desert-adapted plants and facilitate the improvement of drought resistance in other plants.

Plant material
This study was approved by the National Engineering Laboratory for Tree Breeding. Seeds of A. mongolicus were identified and collected by Dr. Chao Liu (Beijing Forestry University) from the Sand Control Station of Wuhai (39.62°N, 106.76°E), in Inner Mongolia Autonomous Region, China. After sterilization using 70% ethanol for 30s and 10% sodium hypochlorite for 20min, the A. mongolicus seeds were planted in MS (Murashige-Skoog) medium. After two weeks of growth, the seedlings were transplanted to plastic pots (10 cm ×10 cm × 10 cm) containing peat/sand/vermiculite (7:2:1), in the nursery of Beijing Forestry University (BJFU) (40.0°N, 116.6°E). The transplanted seedlings were separated into two groups: control group (CK) and drought-treated group (DT). Controls were watered to field capacity every day. For the DT group, the water stress treatment was implemented by stopping watering on the first day and withholding water for the next two to four weeks, which simulated a natural drought process (S3 Fig). We performed the qRT-PCR with four drought related genes identified by our previous study [27] to show the best regimen for drought treatment. The leaf water potential (LWP) of the A. mongolicus seedlings was measured with the PSYPRO water potential system (WESCOR, INC, USA). Values of LWP were measured as MPa. Three biological replications were performed. Photographs were taken to document general plant condition, which did not decline noticeably, at 2, 3, or 4 weeks treatment (S3 Fig). Finally, leaves of both groups were collected at the same time point for RNA extraction.

RNA isolation
Leaves collected from each group of seedlings were pooled to form the control (CK) sample and four weeks drought (DT) sample for RNA isolation. Total RNA was isolated using a CTAB procedure [77]. Tissues of CK and DT samples were separately ground in liquid nitrogen and the powder dispersed in CTAB buffer. After two extractions using chloroform-isoamyl alcohol mix (v/v 24:1), the total RNA was precipitated with LiCl 2 . To ensure the quality and concentration, the precipitate was suspended in SSTE buffer (0.5%SDS, 10mM Tris-HCl, 1mM EDTA and 1M NaCl), and extracted twice again with chloroform [20]. Finally, the RNA was precipitated with 3 M sodium acetate and dissolved in 100 μl diethypyrocarbonate (DEPC)-treated water. For each sample, we conducted three extractions of total RNA. The total RNA yield and purity were measured using a NanoDrop 2000 (Thermo Fisher Scientific). The A260/A280 ratios for both samples ranged from 1.9 to 2.1. The quality and concentration of all RNA samples was examined with an Agilent 2100 Bioanalyzer (Agilent Technologies) which showed no sign of degradation. Finally, mRNA was purified from the total RNA using the Oligotex-dT30<Super> mRNA Purification Kit (Takara, China).

cDNA preparation and 454 pyrosequencing
First-strand cDNA synthesis was performed with 5 ug of mRNA for each sample, using the Just cDNA Double-Stranded cDNA Synthesis Kit (Agilent Technologies, USA, catalog #200453). Double stranded cDNA was synthesized following the manufacturer's recommendations. The cDNA was then dissolved in sterile distilled water, and the concentration and quality was determined using an Agilent 2100 Bioanalyzer. The cDNA libraries for sequencing were constructed according to the manufacturer's manual (Roche, 454), and one half plate of FLX sequencing was performed for each of the cDNA libraries. Both the cDNA library construction and 454 pyrosequencing were conducted in the Genomics Center at the Pennsylvania State University.

Sequence read processing and de novo assembly
The 454 raw sequencing reads were processed before assembly. Roche library adaptors, ribosomal RNA, low quality scored bases, and contaminants that may negatively affect clustering and assembly were removed. The high quality, filtered reads from each sample were then assembled using SeqManNgen sequence assembler v2.1 (DNASTAR, Inc.) software in cDNA mode with default parameters. De novo assembly was necessary due to lack of genome information on A. mongolicus. Finally, the contigs from the two libraries were combined to form a single set of non-redundant transcripts. The contigs were clustered based on pairwise sequence similarity, using TGICL and CAP3 programs [78]. This provided a set of non-redundant consensus sequences subsequently referred to Putative Unique Transcripts (PUTs), used for annotations and gene expression analyses. In addition, previously published RNA sequence data from A. mongolicus drought-treated and control root tissues was downloaded from the SRA database at NCBI (Zhou et al., accession number SRX142053) [28]) totalling over 671,000 reads. The reads were filtered to remove low quality score bases and adapter remnants.

Sequence read mapping
The cleaned sequence reads were mapped to a set of 35,982 G. max unigenes retrieved from the NCBI UniGene repository (ftp://ftp.ncbi.nih.gov/repository/UniGene/). Both the droughttreated and control root tissue sequence reads downloaded from NCBI (Zhou et al. [28]) and the drought-treated and control leaf tissue sequence reads from this study were mapped against the G. max unigenes. Read mapping was performed using CLCbio Genomics Workbench 7.5 with the affine gap scores and unrestrictive local alignment parameters set at the same value for the mapping of the root and leaf reads. Similarly, reads from root and leaf data sets were mapped against the A. mongolicus PUTs generated in this study. The resulting distributions of read depth and unigene coverage from mapping were plotted as R.Venn diagrams using an R script with the following settings: leafDepth<-subset(GmaxMerge, Average.coverage.y> 0.5); rootDepth<-subset(GmaxMerge, Average.coverage.x> 0.5); depth.venn<-venn.diagram(list (Leaf = leafDepth$shortName, Root = rootDepth$shortName), fontface = "bold", fill = "grey", alpha = 0.5, scaled = T, "DepthVenn.tiff"); leafCover<-subset(GmaxMerge, coverage.y> 0.1); rootCover<-subset(GmaxMerge, coverage.x> 0.1); cover.venn<-venn.diagram(list (Leaf = leafCover$shortName, Root = rootCover$shortName), fontface = "bold", fill = "grey", alpha = 0.5, scaled = T, "CoverageVenn.tiff"). For Figs 2C and 3C, read depth was calculated as the value of total reads divided mapped by the number of reads in the sample. Read coverage was calculated as the total basepairs of a unigene contig covered by at least one read divided by the length of that contig.

Functional annotation and classification
The 11,357 PUTs from A. mongolicus leaf sequences were annotated by alignment to the following protein databases: [Nr (NCBI non-redundant protein database), Swiss-Prot, KEGG (Kyoto Encyclopaedia of Genes and Genomes database), and COG (Clusters of Orthologous Groups of proteins)] using the BLASTX method (E-value < 1E-5). The PUTs were also annotated by alignment to the nucleotide sequence database Nt (NCBI non-redundant nucleotide sequence database) by BLASTN (E-value < 1E-5). The annotations with the highest sequence similarity (BLAST scores) were chosen for functional categorization of the PUTs. To gain a better understanding of the distribution of gene functions at the macro-level represented in the set of PUTs, we performed GO annotation using the Blast2GO program [79], and GO classifications using the WEGO software [80], using the best Nr-based annotations. From the KEGGbased alignments, pathway annotations were constructed, to provide a better understanding of the metabolic pathways involved in drought stress-response in A. mongolicus [81]. Also, the protein-coding regions of the PUTs were predicted and translated into amino acid sequences, based on the best BLAST alignments. For those PUTs that could not be aligned to any of the above databases, we used ESTScan [82] to predict possible coding regions and direction of translation.

Analysis of differentially expressed genes (DEGs)
Differentially expressed drought-responsive genes were identified and quantified using CLCbio Genomics Workbench software. The RPKM (reads per kb per million reads) method was used to calculate the fold-change of the expression levels of each PUT between control and treated samples [31]. For statistical analysis, Kal's Z-test was used to conduct pair-wise comparisons of gene expression proportions of specific tags in the sequence reads between individual samples. PUTs with the absolute log-ratio value of 2 and with p-value cutoff of less than 0.01 of FDR p-value correction were selected as reliable DEGs. DEGs were assigned to Arabidopsis gene models (TAIR ver. 10, http://www.arabidopsis.org/), using BLASTx (cutoff E-10). The gene ids were then entered into the "FunCat" Functional Catalogue Database at the Munich Information Center for Protein Sequence (MIPS) classification system (http://www.helmholtzmuenchen.de/en/ibis/resourcesservices/services/funcat-the-functional-catalogue/index.html).

Metabolic pathways and transcription factors analysis
DEGs were assigned to the primary cellular biochemical pathways and signal transduction pathways by comparison of the BLAST results to the KEGG database. The predicted protein sequences were also aligned to the Plant Transcription Factor Database (http://plntfdb.bio.unipotsdam.de/v3.0/) by BLASTx, and the PUTs classified according to TF gene families. Candidate genes associated with drought stress were selected from among the KEGG and TF family assignments for validation.

Quantitative real-time PCR analysis
The expression levels of the selected candidate DEGs were confirmed by qRT-PCR analysis. The cDNA for the CK and DT samples was produced from approximately 1 μg of total RNA from each group using M-MLV Reverse Transcriptase (TIANGEN, China). The PCR reaction was performed on a STEP ONE PLUS Real-Time PCR System (Applied Biosystems, USA) following the manufacturer's instructions. The 20 μl reactions were composed of 10 μl 2×RealM-asterMix (TIANGEN, China), 1 μl forward primer, 1 μl reverse primers, 1μlcDNA template (diluted 100-fold with deionized water), and 7 μl sterile ddH 2 O. The PCR conditions were as follows: 94°C for 2 min, followed by 45 cycles of 94°C for 20 s, 60°C for 35 s and 68°C for 1 min. For all samples, three independent biological replicates were performed. Expression levels of the selected DEGs were calculated using the 2 -ΔΔCt method [83], with the 18S rRNA from A. mongolicus serving as the internal reference gene. The PCR primer information for this study is available in S2 Table. Supporting Information