Insights into soybean transcriptome reconfiguration under hypoxic stress: Functional, regulatory, structural, and compositional characterization

Soybean (Glycine max) is one of the major crops worldwide and flooding stress affects the production and expansion of cultivated areas. Oxygen is essential for mitochondrial aerobic respiration to supply the energy demand of plant cells. Because oxygen diffusion in water is 10,000 times lower than in air, partial (hypoxic) or total (anoxic) oxygen deficiency is important component of flooding. Even when oxygen is externally available, oxygen deficiency frequently occurs in bulky, dense or metabolically active tissues such as phloem, meristems, seeds, and fruits. In this study, we analyzed conserved and divergent root transcriptional responses between flood-tolerant Embrapa 45 and flood-sensitive BR 4 soybean cultivars under hypoxic stress conditions with RNA-seq. To understand how soybean genes evolve and respond to hypoxia, stable and differentially expressed genes were characterized structurally and compositionally comparing its mechanistic relationship. Between cultivars, Embrapa 45 showed less up- and more down-regulated genes, and stronger induction of phosphoglucomutase (Glyma05g34790), unknown protein related to N-terminal protein myristoylation (Glyma06g03430), protein suppressor of phyA-105 (Glyma06g37080), and fibrillin (Glyma10g32620). RNA-seq and qRT-PCR analysis of non-symbiotic hemoglobin (Glyma11g12980) indicated divergence in gene structure between cultivars. Transcriptional changes for genes in amino acids and derivative metabolic process suggest involvement of amino acids metabolism in tRNA modifications, translation accuracy/efficiency, and endoplasmic reticulum stress in both cultivars under hypoxia. Gene groups differed in promoter TATA box, ABREs (ABA-responsive elements), and CRT/DREs (C-repeat/dehydration-responsive elements) frequency. Gene groups also differed in structure, composition, and codon usage, indicating biological significances. Additional data suggests that cis-acting ABRE elements can mediate gene expression independent of ABA in soybean roots under hypoxia.

Introduction experiment was carried out in randomized blocks, with four replicates (75 plants per plot, at a density of 200,000 plants/ha). The seed emergence occurred six days after sowing. The first waterlogging occurred 18 days after sowing due to heavy rain, lasting five days. We observed mild symptoms of yellowing leaves. On February 15th, 2012, we waterlogged the soil beds, maintaining water level 2 cm above the soil surface for 10 days. The plants developed typical reaction to waterlogging. Harvest was done on May 23th, 2012. All seeds were collected and corrected to 13% moisture content. Data met assumptions of the analysis of variance (ANOVA). Thus, means were compared by the Tukey test 5%.
Plant material for RNA-seq and qRT-PCR analysis Previously described [17], using a hydroponic system under greenhouse conditions, the experiment was set in a randomized block design composed of twelve treatments: two cultivars (Embrapa 45 and BR 4), two oxygen conditions [fully aerobic state (normoxy) and hypoxic], and three treatment sampling times (0.5h, 4h, and 28h). Each treatment has three biological replicates (four plantlets per replicate in order to reduce biological variation). At each time point, root tissues were collected and immediately frozen in liquid nitrogen before being stored at -80˚C. We compared stressed and unstressed samples at the same time point to remove putative additive effects, such as gene-intrinsic effects (e.g., circadian rhythm [18]), differences in developmental stages among individuals, or any unknown variation between the time points [17].
Total RNA was extracted using Trizol reagent (Invitrogen) and treated with DNase I (Invitrogen) according to manufacturer instructions. RNA concentration and purity were measured using a spectrophotometer (NanoDrop, ND-1000), and the integrity of the molecules was analyzed on 1% agarose gels stained with ethidium bromide.

Transcriptome library construction, deep sequencing, and mapping of reads
For each of twelve treatments, equimolar quantities of purified total RNA from roots of twelve plants were pooled to result one library. Then, the twelve libraries were sent to Fasteris SA (Plan-les-Ouates, Switzerland) for processing and sequencing.
The RNA quality and integrity was checked using an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA), of which only samples with a RIN ! 8.0 were used. The twelve libraries were processed (poly-A purification, fractionation, cDNA synthesis using random primers, and ligation to bar-coded adapters), fragments of 150-250 bp were isolated and multiplexed, resulting one sequencing library. The sequencing library is a pool of equimolar quantities from twelve initial libraries, each library with a specific barcode for further bioinformatic discrimination. Single end reads were generated by the Illumina HiSeq 2000 (read length 1 × 100 base; one lane of the flow-cell; Illumina, Inc. San Diego, CA). The raw data, deposited in the ArrayExpress database (http://www.ebi.ac.uk/ arrayexpress) under accession number E-MTAB-5709, was uploaded to the GeneSifter platform (Geospiza, Seattle, WA) for alignment with the soybean genome (assembly Glyma 1.1) [19]. The mapping of reads and transcripts analysis was done as described previously [18].

Transcriptomic analysis
For each time point (0.5, 4, and 28h), the expression ratio (fold-change, fc) of genes was performed by dividing transcript abundance values (in RPM = Reads per Mapped Million) from plants under hypoxic and normoxic conditions. The statistical significance of DEGs were obtained by using Bioconductor package edgeR [22], corrected by Benjamini and Hochberg method [23]. We only considered as DEGs those showing fold-change ! 2 (up), -2 (down), adj. p-value 0.01, and with more than 20 mapped reads (RPM ! 9) in at least one of the two compared libraries.
Gene set enrichment analysis was performed using Singular Enrichment Analysis (SEA) provided by agriGO (

qRT-PCR analysis
For quantitative real-time PCR (qRT-PCR) analysis, the synthesis of cDNA, design of primers, and expression analysis of genes used to verify reliability of RNA-seq expression data were done as described previously [17] (Table D in S1 Supporting Information).

Results and discussion
In the present study, we analyzed root RNA-seq data from flood-tolerant Embrapa 45 and flood-sensitive BR 4 soybean cultivars that showed contrasting grain yields when cultivated in waterlogged soil (Fig 1). In order to evaluate pairwise RNA-seq data, the relative expression of six common hypoxia-responsive genes [Trehalose-6-Phosphate Synthase (Glyma17g07530), Ascorbate Peroxidase (Glyma12g07780), Sucrose Synthase (Glyma13g17420), Alternative Oxidase (Glyma04g14800), non-symbiotic Hemoglobin (nsHB; Glyma11g12980), and Nitrate Reductase (Glyma06g11430)] were determined by qRT-PCR. The non-log-transformed qRT-PCR and RNA-seq expression data were consistent for all these genes (Fig 2) showing a strong positive Pearson correlation (r = 0.95; P < 0.001), indicating reliability in our transcriptome analysis.

Transcriptome reconfiguration
Induction of signaling genes and down-regulation of genes related with energy-consuming processes under hypoxia. From 54,174 predicted protein-coding genes in the soybean genome (assembly Glyma 1.1) [19], 2,656 up-regulated (URGs) and 4,970 down-regulated genes (DRGs) were found in both cultivars under hypoxic stress. Of these total, after 0.5, 4, and 28h of root hypoxia, 1,144, 5,687, and 3,761 genes were differentially expressed, of which 89, 28, and 46% were URGs, respectively (Fig 3). In Arabidopsis thaliana, another flood-sensitive species, more URGs were also observed in roots after 0.5h of hypoxic stress, and URGs were more prevalent deeper into stress conditions [27]. In contrast to our findings in soybean data, where DRGs number decreased after 28h, DRGs remained after 168h of waterlogging in roots of flood-tolerant gray poplar (Populus × canescens) [28]. Even so, Embrapa 45 showed  fewer URGs and more DRGs than BR 4, of which the greatest difference in DEGs between cultivars was after 28h where Embrapa 45 (4,216 DRGs) had more DRGs than BR 4 (2,582 DRGs) (Fig 3).
Gene function was determined by identifying Gene Ontology (GO) categories for DEGs and SGs. The most enrichment for GO categories was found after 0.5h in URGs from both cultivars and after 4-28h in DRGs mainly from Embrapa 45 (Fig 3). Noteworthy GO categories in URGs were gene expression regulation (GO:0010468), more specifically for transcription (GO:0006350) and protein modification (GO:0006464) involving transcription factors (GO:0003700) and kinases (GO:0016301) activities. Overrepresented in DRGs were energydemanding processes, including transport (GO:0005215) and biosynthesis (GO:0009058), as well as translation (GO:0006412), most of which encode ribosomal proteins (GO:0005198; structural molecule activity). Further, transcription factors (GO:0003700), kinases (GO:0016301) and transporters (GO:0005215) were enriched in DEGs, while more general functions such as binding (GO:0005488) in SGs. Our results show that hypoxia induces controlling/signaling genes and suppresses genes related with energy-consuming processes in soybean. Therefore, both induction and repression of genes may be important for flooding tolerance. Hypoxic soybean roots experience changes in amino acids and amino acid-related metabolism. After 4h of hypoxia, URGs and DRGs were enriched for reorganization of cellular amino acid and derivative metabolic processes (GO:0006519) (Fig 3). Alterations in amino acid metabolism have been previously observed in hypoxic roots of soybean [29], Lotus japonicus [30], and gray poplar [28], including high accumulation of alanine and GABA (Gamma-Amino Butyric Acid) as well as reduction of aspartate level. In agreement, we observed up-regulation of alanine aminotransferase (Glyma07g05130) and glutamate decarboxylase (Gly-ma08g09670) at all-time points in both cultivars (S2 File).
While expression of aspartate kinase, aspartate semialdehyde dehydrogenase and homocysteine S-methyltransferase increases under hypoxia, we observed repression of polyamines and phenylpropanoids biosynthesis-related genes, including upstream genes from shikimate pathway (Fig A in S1 Supporting Information, S2 File). Among DRGs was found S-adenosyl-Lmethionine (SAM) synthase (EC 2.5.1.6). SAM connects to ethylene, polyamines, and phenylpropanoid-derived lignin pathways (Fig A in S1 Supporting Information) as well as histone and nucleic acid methylation for gene expression regulation [32,33]. Studies involving exogenous application or endogenous production of polyamines via genetic manipulation have shown increased tolerance to a broad spectrum of abiotic stresses [34], opening opportunities for improvement of soybean flooding tolerance by genetic engineering approaches.
Same gene, different regulation between cultivars: Identification of candidate genes for flooding-tolerance. The phosphoglucomutase (Glyma05g34790) gene was up-regulated in Embrapa 45 and down-regulated in BR 4 soybean cultivar after 4h of hypoxic stress (Fig 4). Its up-regulation in the flood-tolerant cultivar is in accordance with a shift in sucrose catabolism from ATP-dependent invertase-hexokinase to energy-saving SuSy-UGPase pathway [5]. In both cultivars, SuSy (Glyma19g40041, Glyma09g08550, and Glyma15g20180) and UGPase (Glyma11g33160) genes were observed up-regulated, while genes down-regulation were invertase (Glyma10g35890) and hexokinase (Glyma05g35890, Glyma07g12190, and Gly-ma17g37720) genes.
Higher induction of an unknown protein related to N-terminal protein myristoylation (Glyma06g03430), suppressor of phyA-105 (Glyma06g37080), and fibrillin (Glyma10g32620) were observed in flood-tolerant Embrapa 45. The A. thaliana ortholog of Glyma10g32620 (AT3G23400) is required for resistance to multiple stresses [35]. Moreover, differential expression of fibrillin genes correspond to plastoglobule number in leaves of contrasting soybean genotypes under drought and waterlogging stresses [36].
Although the nsHB gene (Glyma11g12980) exhibited similar expression ratio (in foldchange) between the two cultivars, the expression level (in RPM) in Embrapa 45 was lower under normoxic and hypoxic conditions (Fig 4). The last 284 nucleotides of the nsHB 3'UTR (3'Untranslated Region) are absent only in Embrapa 45 (Fig B in S1 Supporting Information). This explains the absence of qRT-PCR amplification in Embrapa 45 (Fig 2) and the similar transcriptional profile (qRT-PCR and RNA-seq) in both cultivars (Fig 2). Considering the important role of nsHB to protect plants under hypoxic stress [37], further study is required to understand the alternate 3'UTR structures influence transcription, transcript stability, and protein abundance.
Function, structure, and composition of gene groups: Comparing its mechanistic relationship with expression regulation To further understanding how soybean genes evolve and respond to hypoxia, the top 40 transcripts (all time points in both cultivars) and top 500 transcripts (at least one time point in both cultivars, except for S500 keeping all time points as criterion) stable and DEGs were structurally/compositionally characterized, comparing its mechanistic relationship with expression regulation.
Top gene groups have different TATA box, ABRE, and DRE motif usage in promoter sequences. Does ABRE mediate gene expression independent of ABA in hypoxic soybean roots?. Transcription of protein-coding genes in eukaryotes requires numerous protein factors to recognize specific DNA loci. The core promoter region, proximal to the transcription start-site (TSS), recruits general transcription factors involved in basal transcription [41] and cis-regulatory elements from extended promoter recruits DNA-bound transcription factors (activators or repressors) to fine-tune the transcriptional control [42].
The general regulator TATA-box binding protein (TBP) is required for transcription initiation by all three eukaryotic RNA polymerases [43]. TBP can bind to various DNA sequences but has higher affinity for the consensus TATA-box [44]. Based on previous work [45][46][47], we scanned for the core sequence TATA extending 4 bp in the 3' direction within the 50 bp region upstream of the predicted TSS (between -50 and -1). We found more DEGs (from 21% in U500 to 40% in D40) with the consensus TATA box sequence TATA(T/A)ATA than SGs (at most 6% in S500) (Fig 5, Table B in S1 Supporting Information). Our results are in agreement with hexamer sequences TATA(T/A)A over-represented in A. thaliana, Oryza sativa, and Glycine max genomes [48]. Similarly, Tirosh et al. [45] observed a correlation between consensus TATA-containing genes being differentially expressed and TATA less-containing genes stable expressed in yeast, metazoans, and plants.
Higher TBP turnover at consensus TATA-compared to TATA-less promoters is associated with specific coactivators [46,49]. Coactivators are multisubunit complexes represented by SAGA (Spt-Ada-Gcn5-Acetyltransferase), TFIID (transcription Factor II D), related with TBP binding on TATA and TATA-less promoters, respectively [49], and Mediator [50]. The latter is organized into head, middle, tail, and Cdk8 kinase modules to converge and transmit signals from sequence-specific transcription factors to RNA polymerase [51,52]. Here, mediator components Glyma13g16910 (head MED20a) and Glyma13g31480 (tail MED16) were at least  three times down-and nine times up-regulated after 4 and 28h of hypoxia, respectively, in both cultivars. The head module MED20a subunit participates in transcription regulation of miRNA and protein-coding genes involved with plant development, time flowering, and fruit size in A. thaliana [53]. In contrast, the tail module MED16 component regulates several biotic [54,55] and abiotic [56-58] stress responses in plants. Med16 is required for transcriptional activation of cold-and dehydration-inducible genes that have C-repeat/dehydration-responsive elements (CRT/DRE) promoter motifs controlled by CRT/DRE-binding transcription factors (CBF/DREB) [57,58].
Top gene groups differed in structure, composition, and codon usage. How may hypoxia alter translation?. Compared to SGs, DEGs are smaller, with shorter CDS (coding sequence) length and fewer introns (Fig 6, Table C in S1 Supporting Information). Similar results were observed in A. thaliana [75], yeasts, and mice [76] for genes responsive to other stresses. Shorter genes with fewer introns demand less energy [77] and can have faster expression dynamics [78]. Interestingly, SGs also seem to have improvement of energetic and time costs. We compared soybean SGs with cognate proteins [79], and analyzed A. thaliana data sets of immunopurified polysomal mRNAs [80] and translationally inactive mRNAs [81]. The results suggest that energy is saved from translation by down-regulation of cognate soybean proteins under hypoxia (Fig C in S1 Supporting Information). In this context, stable mRNAs from A. thaliana are sequestered into stress granules and poorly associated with translating ribosomes under hypoxia. Upon reoxygenation, they are rapidly released from stress granules forming new polyribosomes, minimizing the need for de novo transcription.
The higher intron number and number of splicing variants in soybean SGs (Fig 6, Tables B and C in S1 Supporting Information) are in agreement with the higher gene body (i.e., transcript region) methylation in SGs [75,82,83], involved in splicing regulation [84][85][86]. Here, whereas SGs exhibited steeper 5' to 3' negative G+C and CpG gradient, no decrement from start to middle region of DEGs were observed (Fig D and Table C in S1 Supporting Information). The low strength but diverged G+C and CpG patterns among gene groups can be influenced by gene structure, recombination, and DNA methylation [87]. Moreover, C3pG1 (C at third position of a codon binding G at first position of a neighbor codon) and G1 content were higher among di-and mono-nucleotides, respectively, and C3 was the mono-nucleotide with more diverged content among gene groups, mainly at CDS middle region.
The relative synonymous codon usage (RSCU) analysis for all 59 synonymous codons showed that C3 content divergences highlighted in 2-fold degenerate pyrimidine ending codons between CDS edges as well as among gene groups at CDS middle region (Fig 7). It is noteworthy that 2-fold degenerate codons ending in pyrimidines seem to be in majority [maybe exclusively, including in soybean (Fig E in S1 Supporting Information)] decoded by G34 tRNAs (G at position 34 of the tRNA, the first anticodon position) in all three domains of life (archaea, bacteria, and eukarya) [88,89]. This suggests strong positive selection to discriminate correct cognate C3 and wobble U3 codons from the incorrect near-cognate codons G3 and A3 (e.g., CAC/U histidine versus CAG/A glutamine). C3 over U3 2-fold degenerate ending codon bias also occurs at evolutionarily conserved amino acids sites across 12 fly drosophilid species and correspond to higher levels of G34-to-Q34 substitution [90]. This opens a question if Q34 tRNA influences the compositional divergence among soybean gene groups. Remarkably, queuine (q), the free base of Q (queuosine), is only synthesized by bacteria and salvaged by most eukaryotes [91]. Example is the legume model Medicago truncatula, in which rhizobium Q synthesis is required for effective nitrogen-fixing symbiosis [92]. In contrast, this is not observed in Brassicaceae, including A. thaliana, given the absence of genes encoding transglycosylases that catalyze q insertion in target G34U35N36 tRNAs (N = any base), found in Medicago [91] and soybean (Glyma04g10706, Glyma01g40041, Glyma13g10281, Gly-ma11g05250, Glyma06g10555, and Glyma08g48310). Besides Q, wybutosine (yW) is another hypermodified nucleoside derived from G, but yW is found exclusively at position 37 (neighboring anticodon sequence) of tRNAs that decode phenylalanine (codons UUU and UUC; Transcriptome reconfiguration under hypoxic stress biased here among soybean gene groups), important to reduce polyuridine translational frameshift errors [93].
Differential expression of aminoacyl-tRNA synthetases (aaRS) also occur in soybean leaf under drought stress, with more aaRS DEGs in wild-type than in transgenic lines overexpressing BiP chaperone [94]. The BiP (Binding Protein) major regulates the endoplasmic reticulum (ER) stress [95]. Here, many ER stress related genes [96] were differentially expressed (S5 File), including down-regulation of BiP (Glyma05g36600, Glyma05g36620, Glyma08g02940, and Glyma08g02960; from -2 to -10 fc after 4 and 28h in both cultivars under hypoxia). Among these aaRS differentially expressed, Glyma14g11711 (aspartyl-tRNA synthetase) was up-regulated at all-time points in the two cultivars (from 5 to 16 fc). Curiously, transgenic plants overexpressing an aspartyl tRNA synthase (AspRS) orthologue (At4g31180) improve tolerance to biotic stress [97]. The At4g31180 is target of a synthetic isomer of GABA, called BABA (β-Amino Butyric Acid) [97]. BABA primes plants to enhance tolerance against broad-spectrum of biotic and abiotic stresses [98]. Inhibition of AspRS activity by BABA accumulates uncharged tRNA Asp, which as others uncharged tRNAs is signaling molecule to attenuate translation by Gcn2-eIF2α system, and subsequently alleviate ER stress [99]. Based on this, further studies may help to elucidate involvement of amino acids metabolism in tRNA modifications, translation accuracy/efficiency, and ER stress under hypoxia. In addition, BiP and aaRS are candidates for biotechnology applications for improvement in flooding tolerance. Table A in S1 Supporting Information. Samples sizes used for structural and compositional analysis among groups of genes. Table B in S1 Supporting Information. Fisher's pairwise comparisons between gene groups. Table C in S1 Supporting Information. Mann-Whitney pairwise comparisons. Table D