Transcriptomic Analysis of Petunia hybrida in Response to Salt Stress Using High Throughput RNA Sequencing

Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl) disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN) http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.


Introduction
Abiotic stress is the negative effect on living organisms of nonliving factors such as high temperature, drought and salinity. Abiotic stress affects normal plant growth and development and severely reduces agricultural productivity. Abiotic stressors, especially salinity and drought, are the primary cause of crop loss worldwide, leading to 50% average yield reductions per year for major crops [1,2].
Due to the important role of the Solanaceae family in agronomic and ornamental crops, holistic-scale approaches have been used to examine salt tolerance in this family. Root proteomic profiling in four tomato (Solanum lycopersicum) accessions (Roma, Super Marmande, Cervil and Levovil) was conducted in response to short-term stress by exposing hydroponically grown plants to 100 mM NaCl [3], and a cDNA microarray was used on two cultivated tomato genotypes (LA2711 and ZS-5) growing hydroponically under 150 mM NaCl to study gene expression in early stages of development in tomato plants [4].
RNA-seq offers several advantages over existing technologies; it requires neither previous genome annotation nor pre-synthesized nucleotide as probes and it is not limited by Expressed Sequence Tag (EST) availability [5]. Transcriptome sequences can be reconstructed by de novo assembling millions of short DNA sequences (reads) [6] enabling downstream analysis such as novel gene discovery or expression profile analysis [7,8]. The assembly of DNA reads into a meaningful transcriptome can be performed with different de novo assemblers such as Trinity [9], Trans-ABySS [10], and SOAPdenovo-trans [11]. Thus, RNA-seq has become the method of choice to carry out transcriptomic analysis in both model and non-model organisms [12].
De novo transcriptomes have been successfully performed through the Illumina platform in a variety of non-model species, including Lupinus albus (lupin) [13], Cicer arietinum (chickpea) [14], Ipomoea batatas (sweetpotato) [15] and Medicago sativa (alfalfa) [16], to name a few. Zenoni et al. (2011) used 454 sequencing to generate de novo assembled transcriptomes separately for Petunia axillaris and Petunia inflata, parental species of Petunia hybrida, to develop microarray chips for transcriptomic analyses to study seed coat defects in a P. hybrida mutant [17]. Paired-end read sequencing libraries are widely used in transcriptomic studies to reduce the occurrence of de novo mis-assembled reads into artificial contig sequences and chimeras [18], and strand-specific libraries improves RNA-seq by accurately identifying antisense transcripts and boundaries of closely situated genes [19].
The objective of this study was to carry out the first, to our knowledge, whole-transcriptome expression profiles of transcripts through RNA-seq in any Solanaceae plant grown under salinity conditions. Utilizing our newly developed gene index and expression patterns, we identified new candidate genes whose expressions are highly induced as a response to NaCl. We hypothesized that plant response will parallel drought stress in the short term (6 h) and in the longer term (24 h) plant response will be directed to control ion uptake and eliminating toxic ion concentration in the cytoplasm. We hypothesize that short term responses should evidence the up-regulation of Heat Shock Proteins, stress hormones (ABA, ethylene) and signaling transduction components. In this work we also present the most in-depth Petunia hybrida reference transcriptome by paired-end sequencing cDNA libraries. The novel transcriptome, available at the SOL Genomics Network (SGN) http://solgenomics.net [20], can be used as an excellent tool for biological and bioinformatic inferences in the absence of an available Petunia genome.
Transcriptomic gene expression has shed light on novel salt stress mechanisms and differentially expressed genes related to salt stress previously undescribed. While the predominant focus of our work is on transcriptomic analyses for salt stress, a secondary objective was to test the utility of a cost saving modification for RNA-seq library construction with non-HPLC purified primers, which has the potential to greatly reduce the cost of library preparation for future RNA-seq-based-experiments.

Plant material and salt treatments
Petunia x hybrida cv. 'Mitchell Diploid' were germinated in a soilless substrate (Metromix 280, Sun Gro Horticulture LTD.,Vancouver, Canada) for 3 weeks. After seedlings were ca. 8 cm tall and well rooted, 60 seedlings were selected for uniformity. Roots were washed to remove substrate and seedlings were secured in rockwool around the stem base and placed into 4 L containers in solution culture (one plant per container). The nutrient solution used was a modified Hoagland's solution ( ) prepared in reverse osmosis filtered water. The solution was kept aerated by continuously bubbling air into each container using an aquarium pump to maintain oxygen saturation. After 1 week of establishment in the hydroponic systems, 20 containers were selected for uniformity and transferred to a growth chamber (200 mmol light 12 h/d, 22uC day/night and 45% relative humidity). The 20 plants were selected based on phenotype (similar size, number of branches, height, and absence of nutritional or biotic disorders), and developmental stage (first flower initiation). After one week of growth chamber acclimation, the two least representative plants for each treatment were discarded from the experiment. The remaining eighteen plants were randomly divided into two groups of nine containers. The control group received the Hoagland's solution with no added NaCl, the salt treatment group received Hoagland's solution amended with 150 mM NaCl. Containers were distributed randomly throughout the growth chamber.

Tissue sample and RNA isolation
To reduce plant-to-plant variability, we established three groups of three randomly selected plants within each treatment condition.
Tissue samples from the three plants per group were pooled together to create one biological replicate. At each time point, the most recently expanded leaf (the fourth or fifth leaf from the lateral meristem) from a lateral branch was selected. Plant leaves were sampled at 0, 6, and 24 h after salt treatment was applied. Therefore, for each time point six biological replicates were collected (3 from control and 3 from salt treatment) resulting in 18 samples total. To reduce the number of samples for RNA-seq, only the control samples were used at time point 0 (just prior to initiation of salt stress) which yielded 15 samples for the experiment. Samples were immediately frozen in liquid nitrogen and stored at 280uC prior to RNA isolation. Total RNA was isolated using Trizol Reagent (Invitrogen, USA) and purified through a Qiagen RNeasy Column (Qiagen, Germany) according to the manufacturer's instructions. A 1% agarose gel buffered by Tris-acetate-EDTA was run to indicate the integrity of the RNA. Seven samples were further quantified in an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA) at the Core Laboratories Center Genomics, Institute of Biotechnology, Cornell University (http://www.biotech.cornell.edu/ biotechnology-resource-center-brc) to verify total RNA quality. RNA Integrity Number (RIN) for the samples analyzed were 8. 5, 9.1, 8.9, 8.5, 8.5, 8.7 and 6.7.

Library preparation and sequencing
Libraries corresponding to three biological replicates from each time point plus treatment combination (control time 0 h, control and NaCl time 6 h and 24 h) were constructed following a High-Throughput Illumina Strand-Specific RNA Sequencing Library protocol [21]. Briefly, 2-5 mg of total RNA was used for polyA RNA capture with magnetic oligo(dT) beads (Invitrogen, USA), fragmented at 95uC for 5 min and eluted from beads. Cleaved RNA fragments were primed with random hexamer primers to synthesize the first cDNA strand using reverse transcriptase SuperScript III (Invitrogen, USA) with dNTP. The second cDNA strand was generated by DNA polymerase I (Enzymatics, USA) with dUTP mix. Following end-repair (Enzymatics, USA), dAtailing (Klenow 39-59, Enzymatics, USA) and adapter ligation (T4 DNA Ligase HC Enzymatics, USA), the second dUTP-strand was digested by uracil DNA glycosylase (Uracil DNA Glycosylase, Enzymatics, USA). The resulting paired-end adaptor ligated-cDNA tags at the 39 end were amplified using PCR indexed primers (IP) annealing in the adaptor sequence for 15 cycles enriching the final libraries (see Table S1 for all 6-nt tags/index). Libraries one through fifteen were indexed with non-HPLC purified IP 1-15 and the remaining fifteen libraries (technical replicates) were indexed with HPLC purified IP 16-30 utilizing the same cDNA sample (i.e., cDNA library 1 with IP 1 and IP 16).
The standard desalted non-HPLC primers (NH) primers were ordered in a 96 well plate (Integrated DNA Technologies, Coralville, Iowa, USA) designed with two empty wells between every well containing primer to allow the dispensing needle to be rinsed out twice before making a new primer. The HPLC purified primers (HP) were ordered individually (Integrated DNA Technologies, Coralville, Iowa, USA). All double stranded cDNA libraries had expected size (,250 bp) when run on a 2% agarose gel except library 5 (third bioreplicate from control at time point 06 h) indexed with NH primer that failed (

Bioinformatics analysis -reads processing
A thorough quality control on the raw data was performed using FastQC software written in Java to provide summary statistics for FASTQ files (http://www.bioinformatics.babraham. ac.uk/projects/fastqc/) [22] and to report problems, thus ensuring the detection of biases in the data. For all the 29 libraries the phred-like quality scores (Qscores) was .20. The detection of sequencing adapters and primers, poor quality at the ends of reads, limited skewing at the ends of reads and N's were then processed and filtered out with the Ea-Utils software (http://code.google. com/p/ea-utils/wiki/FastqMcf) [23] increasing the Qscore to .30 for all the libraries and length .50 bp (Q30L50).
In order to assess the quality of each assembly we compared the major outcomes: contig mean size, number of sequences (N50) and length (L50). We also compared the mean size distribution of assembled transcripts with ITAG2.3 tomato gene models [25]. All plots were generated using free and open-source 'R software' (R Development Core Team, 2010; http://www.R-project.org).

Mapping and error estimation
All the reads from both technical replicates (non-HPLC and HPLC) were separately mapped against a Trinity HP de novo assembly using 'Bowtie2' (http://bowtie-bio.sourceforge.net/ bowtie2/index.shtml) to screen for total error number and errors per read. The error percentage was calculated with the 'Error Correction Evaluation Toolkit software' [26] as (Error Number/ Mapped Bases) 6100 and mapping percentage as (Total Reads/ Mapped Reads)/Total Reads 6100 against a Trinity HP reference.
Since no significant differences were found with regards to mean error per read as expected, a final de novo assembly was performed with all the reads combined to increase the coverage of the transcripts, building a final reference using Trinity with default settings.  Gene expression was carried out with 'RNA-Seq by Expectation-Maximization (RSEM)' software (http://deweylab.biostat. wisc.edu/rsem/README.html) [27] bundled with the Trinity package. Differentially expressed transcripts across the time points for both control and salt-treated plants were identified and clustered according to expression profiles using 'EdgeR Bioconductor' package (http://www.bioconductor.org/packages/2.11/ bioc/html/edgeR.html) [28] using 'R statistical software' (R Development Core Team, 2010; http://www.R-project.org).

Statistical analysis
Multivariate comparisons of transcriptional expression profiles between HP and NH samples were conducted using 'R statistical software' (R Development Core Team, 2010; http://www.Rproject.org) including a permutational multivariate analysis of variance (ADONIS) with a Bray-Curtis distance matrix in the Vegan package. Fixed effects in the model included primer type, time point, and interactions.

Validation of technical replicates
Many RNA-seq experiments include both biological (RNA from different samples) and technical (same source of RNA) replicates [28]. In our work, technical replicates corresponded to transcript isoforms barcoded with both non-HPLC (NH) and HPLC (HP) purified index primers. Prior to data analysis, we evaluated if library construction with these two types of oligonucleotides resulted in significant differences by separately analyzing and comparing the output of both datasets (NH vs. HP) using different bioinformatics statistical analyses. Variance partitioning through permutational multiple analysis of variance indicates that the primer-choice (NH vs. HP) in the statistical model explained less than 2% of the variation in expression profiles whereas the overall model explained greater than 85% (Table S2A-E). The specific effect of primer-choice varied with the cut-off of the most expressed transcripts at 10, 100, 1,000, 10,000, and 100,000 RPKM (P-value = 0.310, P-value = 0.066, P-value = 0.049, Pvalue = 0.055, and P-value = 0.038, respectively). It should be noted that low significance in expression profiles (Table S2B-E) might be due to experimental and biological noise, rather than technical effects of primer purification. Slight variation between technical replicates without affecting datasets has also been found and described Marioni et al. (2008) [30]. A dendrogram of differentially expressed transcripts was created to visualize the relationship between technical and biological replicates, showing that the difference between technical replicates is smaller than biological replicates (Fig. S1). Lower variability in technical replicates than biological replicates is in accordance with Robinson et al. (2010) [28]. These findings validate our technical replicates, increase the robustness and accuracy of the transcriptome (i.e., more depth in the de novo assembled transcripts from both biological and technical replicates) and suggests that the use of NH index primers can be adopted, greatly reducing the cost of indexing step for future RNA-seq experiments. Even in the case that one library fails due to the use of non-HPLC primer (low probability, 6% in our case) it is still worth building libraries with cheaper primers, as the quality and quantity of data is not affected.
Moreover, a failed library can be easily detected at early stages of library construction and thus barcoded with a new index primer and checked for expected size on an agarose gel (see Library preparation and sequencing -Material and Methods).

Reads processing
The high-throughput and powerful RNA-seq technology has allowed scientists to reconstruct a transcriptome from species with no genomics information available, recovering most of the expressed genes in a given cell or tissue. For example, 454 GS FLX Titanium pyrosequencing has been used in olive tree (Olea europaea) [31] and the Illumina Genome Analyzer in Chinese cabbage (Brassica rapa) [32]. To do this, a suggested number of reads (.30 million pair-end reads .30 nucleotides for experiments whose purpose is to compare transcriptional profiles) should be generated either with 454 or Illumina platform to produce a meaningful assembled transcriptome [33]. One lane in an Illumina HiSeq2000 flow-cell will generate more than 100 million reads. To obtain a global view of the transcriptome of Petunia x hybrida from both control and salt-treated leaf samples we generated 196 million reads per lane (raw data) ranging from 10 to 23 million reads across the 29 libraries (Table S1), in accordance with the yield suggested by Goldfeder et al. (2011) [34].

Transcriptome de novo assembly and evaluation
Comparison of software used in our study showed that Trinity outperformed the rest (Trans-ABySS and SOAPdenovo-trans) across the entire range of conditions and that Trans-ABySS had the lowest of the quality assembly (Fig.1). K-mer length was adjusted to include every odd numbers from 23 to 63 (i.e., k-mers 23, 25, …, up to 63) for Trans-ABySS (T.ABySS hereafter) and SOAPdenovo-trans (SOAP hereafter) to optimize transcriptome de novo assembly into contigs and scaffolds. The best results with SOAP were obtained with k-mer length 47, which yielded larger contigs and scaffolds (data not shown), that had higher N50 and L50 than other k-mer lengths (Table 1) [36]. In their results SOAP outperformed all three assemblers (T.ABySS, SOAP and Trinity). This shows the importance of optimizing a methodology for a particular dataset, as all datasets are different. Summary of results including contig mean size, N50 and L50 for all the assemblers are found in Table 1.
To evaluate sequence length of the recovered Petunia transcriptome, we compared the apparent total mRNA length to the fully annotated tomato transcriptome. Tomato was utilized as the most closely related species (both in family Solanaceae) with a fullannotated transcriptome (34,727 CDS, N50 7,000 sequences with 1,400 bp average length) [25]. The comparison was made using the three aforementioned assemblers looking at mRNA size distribution; we observed that Trinity showed the closest distribution to tomato transcriptome followed by SOAP k-mer 47 and lastly by T.ABySS trans k-mer (Fig. 1). Thus, according to our data, Trinity is the most accurate assembler leading to a transcript mean size closer to tomato's.   , growing under fresh and seawater (,500 mM NaCl), which were assembled into 108,598 unigenes [37]. Of these, 50.3% (54,596) showed significant similarities with protein databases and 1% were annotated with sequences from non-plant sources. The three species with the most BLAST hits in our work were Vitis vinifera, Solanum lycopersicum and Glycine max. A graph with species distribution and their BLAST hits is found in Fig. S2. Gene Ontology (GO) was used to classify functions of the assembled transcripts, from which we obtained a total number of 69,277 GO term annotations in our proposed transcriptome. The large majority of unigenes corresponded to metabolic process (9,611), cellular process (9,443) and response to stimulus (3,330) (Fig. 2). Transcriptome GO terms and gene descriptions are found in Table S3 and DNA sequences are deposited it in the SOL Genomics Network (SGN) database http://solgenomics.net for others to use. This all-reads-assembly performed with Trinity was used for further analysis. In our work we used Bowtie mapper bound with the Trinity package, which mapped back to the final reference transcriptome ,18 million reads (data not shown).

Gene expression and differentially expressed genes
The top five most highly expressed transcripts (highest RPKM) were the same for each of the 29 libraries regardless of presence of salt stress. These five genes are involved in photosynthesis, as expected for leaf samples ( Table 2). The most highly expressed gene (highest RPKM) across all the samples was the small chain of ribulose-bisphosphate carboxylase (EC 4.1.1.39). The high expression of rubisco corresponds with maize B73 seedlings exposed to low night temperature (4uC) as determined by real-time PCR [38]. Transcript abundance and functional annotation for the top five most expressed genes with their respective RPKM expression levels is shown in Table 2.
Differentially expressed genes. When comparing the total number of differentially expressed genes and transcripts across the three time points in a pair-wise fashion, we observed that differential expression was higher in salt treated plants compared to a control at a particular time point. For example, the large majority of differentially expressed genes (1,064) and transcripts (1,494) were found between salt treated plants at 24 h vs. control at 06 h ( Table 3). The number of genes differentially expressed in the control (00 h, 06 h and 24 h) is likely due to transcripts involved in plant circadian rhythm and mechanical damage induced while sampling.
To represent differentially expressed genes under salt stress we created a heatmap of RPKM-normalized transcript isoforms through hierarchical clustering. False Discovery Rate (FDR) #0.001 and the maximum value of |log 2 (ratio of stress/control)| $1 was used as cut-off to evaluate significant differences in expression (Fig. 3). We found 1,216 up-regulated transcripts (grouped in 3 subclusters) and 49 down-regulated transcripts (grouped in 1 subcluster) whose expressions were significantly induced and reduced by NaCl treatment, respectively (Fig. 4). Three isoforms of heat shock protein (HSP) were the most upregulated transcripts, increasing their expression by over 90-fold (Fig. 4A). The high expression level of HSP under abiotic stress is in accordance with the DNA microarray analysis in Arabidopsis by Seki et al. (2002) [39]. The large majority of up-regulated Table 3. Pair-wise matrix comparison of differentially expressed transcripts and genes (genes in parenthesis) of leaves exposed to 0 and 150 mM NaCl across three different times (0, 6 and 24 h).
In contrast to NaCl treatment, most of the up-and downregulated transcripts between control treatments were involved in oxidation-reduction processes, photosynthetic electron transport in photosystem II, electron carrier activity, response to cyclopentenone, coenzyme binding, cytochrome P450 regulation and transferase activity. A detailed lists with all up-regulated transcripts (subclusters 12, 2 and 7) and down-regulated transcripts (subcluster 4), including gene descriptions, changes in expression and their GO annotation are found in Table S4A-D.
Candidate genes to enhance salt tolerance. Based on our analysis we suggest eight salt-induced genes that could be further studied. Functional analysis for these candidate genes may be useful for genetic engineering or marker assisted selection to enhance salt tolerance in Solanaceae. We group the candidate genes into two major groups; those induced at both 06 and 24 h of salt stress (Fig. 5) and those induced at 24 h of stress but not with 6 h (Fig. 6). From the eight suggested candidate genes, no homology (unknown protein) was retrieved upon performing BLASTX to the tomato genome (ITAG release 2.31), with 'comp32475_c0_seq1'. The 'unknown' transcript maps to tomato chromosome 3 between 61,095,606-61,097,016 base pairs and it is induced 17-fold when comparing control 06 h vs. salt 06 h and 59 fold when comparing control 06 h vs. salt 24 h. Genes ID, annotation, P-values, FDR and fold induction for the suggested candidate genes are shown in Table 4. Partial DNA sequences can be found at the SOL Genomics Network (SGN).
Gene Ontology analysis. To better characterize the effects of NaCl in biological processes we conducted GO enrichment analysis using Fisher's Exact Test (Bonferroni-corrected, FDR #0.05), with differentially expressed genes and the whole transcritpome set as a background reference. With the exception of 'regulation of biological quality', all the statistically significant overrepresented GO terms in salt treated leaves from 6 h were the same as those from 24 h. The most overrepresented GO terms in response to NaCl stress were 'response to abscisic acid stimulus', 'response to jasmonic acid stimulus', 'response to ethylene stimulus', 'response to salt stress' and 'G-protein coupled photoreceptor activity', indicating that most induced genes at this early stage of the stress are not salt-induced but genes involved with osmotic adjustment, hormonal changes and stress signaling (Table S5A-C). These results are in accordance with previous reports on salt stress studies [41]. More interestingly, 72 significantly enriched GO terms were associated exclusively with samples at 24 h of salts stress (i.e., not found at 6 h). From these results, we observe that salt induces the activation of a distinct group of genes not activated previously, suggesting that the concentration of Na + or Cl 2 ions may interfere with cellular functions and biological processes such as the DNA replication process (i.e. GO terms: 'DNA replication', DNA conformation change', DNA replication initiation', 'DNA-dependent DNA replication'), metabolic processes ('nucleic acid metabolic process', 'glycerolipid metabolic process', 'RNA metabolic process'), transport ('nuclear transport', 'oligopeptide transmembrane transport', 'nucleocytoplasmic transport', 'nitrogen compound transport') and development ('post-embryonic development', 'developmental process'). The 72 GO terms are listed in Table 5. Ulm (2004) reported that Na + accumulation may also cause genotoxicity in which DNA alteration/damage can arise as a consequence of errors in DNA replication and DNA repair [42]; Katsuhara and Kawasaki (1996) showed nuclear deformation and genotoxicity in the meristematic    The compartmentalization of Na + into the vacuole by the Na + / H + tonoplast antiporter is a mechanism employed by some plants to cope with salt [44][45][46]. Tomato plants overexpressing an Arabidopsis vacuolar Na + /H + antiporter (AtNHX1) were able to grow in the presence of 200 mM sodium chloride accumulating high sodium concentrations in leaves but not in fruits [46]. However, we did not observe this mechanism in our experiment. We believe that after 24 h of salt stress, while initial cellular damage can be evident, a longer-term response may be required to observe genes involved in exclusion and/or compartmentalization of ions. Future work with RNA-seq should seek to understand the longer-term detrimental consequences of salt in in Solanaceae plants.
In this work we carried out the first in-depth transcriptomic analysis in Petunia under salt stress through RNA-seq. We quantified the expression of more than 7,000 genes across 24 h of acute NaCl stress. The large number of up-and downregulated transcripts in response to salt stress is consistent with previous research and the underlying physiological responses to NaCl treatment. Stress response genes related to reactive oxygen species, transport, and signal transductions as well as novel and undescribed genes were identified. The candidate genes identified in this study can be applied as markers for breeding efforts or as candidates to genetically engineer plants to enhance salt tolerance. GO terms analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na + ions. We suggest that future RNA-seq with members of the Solanaceae incorporate more time points (i.e., longer exposure to NaCl) to assess detrimental effects of sodium chloride in plants. In this work we also propose a novel Petunia transcriptome assembled out of 196 million Illumina reads with Trinity software that can be used as an excellent tool for biological and bioinformatic inferences in the absence of an available genome. Additionally, we introduced a slight modification in the library preparation barcoding samples with non-HPLC primers. The methodological improvement presented could benefit the work in different next generation sequencing technologies, where the use of HPLC purified primers is an important contribution to the cost of sample preparation, thereby reducing a barrier to researchers of limited means to use high-throughput RNA sequencing. Supporting Information Figure S1 Clustering of differentially expressed transcripts when comparing dispersion between biological (r1, r2, r3) vs. technical replicates (NH and HP). Conrtrol00h, Controlh06 and Con-trol24h indicate control leaves samples taken at time 0 h, 6 h and 24 h respectively, after treatment commenced. SaltStress00h, SaltStress06h, SaltStress24h indicate salt treated leaf samples at the same time points. Clustering between NH and HP datasets is smaller for the biological replicates.
(TIFF) Figure S2 Species distribution and their BLAST Hits. (TIF) Table S1 Million reads per library before (raw data, column 3) and after data cleaning (filtered data, column 4). The 6 unique nucleotides tags used to index each library are shown in the first column and yield (GB) per library in column 2. Note that library 5 indexed with non-HPLC primer failed. (DOCX)