De novo Assembly and Characterization of the Barnyardgrass (Echinochloa crus-galli) Transcriptome Using Next-Generation Pyrosequencing

Background Barnyardgrass (Echinochloa crus-galli) is an important weed that is a menace to rice cultivation and production. Rapid evolution of herbicide resistance in this weed makes it one of the most difficult to manage using herbicides. Since genome-wide sequence data for barnyardgrass is limited, we sequenced the transcriptomes of susceptible and resistant barnyardgrass biotypes using the 454 GS-FLX platform. Results 454 pyrosequencing generated 371,281 raw reads with an average length of 341.8 bp, which made a total length of 126.89 Mb (SRX160526). De novo assembly produced 10,142 contigs (∼5.92 Mb) with an average length of 583 bp and 68,940 singletons (∼22.13 Mb) with an average length of 321 bp. About 244,653 GO term assignments to the biological process, cellular component and molecular function categories were obtained. A total of 6,092 contigs and singletons with 2,515 enzyme commission numbers were assigned to 151 predicted KEGG metabolic pathways. Digital abundance analysis using Illumina sequencing identified 78,124 transcripts among susceptible, resistant, herbicide-treated susceptible and herbicide-treated resistant barnyardgrass biotypes. From these analyses, eight herbicide target-site gene groups and four non-target-site gene groups were identified in the resistant biotype. These could be potential candidate genes involved in the herbicide resistance of barnyardgrass and could be used for further functional genomics research. C4 photosynthesis genes including RbcS, RbcL, NADP-me and MDH with complete CDS were identified using PCR and RACE technology. Conclusions This is the first large-scale transcriptome sequencing of E. crus-galli performed using the 454 GS-FLX platform. Potential candidate genes involved in the evolution of herbicide resistance were identified from the assembled sequences. This transcriptome data may serve as a reference for further gene expression and functional genomics studies, and will facilitate the study of herbicide resistance at the molecular level in this species as well as other weeds.


Introduction
Barnyardgrass (Echinochloa crus-galli [L.] Beauv.) is one of the main problematic grass weeds that grows along with important staple crops such as rice [1]. During cultivation even when the ratio of rice plants to barnyardgrass is 10:1 rice biomass is reduced by 75% and yield by about 50% [2]. Many herbicides are being used to destroy barnyardgrass which in turn would improve rice production. However, persistent use of herbicides results in rapid development of herbicide resistance [3][4]. In the last two decades, it has been reported that E. crus-galli worldwide has developed resistance to nine herbicide groups: ALS inhibitors (e.g. penoxsulam, bispyribac-sodium), ACCase inhibitors (e.g. cyhalofop-butyl), synthetic auxin (e.g. quinclorac), photosystem II (e.g. atrazine), ureas and amides (inhibition of photosynthesis at photosystem II, e.g. propanil), dinitroanilines (microtubule assembly inhibition, e.g. pendimethalin), thiocarbamates (inhibition of lipid synthesis, e.g. thiobencarb), chloroacetamides (inhibition of cell division, e.g. butachlor) and isoxazolidinoes (inhibition of carotenoid biosynthesis, e.g. clomazone) [4][5][6][7][8]. The increasing resistance of E. crusgalli to herbicides has drastically threatened rice production and alternate weed management strategies should be considered during the cultivation of direct-seeded rice. Therefore, an understanding of the fundamental molecular mechanisms behind development of herbicide resistance is necessary to minimize and manage resistance development, and increase crop yield [3][4].
Recent advances in genomic sequencing technologies are radically changing biological research, and will also have a major impact on crop improvement [9][10][11]. However, genomics and bioinformatics studies on the rapidly evolving weeds of modern agriculture are limited [12]. Therefore, developing new genomics resources is necessary to study the troublesome weeds in crop fields.
In recent years, the next generation sequencing (NGS) technologies have enabled inexpensive and quick generation of large-scale sequence data when compared to conventional Sanger sequencing [9,13]. The 454 pyrosequencing technology has been used to analyze the transcriptome of grass species such as Brachypodium distachyon, waterhemp (A. tuberculatus), horseweed (C. canadenisis), grain amaranth (A. hypochondriacus), 11 composite weeds and switchgrass (Panicum virgatum) [14][15][16][17][18][19]. Besides obtaining the transcriptomic data from these species, some candidate genes potentially involved in herbicide resistance have also been analyzed. The preliminary sequence data for all of the major target-site genes such as 4-hydroxyphenylpyruvate dioxygenase and glutamine synthetase were obtained in A. tuberculatus [15]. The functions of ABC transporters involved in non-target glyphosate resistance in C. canadenisis were demonstrated by real-time PCR experiments [16]. These data indicate that 454 GS-FLX pyrosequencing is a powerful tool for the development of genomic resources for functional genomics on grasses. They would also potentially benefit future research efforts in weed science.
To increase the genomic resources for weeds, we sequenced the trancriptomes of the susceptible and resistant biotypes of barnyardgrass (E. crus-galli) using the 454 GS-FLX platform. The data were assembled de novo followed by sequence annotation and clustering into putative functional categories using the Gene Ontology (GO) framework and grouping into pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Moreover, transcript abundance analysis was performed using Illumina sequencing and the observed differences between the resistant and susceptible E. crus-galli biotypes are presented here. Then, potential candidate sequences involved in the development of herbicide resistance were also analyzed. Finally, C 4 photosynthesis genes including RbcS, RbcL, NADP-me and MDH with complete CDS were identified using PCR and RACE technology.
De novo sequence assembly was then performed, which involved comparison of sequences, finding overlapping fragment pairs and merging as many fragments as possible to create a consensus sequence [27]. This assembly resulted in 10,142 contigs (,5.92 Mb) with an average length of 583 bp and 68,940 singletons (,22.13 Mb) with an average length of 321 bp ( Table 1). The length distribution of all assembled contigs and singletons is shown in Fig. 3. About 9,432 contigs and 64,272 singletons were .100 bp, 5,103 (50.32%) contigs were longer than 500 bp and 2,935 (28.94%) were longer than 700 bp. The longest contig and singleton was 6,708 and 878 bp respectively. The N50 value for the contigs was 900 bp and represents the contig length of half of the total contigs.

Materials and Methods
We promise that no specific permissions were required for the barnyardgrass species in the described locations in this manuscript, and the field studies did not involve endangered or protected species.

Plant Materials and Growth
Barnyardgrass (E. crus-galli) seeds from susceptible and resistant biotypes were collected from the rice field at Lu-jiang county in An-hui province, China. Before the experiments, this resistant biotype was first confirmed to be resistant to quinclorac (inhibitor of synthetic auxin and also inhibitor of cell wall), penoxsulam (ALS inhibitor) and bispyribac-sodium (ALS inhibitor) group of herbicides ( Fig. 1). For this, the seeds were sterilized in 3% NaClO for 10 min, washed three times with tap water and allowed to germinate at 30uC. These germinating seeds were then planted in plastic pots, filled with a 2:1:1 mixture of soil: peat: sand in a climate chamber at 28uC/25uC (day/night) with a 16 h photoperiod. Three weeks later, leaves from the susceptible and resistant biotype seedlings were treated with 1.143 mg/ml quinclorac (BASF AG, German), 4.114 mg/ml penoxsulam (Dow AgroSciences, USA) and 3.429 mg/ml bispyribac-sodium (Jiangsu Institute of Ecomones Co., LTD, China). After 24 h of spraying, both treated and untreated seedlings were harvested and immediately frozen in liquid nitrogen, and stored at 280uC until RNA extraction.

RNA Isolation and cDNA Library Construction
Total RNA from the seedlings was extracted using Trizol reagent (Invitrogen, USA) according to the manufacturer's protocol. RNA quantity and quality were analyzed using gel

Sequencing, Assembly and Annotation
The cDNA pool representing untreated susceptible barnyardgrass, untreated resistant barnyardgrass and herbicide-treated resistant barnyardgrass was used for high throughput sequencing using the 454 GS-FLX platform (Roche, USA). A half-plate run was performed at the Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China) following the manufacturer's protocol. GS de novo assembler (Newbler v2.3, Roche) was used for 454 sequencing assembly. For similarity searches, all assembled contigs and singletons were compared with the proteins in the NCBI non-redundant (nr) protein database using BLASTx [20]. The BLAST E-value threshold was set at 1610 25 , with a minimum alignment length of 60 bp. To determine gene functions of the sequences, these results were imported to Blast2GO, a software that retrieves gene ontology (GO) terms [21]. GO terms were determined with respect to Cellular Component, Biological Process and Molecular Function. Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways were assigned to the assembled sequences and Enzyme Commission (EC) numbers were used to automatically color and retrieve KEGG pathway maps [22]. The assembled sequences were mapped to the KEGG biochemical pathways according to the enzyme distribution in the pathway database. InterPro annotations were performed using InterProScan (http://www.ebi.ac.uk/Tools/InterProScan/).

Illumina Sequencing and Digital Expression Analysis
Oligo-dT beads were used to yield poly(A + ) mRNA from a total RNA pool consisting of equal quantities of total RNA from four sample types: untreated susceptible barnyardgrass, herbicidetreated susceptible barnyardgrass, untreated resistant barnyard-  grass and herbicide-treated resistant barnyardgrass. Then, mRNA bound to the beads was converted into first-strand cDNA using oligo-dT and second-strand cDNA using random primers. Pairedended (PE) library with approximate 200 bp fragment was synthesized using the Genomic DNA Sample Prep Kit (Illumina, USA) according to the manufacuter's instructions. PE library products were sequenced on the Illumina Genome Analyzer II platform equipped with a paired-end module according to the manufacturer's instruction. Then the PE reads were assembled using ABySS assembler tool [23]. The sequences were aligned to the assembled contigs and singletons using the bowtie software [24]. The expressions of each reads between sample pairs (untreated susceptible barnyardgrass VS herbicide-treated susceptible barnyardgrass, untreated resistant barnyardgrass VS herbicide-treated resistant barnyardgrass) were calculated using the numbers of reads with a specific match. Among the four samples, a minimum of a two-fold difference in log2 expression were considered as expression differences. Genes with different expression were identified using R package DEGseq with MARS (MAplot-based method with random sampling Model) [25].

Data Availability
All the reads generated in 454 pyrosequencing and Illumina sequencing were deposited in the NCBI Sequence Read Archive (SRA) under the accession number SRX160526.

Quantitative Real-time PCR Analysis
Quantitative real-time PCR analysis of two candidate genes, ALS and ACCase, was performed using gene specific primers.   was used as an internal control. The primers used to amplify bactin were 59-CACACTGGTGTTATGGTAGG-39 and 59-AGAAGGTGTGATGCCAAAT-39. Real-time PCR was performed using SYBRH Premix Ex Taq TM kit (TaKaRa, Japan) on a ABI-7500 Fast Real-Time PCR System (ABI, USA). About 2ml cDNA from each of the four samples (untreated and herbicidetreated susceptible barnyardgrass, untreated and herbicide-treated resistant barnyardgrass) was mixed with 0.2 mM primers, 16ROX Reference Dye II and SYBRH Premix Ex Taq TM (including TaKaRa Ex Taq HS, dNTP mixture, Mg 2+ and SYBR Green I) in a final volume of 20 ml. The amplification program was as follows: 30 s at 95uC for denaturation followed by 40 cycles of 3 s at 95uC and 30 s at 60uC. Each experiment had three independent repetitions. Values were presented as mean 6 SD of three independent RT-PCR analysis. Error bars were calculated based on values from three replicates. Relative expression of the target genes to the b-actin control was calculated using the 2 2ggCT method [26].

Sequencing and de novo Assembly
We performed 454 pyrosequencing of the barnyardgrass (E. crus-galli) cDNA to identify target genes for functional genome studies. Our choice to sequence the transcriptome was based on the difficulty in sequencing the polyploid barnyardgrass genome by whole genome sequencing. A pooled cDNA library from three samples (untreated susceptible biotype, untreated resistant biotype and herbicide-treated resistant biotype) was run on a half-plate using the 454 GS-FLX Titanium Chemistry to obtain sequences of low-and high-abundance transcripts. This biotype-treatment combo could produce the transcripts related to herbicide resistance, herbicide-treated susceptible biotype was therefore absent in this 454 pyrosequencing,. This generated 371,281 raw reads with an average length of 341.8 bp and a total length of 126.89 Mb (Table 1). Sequence length distribution of raw reads is shown in Fig. 2.

Sequence Annotation
In order to identify new E. crus-galli specific sequences, all 79,082 assembled sequences (10,142 contigs and 68,940 singletons) were used as queries to search the NCBI nr protein database using BLASTx followed by analysis using Blast2GO. In total 45,438 assembled sequences including 7,407 (73.03%) contigs and 38,031 (55.17%) singletons were annotated. The remaining 44.83% singletons and 26.97% contigs were not annotated presumably due to their short average lengths. Average length of the annotated contigs was 690 bp, while that of non-annotated contigs was only 260 bp. The average length of annotated singletons was 360 bp, while that of non-annotated singletons was 272 bp.
In these annotations, only 24 sequences were hits to proteins from the Echinochloa species while 23,657 (52.06%) were hits to proteins from Sorghum bicolor; 9,584 (21.09%) to Zea mays; 7,718 (16.99%) to Oryza sativa and 1,764 (3.88%) to Hordeum vulgare. Such low numbers of proteins from barnyardgrass is most likely due to the absence of a well annotated reference genome for the species. Almost half of the sequences (42.54%) assembled in this study could not be annotated by searching the NCBI nr protein database. Absence of sufficient sequences from species phylogenetically close to barnyardgrass in the public databases could have also contributed to the high percentage of non-annotated sequences. These non-annotated sequences could be resourceful for pursuing taxon-guided searches of specific genes of interest [15].
GO terms were assigned to the annotated EST sequences using Blast2GO. One or more GO terms were assigned to 40,399   (Fig. 4). Representation of 56 GO terms in the transcriptome provided an indication of the diversity of genes expressed in E. crus-galli. Similar results were reported for the biological process category in Conyza canadensis, while the predominant subgroups in molecular function (enzyme activity and transferase activity) and cellular component (intracellular components and cytoplasmic components) are different from our data [16]. To our interests were the GO terms related to herbicidal mechanisms: response to auxin stimulus (152), photosystem II (114), cellulose biosynthetic process (106), lipid catabolic process synthesis (95), acetyl-CoA biosynthetic process (65), carotenoid biosynthetic process (37) and acetolactate synthase activity (6). Quinclorac used in our study is an inhibitor of synthetic auxin (acting like indolylacetic acid) and also an inhibitor of cell wall (cellulose) biosynthesis [4]. About 51 sequences under auxin response factor and 28 sequences under cellulose synthase were obtained. These sequences could be further investigated to determine their specific function in the evolution of herbicide resistance in E. crus-galli.

Functional Classification by KEGG
In addition, KEGG pathways were assigned to all 79,082 assembled sequences by comparing to the KEGG database using BLASTx with an E-value cutoff of ,1610 25 . A total of 6,092 contigs and singletons with 2,515 enzyme commission (EC) numbers were assigned to 151 predicted KEGG metabolic pathways and the numbers of barnyardgrass sequences in the different pathways ranged from 1 to 4,424. This analysis showed that 4,424 sequences were present in the major metabolic pathways and 1,865 in the biosynthesis of secondary metabolites. The sequences represented in the metabolic pathways of major bio-molecules included purine, starch and sucrose, terpenoids and steroids, methane, pyrimidine, phenylalanine, amino sugar and nucleotide sugar, glycerophospholipid, glycerolipid, etc. About 894 sequences with 109 EC numbers were mapped to the biosynthesis of plant hormones pathway. In the metabolism pathway, we found seven sequences assigned to acetolactate synthase (ALS, EC 2.2.1.6), which is the target for herbicides, penoxsulam and bispyribac-sodium. When these sequences were compared in the NCBI nr database using BLASTx, only two (contig05544 and  G81ESPA01CWAJO) sequences were found to be highly similar (99% and 99%) to ALS genes from Echinochloa phyllopogon [28], while the remaining five (contig03089, contig06883, contig08547, G81ESPA01BLGZC and G81ESPA01ECFXF) were annotated to hypothetical proteins of unknown function.

Illumina Sequencing and Digital Expression Analysis
To identify large scale differences in gene expression of susceptible and resistant barnyardgrass, quantitative transcriptome analysis using Illumina Genome Analyzer II was performed. It produced quantitative expression scores, which should be accurate and highly consistent with other quantitative methods such as quantitative RT-PCR [29]. A mixed cDNA pool from four samples (untreated and herbicide-treated susceptible barnyardgrass, untreated and herbicide-treated resistant barnyardgrass) was sequenced using Illumina. This yielded 2.8 GB data with 14,769,229, 14,610,811, 12,341,146 and 15,099,517 assembled paired-ended reads respectively. These transcripts were mapped to the assembled sequences from 454 pyrosequencing and those that were mapped successfully were further used to determine and compare the expression profiles of the transcripts in the four samples. Digital expression analysis identified a total of 78,124 sequences from 454 pyrosequencing including 9,737 contigs and 68,387 singletons showing significant abundance differences between sample pairs (untreated susceptible barnyardgrass VS herbicide-treated susceptible barnyardgrass, untreated resistant barnyardgrass VS herbicide-treated resistant barnyardgrass) (Fig. 5). The expression of many non-annotated genes with unknown function was revealed in this digital gene expression analysis. Thus this transcriptome could be a potentially rich source of genetic material for the systematic discovery of genes involved in the mechanisms of herbicide resistance [17].
In the digital gene expression analysis, we found that asparagines synthetase 2 had the highest expression in the herbicide-treated resistant barnyardgrass. The top 10 annotated sequences with highest expression level in the barnyardgrass transcriptome are listed in Table 2. Except sucrose synthase, the other nine genes had higher expression in herbicide-treated resistant barnyardgrass than in herbicide-treated susceptible barnyardgrass biotype. Metallothionein-like protein, elongation factor, acetyl-coenzyme A carboxylase, light-induced protein 1 and pyruvate orthophosphate dikinase had higher expression levels in the herbicide-treated resistant barnyardgrass than the untreated resistant barnyardgrass biotype. In addition, the expression levels of these genes were lower in the herbicidetreated susceptible barnyardgrass and lowest in the untreated susceptible barnyardgrass biotype. Functional analysis of these genes will be conducted in future to determine the relationship between these genes and herbicide resistance.

Comparison of the Sequences Involved in Herbicide Resistance
The candidate genes involved in herbicide resistance were obtained by comparing our transcriptome (454) with the trascriptomes of waterhemp [15], horseweed [16] and grain amaranth [17]. The comparison of eleven herbicide target gene families and four non-target gene families is presented in Table 3. More glutamine synthetase (16) and phytoene desaturase (7) transcripts from the herbicide target gene families were identified in barnyardgrass than in the other three species. Notable absence in the barnyardgrass transcriptome were 4-hydroxyphenylpyruvate dioxygenase, protophenylpyruvate dioxygenase and D1 protein. The reason of such absence could be due to similar expression patterns between the susceptible and resistant biotypes. From our analysis we could conclude that 215 assembled sequences could be involved in the evolution of herbicide resistance. Since there is no reference genome to assemble these short reads into scaffolds and these 215 sequences could be redundant, the actual number of genes could be less than 215. In future work, we will focus on the functional analysis of these candidate genes.
Microtubules are polymers of a-tubulin and b-tubulin dimers and play an important role in many essential cellular processes including mitosis, cytokinesis and vesicular transport [3]. The digital expression of 24 tubulin genes including a-tubulin and btubulin was much higher in the herbicide-treated resistant biotype (RT) than the herbicide-treated susceptible biotype (ST) of E. crusgalli, except one a-tubulin gene (G81ESPA01BDRNJ) and three btubulin genes (G81ESPA01CF9IL, G81ESPA01BZVDB and G81ESPA01BQRNR) with lower expression (Table 4). After treatment with herbicides, the digital expression levels of tubulin genes in the susceptible biotype (ST) declined when compared with untreated susceptible biotype (SC). However, the digital expressions of tubulin genes in the resistant biotype remained the same or declined below the susceptible biotype. The microtubule growth was more disrupted in the susceptible biotype than in the resistant biotype after the herbicide treatment. Tubulin genes play an important role in herbicide induced growth inhibition. For instance, in goosegrass (Eleusine indica), nucleotide differences were observed in the three a-tubulin genes between resistant (R) and susceptible (S) biotypes, although missense mutations were not present in any of the four b-tubulin genes in the two R-biotype lines [30,31]. This indicated the possible association between these missense mutations in the a-tubulin genes and herbicide resistance in goosegrass. Similar results were found between the resistant and sensitive biotypes of green foxtail (Setaria viridis L. Beauv.) [32]. Therefore, the isolation and functional analysis of the full-length atubulin and b-tubulin genes from barnyardgrass is necessary to understand the mechanisms of herbicide resistance. Acetyl-CoA carboxylase (ACCase, EC 6.4.1.2) is a key enzyme in lipid biosynthesis that catalyzes the formation of malonyl-CoA from acetyl-CoA [3]. Similar to tubulin genes, the digital expression of eight ACCase genes also showed higher expression in the resistant biotype than in the susceptible biotype after herbicide treatment (Table 5). Two ACCase belonging to E. crus-galli biotype Geqiushan-S (HQ395758) and Geqiushan-R (HQ395759) were found in NCBI. These genes shared 96% identity with contig04470 and no significant similarity with contig04540. Further studies on the ACCase genes in barnyardgrass are needed to compare the differences among these genes, and determine their role in herbicide resistance. Penoxsulam and bispyribacsodium herbicides, which were used in our study, were ALSinhibiting herbicides [33]. In the barnyardgrass transcriptome and digital expression analysis, higher levels of ALS sequences (contig05544 and G81ESPA01CWAJO) were present in the resistant biotype than in the susceptible biotype after treatment with herbicides (Table 6). Based on real-time PCR experiments, the relative expression of ACCase and ALS genes were analyzed in each untreated and herbicide-treated biotype. Compared to herbicide treated susceptible plants, ALS and ACCase were highly expressed in herbicide treated resistant plants. The expressions of ALS and ACCase genes decreased when the susceptible biotype was treated with herbicides. However, the expression of ACCase declined slightly and ALS increased slightly in the herbicidetreated resistant biotype (Fig. 6). This trend was consistent with the expression profiles of ALS and ACCase sequences in the transcriptome. These genes are therefore regarded as good preliminary targets for further functional studies of herbicide resistance.

Conclusions
The susceptible and resistant biotypes of E. crus-galli, with and without treatment with three herbicides, were used to generate the first large-scale transcriptome sequencing data using the 454 GS-FLX platform. De novo assembly and functional annotation of the assembled contigs and singletons were performed using Newbler. Interestingly, eight herbicide target-site gene groups and four nontarget-site gene groups were identified in the resistant biotype of E. crus-galli that would enable isolation of the full-length genes, which could be involved in herbicide resistance. These genes could also be used for further functional genomics research. Compared to the previously Echinochloa reported genes, four additional genes including RbcS, RbcL, NADP-me and MDH with complete CDS were identified using PCR and RACE technology. This study provides a valuable genetic resource for grasses in general and should be useful for weed control which should result in an improvement in agriculture.

Author Contributions
Conceived and designed the experiments: YFL. Performed the experiments: XY XYY. Analyzed the data: XY. Contributed reagents/materials/ analysis tools: XYY YFL. Wrote the paper: XY.