A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen.


Introduction
During the past decade, a large number of bacterial complete genomes have been finished, providing the sequence basis for transcriptomics and proteomics. The genome sequence contains all the information of the functional elements including genes, non-coding RNAs, and UTRs, which are required for a system-level understanding of the survival mechanisms and pathogenesis of microorganism [1]. However, due to the inherent limitations of the computational methods for prokaryotic gene annotation, most of the present genome sequences are still at the preliminary stage to define accurate gene boundaries, to predict gene functions and regulatory elements, especially small non-coding RNAs (sRNAs) and UTRs. Therefore, to further take advantage of the genome sequences, the original genome annotations need to be revised for accurate identification and demarcation of all the functional elements in a genome.
Transcriptome of bacteria represents the entire transcripts expressed in a certain condition. Genome-wide transcriptome analysis enables researchers to improve the genome structural annotation and reveal the functional and regulatory architecture of the genome [2]. Microarrays have been widely applied to study gene expression patterns. The recently developed RNA sequencing (RNA-seq) is another important technique for transcriptome profiling, as several advantages compared with microarrays, such as low background noise, high throughput and high resolution [3][4][5]. Several recent studies have proven that RNA-Seq can comprehensively re-annotate the bacteria genomes and is useful for understanding the complexity of bacterial transcriptome and identifying previously unknown functional elements [6][7][8]. For example, RNA-Seq based analysis of the transcriptome of the S. Typhi has identified 40 novel non-coding RNAs and re-annotated a number of hypothetical genes [9]. Thirty-eight novel genes, 94 sRNAs and 278 operon structures in the H. somni genome have been characterized by using a transcriptional map generated by RNA-Seq [10].
Actinobacillus pleuropneumoniae, a Gram-negative pathogen of the Pasteurellaceae family, is the primary etiologic agent of porcine contagious pleuropneumonia, a severe respiratory disease causing great economic losses to the worldwide swine industry [11]. The genome of A. pleuropneumoniae JL03 was sequenced 7 years ago [12]. After that, a comparative genomic analysis has been done based on the 12 genomes of serovar reference strains, depicting genomic features associated with serovar diversity [13]. These A. pleuropneumoniae genome annotations were made by computational analysis based on gene prediction algorithms [12]. Thus, the existing annotations of the A. pleuropneumoniae genomes are far from complete. The precise gene structures, transcription units, expressed genes and sRNAs are required to be identified to explore more gene functions and regulatory modes to study A. pleuropneumoniae survival and infection mechanisms. Therefore, in this study, we used deep RNA-seq technology to characterize transcriptome structure of A. pleuropneumoniae. The single nucleotide resolution map generated helped to not only correct the previous annotation errors, but also identify novel protein coding regions, operon structures as well as sRNAs.

Bacterial strain and culture conditions
The bacterial strain used in this study is Actinobacillus pleuropneumoniae JL03 (a clinical isolate of serovar 3 [12]). It was grown at 37°C in tryptic soy broth (TSB) or on tryptic soy agar (TSA) supplemented with 10% (v/v) filtered cattle serum and 10 μg/ml nicotinamide adenine dinucleotide (NAD).

RNA isolation, library construction and sequencing
In preparation for RNA isolation, A. pleuropneumoniae JL03 was cultured with rotation at 180 rpm to mid-log phase (optical density at 600 nm = 1.0). Total RNA was extracted using RNA Midiprep kit (Promega, Madison, USA) according to the manufacturer's protocol. Total RNA was treated with RNase-free DNase (Invitrogen, Carlsbad, CA) to remove traces of genomic DNA. By using Agilient Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA), we checked the RNA integrity number (RIN) of total RNA to be greater than 8. Bacterial 16S and 23S ribosomal RNAs were removed with a MICROBExpress ™ kit (Ambion, TX, USA). Small RNAs (i.e., tRNA and 5S rRNA) were not removed with this enrichment step. The enriched mRNA was processed to a RNA-Seq library using the mRNA-Seq sample preparation kit (Illumina, San Diego, CA) following the manufacturer's instructions. The constructed sequencing library was sequenced using the Illumina HiSeq 2000 platform (Illumina, San Diego, CA) at the Shenzhen Genome Institute (BGI, Shenzhen, China).

RNA reads mapping and visualization
Before mapping of reads to the genome, the raw data were analyzed using FASTQC tool [14] (S1 Fig) and pre-processed to obtain high quality reads. FastQC reports revealed that our samples were successfully sequenced. Raw reads were quality-filtered using a reads trimming tool Trimmomatic with default settings [15]. We checked and cleaned all Illumina raw reads by removing the following types of sequences: those corresponding to adapters, those with low read quality (more than 40% of the reads with a base quality value lower than 20), and those containing more than 10% ''N" bases. All clean reads were mapped to the A. pleuropneumoniae JL03 genome sequence (GenBank Accession number. CP000687) using the Burrows-Wheeler alignment tool (BWA) [16], with up to two base mismatches allowed. The reads that mapped to more than one location were discarded [17]. Only uniquely mapped reads were used for further analysis. The resulting data was transformed into SAM/BAM format with Samtools [18] to generate alignment results in a pileup format, which provided the signal map file and contained the count of reads per base. In-house Perl scripts were written to calculate the background expression. The sequence coverage per base was plotted and visualized using the genome browser Artemis [19].
Analysis of expressed intergenic regions of A.pleuropneumoniae JL03 genome. Prior to the analysis of expressed regions in the genome, we calculated the background expression with two criteria as previously reported [10,20]. In brief, one criterion is that at least half of the intergenic region is assumed as not expressed; the other one criterion is that the coverage depth (reads per base) of expressed reads should be greater than the lower tenth percentile of all reads. Identification of UTR regions was performed by searching for the point of rapid coverage reduction (drop down to 1/10 of the mean coverage value of the gene) in the regions upstream of the start codon and downstream of the stop codon. Expressed intergenic regions (EIRs) within predicted operons [21] are difficult to determine the UTR regions and can be mis-classified as sRNAs. Hence these regions were discarded to minimize the number of false positives. To identify novel protein coding regions and correct start codon locations, sequences of EIRs were searched with BLASTX against sequences in NCBI non-redundant (nr) protein database followed by manual analysis and interpretation. If protein coding region was found in the EIRs, the ORF search was performed using the web program ORF Finder [22]. If no protein coding regions were found and a promoter or a Rho-independent terminator was present, the EIRs were classified as sRNA [10]. The Prokaryotic Promoter Prediction (PPP) program [23] and Transterm HP [24] were used to predict promoters and Rho-independent terminators. For functional annotation, all identified sRNA sequences were aligned with the sequences in the Rfam database [25]. To study the conservation of identified sRNAs among others genomes, the sRNA sequences were searched with blastn against the non-redundant, nucleotide database at NCBI.
Analysis of expressed annotated regions of A.pleuropneumoniae JL03 genome. Expressed reads with coverage above background were mapped onto the annotated genes of A. pleuropneumoniae JL03. Using the annotation information (gene loci) of A. pleuropneumoniae JL03 and the pileup file, the genes that had a significantly higher proportion of their length (at least 60%) covered by expressed reads were considered as to be expressed. Annotated regions below 60% coverage were considered as "not expressed" under the current experimental conditions. Similar measures have been used in other transcriptome profiling studies [26,27]. The functions of genes expressed in the present study were classified according to the Clusters of Orthologous Groups (COG) databases [28]. For the identification of operons, we used the following rules according to previous studies [10,29,30]: (a) All the genes in an operon are expressed; (b) Genes in an operon are transcribed in the same orientation; (c) The intergenic regions between the genes in the operon are expressed. Operon structures identified in this study were compared with the computationally-predicted operon structures described by the Database for prokaryotic OpeRons (DOOR) [21] for verification and comparison.

RNA-Seq data accession number
The RNA-Seq data used in this work have been submitted to the NCBI GEO (Gene Expression Omnibus) database with the accession number GSE70153. (For reviewer link: http://www.ncbi. nlm.nih.gov/geo/query/acc.cgi?token=itctauugrzqtxkp&acc=GSE70153)

Reads mapping to the A. pleuropneumoniae JL03 genome
The complete genome sequence of the A. pleuropneumoniae JL03 was published in 2008 [Gen-Bank: CP000687] [12]. The 2.24 megabase circular genome carries 2,147 computationally predicted genes of which 2,036 are protein coding with a 41.23% G+C content. To obtain a global view of the A. pleuropneumoniae transcriptome at single-nucleotide resolution, we sequenced the transcriptome of the wild type A. pleuropneumoniae JL03 strain using RNA-Seq. RNA was isolated from bacteria harvested at the mid-exponential growth phase (S2 Fig). After removing the low-quality reads, a total of 38,568,297 reads with an average length of 90 bp were obtained. The total length of the reads was about 3.5 Gb, representing about a more than 1700-fold coverage of the A. pleuropneumoniae genome. Totally, 93.22% reads were mapped onto the reference DNA sequence. After estimation of the background expression level using the criteria described in methods, the cutoff value was calculated as 71 reads/bp. Based on this cutoff, we identified 1933 (90.03%) out of the 2147 predicted genes and 1845 out of the 2036 protein-coding genes which were expressed. The functions of the expressed genes were distributed across all categories of COG database [28] (Fig 1). We also identified 1221 previously un-annotated expressed intergenic regions (EIRs) with a minimum length of 30 bp. The transcriptome analysis workflow is represented in

Identification and characterization of novel genes
Overall, 32 novel protein coding regions were identified according to the transcriptome map ( Table 1). The average length of proteins encoded by these regions was around 47 amino acids (ranged from 26 to 90 amino acids). 81.3% of the proteins encoded by the novel genes were annotated as hypothetical proteins, with only six proteins having predicted functions, a DNA methylase, a Leucyl-tRNA synthetase, a phosphoribosylformylglycinamidine cyclo-ligase, two transposases and a naphthoate synthase (Table 1). Fig 3 shows an example of a novel gene named ''APP-P04" encoding a protein that is identical to (100% identities and 100% coverage) a hypothetical protein already annotated in other A. pleuropneumoniae genomes.

Refinement of the existing genome annotation
Based on the RNA-seq data, we obtained the transcriptional evidence of the existing genome annotation. When visualized in Artemis, the transcriptome map helped to identify incorrect annotation of start codons and real length of the annotated proteins. Some annotated proteins had the same termination codons, but the transcriptional start sites identified in this study are different from the annotated start codons, which suggest that the previously annotated start codons are incorrect. This enabled us to correct the start site for 35 genes based on the current genome annotation (Table 2). Then, a comparison with the similar proteins in related species and ORF search were performed to finally confirm the gene boundaries (Table 2). An example is presented for the gene ''APJL_1947", annotated as ''putative formate-nitrite transporter, YrhG" (Fig 4). The actual start codon site was found to be 84 bp upstream of the previous annotated site.

Identification of small non-coding RNAs and UTRs
Bacterial small RNAs (sRNAs) are short (30-500 bp) untranslated transcripts and often act as regulators of gene expression through a variety of mechanisms [31][32][33]. The identification of sRNA in the genome is the foundation to understand their role in modulating bacterial physiology and virulence [34]. In this study, we identified 51 putative sRNAs in the A. pleuropneumoniae JL03 genome ( Table 3). The average length of the identified novel sRNA was approximate 100 bp and ranged from 30 to 303 bp. All identified sRNA had a promoter or terminator adjacent to their locus which can increase the confidence of the sRNA [35,36]. By comparing with the sRNA sequences in the Rfam database, 11 sRNAs of A. pleuropneumoniae were homologous to well characterized sRNAs in other species, such as t44 RNA and Bacter-ia_small_SRP (Bacterial small signal recognition particle RNA) (Fig 5A and 5B). The other identified functional sRNAs categories included 6S, cspA mRNA 5' UTR, GcvB, MOCORNA, Glycine riboswitch, Alpha_RBS and His_leader. On the other hand, 40 sRNAs with no matches in the Rfam database could be predicted as novel sRNAs. Fig 5C shows the depth of coverage for one of the identified novel sRNA ''APP-S4".
To study the conservation of identified sRNAs of A. pleuropneumoniae, the sequences of all predicted sRNA were searched against the non-redundant, nucleotide database at NCBI. Thirty of the sRNAs were highly conserved (>95% identity with 100% coverage) only in A. pleuropneumoniae. A set of 20 sRNAs were conserved in the related Pasteurellaceae family including M. varigena, M. haemolytica, A. suis, B. trehalosi and A. lignieresii. Only 1 sRNA was also present in the species far from A. pleuropneumoniae, such as in Mycoplasma pulmonis and Vibrio.
The bacterial untranslated regions (UTRs) contain regulatory elements to control gene expression [37,38]. The 3'-UTRs can be a source of regulatory RNAs in bacteria [39]. RNAseq allows the analysis of 5'and 3'-UTR regions. We identified 5'UTR for 715 annotated genes and 3'UTR for 384 annotated genes in A. pleuropneumoniae JL03 (S1 Table). One 5'UTR in the transcriptome map is shown in Fig 5D. With further analysis of the identified sRNAs,  11 predicted sRNA were found to be located in the immediate vicinity of a protein-coding region which might be processed from the 5'-or 3'-UTR of the corresponding mRNA.

Characterization of Operon
By using the transcriptome map, the operon structures can be determined at the genome scale [20,40]. To evaluate the accuracy of the operon based on RNA-seq data, we also compared our identified co-expressed genes and operon structures with computationally predicted operons in DOOR. Based on RNA-seq data, 840 co-expressed pairs of genes that could be organized into 351 operons were identified. DOOR has predicted 860 co-expressed genes forming 460 multi-gene operons in A. pleuropneumoniae JL03 (S2 and S3 Tables). Through the comparative analysis, 213 operons found in this study are identical to the predicted operons in DOOR and 104 operons are different. Thirty-four operons based on RNA-seq data are not previously predicted by DOOR (S3 Table). Three typical operon structures visualized on the transcriptome map are displayed in Fig 6. Operon "APP-O24" consisting of four genes, the nrfA, nrfB, nrfC and nrfD encoding nitrite reductase subunits, was predicted by DOOR and also confirmed by RNA-seq (Fig 6A). A new operon "APP-O28" comprised of three genes (APJL_0120, APJL_0121 and APJL_0122) was identified based on RNA-seq data but not predicted in DOOR (Fig 6B). Fig 6C shows the operon "APP-O206" which contains five genes (APJL_1195, APJL_1196, APJL_1197, APJL_1198 and APJL_1199) and is different with the structure in the DOOR database. The overlap number of co-expressed gene pairs between RNA-Seq based and DOOR-based was 658 which validated the computational gene-pair predictions (S2 Table).
The 182 new gene pairs that are co-expressed but not predicted by DOOR well complemented the operon structures of A. pleuropneumoniae (S2 Table).

Discussion
Genome annotations can bridge the gap between sequence and biology, providing a framework to understand the biological process of various bacteria, archaea and eukaryotic species. Comprehensive and accurate annotation of the genomic sequence is crucial for analysis and investigations of functional genes and regulatory elements at systems level. Previous genome sequencing has the limitations that it can not display real transcription regions and non-coding transcripts, which leads to the mis-annotations of genes and neglected numerous functional elements in the intergenic regions, especially small non-coding RNAs (sRNAs). With the accumulation of sequence information and the improvements in bioinformatic techniques, original genome annotations need to be revised to make them more accurate and useful to explore gene functions and regulations. To acquire complete and well described genome annotation, plenty of studies have worked on genome re-annotation with various methods in eukaryotes and prokaryotes [41][42][43]. Both computational and experimental approaches (such as computational methods using various algorithms [44,45], comparative genomics [46,47] and whole genome microarrays [48,49]) have corrected many predicted genes and revealed numerous novel genes. However, identification and demarcation of the boundaries of all the functional elements such as untranslated regions (UTRs), regulatory regions, and operon structures are still big challenges. To overcome these difficulties, whole transcriptome RNA sequencing (RNAseq) has been used to improve the genome annotations [50,51]. In comparison to microarray, RNA-Seq has the advantages on increased dynamic range and low background noise, which has been demonstrated to be effective in the correction of gene annotation [10,36], discovery of novel transcripts and non-coding RNAs [52], lowly expressed transcripts [53] and accurate operon structures [54]. For example, RNA-Seq based transcriptome analysis identified 14 novel protein coding regions, 44 sRNAs and 518 expressed operons in M. haemolytica PHL213 [36]. The A. pleuropneumoniae JL03 genome was sequenced and published in 2008 which provided the foundation for all subsequent analysis of the A. pleuropneumoniae genome and gene functions [55][56][57][58]. But the functional and structural annotations of genomes are inaccurate based on the DNA level. The regulatory elements in the genome have not yet been fully elucidated. In this study, we combined RNA-seq with bioinformatical methods to re-annotate and analyze the A. pleuropneumoniae JL03 genome at the transcriptome level. A single resolution transcription map of the A. pleuropneumoniae JL03 genome was generated which enabled us to re-annotate the existing genome. Ninety-four percent of the RNA-seq reads mapped to the A. pleuropneumoniae JL03 genome, which is comparable with the other RNA-seq studies. Ninety-four percent cDNA reads were assignable to the genome of Bacillus anthracis [9], 95% in Burkholderia cenocepacia [59] and 94% in Histophilus somni [10], which illuminates the appropriateness of using RNA-seq for bacterial transcriptomic studies. The data analysis revealed 32 novel protein coding genes, with the average length around 47 amino acids. These genes were missed in previous annotations possibly due to their relatively short size. Most of the proteins have not been reported or predicted functions. During the genomic sequencebased gene annotation, it is difficult to predict the precise start codon of gene. Through the transcriptome map, we identified 35 genes with incorrect annotation of start codons in the genome. Based on BLAST searching against homologous genes in other bacterial species, we confirmed the actual start codon and defined the actual boundaries. Consequently, the proteins encoded by these genes are longer than previously predicted. Identification of the actual start codon is crucial for understanding and characterizing the proteome. Future work can be done on the protein level to further validate the new annotations and investigate the function of the newly defined proteins.
In bacteria, small regulatory RNAs (sRNAs) play a pivotal role in regulation of the gene expression, controlling a variety of adaptation processes in response to changes in the environment. Although the genome has been sequenced several years ago, no information about sRNAs is available in A. pleuropneumoniae. Because of the small size and low expression level, sRNAs are hard to detect using traditional screening strategy. On account of lack of defined sequence features, computational approach for sRNA prediction is not powerful enough. Previous studies have shown the power of RNA-seq in bacterial sRNA identification [9,60]. In this work, we identified 51 putative sRNAs that were transcribed in mid-log phase of A. pleuropneumoniae. We found that 10 of the 51 sRNAs were homologous to characterized sRNAs. For instance, APP-S17 was classified as T44 RNA, which has been identified in several species of enteric bacteria [48]. The function of this RNA is still unknown. APP-S25 was classified as bacterial signal recognition particle RNA, which is the RNA component of the signal recognition particle (SRP) ribonucleoprotein complex. These results validate our approach. Interestingly,   the 6S RNA, a global regulator of transcription in bacteria, was found in our study with high transcription levels (APP-S6). In E. coli, 6S RNA alters the transcription of most genes in stationary phase. During this phase, 6S RNA accumulates in cells; loss of 6S RNA leads to decreased survival compared to wild-type cells [61]. However, 6S RNA accumulation profiles also vary, suggesting diversity in the time and the location at which it exert effects [62]. Recent studies in E. coli suggest 6S RNA may contribute significantly to transcriptional regulation even when levels are not maximal, such as in exponential phase when 6S RNA is only onetenth as abundant as in stationary phase [62]. In addition to down-regulating some housekeeping genes, 6S RNA has been shown to up-regulate expression of some σ S -dependent genes in Gram-negative bacteria [62]. Taken together, APP-S6 might play roles in regulating the expression of an abundance of genes in exponential phase in A. pleuropneumoniae. We also identified APP-S9 which was classified as GcvB RNA. The GcvB RNA has been found in a range of bacteria such as E. coli [63], Y. pestis [64], H. somni [10] and Salmonella [65]. In E. coli and Salmonella species, the GcvB RNA has been shown to regulate a large number of genes which are involved in the amino acid transport systems and amino acid biosynthesis [63,65,66]. In   E. coli, transcription of the GcvB RNA is activated by the adjacent gcvA gene and regulates genes by acting as an antisense binding partner of the mRNAs for each regulated gene [63]. APP-S9 was located in the upstream of gcvA (APJL_0132) which was annotated as DNA-binding transcriptional activator. Taken together, we hypothesize that APP-S9 may be activated by gcvA (APJL_0132) and have regulatory activity in the amino acid transport systems and amino acid biosynthetic genes in A. pleuropneumoniae. The other 41 novel identified sRNAs may play regulatory roles in basic life and/or virulence as described in other bacteria [67,68]. Among the identified sRNAs, 30 of which were highly conserved (>95% identity with 100% coverage) only in A. pleuropneumoniae implying these sRNA are A. pleuropneumoniae-specific. Only 1 sRNA is present in the species far from Pasteurellceae family. This sRNA may be responsible for the basic survival of many species. The sRNAs present in different species may be functional in adaptation to different environmental conditions. RNA-Seq data provides information to identify the UTRs of annotated genes. Until now, the mapping of UTRs has not been performed on A. pleuropneumoniae which can discover regulatory elements in these regions such as riboswitches, RNA thermometers or small leader peptides. We identified 5'-UTR of 715 annotated genes and 3'-UTR of 384 annotated genes in A. pleuropneumoniae JL03. Most of the identified 5'-UTRs (72%) were very short with the size less than 100 bp, which is similar to other bacterial, such as 75% of the 5'-UTRs for annotated genes were shorter than 100 bp in C. glutamicum ATCC 13032 [54,69]. With further analysis, we found that 9 sRNAs were located in the UTR regions. APP-S20 has a MOCO RNA motif which is presumed to be a riboswitch that binds to molybdenum cofactor or related tungsten cofactor [70]. In E. coli, the riboswitches upstream of moaA gene regulate translation of the moaABCDE operon by blocking ribosome-binding sites [70]. APP-S20 was found 155 bp upstream of moaA (APJL_0688) which was annotated as molybdenum cofactor biosynthesis protein A, implying its regulatory role in molybdenum cofactor biosynthesis. Additionally, one RNA thermometer (APP-S8) was observed within the 5'-UTR of cspC (APJL_0119). The gene cspC encodes a cold shock-like protein. Cold shock proteins are stress induced regulators which is functional in response to low temperature and also in normal conditions [71,72]. The cspC was repressed after acute infection of A. pleuropneumoniae [73] and after exposure to host hormones [74], suggesting that they are involved in infection of A. pleuropneumoniae. In E. coli, the cspA thermosensor acts as an RNA thermometer, regulating the expression of cspA in response to temperature [75]. Hence, APP-S8 is possible a regulator of cspC. A recent study has documented that Gram-negative bacteria contain a conserved motif in their 5'-leader sequences [76]. In A. pleuropneumoniae, we found a small leader peptide (APP-S51) within 5'-UTRs of the operon involved in histidine synthesis (APJL_2069/hisG, APJL_2070/hisD, APJL_2071/hisC2, APJL_2072/-and APJL_2073/hisB). These leader peptides are possibly involved in transcription attenuation of the target genes dependent on the amino acid level [77]. In conclusion, these results imply that UTRs may play regulatory roles in the complicated process of gene expression of A. pleuropneumoniae.
For prokaryotic organisms, identification of operon structures at the genome scale is significant in studying the gene function and the regulatory networks. Based on the RNA-seq data, we identified 351 expressed operons covering 1,230 genes of A. pleuropneumoniae. The majority of the identified operons contain 2-9 genes, which is a general trend in other previously reported bacteria [6,54]. Furthermore, we compared our prediction results with 460 predicted operons in DOOR, and found 213 operons in common which validate our analysis approach. For example, the operon APP-O279 which contains four genes (pgaA, pgaB, pgaC, pgaD) has been characterized as the operon involved in biofilm formation of A. pleuropneumoniae [78]. Operons encoding known virulence genes of A. pleuropneumoniae were also identified, such as cps3ABCD (APJL1611-1614) encoding CPS biosynthetic enzymes, cpxDCBA (APJL1615-1618) encoding proteins involved in the export of CPS and apxIIICABD (APJL1344-1347) encoding the ApxIII structure, activation and secretion proteins [79]. Remarkably, we found that the uncharacterized gene APJL0965 was located in apxII operon which has been reported to consist of apxIICA and a partial apxIIB' without apxIID [79]. The product encoded by APJL0965 may be functional as a secretion protein of ApxII. We also identified 34 operons not predicted by the computational operon prediction method. A new operon APP-O25 containing two genes, putA and putP, annotated as bifunctional proline dehydrogenase/pyrroline-5-carboxylate dehydrogenase and proline symporter/permease was not predicted by DOOR. The functions of putAP gene products are quite conserved among different bacteria [80,81]. In E. coli and S. typhimurium, the orthologs of these genes are well known to form the proline utilization operon [82,83]. Our results presented the first operon map of A. pleuropneumoniae and also demonstrated RNA-Seq can improve operon identification in bacterial genomes.

Conclusion
In summary, a global transcriptome profiling of A. pleuropneumoniae at single nucleotide resolution using deep RNA-seq was presented in this work. This study confirmed previously annotated open reading frames and provided a more accurate map of gene boundaries. The analysis also identified 32 novel protein-coding genes and 51 sRNAs, defined UTRs regions and generated complete operon structures throughout the genome. The analysis results indicated high complexity of the A. pleuropneumoniae transcriptome. The functional elements on a whole genome level identified in this study provide a useful reference transcriptome to comprehensively understand the A. pleuropneumoniae survival and infection process.  Table. Comparison of co-expressed gene pairs identified from RNA-Seq data and DOOR program. Line A has a list of the co-expressed gene pairs predicted by DOOR program. Line B has a list of the co-expressed gene pairs predicted by RNA-seq. Line C has a list of the co-expressed gene pairs common to both DOOR and RNA-Seq. Line D has a list of the coexpressed gene pairs unique in RNA-Seq experiment. (XLS) S3 Table. Comparison of operons predicted from RNA-Seq data and DOOR program. Operons identified by RNA-seq were compared with the operons predicted by DOOR database. The operons along with their ID and genes are listed. Operon type: A-operons common to both DOOR and RNA-Seq; B-operons based on RNA-seq data and not predicted by DOOR; C-operon having different compositions. (XLS)