Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”

  • Ranjit Kumar,

    Affiliations College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, United States of America, Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America, Center for Clinical and Translational Science, University of Alabama at Birmingham, Birmingham, Alabama, United States of America

  • Mark L. Lawrence,

    Affiliations College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, United States of America, Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America

  • James Watt,

    Affiliation Eagle Applied Sciences LLC, San Antonio, Texas, United States of America

  • Amanda M. Cooksey,

    Affiliation Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America

  • Shane C. Burgess,

    Affiliation College of Agriculture and Life Sciences, The University of Arizona, Tucson, Arizona, United States of America

  • Bindu Nanduri

    Affiliations College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, United States of America, Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America

RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”

  • Ranjit Kumar, 
  • Mark L. Lawrence, 
  • James Watt, 
  • Amanda M. Cooksey, 
  • Shane C. Burgess, 
  • Bindu Nanduri


Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.

The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.


Systems biology approaches are designed to facilitate the study of complex interactions among genes, proteins, and other genomic elements [1], [2], [3]. In the context of infectious disease, systems biology has the potential to complement reductionist approaches to resolve the complex interactions between host and pathogen that determine disease outcome. However, a prerequisite for systems biology is the description of the system's components. Therefore, genome structural annotation or the identification and demarcation of boundaries of functional elements in a genome (e.g., genes, non-coding RNAs, proteins, and regulatory elements) are critical elements in infectious disease systems biology.

Bovine Respiratory Disease (BRD) costs the cattle industry in the United States as much as $3 billion annually [4], [5]. BRD is the outcome of complex interactions among host, environment, bacterial, and viral pathogens [6]. Histophilus somni, a gram-negative, pleomorphic species, is one of the important causative agents of BRD [6]. H. somni causes bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis [7]. H. somni strain 2336, the serotype used in this study and isolated from pneumonic calf lung, has a 2.2 Mbp genome and 2044 predicted open reading frames (ORFs), of which 1569 (76%) have an assigned biological function.

Genome structural annotation is a multi-level process that includes prediction of coding genes, pseudogenes, promoter regions, repeat elements, regulatory elements in intergenic regions such as small non-coding RNAs (sRNA), and other genomic features of biological significance. Computational gene prediction methods such as Glimmer [8] or GenMark [9] use Hidden Markov models which are based on a training set of well annotated genes. Although these methods are quite efficient, they often miss genes with anomalous nucleotide composition and have several well-described shortcomings: because bacterial genomes do not have introns, detecting gene boundaries is comparatively difficult; due to the usage of more than one start codon, computational genome annotation methods may predict overlapping ORFs [10]; prediction programs use arbitrary minimum cutoff lengths to filter short ORFs, which may lead to under-representation of small genes. In case of sRNA (small non-coding RNA) prediction, the lack of DNA sequence conservation, lack of a protein coding frame, and the limited accuracy of transcriptional signal prediction programs (promoter/Rho terminator prediction) confound computational prediction [11], [12].

Computational prediction methods are a “first pass” genome structural annotation. Whole genome transcriptome studies (such as whole genome tiling arrays [13], [14], [15] and high throughput sequencing [16], [17]) are complementary experimental approaches for bacterial genome annotation and can identify “novel” genes, gene boundaries, regulatory regions, intergenic regions, and operon structures. For example, a transcriptomic analysis of Mycoplasma pneumoniae identified 117 previously unknown transcripts, many of which were non-coding RNAs, and two novel genes [18]. Transcriptome analyses identified novel, non-coding regions in other species, including 27 sRNAs in Caulobacter crescentus [15], 64 sRNAs in Salmonella Typhimurium [17], and a large number of putative sRNAs in Vibrio cholerae [16]. sRNAs found in pathogen genomes are known to be involved in various housekeeping activities and virulence [19].

In this study we used RNA-Seq for the experimental annotation of the H. somni strain 2336 genome and to construct a single nucleotide resolution transcriptome map. Novel expressed elements were identified, and where appropriate, computational predictions of previously described gene boundaries were corrected.


Mapping of reads onto the H. somni genome

In 2008 the complete genome sequence of the H. somni strain 2336 became available (GenBank CP000947). The 2,263,857 bp circular genome has a GC content of 37.4%, and 87% of the sequence is annotated to coding regions. The genome has 2065 computationally predicted genes, of which 1980 are protein coding. We sequenced the transcriptome of H. somni using Illumina RNA-Seq methodology, and obtained 9,015,318 reads, with an average read length of approximately 76 bp. We mapped approximately 9.4% reads onto the reference DNA sequence of H. somni strain 2336 using the alignment program Bowtie [20]. To determine expressed regions in the genome, we estimated the average coverage depth of reads mapped per nucleotide/base. We used pileup format, which represents the signal map file for the whole genome in which alignment results (coverage depth) are represented in per-base format. Regions where coverage depth was greater than the lower tenth percentile of expressed genes were considered significantly expressed [21]; in the current study, this corresponded to a coverage depth of 7 reads/bp in pileup format.

As another measure for estimating background expression level, we analyzed the coverage in the intergenic regions of the genome. We assumed that at least half of the intergenic region is not expressed (considering the presence of known expressed regions, such as 3′ and 5′ UTR of genes, intergenic region of the operons, and sRNAs) and calculated the coverage, which corresponded to ≤6 reads per base, lower than our first cutoff estimate. We retained the most conservative cutoff for expression, i.e., 7 reads per base for describing the expression map of H. somni. Nucleotides in the genome sequence with coverage depth above our threshold value were considered to be expressed. This resulted in the generation of a whole genome transcriptome profile of H. somni 2336 at a single nucleotide resolution. Figure 1 show the steps involved in the analysis of expressed intergenic regions.

Figure 1. RNA-Seq data analysis workflow for intergenic expression analysis.

Analysis workflow includes identification of novel protein coding genes and sRNAs in the intergenic region of H. somni 2336 genome.

Expression in the intergenic region of the genome

We compared the RNA-Seq based transcriptome map with the available genome annotation to identify expressed, novel, and intergenic regions in the genome. Promoters and terminators were predicted across the genome to add confidence to the identified novel elements. For the first time, we report the identification of 94 sRNAs (Table 1) in the H. somni genome. The start and end for sRNA in Table 1 refer to the boundaries of transcriptionally active regions (TAR, putative sRNAs). Of these, twelve were similar to well-characterized sRNA families that are described in many bacterial species, such as tmRNA, 6S, and FMN (Figure 2). The total of 82 novel sRNAs reported in this study has not been reported earlier. The majority of the identified sRNAs (>75%) were shorter than 200 nucleotides (length range 70–695 nucleotides). The average GC content of sRNA at 39.3% was slightly higher compared with the 37.4% GC content of the genome. Promoters within 50 nt upstream/downstream of the TAR boundaries were predicted for 68 sRNA. Similarly, Rho-independent transcription terminators were predicted within 50 bp upstream/downstream of 40 sRNA. Figure 3 shows the depth of coverage for one of the identified novel sRNA “HS46” viewed in the Artemis genome browser [22].

Figure 2. Identification of sRNA annotated to Rfam.

The figure shows identification of well conserved sRNA “tmRNA” using RNA-Seq based method. “tmRNA” was computationally predicted as a sRNA by Rfam using sequence similarity across other bacterial families.

Figure 3. Identification of a novel sRNA.

A highly expressed sRNA “HS46” found in the intergenic region of H. somni 2336 genome.

Table 1. H. Somni 2336 sRNAs, their genome location, additional features and comparative genomics.

BLAST analysis of the sRNA sequences against the non-redundant, nucleotide database at NCBI revealed that 31 of the sRNA sequences were unique to the H. somni 2336 genome. Another 41 were highly conserved (>95% identity with >95% coverage) only in H. somni strain 129PT, which is a commensal, preputial isolate. A set of 11 sRNAs were conserved in the related Pasteurellaceae family, which includes genomes such as P. multocida, H. influenzae, H. parainfluenzae, and H. ovis. Only 11 sRNAs were conserved in distant bacterial genomes from genera Streptococcus, Clostrodium, Actinobacillus, Vibrio, and others. This lack of sRNA sequence conservation beyond the species could indicate that sRNA sequences are under strong selection pressure, and that they could be responsible for the adaptation of many species to different environmental niches.

We searched all H. somni sRNA sequences against the Rfam database [23] to determine their putative functions. We found that 12 sRNAs were homologs to well characterized sRNAs in other genomes. The identified functional categories included FMN riboswitches, gcvB, glycine, intron_gpII, lysine, alpha_RBS, LR-PK1, isrK, MOCORNA, RNaseP_bact_a, tmRNA, and 6S. sRNAs for which no Rfam function could be predicted represent a completely novel set of non-coding sRNAs. Functions of these novel sRNA need to be determined by further experiments.

Identification and characterization of novel genes

We evaluated the coding potential of all expressed intergenic regions, by conducting BLASTX based sequence searches against the non-redundant protein database at NCBI followed by manual analysis and interpretation. We identified 38 novel protein coding regions (Table 2). The average length of the identified novel proteins was around 60 amino acids (ranged from 19 to 135 amino acids). The majority of the novel proteins (30) were conserved hypothetical proteins present in related species such as H. somni 129PT, M. haemolytica, and H. influenzae. Some of the novel proteins had predicted functions, such as DnaK suppressor protein, toxic membrane protein TnaC, and predicted toxic peptide ibsB3 (Table 2). Figure 4 shows an example of a novel protein “HSP7” that is similar (74% similarity and 100% coverage) to a putative, phage-related DNA-binding protein of Neisseria polysaccharea.

Figure 4. Identification of a novel protein coding gene.

Novel protein coding gene “HSP7” identified using transcriptome analysis shows homology (similarity 74%, sequence coverage 100%) to a phage related DNA binding protein from Neisseria polysaccharea.

Table 2. Novel proteins identified in the H. somni 2336 genome along with closest matching homolog and its annotation.

Corrections made to the existing genome annotation

The single nucleotide resolution map described in this study enabled us to correct the start site for five genes based on the current genome annotation (Table 3). These genes were annotated as phospholipid synthesis protein, ribosomal protein S2, aconitate hydratase 2, peptide chain release factor 2, and DUF411, a protein of unknown function. Based on evidence from RNA-Seq data, we performed a BLAST comparison with other phylogenetically similar proteins to confirm the new gene boundaries (Table 3).

Table 3. Genes with revised coordinate information based on transcriptome map.

Non-functional start codons and frameshifts

The comparison of the transcriptome map of the H. somni genome with predicted proteins revealed the presence of frameshift mutations. Four genes have non-functional start codons, resulting in a predicted protein, truncated at the amino terminus (based on BLAST comparison with homologous proteins in other species), although full length mRNA was present. An example is presented for the gene “HSM_0748”, annotated as “Alpha-L-fucosidase” (Figure S1). The other three genes, HSM_0603, HSM_1666 and HSM_1668, encode a hypothetical protein, type III restriction protein res subunit, and CTP synthase, respectively. Two genes with frameshifts causing protein truncations (based on BLAST comparison with homologous proteins) are HSM_1385 (beta-hydroxyacyl dehydratase, FabA) and HSM_1744 (alcohol dehydrogenase zinc-binding domain protein). The transcriptome map revealed a full length mRNA for these two genes that code for truncated proteins.

Gene expression and operon structures

Our transcriptome map of H. somni identified expression from 1636 (approximately 80%) of the predicted genes. The expressed genes were distributed evenly across all TIGRFAM functional categories (Table S1). The transcriptome map allowed identification of operon structures at a genome scale, critical for identifying co-expressed genes and for understanding coordinated regulation of the bacterial transcriptome. We identified co-expression for 452 pairs (total 730 genes) of H. somni genes (Table S2) that were transcribed together and constituted a minimal operon. By joining consecutive overlapping pairs of co-expressed genes, we identified 278 distinct transcription units (Table S3).

We compared our experimentally identified co-expressed genes with computationally predicted operons. The overlap between computational prediction of co-expressed genes using DOOR [24] and this study was 86% (394 gene pairs) (Table S4). Thus, our dataset validates expression of 394 computational gene-pair predictions. We identified 59 new gene pairs that are co-expressed and were not predicted by DOOR, which could be part of unidentified, new operon structures. For example, further in-depth analysis indicated a new operon consisting of three genes: HSM1354, HSM1355 and HSM1356, annotated as ribosomal protein L20, ribosomal protein L35, and translation initiation factor IF-3 respectively, which were not predicted computationally (Figure 5). The orthologs of these genes are well known to form a functional operon of ribosomal proteins (IF3-L35-L20) in Escherichia coli [25].

Figure 5. Identification of a novel operon structure comprised of three genes: HSM_1354, HSM_1355, and HSM_1356.

The RNA-Seq coverage shows three genes annotated as ribosomal proteins (IF3, L35, and L20) being expressed as a transcription unit.


In this study using RNA-Seq we describe the whole genome transcriptome profile of H. somni 2336, a bovine respiratory disease pathogen. The single nucleotide resolution map helped uncover the structure and complexity of this pathogen's transcriptome and led to the identification of novel, small RNAs and protein coding genes as well as gene co-expression. Prokaryotic genome annotation is performed often using computational gene prediction programs [8], [9]. However, these prediction algorithms are not able to identify the non-coding sRNAs, antisense transcripts, and other small proteins. To overcome the shortcomings of computational genome structural annotation, various experimental methods are used for identification of novel expressed elements [13], [14], [15], [16], [17], [18], [26], [27], [28]. Deep transcriptome sequencing (RNA-Seq) has emerged recently as a method that enables the study of RNA-based structural and regulatory regions at the genome scale. RNA-Seq technology has many advantages compared with existing array based methods for transcriptome analysis. In particular, RNA-Seq does not require probes, so the process is free from probe design issues or bias from hybridization issues. Also, the transcriptome coverage from RNA-Seq is very high [29], [30]. RNA-Seq was demonstrated to be effective for the discovery of bacterial non- coding RNAs, accurate operon definition, and correction of gene annotation [27], [31], [32]. Therefore, in the current study, we used RNA-Seq for profiling H. somni 2336 transcriptome.

Mapping of RNA-Seq reads onto the H. somni genome sequence resulted in more than 94% coverage with at least one read per base. This observation is consistent with the reported 94% genome expression in Bacillus anthracis, 89.5% in Sulfolobus solfataricus, and 95% in Burkholderia cenocepacia, studied under one or more experimental growth conditions using RNA-Seq [32], [33], [34]. These results indicate that most of the bacterial genome sequence is expressed at some basal level. To identify significantly expressed regions above this baseline, we used two alternative methods (discussed in Results section) to estimate the background expression. Both methods yielded similar results (6–7 reads per base). We selected the higher stringency cutoff of 7 reads per base to minimize the number of false positives.

We identified a total of 95 sRNAs in the H. somni genome. Twelve of these were predicted by Rfam [23] and are similar to conserved sRNA (e.g., 6S, tmRNA, FMN) in other bacterial species, which helps validate our approach. The 83 novel H. somni sRNAs may have housekeeping function, regulatory activity, or participate in virulence as described in other pathogenic bacteria [19], [35], [36]. The identified sRNAs did not show any location specific bias across the genome. Similarly, genes known to be associated with virulence are known to be scattered across bacterial genomes [37], [38]. However, the tendency to form clusters was observed with sRNAs, which could indicate that functionally related sRNAs tend to be located in close proximity.

The RNA-Seq based transcriptome map of H. somni identified 38 novel protein coding genes that were missed by the initial annotation. The average length of the proteins coded by these genes exceeds 60 amino acids, suggesting that length based cutoff was not the main reason that these genes were missed by computational gene prediction programs. The novel protein coding genes identified in the current study could serve as a training set to improve gene prediction algorithms.

The transcriptome map helped to identify incorrect annotation of start codons in the genome. Transcriptional mapping does not provide direct evidence of translational start sites. However, location of identified transcriptional start sites suggest that the annotated start codons are incorrect, an observation that is confirmed by BLAST comparisons against homologous genes in other bacterial species. Transcriptional mapping revealed genes where the 5′ untranslated sequence extended well beyond the translational start. BLAST comparisons indicated that these genes have either nonsense or missense base changes relative to homologous genes in other bacterial species, causing apparent “truncated” proteins compared with those in other species. Further work is needed to determine whether these 5′ untranslated regions serve regulatory functions or they are vestigial.

RNA-Seq data enabled us to determine operon structures at a genome scale, and it allowed identification of some operons not predicted by the computational operon prediction method. Operon structures that include genes not expressed under the experimental growth condition used in the current study, could not be identified. Our results support the notion that using a combination of experimental operon identification by RNA-Seq and computational prediction can improve operon identification in bacterial genomes [39].

For the first time, we report the RNA-Seq based transcriptome map of H. somni 2336 and describe novel expressed regions in the genome. Whereas the results are interesting, we are aware of the limitations of the study. Because the RNA-Seq protocol was not strand specific, we could not determine the strand specificity of expressed novel transcripts. Therefore, Table 1 lacks information about sRNA orientation in the genome. Because strand specific information was missing, we could not describe antisense expression in the genome. For protein coding genes, we derived strand specificity based on alignment of the BLAST hit. Despite this shortcoming, we identified novel expressed regions and transcriptional patterns across the whole genome at a high coverage, which is not possible by other transcriptome analysis methods.

Overall, this study describes RNA-Seq based transcriptome map of H. somni for identification of functional elements in a pathogen of importance to agriculture. Our genome-wide survey predicts numerous, novel, expressed regions that need biological characterization for understanding disease pathogenesis. Description of all functional elements in the H. somni system is a prerequisite for conducting holistic systems approaches to understand the complex pathogenesis of bovine respiratory disease.


RNA isolation and sequencing

We propagated H. somni 2336 on three TSA-blood plates (with 5% sheep red blood cells) for 16 hr or until a fresh lawn of cells was visible. IBC approval was not required for acquiring the plates as they were purchased through a commercial vendor: Fisher Scientific (Pittsburgh, PA), and manufactured by Becton Dickinson Diagnostic Systems, (Franklin Lakes, NJ). We washed the plates with brain heart infusion (BHI) broth, adjusted the culture to an OD620 nm = 0.8, and supplemented with RNAprotect reagent. The cells were harvested by centrifugation and stored at −80°C. We extracted total RNA using the RNeasy mini kit (Qiagen, Valencia, CA) following the manufacturer's protocol. Total RNA was treated with RNase-free DNAse (Invitrogen, Carlsbad, CA). Using Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA), we determined the RNA integrity number (RIN) of total RNA to be greater than 8. MICROBExpress™ Kit (Ambion, TX, USA), which specifically removes rRNAs, was used for mRNA enrichment. Small RNAs (i.e., tRNA and 5S rRNA) are not removed with this enrichment step (confirmed by Bioanalyzer).

We used 100 ng enriched mRNA with Illumina mRNA-Seq sample preparation kit (Illumina, San Diego, CA) for library construction following the manufacturer's protocols. Briefly, mRNA was fragmented chemically by divalent zinc cations and randomly primed for cDNA synthesis. After ligating paired-end sequence adaptors to cDNA, we isolated fragments of approximately 200 bp by gel electrophoresis and amplified. We sequenced one nM of mRNA-Seq library on the Illumina GAII (Illumina, San Diego, CA), according to the manufacturer's protocol. Single read sequencing (36 bp) of the clustered flow cell was performed by Illumina's SBS chemistry (v3) and SCS data analysis pipeline v2.4. We used Illumina Real Time Analysis (RTA v1.4.15.0) software for flow-cell image analysis and cluster intensity. Subsequent base-calling was performed using the Illumina GA Pipeline v1.5.1 software.

Mapping and analysis of Illumina reads

We checked all Illumina reads for quality, and removed sequence reads containing “Ns”. Custom perl script was written to convert Illumina reads into fastq format. The script “” from MAQ [40] converted fastq format to Sanger fastq format. Reads in sanger fastq format, were mapped onto the Histophilus somni 2336 genome sequence (GenBank Accession number. CP000947) using the alignment tool Bowtie [41], allowing for a maximum of two mismatches. The reads that mapped to more than one location were discarded. We used Samtools [42] to convert data into SAM/BAM format, and to generate alignment results in a pileup format. Pileup format provides the signal map file and has per-base format coverage. Custom perl scripts were written to calculate the background expression. Processed data was deposited in GEO with the accession number GSE29578.

Analysis of intergenic regions of H. somni genome

We used in-house perl scripts to extract novel expressed intergenic regions to identify novel small RNAs, riboswitches, and putative novel proteins. sRNA <70 bp in length were discarded to minimize the number of false positives. For each novel expressed region, BLAST sequence searches were performed against the non-redundant protein database at NCBI to identify potential protein coding regions. Intergenic regions within predicted operons [24] represent expressed regions and can be mis-classified as sRNAs. Therefore, these regions were excluded. We analyzed BLAST results manually, to identify novel protein coding regions and start codon corrections. If no protein coding region was found in the intergenic expressed regions, the presence of a promoter or a rho-independent terminator allowed us to classify the regions as sRNA. Bacterial promoter sequences were predicted by Neural Network Promoter Prediction program ( [43]. Rho-independent transcription terminators were identified using the program TransTermHP [44]. For functional annotation, all identified identified sRNA sequences were searched against the Rfam database [23]. sRNA sequence conservation among other genomes was determined by blastn searches against non-redundant nucleotide database at NCBI. We mapped sRNAs, along with additional features, onto genome browsers like IGV [45] and Artemis [46] for further visualization, manual analysis, and interpretation.

Analysis of annotated regions of H. somni genome

Gene expression: expressed reads with coverage above background were mapped onto the annotated genes of H. somni 2336. Genes that had a significantly higher proportion of their length (>60%) covered by expressed reads were considered to be expressed.

Operons: RNA-Seq can identify and predict operon structures in bacteria. We considered two or more consecutive genes to be part of an operon, if they fulfilled the following criteria: (a) they are expressed; (b) they are transcribed in the same direction; and (c) the intergenic region between the genes is expressed. Overlapping pairs of such genes were joined together to identify large operon structures. We used in-house perl scripts for the analyses.

Supporting Information

Figure S1.

Mutated start codon. The Figure shows that the predicted protein coding frame (MH_748) is shorter at the 5′ end than the corresponding transcript level shown by the RNA-Seq coverage. Although the transcript is longer near 5′ end, no start codon is found in that region which might be a result of the mutation in that region of the start codon. This was further validated using homology searches of the full length transcript which shows high homology (95% Identity and >95% coverage) to a alpha-L-fucosidase protein from M. haemolytica PHL213.


Table S1.

H. somni genes expressed in the present study according to the TIGRFAM categories.


Table S2.

Pairs of co-expressed genes identified in H. somni 2336 genome by RNA-Seq data analysis.


Table S3.

Transcription units identified by joining co-expressed genes in H. somni 2336.


Table S4.

Comparison of co-expressed gene pairs identified from RNA-Seq data and operon prediction program “DOOR”.



We thank Dr. John Harkness and Dr. Stephen B. Pruett for editing the final version of the manuscript.

Author Contributions

Conceived and designed the experiments: SB ML BN. Performed the experiments: JW AMC. Analyzed the data: RK SB ML BN. Contributed reagents/materials/analysis tools: RK BN AMC JW ML. Wrote the paper: RK BN SB ML. Wrote the script used for data analysis: RK.


  1. 1. Forst CV (2006) Host-pathogen systems biology. Drug Discov Today 11: 220–227.
  2. 2. Aderem A, Adkins JN, Ansong C, Galagan J, Kaiser S, et al. (2011) A systems biology approach to infectious disease research: innovating the pathogen-host research paradigm. MBio 2:
  3. 3. Peng X, Chan EY, Li Y, Diamond DL, Korth MJ, et al. (2009) Virus-host interactions: from systems biology to translational research. Curr Opin Microbiol 12: 432–438.
  4. 4. Kapil S, Basaraba RJ (1997) Infectious bovine rhinotracheitis, parainfluenza-3 and bovine respiratory coronavirus. Veterinary Clinics of North America: Food Animal Practice 13: 455–469.
  5. 5. Griffin D (1997) Economic impact associated with respiratory disease in beef cattle. Vet Clin North Am Food Anim Pract 3: 367–377.
  6. 6. Ellis JA (2001) The immunology of the bovine respiratory disease complex. Vet Clin North Am Food Anim Pract 17: 535–550, vi–vii.
  7. 7. Kuckleburg CJ, Sylte MJ, Inzana TJ, Corbeil LB, Darien BJ, et al. (2005) Bovine platelets activated by Haemophilus somnus and its LOS induce apoptosis in bovine endothelial cells. Microb Pathog 38: 23–32.
  8. 8. Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26: 544–548.
  9. 9. Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29: 2607–2618.
  10. 10. Palleja A, Harrington ED, Bork P (2008) Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics 9: 335.
  11. 11. Kulkarni RV, Kulkarni PR (2007) Computational approaches for the discovery of bacterial small RNAs. Methods 43: 131–139.
  12. 12. Backofen R, Hess WR (2010) Computational prediction of sRNAs and their targets in bacteria. RNA Biol 7:
  13. 13. Tjaden B, Saxena RM, Stolyar S, Haynor DR, Kolker E, et al. (2002) Transcriptome analysis of Escherichia coli using high-density oligonucleotide probe arrays. Nucleic Acids Res 30: 3732–3738.
  14. 14. Akama T, Suzuki K, Tanigawa K, Kawashima A, Wu H, et al. (2009) Whole-genome tiling array analysis of Mycobacterium leprae RNA reveals high expression of pseudogenes and noncoding regions. J Bacteriol 191: 3321–3327.
  15. 15. Landt SG, Abeliuk E, McGrath PT, Lesley JA, McAdams HH, et al. (2008) Small non-coding RNAs in Caulobacter crescentus. Mol Microbiol 68: 600–614.
  16. 16. Liu JM, Livny J, Lawrence MS, Kimball MD, Waldor MK, et al. (2009) Experimental discovery of sRNAs in Vibrio cholerae by direct cloning, 5S/tRNA depletion and parallel sequencing. Nucleic Acids Res 37: e46.
  17. 17. Sittka A, Lucchini S, Papenfort K, Sharma CM, Rolle K, et al. (2008) Deep sequencing analysis of small noncoding RNA and mRNA targets of the global post-transcriptional regulator, Hfq. PLoS Genet 4: e1000163.
  18. 18. Guell M, van Noort V, Yus E, Chen WH, Leigh-Bell J, et al. (2009) Transcriptome complexity in a genome-reduced bacterium. Science 326: 1268–1271.
  19. 19. Livny J, Waldor MK (2007) Identification of small RNAs in diverse bacterial species. Curr Opin Microbiol 10: 96–101.
  20. 20. Norrby SR, Nord CE, Finch R (2005) Lack of development of new antimicrobial drugs: a potential serious threat to public health. Lancet Infect Dis 5: 115–119.
  21. 21. Bumann D (2010) Pathogen proteomes during infection: A basis for infection research and novel control strategies. J Proteomics 73: 2267–2276.
  22. 22. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, et al. (2000) Artemis: sequence visualization and annotation. Bioinformatics 16: 944–945.
  23. 23. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, et al. (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 33: D121–124.
  24. 24. Mao F, Dam P, Chou J, Olman V, Xu Y (2008) DOOR: a database for prokaryotic operons. Nucleic Acids Res.
  25. 25. Nannini E, Murray BE, Arias CA (2010) Resistance or decreased susceptibility to glycopeptides, daptomycin, and linezolid in methicillin-resistant Staphylococcus aureus. Curr Opin Pharmacol 10: 516–521.
  26. 26. Tisserant E, Da Silva C, Kohler A, Morin E, Wincker P, et al. (2011) Deep RNA sequencing improved the structural annotation of the Tuber melanosporum transcriptome. New Phytol 189: 883–891.
  27. 27. Martin J, Zhu W, Passalacqua KD, Bergman N, Borodovsky M (2010) Bacillus anthracis genome organization in light of whole transcriptome sequencing. BMC Bioinformatics 11: Suppl 3S10.
  28. 28. Kumar R, Shah P, Swiatlo E, Burgess SC, Lawrence ML, et al. (2010) Identification of novel non-coding small RNAs from Streptococcus pneumoniae TIGR4 using high-resolution genome tiling arrays. BMC Genomics 11: 350.
  29. 29. Croucher NJ, Thomson NR (2010) Studying bacterial transcriptomes using RNA-seq. Curr Opin Microbiol 13: 619–624.
  30. 30. van Vliet AH (2010) Next generation sequencing of microbial transcriptomes: challenges and opportunities. FEMS Microbiol Lett 302: 1–7.
  31. 31. Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, et al. (2009) A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet 5: e1000569.
  32. 32. Yoder-Himes DR, Chain PS, Zhu Y, Wurtzel O, Rubin EM, et al. (2009) Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc Natl Acad Sci U S A 106: 3976–3981.
  33. 33. Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, et al. (2009) Structure and complexity of a bacterial transcriptome. J Bacteriol 191: 3203–3211.
  34. 34. Wurtzel O, Sapra R, Chen F, Zhu Y, Simmons BA, et al. (2010) A single-base resolution map of an archaeal transcriptome. Genome Res 20: 133–141.
  35. 35. Toledo-Arana A, Repoila F, Cossart P (2007) Small noncoding RNAs controlling pathogenesis. Curr Opin Microbiol 10: 182–188.
  36. 36. Papenfort K, Vogel J (2010) Regulatory RNA in bacterial pathogens. Cell Host Microbe 8: 116–127.
  37. 37. Chen L, Yang J, Yu J, Yao Z, Sun L, et al. (2005) VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res 33: D325–328.
  38. 38. Sandal I, Inzana TJ (2010) A genomic window into the virulence of Histophilus somni. Trends Microbiol 18: 90–99.
  39. 39. Brouwer RW, Kuipers OP, van Hijum SA (2008) The relative value of operon predictions. Brief Bioinform 9: 367–375.
  40. 40. Pitout JD, Laupland KB (2008) Extended-spectrum beta-lactamase-producing Enterobacteriaceae: an emerging public-health concern. Lancet Infect Dis 8: 159–166.
  41. 41. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.
  42. 42. Boucher HW, Talbot GH, Bradley JS, Edwards JE, Gilbert D, et al. (2009) Bad bugs, no drugs: no ESKAPE! An update from the Infectious Diseases Society of America. Clin Infect Dis 48: 1–12.
  43. 43. Reese MG (2001) Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem 26: 51–56.
  44. 44. Kingsford CL, Ayanbule K, Salzberg SL (2007) Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol 8: R22.
  45. 45. Kozhenkov S, Sedova M, Dubinina Y, Gupta A, Ray A, et al. (2011) BiologicalNetworks–tools enabling the integration of multi-scale data for the host-pathogen studies. BMC Syst Biol 5: 7.
  46. 46. Sturdevant DE, Virtaneva K, Martens C, Bozinov D, Ogundare O, et al. (2010) Host-microbe interaction systems biology: lifecycle transcriptomics and comparative genomics. Future Microbiol 5: 205–219.