Complete Genome Analysis of a Haemophilus parasuis Serovar 12 Strain from China

Haemophilus parasuis is the etiological agent of Glässer's disease in pigs and 15 standard serovars were identified. The widespread disease causes great economic loss in the swine industry worldwide. Aiming to investigate the differences in genome composition and functions among various strains, a highly virulent strain ZJ0906 of H. parasuis serovar 12 from China was analyzed and compared with serovar 5 SH0165. Strain ZJ0906 genome is 2,324,740 base pairs with 40.06% genomic GC content. It contains 2,484 open reading frames (ORF) predicted by Glimmer 3.02, of which 2,352 (∼94.7%) were annotated by NCBI nr blast, 1,745 by COG database and 1,829 by KEGG database. 109 potential virulence factors were annotated in strain ZJ0906 and 3 of which are potentially related to antibiotic resistance. Strain ZJ0906 genome is ∼55 kilobases longer than SH0165 genome, with an extra 211 predicted ORFs. VFDB, ARDB, and PAIDB blast searches showed that ZJ0906 and SH0165 shared a nearly identical panel of potential virulence factors, drug resistant genes and four PAI-like regions which showed high homology to Enterococcus, Escherichia and Salmonella. Synteny analysis showed that gene rearrangements are frequent between the two strains, which may lead to variations in pathogenicity and cross-protection among serovars. KEGG pathway analyses showed strain ZJ0906 shared similar metabolic pathways to strain SH0165. Molecular identification of these genomic elements and potential virulence factors pave the way to the better understanding of mechanisms underlying metabolic capabilities and pathogenicity of H. parasuis and prospective vaccine targets besides the widely used method of inactivated bacteria.


Introduction
Haemophilus parasuis (H. parasuis) is an important respiratory-tract pathogen in pigs and the etiological agent of porcine polyserositis, polyarthritis and meningitis, known as Glä sser's disease [1].To date, 15 H. parasuis serovars with apparent differences in virulence have been described [2]. However, a large number of non-typable isolates are frequently reported [3,4]. Inconsistent cross-protection among serovars is one of the major problems for the control of Glä sser's disease by means of bacterins [5]. It is essential to know the prevalent serovars in a given area to effectively control Glä sser's disease, as vaccine immunity confers only limited crossserovar protection [6]. Serovars 4 and 5 were the most prevalent serovars, followed by serovars 13, 14 and 12 in China [7,8]. While in Brazil, serovar 4 was the most prevalent serovar, followed by serovars 5, 14, 13 and 2 [9]. Genomic characterization of H. parasuis serovar 5 was completed recently [10,11], but the differences in genomic structure and composition among different serovars and whether these differences associated with pathogenesis and antigenic variations are not clear.
In China, Glä sser's disease is common among finishing pigs and causes great economic loss. Many type strains have been isolated by our laboratory from pigs with clinical symptoms of high fever. In this study, the genome of the field strain ZJ0906 of H. parasuis, previously confirmed as serovar 12, was sequenced and compared with serovar 5, with focus on the investigation of potential virulence factors, drug resistant genes and pathogenicity-like islands in H. parasuis.

Bacterial strain and genome DNA extraction
The H. parasuis field strain serovar 12 ZJ0906 were cultured in tryptic soy agar (TSA) or tryptic soy broth (TSB; OXOID, Hampshire, ENGLAND) supplemented with 10 mg/ml nicotinamide adenine dinucleotide (NAD) and 5% bovine serum. Bacterial genomic DNA was purified with the QIAGEN Blood & Cell Culture DNA Kit, following manufacturer's instruction.
Pyrosequencing and complete genome assembly of H. parasuis ZJ0906 To confirm the purity of the genomic DNA of H. parasuis ZJ0906, 16S rDNA-specific region was amplified and 20 individual positive clones were sequenced by Genetic Analyzer 3130 (Invitrogen, Grand Island, US). BLASTn analysis [12] revealed that rDNA sequences have high similarity to those from various serovars of Haemophilus parasuis publicly accessible. The quality and quantity of genomic DNA were evaluated by 0.7% agarose gel electrophoresis and Nanodrop2000 (Thermo Scientific, Waltham, US), and using the Quant-iT Picogreen dsDNA kit (Invitrogen), respectively.
A whole genome shotgun library was generated with 500 ng of ZJ0906 genomic DNA. The shotgun sequencing procedure was performed using 454 GS Junior General Library Preparation Kit, following the manufacturer's instruction (Roche, Basel, Switzerland). In addition, an 8 kb-span paired end library was generated with 15mg of genomic DNA. The paired end sequencing procedure was performed using 454 GS Junior Paired end Library Preparation Kit, following the manufacturer's instruction (Roche). Paired end reads were used as orientation guide for assembling the contigs into scaffolds. The DNA libraries were amplified by emPCR and sequenced by FLX Titanium sequencing chemistry (Roche). One shotgun run and one paired end run were performed on individual libraries prepared with the same genomic DNA sample. After sequencing, the raw data were assembled by Newbler 2.7 (Roche) with default parameters. Primer pairs were designed along the sequences flanking the gap regions for PCR gap filling. The complete genome was submitted to NCBI Genbank and is publicly accessible (accession no: CP005384).
Genome annotation of H. parasuis ZJ0906 Glimmer 3.02 [13] was used for gene prediction in H. parasuis ZJ0906 complete genome. All predicted ORF sequences were translated into amino acid sequences by in-house Perl scripts. BLASTp [14] was applied to align the amino acid sequences against the NCBI non-redundant (nr) database (January, 2013). Amino acid sequences with alignment length over 90% of its own length and over 40% match identity were chosen and the description of the best hit (with highest alignment length percentage and match identity) was assigned as the annotation of predicted gene. Intergenic regions were annotated by RepeatMasker (http://www.repeatmasker.org) with default parameters.  [15] using Glimmer-predicted H. parasuis ZL0906 genes as queries against each of the 8 complete, annotated genomes as database. Predicted genes of H. parasuis ZJ0906 which were found as a single copy, and with 90% minimum alignment length against the other 8 bacteria were designated as the core genes. All 26 core genes were then aligned by MUSCLE [16] and concatenated. A Bayesian phylogenetic tree was constructed using MrBayes [17] using the consequent concatenated genes as the dataset and GTR+G+I as the substitution model. The chain length was set to 10,000,000 (1 sample/1000 generations) whilst the burn-in was set as 2000 after checking on the trace files of two independent runs with Tracer v1.4 (http://tree.bio.ed.ac.uk/software/tracer/).
For comparison within the species of H. parasuis, reciprocal BLAT was performed between the 3 strains, and numbers of orthologs (single copies in each strain) shared between them were calculated by in-house Perl scripts.

COG analysis of H. parasuis ZL0906
BLASTp [14] was applied to align the amino acid sequences against the COG database [18]. Amino acid sequences with alignment length over 90% of its own length and over 20% match identity were chosen and the description of the best hit (with highest alignment length percentage and match identity) was assigned as the annotation of predicted gene. All annotated genes were then classified based on their COG classes. COG-annotated genes of strain ZJ0906 were compared to that of strain SH0165.
Virulence gene and pathogenicity island analysis of H. parasuis ZJ0906 BLASTp [19] was applied to align the amino acid sequences against the VFDB database [20]. Amino acid sequences with alignment length over 90% of its own length and over 20% match identity were chosen and the description of the best hit (with highest alignment length percentage and match identity) was assigned as the annotation of predicted gene.
Strain ZJ0906 virulence factors and PAI-like genes were compared to that of strain SH0165.

Drug resistant gene analysis of H. parasuis ZJ0906
BLASTp [12] was applied to align the amino acid sequences against the ARDB database [22]. Amino acid sequences with alignment length over 90% of its own length and over 40% match identity were chosen and the description of the best hit (with highest alignment length percentage and match identity) was assigned as the annotation of predicted gene. All annotated genes were designated by the antibiotics to which they render the bacteria resistance. Comparison of antibiotics resistance genes were carried out between strain ZJ0906 and the other 8 bacteria chosen.

Potential horizontal transferring gene analysis of H. parasuis ZJ0906
BLASTp [12] was applied to align the amino acid sequences against the ACLAME database [23].Amino acid sequences with alignment length over 90% of its own length and over 40% match identity were chosen and the description of the best hit (with highest alignment length percentage and match identity) was assigned as the annotation of predicted gene. All annotated genes were classified according to their corresponding potential horizontal transferring vectors (''virus'' or phages in bacteria, plasmid or prophage). Horizontal transferring genes of strain ZJ0906 were compared to that of strain SH0165.

Pathway analysis of H. parasuis ZJ0906
Glimmer-predicted ORF sequences of strain ZJ0906 were translated into amino acid sequences by in-house Perl scripts. All sequences were submitted to KEGG database [24] for automatic pathway annotation (http://www.genome.jp/kaas-bin/kaas_main).
All annotated pathways were manually downloaded and curated by in-house Perl scripts.

Results and Discussion
Complete genome sequencing and assembly of H. parasuis ZJ0906 Haemophilus parasuis ZJ0906 genome was sequenced and its complete de novo assembly was achieved by one shotgun run and one 8 kb-span paired end run via Roche GS Junior, and the follow-up PCR gap filling and Sanger sequencing. A total of 137,580 raw shotgun reads (59,446,391 bases) and 91,425 raw paired end reads (28,335,685 bases) were generated by respective pyrosequencing runs, in which ,99.60% and ,97.10%, respectively, were aligned into 258 contigs and 10 scaffolds, resulting in an average sequencing depth of ,34-fold. The average read lengths for the shotgun and paired end run are ,432 base pair (bp) and ,310 bp, respectively. The size of the largest scaffold is 2,326,318 bp which contains 125 large contigs and the N50 contig is 32,107 bp long, suggesting that this raw assembly is highly continuous. The complete circular genome of H. parasuis ZJ0906 was found to be 2,324,740 bp in length, with genomic GC content of 40.06% after PCR gap-filling by Sanger sequencing, highly similar to the previously published complete genome of H. parasuis SH0165 [11].

Genome annotation of H. parasuis ZJ0906
The H. parasuis ZJ0906 chromosome encodes 2,484 predicted genes (Glimmer 3.02), in which 2,325 (,93.60%) was annotated by BLASTp search via NCBI non-redundant (nr) database ( Table 1). The full annotation result was attached as File S1. 54 tRNA genes and 18 rRNA genes were found in H. parasuis ZJ0906 genome. Majority of them were arranged as large RNA islands -5 rRNA islands (loci located on nucleotide positions of 419,486 to 423,366 bp, 1,550,077 to 1,554,037bp, 1,618,551 to 1,622,580bp, 1,702,915 to 1,706,912bp and 1,950,682 to 1,954,711bp respectively) and 4 tRNA islands (located on 25,369 to 25,537bp, 1,864,190 to 1,864,824bp, 1,950,075 to 1,953,830bp, and 2,245,963 to 2,246,216bp respectively) (Fig. 1). Strain ZJ0906 genome contains the same number of tRNA and rRNA genes as strain SH0165 genome. Full annotation of repetitive sequences, such as low-complexity repeats, interspersed repeats and RNA regions is attached as File S2.
As shown in File S3, synteny between the two complete genomes of strains ZJ0906 and SH0165 is not very conserved. Chromosomal rearrangement was commonly observed along the whole stretch of the two genomes, especially a large proportion of the ,1250-1425 kb region in strain SH0165 genome could not be matched to strain ZJ0906 genome (File S3).
On the other hand, amongst the other members within the family Pasteurellaceae, only modest numbers (,100) of orthologs to H. parasuis ZJ0906 were found, with the exception of Actinobacillus pleuropneumoniae serovar 3 JL03 which have 142 orthologs, even more than that found in species under the same Genus of Haemophilus (Table 2). In fact, only 45 genes are shared between the Haemophilus spp yet fully sequenced. The 26 core genes shared among H. parasuis ZJ0906 and the 8 closely related species with complete genomes were aligned and randomly concatenated before phylogenetic tree construction by MrBayes. Phylogenetic analysis showed that H. parasuis ZJ0906 shares the closest evolutionary origin to strain SH0165 as expected (Fig. 3). Interestingly, similar to number of orthologs (Table 2), H. parasuis ZJ0906 displays closer evolutionary relationship to A. pleuropneumoniae serovar 3 JL03 than to other Haemophilus species sequenced (Table 2), as previously mentioned for H. parasuis serovar 5 SH0165 by Xu's group [11]. Similar to H. parasuis, A. pleuropneumoniae is the etiological agent of a widespread severe pig disease, pleuropneumonia, which is commonly transmitted via COG analysis of H. parasuis ZJ0906 Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Thus, identification of orthologs is critical for reliable prediction of gene functions in a newly sequenced genome. NCBI COG database contains clusters of orthologous groups which provides genome-scale analysis of protein function prediction. 1,745 (,70.25%) out of 2,484 Glimmer-predicted genes of H. parasuis ZJ0906 can be found in NCBI COG database (Table 1). COG-annotated class distribution for H. parasuis ZJ0906 was illustrated and the top 10 COG classes were annotated in Fig. 4. Majority of the genes, as expected, were involved in basic cellular functions, such as replication, transcription and metabolism, however, up to ,19.68% of them only have predicted or unknown functions in COG database ( Table 3). The single-letter COG class distribution of H. parasuis strains ZJ0906 and SH0165 was compared in Table 3. Differences in numbers of COG-annotated genes associated with the posttranslational modification, protein turnover, chaperones (COG class [O]), the defense mechanisms (COG class [V]), and the replication, recombination and repair (COG class [L]) were noted between the two strains. Difference in posttranslational modification and protein turnover may be associated with antigenic variations and cross protections across different strains, while discrepancy over the number of genes involved in defense mechanisms, DNA replication, recombination and repair may be attributed to their different adaptation to hosts. For the full COG functional annotation, please refer to File S4.

Virulence gene and pathogenicity island analysis
Virulence genes of pathogenic bacteria, which code for toxins, adhesins, invasins or other virulence factors, may be located on transmissible genetic elements such as transposons, plasmids or bacteriophages [25]. Fifteen serovars of H. parasuis have been identified, serovars 1, 5, 10, and 12-14 may lead to the death of pigs and are considered to be highly virulent; serovars 2, 4, 8 and 15 are virulent, causing lesions in pigs, but serovars 3, 6, 7, 9 and 11 are considered to be avirulent [2]. To date, the relationship between serovars and virulence is not clear, especially the differences in mechanisms related to pathogenicity among highly virulent strains in H. parasuis, thus virulence genes and pathogenicity islands of strain ZJ0906 were analyzed and compared with strain SH0165.
In addition to the VFDB search, a list of potential virulence factors previously mentioned in the literature was compiled and  identified in strain ZJ0906 genome. These include genes involved in bacterial adherence such as the type IV fimbriae-like structureencoding gene cluster, pilA/B/C/D, and pilF, and genes associated with surface LPS biosynthesis including the putative gene clusters related to major surface-exposed O-specific antigen biosynthesis previously hypothesized in SH0165 genome (Table 4). Pathogenicity-associated islands (PAIs) are distinct class of genomic islands where virulence genes have accumulated on the bacterial chromosome. PAIs, and their associated virulence genes, have spread among bacterial populations by horizontal gene transfer [25]. Four PAI-like regions were annotated by PAI finder (https://www.gem.re.kr/paidb/about_paidb.php?m = h) in strain ZJ0906 genome (Fig. 1), in which two of them contained considerable number of potential homologs of previously identified PAI-virulence genes ( Table 5). In particular, homologs to all 4 putative virulence ORFs in E. coli pathogenicity island 1 (PAI 1) and Salmonella pathogenicity island 1 (SPI-1), which encode for potential chelated iron ABC transporter periplasmic-binding protein, chelated iron ABC transporter permease, chelated iron ABC transporter ATP-binding protein and ribosome-associated GTPase, were identified in PAI-L01 ( Fig. 1 and Table 4). PAIs I to IV from E.coli strain 536 (I 536 to IV 536 ) encode a range of virulence factors, including P fimbriae, P-related fimbriae, a-hemolysin, S fimbriae, and the yersiniabactin siderophore system [29]. Similarly, as an indispensable virulence determinant, Salmonella pathogenicity island 1 (SPI-1) has gained much attention in host-pathogen interactions. It not only affects sophisticated activities during infection, including invasion, replication, and  Table 3. Percentages of the top ten classes are labeled for easy reference. doi:10.1371/journal.pone.0068350.g004 host responses, but also extends to other virulence-related aspects like biofilm formation [30]. The PAI 1/SPI-1-like region is also present in strain SH0165 genome, suggesting H. parasuis may manifest its pathogenicity by iron acquisition and persistent infection in hosts (Table 5). Since match identities of PAIvirulence genes to previously studied potential homologs only range from ,44-77% (data not shown), further studies e.g. geneknockout studies are necessary in elucidating the contribution of these PAI-like regions to pathogenicity in H. parasuis, especially in the investigation of correlation between the differences in PAIs and pathogenicity among highly virulent, virulent and avirulent strains of H. parasuis.
Details of the common virulence genes found between the 3 strains and information on annotated ORFs in PAI-L01 and PAI-L04 with potential homologs of virulence genes were included as File S6 and 7, respectively.

Potential drug resistant gene annotation
Drug resistance is an evolutionary strategy in bacteria, which is often associated with horizontal gene transfer, such as plasmids and expression of some enzymes. H. parasuis is one of the most important swine pathogens, antimicrobial treatment is usually the most common, economical and effective means of disease control in husbandry. Although antimicrobial therapy is widely available for the prevention and control of clinical infections, the rising number of antibiotic resistant isolates is a growing global health concern for both human and animal populations [31]. Antimicrobial susceptibility studies of H. parasuis showed that most of them were resistant to penicillin, ciprofloxacin and trimetho-prim+sulfamethoxazole [31,32]. Our genomic analysis showed that 3 (0.12%) out of 2,484 Glimmer-predicted genes can be identified in ARDB database. Comparative study of H. parasuis strains ZJ0906 and SH0165 shows that both serovars have potential resistance against three kinds of antibiotics, i.e. cipro-   floxacin, trimethopimand and penicillin (Table 6 and Figure 6). These results are in line with the previous experiments in H. parasuis species [31]. Surprisingly, A. pleuropneumoniae has the same drug resistant genes with H. ducreyi and shares a similar panel (3 out of 5) of drug resistant genes with the two serovars of H. parasuis. This may be explained by the close ancestral origins between A. pleuropneumoniae and the 2 Haemophilus spp. -H. ducreyi and H. parasuis, as observed in our phylogenetic tree (Fig. 3). Given their common habitat in the pig respiratory tracts, horizontal gene transfer between A. pleuropneumoniae and H. parasuis, especially in pig farms where antibiotics are widely and commonly applied may act as a selective pressure for antibiotic tolerance strains and pose potential risks to global husbandry.

Potential horizontal transferring genes analysis
It has been suggested that the integration of phage elements, as a strategy of horizontal gene transfer, play a potentially important role in genetic diversity and virulence variations in many bacteria [33]. The phage-related genes found in the H. parasuis ZJ0906 genome may also be a putative contributor to virulence and inheritance differences. For H. parasuis ZJ0906, 615 (24.76%) out of 2,484 Glimmer-predicted genes were annotated via Blastp search against the ACLAME database. Amongst these, 31 genes are potentially derived from phages, 391 genes from plasmids and 193 genes from prophages (Table 1), which are comparatively more than predicted in strain SH0165 genome. As previously noted in the synteny illustration (File S3), although the numbers were similar, the localization of prophage islands in the two genomes varies. Instead of an organization of three major islands (sizes over 9 kb) as described previously in strain SH0165 genome [11], phage-related genes are comparatively more scattered throughout strain ZJ0906 genome except from the three regions designated as the ,5.8-kb Proph-I01 (from nucleotide positions of 728,879bp to 734, 713bp), ,7.1-kb Proph-I02 (from 1,637,525bp to 1,644,584bp) and ,47.77-kb Proph-I03 (from 1,768,270bp to  1,816,039bp). As strains ZJ0906 and SH0165 include ,9.02% and ,8.76% (of total Glimmer-predicted genes) of phage-related genes and some of the potential virulence factors annotated via VFDB database show overlapping with potential horizontal transferring genes (Fig. 1), implying that horizontal gene transfer may contribute to genetic variations and virulence variations among different strains.

Pathway analysis of H. parasuis ZJ0906
KEGG-annotated genes found only in H. parasuis strains ZJ0906 and SH0165 respectively were listed in Table 7 and the full list of KEGG database-annotated genes were attached as File S8.
1,829 genes were annotated and the metabolic pathways including glycolysis and gluconeogenesis, the tricarboxylic acid (TCA) cycle and pentose phosphate pathway were analyzed. The general metabolic pathways identified in strain ZJ0906 were found to be highly similar to that in strain SH0165. As previously noted by Xu's group [11], the Entner-Doudoroff pathway was also not encoded in H. parasuis strain ZJ0906. sgbE, sgbH and sgbU genes encoded by strain SH0165 in the pentose and glucuronate interconversion pathway were found missing in strain ZJ0906 genome, which suggest that the inter-conversion of L-ribulose-5P, Lxylulose-5P and 3-dehydro-L-gulonate-6P is either absent, or a novel pathway or enzymes might be involved. Carbon source utilization is highly conserved between the two strains, except the identification of ebgA gene in strain ZJ0906 that may offer it an additional carbon source of lactose. Strain SH0165 genome contains ascorbate-specific PTS system IIA and IIC components, which were not identified in stain ZJ0906 genome. Since homologs to IIB component were not found in SH0165 genome, the functionality of this PTS in ascorbate metabolism remains questionable. Unlike strain SH0165, ORFs encoding two enzymes involved in pyruvate metabolism and amino acid metabolismoxaloacetate decarboxylase, serine dehydratase -were found in strain ZJ0906, which enables important link formation between amino acid metabolism and pyruvate conversion, and subsequent TCA cycle and energy release. As a facultative anaerobe, it is not surprising that similar to strain SH0165, strain ZJ0906 also contains the napF/D/A/G/H/B/C operon encoding putative periplasmic nitrate reductase for anaerobic respiration. In addition, the same 3 two-component regulatory systems -cpxA/ R, arcA/B and qseB/C -were annotated in strain ZJ0906, in which arcA/B genes potentially encode an anoxic redox control regulatory system. Likewise, heme biosynthetic pathway is fully conserved between the two strains while nicotinamide adenine dinucleotide (NAD) biosynthetic pathway is absent in both, as expected from their growth requirement with NAD (factor V) and without iron porphyrin (factor X) supplement. Interestingly, gamma-glutamyltranspeptidase gene (ggt) involved in glutathione metabolism was only found in strain ZJ0906. This key enzyme was previously reported to act as an important regulator for intracellular homeostasis of oxidative stress, osmotic stress, and utilizing nutrients and facilitating growth in cysteinelimited habitats in other bacteria [34][35][36], hence its presence may influence capability of the strains in host survival. On the other hand, SH0165 genome encodes two enzymes involved in LPS biosynthesis that are absent in ZJ0906 -waaB and waaU. Previous studies have suggested minimal correlation between LPS and pathogenicity and host immunological responses in H. parasuis [6], yet LPS was shown to be an important contributor to pathogenicity as an endotoxin causing thrombosis in host blood circulation [37], hence difference in LPS may influence pathogenicity instead of protection against host.

Conclusion
In present study, the complete genome of Haemophilus parasuis serotype 12 strain ZJ0906 from China, a highly virulent field strain of the etiological agent of swine Glä sser's disease, was sequenced. The length of the genome is ,2.3 million base pairs with genomic GC content of 40.06%. It contains 2,484 Glimmer-predicted ORF, of which 2,352 (,94.7%) were annotated by NCBI nr blast, 1,745 by COG database and 1,829 by KEGG database. 109 potential virulence factors were annotated and 3 of which are potentially related to antibiotic resistance. This strain shared a nearly identical panel of potential virulence factors, drug resistant genes and PAI-like regions as serotype 5 strain SH0165. It was also found that gene rearrangements are frequent between the two strains, which may lead to variations in pathogenicity and crossprotection among serovars. These information should be useful to understand the mechanisms of the metabolic capabilities and pathogenicity of H. parasuis and develop a new vaccine to control this disease in the future.

Supporting Information
File S1 NCBI nr annotation for ZJ0906 genome.