Comparative genomic and methylome analysis of non-virulent D74 and virulent Nagasaki Haemophilus parasuis isolates

Haemophilus parasuis is a respiratory pathogen of swine and the etiological agent of Glässer's disease. H. parasuis isolates can exhibit different virulence capabilities ranging from lethal systemic disease to subclinical carriage. To identify genomic differences between phenotypically distinct strains, we obtained the closed whole-genome sequence annotation and genome-wide methylation patterns for the highly virulent Nagasaki strain and for the non-virulent D74 strain. Evaluation of the virulence-associated genes contained within the genomes of D74 and Nagasaki led to the discovery of a large number of toxin-antitoxin (TA) systems within both genomes. Five predicted hemolysins were identified as unique to Nagasaki and seven putative contact-dependent growth inhibition toxin proteins were identified only in strain D74. Assessment of all potential vtaA genes revealed thirteen present in the Nagasaki genome and three in the D74 genome. Subsequent evaluation of the predicted protein structure revealed that none of the D74 VtaA proteins contain a collagen triple helix repeat domain. Additionally, the predicted protein sequence for two D74 VtaA proteins is substantially longer than any predicted Nagasaki VtaA proteins. Fifteen methylation sequence motifs were identified in D74 and fourteen methylation sequence motifs were identified in Nagasaki using SMRT sequencing analysis. Only one of the methylation sequence motifs was observed in both strains indicative of the diversity between D74 and Nagasaki. Subsequent analysis also revealed diversity in the restriction-modification systems harbored by D74 and Nagasaki. The collective information reported in this study will aid in the development of vaccines and intervention strategies to decrease the prevalence and disease burden caused by H. parasuis.


Introduction
Haemophilus parasuis is a small, Gram negative, non-motile, pleomorphic rod-shaped, and nicotinamide adenine dinucleotide (NAD)-dependent bacterium of the Pasteurellaceae family [1,2]. H. parasuis is a respiratory pathogen affecting swine and is the etiological agent of Glässer's disease, a systemic infection resulting in arthritis, polyserositis (inflammation of serous membranes), and meningitis [2][3][4]. Additionally, H. parasuis infections can lead to pneumonia without signs of systemic disease in swine [5][6][7]. The morbidity and mortality caused by H. parasuis is a significant source of economic loss to the swine industry worldwide. Serotyping based on the production of heat-stable antigens, in which capsular polysaccharide is presumed to be the dominant component of the serotyping antigen, is routinely used for isolate classification and epidemiological purposes as well as for guidance in regards to vaccination strategies. Fifteen serovars of H. parasuis have been defined, however, a substantial percentage of clinical isolates are identified as nontypeable (NT) using conventional indirect hemagglutination (IHA) methods [8,9]. Progress to alleviate this problem has been made with the determination of the nucleotide sequence of the capsule locus from fifteen serovar reference isolates, which has been used to develop molecular serotyping methods [10][11][12].
H. parasuis isolates can exhibit different virulence capabilities ranging from lethal systemic disease to subclinical carriage. Numerous studies have focused on the identification of virulence factors that enable some isolates to cause systemic disease, distinguishing them from isolates that remain colonizers of the upper respiratory tract. Examples of potential virulence factors that have been evaluated to date include capsule production, outer membrane proteins (OMPs), trimeric autotransporters, and regulatory proteins QseC and OxyR [13][14][15][16][17][18][19][20][21][22]. Despite the advancement in our understanding of the pathogenic mechanisms used by H. parasuis, a direct link between specific virulence factors and the ability to cause systemic disease has not been demonstrated. Accordingly, virulence is thought to be multifactorial [2][3][4]. Data directly linking specific genes to disease outcomes has been hindered by several substantial complications, the most notable being difficulties in genetic modification of the chromosome due to low transformation efficiencies attributed to strain specific restriction modification barriers, as well as difficulties in consistently reproducing Glässer's disease in conventionally raised pigs due to confounding factors such as age, health status, differences in maternal antibody titers towards H. parasuis, and coinfection with other respiratory pathogens [23][24][25][26].
There are no effective approaches to eradicate H. parasuis from pig herds and controlling outbreaks has proven difficult [2,27]. Although vaccines have been developed, most are comprised of bacterins, resulting in poor heterologous protection. Consequently no broadly protective vaccines or intervention strategies exist [28][29][30]. The current treatment for H. parasuis is broad spectrum antibiotics, which are expensive and are believed to increase the risk of resistant strain development [29,[31][32][33]. Additionally, with increased pressure to limit antibiotic use in agriculture, alternative approaches are desperately needed to reduce disease burden and economic losses caused by H. parasuis.
In a previous effort to link genomic differences to disease outcome, draft genome sequence data was obtained for ten genetically distinct isolates along with the evaluation of virulence in Caesarean-derived, colostrum deprived (CDCD) pigs [34]. These results demonstrated that strain D74 is a non-virulent colonizer of the upper respiratory tract, while in contrast, strain Nagasaki was highly virulent and capable of causing systemic disease [34]. Many genomic differences, including gene content and/or nucleotide variation, were identified that could account for the phenotypic difference between the strains [34]. Unfortunately many genes or regions of interest within each strain were incomplete, preventing a reliable one-to-one assignment and subsequent comparison of any predicted protein structure. In order to definitively identify and characterize the genomic differences between the highly virulent Nagasaki strain and the non-virulent D74 strain, the goal of our report was to obtain the closed whole-genome sequence and genome-wide methylation patterns between these phenotypically distinct strains.

Phenotypic analysis
Phenotypic antibiotic resistance was determined using the broth microdilution method by Iowa State University Veterinary Diagnostic Laboratory following standard operating procedures. Each isolate was tested using the Trek BOPO6F plate (Thermo Fisher Scientific Inc., Oakwood Village, OH) and minimum inhibitory concentrations (MICs) were determined. MICs were evaluated in accordance with Clinical Laboratory Standards Institute (CLSI) recommendations for resistance interpretations.

Genomic antimicrobial resistance (AMR) analysis
ResFinder 2.1 from the Center for Genomic Epidemiology (http://www.genomicepidemiology. org/) and the Comprehensive Antibiotic Resistance Database (CARD) (https://card.mcmaster. ca/home) were employed for AMR determinant identification. Genomes submitted to ResFinder 2.1 were evaluated for AMR determinants using starting parameters of a threshold ID of 90% and a minimum length of 60% and final parameters of a threshold ID of 30% and a minimum length of 20%. Genomes submitted to CARD were evaluated for AMR determinants using the criteria "default-perfect and strict hits only".

Capsular loci analysis
The nucleotide sequences of genes within the capsule loci of Nagasaki and D74 reported by Howell et al. [10] were obtained from NCBI. Alignments of individual gene and protein sequences, as well as calculation of percent identities, were performed using the Geneious Alignment tool in Geneious 10.1.3 (Biomatters Ltd., Auckland, New Zealand). Global alignment with free end gaps parameters were used for both nucleotide and protein sequence alignments followed by determining the percent identity relative to Howell et al. [10]. Nucleotide and amino acid insertions are reported using HGVS nomenclature [44].

cdiA gene analysis
The nucleotide sequences of genes within the cdiA region of D74 were evaluated using BLASTX. Translated coding sequences were extracted and evaluated for the occurrence of any domain using the Pfam 31.0 database (https://pfam.xfam.org/) [45].

vtaA gene analysis
To ensure that all potential vtaA genes were identified in both Nagasaki and D74 genome annotations, the translated coding sequences were extracted and batch queried using the Pfam 31.0 database (https://pfam.xfam.org/) [45] for all occurrences of a YadA_anchor domain (PF03895) [15]. Geneious 10.1.3 (Biomatters Ltd., Auckland, New Zealand) was employed to evaluate and compare the genome location of vtaA genes in both Nagasaki and D74, including flanking upstream and downstream genes. The domain architecture and content of each vtaA gene from both Nagasaki and D74 was further evaluated for the occurrence of any domains using the Pfam 31.0 database (https://pfam.xfam.org/) [45]. BLASTN was employed to search sequences upstream of vtaA genes in both Nagasaki and D74 for the occurrence of the Nagasaki vtaA4 promoter sequence identified by Pina et al. [15].

Methylation analysis
Detection of modified bases (m6A, m4C, m5C) and clustering of modified sites to identify methylation associated motifs was performed using the RS_Modification_and_Motif_analysis.1 tool from the SMRT analysis package version 2.3.0. Briefly, raw reads were aligned to the complete genomes of D74 and Nagasaki and interpulse duration (IPD) ratios were measured for all pulses aligned to each position in the reference sequence (http://www.pacb.com/pdf/ TN_Detecting_DNA_Base_Modifications.pdf). [46]

SSR analysis
The nucleotide sequences of all putative RM genes identified in D74 and Nagasaki were evaluated for the occurrence of SSRs, tandem repeats or homopolymeric tracts of consisting of five or more bases, within the coding region and in the region encompassing 150 bp upstream of the putative start codon using Geneious 10.1.3 (Biomatters Ltd., Auckland, New Zealand).

Accession number(s)
The whole-genome sequences for these isolates were deposited in DDBJ/ENA/GenBank with the accession numbers CP018034 for Nagasaki and CP018032 for D74 genome and CP18033 for the plasmid sequence. The sequence data, target sequences and associated details for methylation enzymes, used for analyses in this report have been deposited in the REBASE database (www.rebase.neb.com) [47]. RM-system and methylation motifs for both strains can be accessed via the index of the REBASE database (http://tools.neb.com/genomes/) or directly via this link: http://rebase.neb.com/cgi-bin/pacbioget?20940 for D74 or http://rebase.neb.com/ cgi-bin/pacbioget?20939 for Nagasaki.

Genome features of H. parasuis strains D74 and Nagasaki
The complete genome assembly and annotation of H. parasuis strain Nagasaki contains a single circular chromosome 2,348,962 base pairs (bp) in length, encodes a total of 2,268 predicted protein coding sequences (CDSs), and a G+C content of 40.0% (Table 1). Of the 2,268 CDSs, 95 were predicted to be pseudogenes. The complete genome assembly and annotation of H. parasuis strain D74 encompasses a single circular chromosome 2,467,568 base pairs (bp) in length, a total of 2,252 total predicted protein coding sequences (CDSs), and a G+C content of 39.7% (Table 1). Similar to Nagasaki, 95 out of the 2,252 CDSs were predicted to be pseudogenes. Despite the equivalent number, none of the predicted pseudogenes were similar between the stains (S1 Table and S2 Table). The difference in rRNA numbers between D74 and Nagasaki is consistent with other closed H. parasuis genomes available in GenBank ( Table 1). The genomes of H. parasuis Nagasaki and D74 were assessed using PHASTER (http://phaster.ca/) [48] for the occurrence of phage regions. Nine phage regions, seven intact and two incomplete, were identified along the Nagasaki chromosome (Table 2). Six phage regions, one intact and five incomplete, were also identified along the D74 chromosome ( Table 1). The phage regions identified are unique to each genome. The chromosomal location, including start and end nucleotide positions, length of phage region, and classification of phage regions for both strains are summarized in Table 2. An 11,595-bp circular plasmid was additionally identified in the complete genome assembly and annotation of H. parasuis strain D74. With the addition of the plasmid, the total nucleotides, including both the circular chromosome and plasmid, is 2,479,163 bp for strain D74. The plasmid harbored by strain D74, designated as pD74, contains seven CDSs with predicted functions based on sequence homology, including parA, rec, and repB, which have predicted functions in plasmid replication. Additionally, pD74 harbors six CDSs of unknown function and one predicted pseudogene (Fig 1). These CDSs lacked a reciprocal match in Nagasaki. A region containing four copies of a 22-bp tandem repeat sequence was identified upstream of the repB CDS, which could potentially comprise the origin of replication (Fig 1). The sequence spanning from 3,328 to 1,243 bp comprises the entire 9,462 bp sequence of H. parasuis plasmid pHS-Rec (accession no. AY862436) with 99.2% sequence identity (Fig 1) [49]. The sequence spanning from 5,847 to 1,025 bp shares 99.4% sequence identity with Pasteurella trehalosi plasmid pCCK13698 (accession no. AM183225) (Fig 1) [50]. pD74 sequencing from 1,246 to 5,350 bp shares 73.9% sequence identity with a chromosomal region of Gallibacterium anantis strain UMN179 (accession no. CP002667) (Fig 1) [51].

Comparison of the genome sequences of H. parasuis strains D74 and Nagasaki
A reciprocal or one-to-one BLASTP comparison of the protein coding sequences in Nagasaki and D74 identified 1,705 shared CDSs between the strains (Fig 2). Of the 2,173 functional CDSs in strain Nagasaki, 366 CDSs lacked a reciprocal match in D74 and were designated unique or Nagasaki-specific (Fig 2A). Conversely, 324 CDSs, out of the 2,157 functional CDSs in strain D74, lacked a reciprocal match in Nagasaki and were designated unique or D74-specific (Fig 2A). Comparison of the linear organization of the genomes of Nagasaki and D74 revealed many genome re-arrangements and inversions ( Fig 2B). The alignment additionally resulted in 73 locally collinear blocks (LCBs), the largest of which is 520,607 bp ( Fig 2B).
The genomes of H. parasuis Nagasaki and D74 were assessed for the presence of insertion sequence (IS) elements using ISfinder (https://www-is.biotoul.fr/) [42]. IS elements found previously in other H. parasuis and Pasteurellaceae genomes were identified in both D74 and Nagasaki genomes and these IS elements contained the greatest similarity to sequences from the IS1595 family. Specifically, four ISHps3 and three ISHps4 IS elements were identified in D74, while one ISHps3 and four ISHps4 IS elements were identified in Nagasaki. In silico analysis revealed the occurrence of additional transposases annotated as belonging to other families, such as IS256 and IS110, in both D74 and Nagasaki genomes that were not identified by . Arrows indicate annotated CDSs; blue arrows represent CDSs with predicted functions based on sequence homology, pink arrows represent predicted CDSs of unknown function. The MFS transporter gene, labeled with a blue-grey arrow, is predicted to be a pseudogene. Dark grey box indicates region containing 4 copies of a 22-bp tandem repeat sequence identified using the Tandem Repeats Finder tool [41]. Arcs inside map indicate sequence with similarity to other sources; green arc represents sequence similar to H. parasuis plasmid pHS-Rec [49], purple arc represents sequence similar to a portion of the Pasteurella trehalosi plasmid pCCK13698 [50], and yellow arc represents sequence similar to a region from the genome sequence of Gallibacterium anantis strain UMN179 [51]. ISfinder analysis, given that it typically does not identify frameshifted or truncated IS units. These transposases were annotated as pseudogenes and therefore are likely nonfunctional. Of the five complete IS units identified in Nagasaki, two are within regions of conserved sequence order and have homologs in D74. Three of the intact IS units identified in Nagasaki are located at the border of sequence rearrangement, with one located at the border of one of the phage regions. Six complete IS units were identified in D74 with two located at the border of rearranged regions. Of the four IS units that are present within regions of conserved sequence order, two are within regions of conserved sequence order and have homologs in Nagasaki and two are small sequence insertions located within a conserved region in which only the IS unit is the only sequence not conserved within the region. Clusters of regularly interspaced short palindromic repeats (CRISPR) and spacer sequences in the genomes of both D74 and . Venn diagram demonstrating the unique 2,173 protein-encoding genes in Nagasaki (blue), the 2,157 protein-encoding genes D74 (yellow), and the shared coding sequences defined as bi-directional best hits (center circle, green) via 1-to-1 reciprocal BLASTP comparison using RAST [38] (excluding frame-shifted genes). (B) Comparison of the linear organization of the H. parasuis strains Nagasaki and D74 chromosomes. Mauve [40] was used to compare the chromosomes of Nagasaki and D74. Locally collinear blocks (LCBs) representing regions of sequence that align in each genome are illustrated as colored rectangles connected by lines. Nagasaki was used as the reference sequence. In D74, LCBs placed above the center line are in the same orientation as in Nagasaki and LCBs placed below the center line are in the reverse orientation relative to Nagasaki. Blank sections are regions that did not align and are likely to contain unique or strain-specific sequence. Many of the larger blank regions not in LCBs contain loci predicted to encode products of phage origin. https://doi.org/10.1371/journal.pone.0205700.g002 Nagasaki were screened using CRISPRFinder (http://crispr.i2bc.paris-saclay.fr) [43]. No evidence for CRISPR sequences were found in either genome.
Phenotypic antibiotic resistance was determined for H. parasuis D74 and Nagasaki and results are summarized in S3 Table. Nagasaki exhibited phenotypic resistance to clindamycin while D74 exhibited "intermediate" limited phenotypic resistance to clindamycin; no other resistance was identified. The genomes of H. parasuis Nagasaki and D74 were screened for the presence of acquired resistance genes and known chromosomal mutations conferring clindamycin resistance. No erm, lnu, or other known resistance determinants were identified in either genome.

Capsule loci in H. parasuis strains D74 and Nagasaki
Due to the importance of capsular polysaccharide in serotyping or strain classification, virulence, and vaccine related research areas, the CDSs located within the capsule loci of Nagasaki and D74 were compared to nucleotide sequences reported by Howell et al. [10]. Several nucleotide and amino acid differences were identified and are summarized in S4 Table and S5  Table. Eleven out of sixteen predicted proteins within the D74 capsule locus were found to contain amino acid differences compared to the sequences reported by Howell et al. [10]. The genes harboring the most or more noteworthy changes for D74 include wzx2, astA, gltL, and wzs (S4 Table). The predicted amino acid changes in AstA relative to the Howell et al. [10] sequence included a two amino acid insertion and are: N98_H99insHId, Q134R, S157A, V177I, V183I, I189V, S195N, and V2056M. In contrast, the predicted amino acid sequence for GltL is shorter than that reported in Howell et al. [10] due to a different predicted start codon. The predicted amino acid changes in GltL are M1_K8del and V9M. The predicted amino acid changes in Wzx2 relative to the Howell et al. [10] sequence are: V176A, I180V, A187V, S192N, R278H, P291S, and R373S (S4 Table). The predicted amino acid changes in Wzs relative to the Howell et al. [10] sequence are: E3K, L27I, V230A, V613A, and V622A (S4 Table).
In H. parasuis Nagasaki, twelve out of fourteen predicted proteins within the capsule locus were found to contain amino acid differences compared to the sequences reported by Howell et al. [10]. The genes harboring the more notable changes for Nagasaki include funA, wcfQ, and wbgX (S5 Table). FunA is predicted to result in a longer protein sequence compared to Howell et al. [10] due to a different predicted start codon and five amino acid changes within the shared region, along with 37 additional amino acids on the N-terminus. The predicted amino acid changes in FunA are M1ext-37, A2V, A6V, D27V, S28N, and V138I (S5 Table). The predicted amino acid changes in WcfQ relative to the Howell et al. [10] sequence are: E43K, I58V, S61T, I64T, F106S, D131G, F132S, N194K, Y262C, R269N, F270N, and L271F (S5 Table). WbgX is predicted to encode a shorter protein sequence compared to Howell et al. [10] due to a different predicted start codon. The predicted amino acid changes in WbgX are: M1del, K2M, E3N, F4Y, T177A, S343N, P344A, and A349I (S5 Table).

Virulence-associated genes identified in H. parasuis strains D74 and Nagasaki
To identify factors that could support host colonization and/ or virulence, the D74 and Nagasaki annotations were searched for CDSs encoding predicted functions in adhesion, hemolysis, secretion, toxin production, or other virulence-associated roles. Twenty-one CDSs encoding predicted adhesins were identified in D74 including five outer membrane genes (ompA, ompP1, ompP2, ompP5, and ompD15), two fimD fimbrial usher genes located adjacent to each other along the chromosome, pertactin family virulence factor aidA, filamentous hemagglutinin transporter fhaC, adhesin autotransporter bmaC, and two type IV pili genes (Table 3). Corresponding orthologs for pilT and bmaC were not found in Nagasaki. Gene leuC (A2U21_06460) in Nagasaki was the uni-directional best match to pilT (A2U20_03770) in D74 with 56% global nucleotide sequence identity and gene aidA2 (A2U21_04065) in a The Hit column contains a '-' (no hit), 'uni' or 'bi' RAST server results from a one-to-one BLASTP comparison of the protein coding sequences in the Nagasaki genome using the D74 genome as the reference. "bi" represents a bidirectional best hit in which the reverse hit from the Nagasaki comparison genome to the D74 reference genome was also the best hit. "uni" indicates a uni-directional hit in which the reverse hit from the comparison genome to the reference genome was not also the best hit. "-"indicates no hit or match was found. b Global pairwise nucleotide percent sequence identity. https://doi.org/10.1371/journal.pone.0205700.t003 Nagasaki was the uni-directional best match to bmaC (A2U20_08910) in D74 with 51% global nucleotide sequence identity (Table 3). Eighteen CDSs encoding predicted adhesins were identified in Nagasaki including five outer membrane genes (ompA, ompP1, ompP2, ompP5, and ompD15), one fimD fimbrial usher gene (chromosomally adjacent to a predicted pseudogene fimB), two pertactin family virulence factor genes aidA and aidA2, and two type IV pili (Table 4). In silico analysis of the region in D74 containing the two fimD genes compared to the fimD gene in Nagasaki indicated that the D74 fimD2 (A2U20_08925) aligns with the 5' end of the Nagasaki fimD (A2U21_03995) and the D74 fimD (A2U20_08920) aligns with the 3' end of the Nagasaki fimD gene. This suggests that the two fimD genes in D74 could have potentially arisen from a frameshift within a single gene. A 51% sequence identity was observed between corresponding orthologs esiB (A2U20_01865) in D74 and esiB2 (A2U21_08055) in Nagasaki (Table 3 and Table 4). Eight CDSs encoding predicted hemolysins were identified in D74 including two (locus tags A2U20_06885 and A2U20_07380) unique to strain D74 (Table 3). D74 harbors two putative Serralysin B genes, prtB and prtB2, and a putative Serralysin C gene prtC. In contrast, only putative Serralysin B gene prtB was identified in strain Nagasaki (Table 4). A notable size difference appears to exist between the proteins encoded by both prtB, prtB2, and prtC in D74, 1,954, 1,703, and 1,911 amino acids respectively, compared to the putative Serralysin B 910 amino acid protein encoded by prtB in Nagasaki. Nine CDSs encoding predicted hemolysins were identified in Nagasaki, five of which were identified as unique to Nagasaki. These include hpmA (A2U21_11150) and locus-tags A2U21_00465, A2U21_11140, and A2U210_11145. Nagasaki harbors two putative hemolysin transporter genes shlB1 and shlB2 (chromosomally located next to each other), while D74 contains only shlB1 gene (Tables 3 and 4). A notable size difference was also observed for the shlB gene in D74 encoding a putative 582 amino acid protein compared to the putative 329 amino acid protein encoded by the shlB in Nagasaki.
Two CDSs encoding genes predicted to function in secretion were identified in D74, A2U20_09675 and esiB, a putative secretory immunoglobulin A-binding encoding gene, while three were identified in Nagasaki. These include A2U21_02375 and two putative secretory immunoglobulin A-binding encoding genes esiB and esiB2, chromosomally located next to each other.
Thirty-nine CDSs encoding predicted toxins were identified in D74 and forty-six CDSs encoding predicted toxins were identified in Nagasaki. A noticeable difference between the CDSs encoding predicted toxins identified in D74 and Nagasaki was the seven cdiA genes encoding putative contact-dependent growth inhibition toxin A harbored only in strain D74 (Table 3). These cdiA genes harbored by D74 are discussed in more detail below.
Both strains harbored a large number of toxin-antitoxin (TA) systems. TA systems are small genetic elements comprised of two components, a stable protein toxin and its more labile antagonistic antitoxin, which can be a protein or non-coding RNA [52]. TA systems were originally identified as plasmid-borne loci, which functioned to promote plasmid maintenance by killing daughter cells that lacked the TA encoded plasmid [52]. TA loci were subsequently discovered in numerous bacterial and archaeal chromosomes and provide several functions, such as stabilization of genomic regions, anti-addiction against similar plasmid-borne toxins, defense against phage infection, biofilm formation, control of the stress response, and bacterial persistence [52][53][54]. Six types of TA systems (types I to VI) have been described to date based on the type (either RNA or protein) and mode of action of the antitoxin [53]. The Type II TA system is highly abundant among prokaryotes and has been extensively studied [52,[54][55][56]. In type II TA systems, both toxin and antitoxin are small proteins encoded by genes in a bicistronic operon [52,[54][55][56]. The antitoxin blocks the toxicity of the toxin by forming a complex with it [52,[54][55][56]. Both D74 and Nagasaki contain several Type II TA families including relBE, mazEF, vapBC, and higBA, which was the most abundant with 10 putative higBA loci identified in D74 and 6 putative higBA loci identified in Nagasaki, with a varying degree of similarity (Table 3 and Table 4). Eight orthologs of other virulence-associated proteins were identified in both strains (Table 3 and Table 4). A 75% sequence identity was observed between corresponding orthologs espP (A2U20_01700) in D74 and espP2 (A2U21_09655) in Nagasaki and a 78% sequence identity was observed between corresponding orthologs espP2 (A2U20_01715) in D74 and espP (A2U21_09640) in Nagasaki (Table 3 and Table 4). Additionally a notable size difference a The Hit column contains a '-' (no hit), 'uni' or 'bi' RAST server results from a one-to-one BLASTP comparison of the protein coding sequences in the Nagasaki genome using the D74 genome as the reference. "bi" represents a bidirectional best hit in which the reverse hit from the Nagasaki comparison genome to the D74 reference genome was also the best hit. "uni" indicates a uni-directional hit in which the reverse hit from the comparison genome to the reference genome was not also the best hit. "-"indicates no hit or match was found. b Global pairwise nucleotide percent sequence identity. https://doi.org/10.1371/journal.pone.0205700.t004 was also observed for the putative proteins encoded by espP (A2U20_01700), 1,071 amino acids, and espP2 (A2U20_01715), 985 amino acids, in D74, compared to the putative proteins encoded by espP (A2U21_09640), 781 amino acids, and espP2 (A2U21_09655), 772 amino acids, in Nagasaki.

cdiA genes encoding putative contact-dependent growth inhibition proteins identified in H. parasuis D74
Contact-dependent growth inhibition (CDI) is a process used by Gram-negative bacteria to deliver diverse growth inhibiting nuclease toxins into the cytoplasm of neighboring cells upon cell-cell contact [57][58][59][60][61]. CDI is mediated by the CdiB and CdiA proteins, which are members of the TpsB and TpsA two-partner secretion (TPS) group of proteins [61][62][63]. CdiB facilitates secretion of the CdiA "exoprotein" onto the cell surface [61,64]. CdiA then binds to specific outer-membrane receptors on susceptible bacteria and transfers its C-terminal toxin domain (CdiA-CT) into the target cell [58][59][60][61]. CDI+ bacteria also produce small immunity proteins (CdiI) that protect them from toxin delivered by neighboring cells of closely related species or sibling cells by binding to the CdiA-CT and neutralizing its toxin activity [58][59][60][61].
CdiA proteins share a number of characteristics in common with the TpsA family protein FHA [61,67]. CdiA proteins typically contain an N-terminal region homologous to FHA containing the TPS domain required for interaction with the TpsB partner, CdiB or FhaC respectively, and haemagglutinin repeats that are predicted to form a β-helical structure [61,67]. In addition, most cdiA homologues encode the VENN peptide motif, which delineates the beginning of C-terminal toxin domain as well as the conserved and variable regions [61,67]. The predicted proteins encoded by the cdiA genes in D74 were evaluated for the presence of these domains. Domains identified in CdiA include a ESPR signal peptide domain (PF13018) required for export and multiple haemaglutination activity domains, specifically one Haemag-g_act domain (PF05860), seven Fil_haemagg domains (PF05594), two Fil_haemagg_2 domains (PF13332) (Fig 3B). CdiA2 is similar to CdiA and contains a ESPR signal peptide domain (PF13018) required for export and multiple haemaglutination activity domains, including one Haemagg_act domain (PF05860), ten Fil_haemagg domains (PF05594), two Fil_haemagg_2 domains (PF13332) (Fig 3B). CdiA3 and CdiA6 are similar to each other and contain a pre-toxin domain with VENN motif that marks the beginning of the C-terminal toxin domain similar to other previously reported CdiA proteins (Fig 3B). CdiA4 and CdiA5 are also similar to each other and both contain a Fil_haemagg_2 domain (PF13332) (Fig 3B). CdiA7 is the largest CdiA protein in D74 and contains a ESPR signal peptide domain (PF13018) required for export and multiple haemaglutination activity domains, including one Haemagg_act domain (PF05860), thirteen Fil_haemagg domains (PF05594), two Fil_hae-magg_2 domains (PF13332), a pre-toxin VENN domain (PF04829), and a EndoU_bacterial nuclease domain (PF14436) (Fig 3B). Overall, cdiA3, cdiA4, cdiA5, and cdiA6 are similar to previously reported "orphan" cdiA genes as they encode much smaller proteins that lack a conserved export signal and other functional domains. The region containing the orphan cdiA genes was further evaluated and no additional cdi-related protein domains were found beyond The three genomic regions containing putative cdiA genes are depicted. Gene function was assigned based on the results of BLASTX searches. Each arrow represents a gene within the locus; direction of the arrow indicates orientation within the closed genome sequence. Dark blue arrows represent the cdiA genes, red arrows represent cdiB homologues, light blue arrows represent potential "orphan" cdiA genes, yellow arrows represent putative cdiI candidates, and grey arrows represent genes predicted to have functions relating to horizontal gene transmission, black arrows represent genes whose predicted function is unrelated to contact-dependent inhibition. Genomic regions are not shown to scale. (B) Domain architecture of the predicted CdiA proteins. Domain content of the CdiA proteins was determined using a pfam database search. Grey boxes represent ESPR signal peptide domains (PF13018), green boxes represent Haemagg_act domains (PF05860), light blue boxes represent Fil_haemagg domains (PF05594), orange boxes represent Fil_haemagg_2 domains (PF13332), red boxes represent PT-VENN domains (PF04829), and purple boxes represent EndoU_bacteria domains (PF14436).
https://doi.org/10.1371/journal.pone.0205700.g003 those depicted in Fig 3B for CdiA3, CdiA4, CdiA5, and CdiA6. This indicates that the orphan cdiA genes are not the result of frameshift or indel mutations causing disruption of a larger intact cdiA gene(s). This genomic organization of a cdi locus containing one or more orphan cdiA genes downstream of full-length cdiA is common in many species of bacteria [65]. In contrast, CdiA, CdiA2, and CdiA7 are larger and contain most of the functional domains harbored by previously characterized CdiA-CT proteins; however, CdiA7 shares more similarity to other CdiA-CT proteins in that it additionally contains a PT-VENN motif and a recognizable nuclease domain at C-terminus.

vtaA family of trimeric autotransporter genes identified in H. parasuis strains D74 and Nagasaki
Pina et al. first identified thirteen proteins encoded by the vtaA family of trimeric autotransporter genes in strain Nagasaki based on the occurrence of a C-terminal YadA anchor domain, which defines this family of proteins [15]. That study also identified 17 homologues harbored by other H. parasuis strains with relatively conserved sequence within the passenger domain among vtaA homologues from pathogenic isolates and a high degree of divergence among non-virulent isolates [15]. The authors subsequently named these genes vtaA or virulence-associated trimeric autotransporter genes and classified the proteins encoded by vtaA genes into three groups based on sequence comparison of the C-terminal YadA anchor domain, with groups 1 and 2 being strongly associated with virulent H. parasuis isolates [15]. Previous studies have demonstrated that VtaA proteins are involved in virulence as well as being immunogenic, are produced during an infection, and are capable of conferring protection [68][69][70]. Recently, a comparison between the D74 and Nagasaki draft genomes identified only three vtaA genes harbored by D74 compared to thirteen harbored by Nagasaki [34]. Unfortunately, many vtaA genes identified within each strain were incomplete in the draft sequences, preventing a reliable one-to-one assignment of the vtaA-like ORFs to specific vtaA genes and subsequent evaluation of the predicted protein structure.
To ensure identification of all potential vtaA genes, all protein coding sequences in Nagasaki and D74 were searched for the occurrence of a YadA anchor domain and no additional YadA anchor domain containing proteins were identified. The 13 vtaA genes identified in Nagasaki have been named according to their location along the chromosome and are listed in Table 5 along with their respective name and group originally assigned by Pina et al. [15]. Similarly, the 3 vtaA genes identified in D74 have been named according to their location along the chromosome and are also listed in Table 5. The predicted protein structure of all VtaA proteins for both Nagasaki and D74 were evaluated for known domain content (Fig 4). All Nagasaki VtaA proteins contain an N-terminal extended signal peptide or ESPR domain (PF13018) for Type V secretion, followed by 1-4 YadA head domains (PF05658), and 3-5 YadA anchor domains (PF03895). This region containing the head and stalk domains is referred to as the passenger domain region [71]. Following the passenger domain region, all Nagasaki VtaA proteins contain 2-8 collagen triple helix repeat domains (PF01391) followed by the C-terminal YadA anchor domain (PF03895) (Fig 4A). In contrast, none of the D74 VtaA proteins contain a collagen triple helix repeat domain (PF01391) and the predicted size for both VtaA_D1 (4054 AA) and VtaA_D2 (6778 AA) proteins is substantially larger than any predicted Nagasaki VtaA protein (Fig 4B). Additionally, VtaA_D3 differs from VtaA_D1 and VtaA_D2 proteins. VtaA_D3 contains a tryptophan-ring motif domain or TAA-Trp-ring (PF15401) and does not contain an N-terminal ESPR domain (PF13018) (Fig 4B). The absence of the ESPR domain suggests that VtaA_D3 may not be exported across the inner membrane. When we compared our VtaA predicted protein sequences to those reported by Pina et al. [15] two differences were identified and the other 11 out of 13 sequences were found to be 100% identical. The two differences identified were an amino acid change in VtaA_N9 (S1191G) relative to the previously reported sequence and an 18 amino acid insertion (AGPTGPQGPAGPTGQ DP) after amino acid 737 of VtaA8 that is not present in our VtaA_N4 sequence. The absence or presence of the insertion is located within the third collagen repeat domain and does not affect the presence or absence of this domain nor the total number of collagen repeat domains predicted for the two proteins.
When the genome regions in Nagasaki and D74 were compared, the sequence and gene content up and downstream of the vtaA gene locations in Nagasaki were similar to the reciprocal locations in D74. No evidence of extensive genome rearrangement at these locations was observed. A different gene or genes were identified in D74 at the same location of a reciprocal vtaA gene in Nagasaki, which could have arisen from small insertion events. Further evaluation of the vtaA genes in both strains suggested that they are not in an operon configuration. Pina et al. identified an active promoter upstream of vtaA_N3 [15]. In silico analysis indicated that the upstream region of all 13 Nagasaki vtaA genes contained highly similar promoter sequences to the vtaA_N3 promoter. A highly similar promoter sequence to the vtaA_N3 was additionally observed upstream of vtaA_D2 in D74. This sequence conservation implies similar expression and/or regulation mechanisms among these genes.

Methylation motifs and RM-systems in H. parasuis strains D74 and Nagasaki
In bacteria, the most common post-replicative modification of DNA is methylation by methyltransferase (MTase) enzymes resulting in three types of epigenetic markers: N6-methyladenine (m6A), N4-methylcytosine (m4C) and 5-methylcytosine (m5C) [47,72]. DNA methylation The Hit column contains a '-' (no hit), 'uni' or 'bi' RAST server results from a one-to-one BLASTP comparison of the protein coding sequences in the D74 genome using the Nagasaki genome as the reference. "bi" represents a bidirectional best hit in which the reverse hit from the D74 comparison genome to the Nagasaki reference genome was also the best hit. "uni" represents uni-directional indicates a hit in which the reverse hit from the comparison genome to the reference genome was not also the best hit. "-"indicates no hit or match was found. serves several key roles in bacterial processes, including mismatch repair, the timing of DNA replication, conferring protection against bacteriophages, and regulating gene expression. [73][74][75][76][77][78]. Analysis of the SMRT DNA sequencing kinetics was used to identify total base modifications in the genomes of H. parasuis D74 and Nagasaki, and the modified sequence motifs for each strain are summarized in Tables 6 and 7. A total of 15 sequence motifs were identified in strain D74, and N6-methyladenine (m6A) was the most prevalent type of modification detected (Table 6). Focusing on strain Nagasaki, a total of 14 recognition sites for methylation or sequence motifs were identified and, similar to D74, N6-methyladenine (m6A) was the most prevalent type of modification detected (Table 7). This analysis revealed a surprising degree of diversity in motifs observed between these closely related strains given that the methylation motif 5'-GA m6 TC-3' was the only motif shared or observed in both D74 and Nagasaki (Table 6 and Table 7).
The genomes of H. parasuis Nagasaki and D74 were assessed using the Restriction Enzyme Database REBASE (www.rebase.neb.com) [47] for determination of putative MTases involved with each motif and for comparisons with known modification systems. A total of 26 genes associated with restriction-modification systems were identified in H. parasuis D74, including 11 genes associated with Type 1 restriction-modification (RM) systems and 15 genes associated with Type II RM systems (Table 8). Genes associated with Type III RM systems were not identified in D74 (Table 8). REBASE predicted three recognition sequences corresponding to a specific motif detected by the SMRT sequencing analysis. REBASE analysis indicated that the putative Type I RM enzymes S.Hpa74III, Hpa74III, and M.Hpa74III were predicted to be responsible for the 5'-GTA m6 NNNNNNNCTTG-3' modification ( Table 8). The putative Type II RM enzymes M.Hpa74I and M.Hpa74IP were indicated by REBSE to be responsible for the motif 5'-A m6 A GCTT-3'modification ( Table 8). The putative Type II RM enzyme dam or Hpa74II was predicted to be responsible for the 5'-GA m6 TC-3'modification for D74 (Table 8).
The remaining two motifs detected by the SMRT sequencing analysis did not correspond with a REBASE predicted recognition sequence. Two putative Type II RM enzymes Hpa74ORFHP and M.Hpa74ORFHP were predicted by REBASE to recognize the sequence motif 5'-GGC C-3, which was not a motif detected by the SMRT sequencing analysis. Three putative RM enzyme genes identified in D74 by REBSAE analysis are predicted pseudogenes. These include S2.Hpa74ORFDP, Hpa74ORFJP, and M.Hpa74IP, associated with the Type II RM system indicated by REBSE to be responsible for the motif 5'-A m6 A GCTT-3'modification (Table 8). In contrast, none of the putative RM enzymes genes identified in Nagasaki by REBASE analysis are predicted pseudogenes. Focusing on H. parasuis Nagasaki, 34 genes associated with restriction-modification systems were identified, including 14 genes associated with Type 1, 15 genes associated with Type II, and 5 genes associated with Type III RM systems (Table 9). Only two of the REBASE predicted recognition sequences corresponded to a specific motif detected by the SMRT sequencing analysis. REBASE analysis indicated that the putative Type I RM enzymes M.HpaNNII, HpaNNIIP, and S.HapNNII were predicted to be responsible for the 5'-GTA m6 NNNNNNN CTTG-3' modification (Table 9). The putative Type II RM enzyme dam or M.HpaNNI was predicted to be responsible for the 5'-GA m6 TC-3'modification ( Table 9). The remaining five motifs identified by the SMRT sequencing analysis represent yet unknown recognition sequences.
Expression of MTases can undergo phase variation by slipped-strand mispairing due to the presence of simple sequence repeats (SSRs), such as homopolymeric tracts [79][80][81][82][83]. All of the putative RM genes identified in D74 (Table 8) and Nagasaki (Table 9) were search for the presence of SSRs within the coding region and in the region encompassing 150 bp upstream of the putative start codon. No SSRs were observed in the upstream region for five RM genes identified in D74 (A2U20_00805c, A2U20_10480, A2U20_10215, A2U20_11315, and A2U20_10200), while six homopolymeric tracts of consisting of five or more bases were observed in the upstream region of hsdR2 (A2U20_05260) (S6 Table). SSRs were observed within the coding region of all of the D74 RM genes and the numbers of SSRs ranged from two (A2U20_00580 and A2U20_11330) to 38 (A2U20_07465) (S6 Table). No SSRs were observed in the upstream region for six RM genes identified in Nagasaki (A2U21_00115, A2U21_01085, A2U21_04210, A2U21_03345, A2U21_03850, A2U21_06885), while six homopolymeric tracts of consisting of five or more bases were observed in the upstream region of A2U21_09985 (S7 Table). SSRs were observed within the coding region of all of the Nagasaki RM genes and the numbers of SSRs ranged from two (A2U21_10520) to 31 (A2U21_04210 and A2U21_00110) (S7 Table). While further studies are warranted, the occurrence of these homopolymeric tracts within these regions indicates the potential of these genes to undergo phase variation by slip strand mispairing.

Conclusions
This report provides the closed whole-genome sequence annotation and genome-wide methylation patterns for the H. parasuis non-virulent D74 strain and for the highly virulent Nagasaki strain. This collective information will enable reliable one-to-one assignment of specific genes of interest and subsequent evaluation of predicted protein structures. Highlights of the information gained from this study include the sequence and annotation of a plasmid harbored by strain D74 that shares a high degree of similarity to other plasmids harbored by members of the Pasteurellaceae family, which could prove useful in future allelic replacement and/or functional genomic studies. Evaluation of the virulence-associated genes contained within the genomes of D74 and Nagasaki led to the discovery of a large number of TA systems, primarily Type II TA families, within both genomes. Five predicted hemolysins were identified as unique to Nagasaki and seven putative contact-dependent growth inhibition toxin proteins were identified only in strain D74. Assessment of all potential vtaA genes revealed thirteen present in the Nagasaki genome and three in the D74 genome. Subsequent evaluation of the predicted protein structure revealed that none of the D74 VtaA proteins contain a collagen triple helix repeat domain and a much larger predicted amino acid size for two D74 VtaA proteins compared to any predicted Nagasaki VtaA protein. Fifteen methylation sequence motifs were identified in D74 and fourteen methylation sequence motifs were identified in Nagasaki using SMRT sequencing analysis. Only one of the methylation sequence motif was observed in both strains highlighting the diversity between D74 and Nagasaki. Subsequent analysis also revealed diversity in the restriction-modification systems harbored by D74 and Nagasaki. Our hope is that the assembly and annotation of these genomes, coupled with the comparative genomic analyses reported in this study, will aid in the identification of genetic elements that underlie and influence phenotypic differences between these isolates. Together, this information can support future research and the development of vaccines with improved efficacy towards H. parasuis in swine to decrease the prevalence and disease burden caused by this pathogen.
Supporting information S1