Identification and Characterization of msf, a Novel Virulence Factor in Haemophilus influenzae

Haemophilus influenzae is an opportunistic pathogen. The emergence of virulent, non-typeable strains (NTHi) emphasizes the importance of developing new interventional targets. We screened the NTHi supragenome for genes encoding surface-exposed proteins suggestive of immune evasion, identifying a large family containing Sel1-like repeats (SLRs). Clustering identified ten SLR-containing gene subfamilies, each with various numbers of SLRs per gene. Individual strains also had varying numbers of SLR-containing genes from one or more of the subfamilies. Statistical genetic analyses of gene possession among 210 NTHi strains typed as either disease or carriage found a significant association between possession of the SlrVA subfamily (which we have termed, macrophage survival factor, msf) and the disease isolates. The PittII strain contains four chromosomally contiguous msf genes. Deleting all four of these genes (msfA1-4) (KO) resulted in a highly significant decrease in phagocytosis and survival in macrophages; which was fully complemented by a single copy of the msfA1 gene. Using the chinchilla model of otitis media and invasive disease, the KO strain displayed a significant decrease in fitness compared to the WT in co-infections; and in single infections, the KO lost its ability to invade the brain. The singly complemented strain showed only a partial ability to compete with the WT suggesting gene dosage is important in vivo. The transcriptional profiles of the KO and WT in planktonic growth were compared using the NTHi supragenome array, which revealed highly significant changes in the expression of operons involved in virulence and anaerobiosis. These findings demonstrate that the msfA1-4 genes are virulence factors for phagocytosis, persistence, and trafficking to non-mucosal sites.

Haemophilus influenzae is an opportunistic pathogen. The emergence of virulent, non-typeable strains (NTHi) emphasizes the importance of developing new interventional targets. We screened the NTHi supragenome for genes encoding surface-exposed proteins suggestive of immune evasion, identifying a large family containing Sel1-like repeats (SLRs). Clustering identified ten SLR-containing gene subfamilies, each with various numbers of SLRs per gene. Individual strains also had varying numbers of SLR-containing genes from one or more of the subfamilies. Statistical genetic analyses of gene possession among 210 NTHi strains typed as either disease or carriage found a significant association between possession of the SlrVA subfamily (which we have termed, macrophage survival factor, msf) and the disease isolates. The PittII strain contains four chromosomally contiguous msf genes. Deleting all four of these genes (msfA1-4) (KO) resulted in a highly significant decrease in phagocytosis and survival in macrophages; which was fully complemented by a single copy of the msfA1 gene. Using the chinchilla model of otitis media and invasive disease, the KO strain displayed a significant decrease in fitness compared to the WT in coinfections; and in single infections, the KO lost its ability to invade the brain. The singly complemented strain showed only a partial ability to compete with the WT suggesting gene dosage is important in vivo. The transcriptional profiles of the KO and WT in planktonic growth were compared using the NTHi supragenome array, which revealed highly significant changes in the expression of operons involved in virulence and anaerobiosis. These findings demonstrate that the msfA1-4 genes are virulence factors for phagocytosis, persistence, and trafficking to non-mucosal sites.

Introduction
The impact of the gram-negative coccobacillus, Haemophilus influenzae on human health has changed significantly over the past several decades, and may continue to do so in the future. Prior to routine immunization against the highly virulent serotype b form (Hib), H. influenzae was a leading cause of pediatric bacterial meningitis and epiglottitis in the United States [1]. In the post-vaccine era the NTHi are opportunistic pathogens causing and exacerbating multiple upper and lower respiratory tract illnesses including otitis media (OM) [2][3][4], otorrhea [5,6], sinusitis [7], bronchitis and chronic obstructive pulmonary disease (COPD) [8], pneumonia [9], and conjunctivitis [10]. Furthermore, the NTHi are early colonizers of the lungs of children with cystic fibrosis, suggesting they may play a critical role in the bacterial pathogenesis of this disease by causing damage that allows for infection by more virulent pathogens such as Pseudomonas aeruginosa. Despite a marked reduction in overall H. influenzae invasive disease postvaccine, non-typeable strains continue to cause invasive complications such as meningitis and bacteremia albeit with low incidence. Worryingly, several studies in various post-vaccine populations have observed steadily increasing NTHi invasive incidence rates, highlighting the importance of investigating the mechanisms of invasiveness in the absence of capsule [11][12][13][14][15][16].
Comparison of the whole genome sequences (WGS) of multiple H. influenzae isolates has revealed enormous genomic diversity among strains [17][18][19]. In addition to extensive singlenucleotide polymorphisms, the species as a whole contains many more distributed/accessory genes than core genes, i.e. genes present in only a subset of isolates versus those present in all isolates. Thus the species-level supragenome (or "pan-genome") is several times the size of the genome of a single strain. On average, each strain-pair differs by the possession of~400 genes [17]. These extensive differences in gene possession among strains lead to profoundly different phenotypes with respect to disease causation [20]. This diversity in genotype and phenotype is consistent with a role for individual genes in virulence, and suggests that effective prevention and treatment strategies will develop from specifically targeting distributed/accessory virulence determinants [21,22]. Rational vaccine design currently focuses on highly expressed, surfaceexposed core gene products ensuring broad coverage of entire bacterial species [23]. While this is desirable for human pathogens that are not part of the normal microflora, targeting core genes of opportunistic pathogens results in the eradication of the commensal populations as well as the disease causing populations of the target species. Thus, conceptually, this is akin to treatment with antibiotics, which results in major disruptions to the hosts' normal microbiota leading to further acute and chronic conditions [24][25][26][27]. In these cases we propose that alternative "microbiome-sparing" approaches should be investigated. In this way, carriage strains lacking virulence genes could be spared, leaving the host's commensal ecosystem intact.
Multiple surface molecules have been associated with pathogenesis and immunity in both acute and chronic NTHi disease. The ability to bind to various cell types depends on adhesins such as Hif, Hmw1/2, Hap, Hia/Hsf, OMP-2,5, oapA and PCP [28,29]. These gene products are frequently under phase variable control due to tandem repeats within or slightly upstream of the coding sequences. The coding sequences are also highly variable from strain to strain and rife with repetitive sequences, which accumulate as a result of immune pressure on these surface exposed molecules [30][31][32][33]. Other surface molecules play an important role in Haemophilus virulence, in particular, the lipo-oligosaccharide (LOS). Secondary modification of the LOS results in considerable antigenic heterogeneity among and within strains and is driven again by phase variable genes such as lic, lgt, lsg, and sia genes [34,35]. Individual strains often do not possess all of these virulence factors; they possess only a subset of them. Thus it has been proposed that possession of certain subsets provides fitness advantages in different settings (middle ear, lung, nasopharynx etc.) [22].
NTHi have been observed within host cells in in vitro and in vivo assays as well as in clinical tissue samples suggesting that survival and persistence within host cells plays a role in chronic disease [36][37][38][39][40][41][42][43][44][45][46][47][48][49]. There is also a correlation between the ability of bacteria to survive in macrophages and disease outcome or severity. In the rat model, strains able to survive in macrophages in-vitro had an increased ability to cause systemic disease [50]. Similar observations have been made in several other bacterial species [51][52][53][54].
Here we present an initial characterization of Msf, a novel distributed NTHi virulence factor with a role in macrophage survival and disease.

Results
Identification of a novel protein family in H. influenzae, characterized by variable numbers of SLR motifs per protein, and variable numbers of genes per strain High genetic diversity, variation in repetitive motifs as well as multiple gene copies with allelic differences are all associated with immune evasion, and these are also common characteristics of virulence determinants such as adhesins, autotransporters and other host-interacting proteins [22,[30][31][32][33][34][35][55][56][57]. We analyzed the H. influenzae supragenome developed from whole genome sequencing (WGS) of 24 geographically and clinically diverse isolates ( Table 1) for proteins that fit these criteria. 47,997 open reading frames (ORFs) were identified and were grouped into 3100 orthologous gene clusters by virtue of their sharing at least 70% amino acid sequence identity over 70% of the length [17,19]. Manual curation of these gene clusters revealed a large set of genes, distributed among several clusters, that all contained the Sel1 Pfam motif (PF08238). From an initial list of genes identified to contain this motif, we performed multiple iterations of MEME/MAST on the H. influenzae supragenome [58][59][60]. MEME analysis identified the consensus Sel1-like repeat (SLR) motif shared among these ORFs. MAST was then used to iteratively search the entire 24-strain supragenome for new instances of this repeat in the already identified ORFs, as well as new ORFs containing it. This analysis combined with manual curation identified a total of 79 ORFs, which were represented by 10 supragenome gene clusters (S1 Table). The common SLR motif identified by MEME/MAST is 36 amino acid residues long and is characterized by conserved alanine and glycine amino acids, as well as a 100% conserved tyrosine residue ( Fig 1A).
Sequences in one of the ten SLR-containing gene clusters are highly similar to each other (at least 95% amino acid identity over 95% of their length), and are present at exactly one copy in each of the 24 strains at a common genomic locus (Table 2). This single, core SLR gene contains an N-terminal signal peptide sequence and two tandem SLR motifs that differ in sequence ( Fig 1B). We refer to this gene cluster as the core SLR (SlrC) subfamily.
The remaining 55 SLR-containing ORFs represented by nine distributed gene clusters were collectively named the variable SLR (SlrV) subfamily ( Fig 1C shows the consensus SLR motif found in the SlrV proteins). The individual SlrV subfamilies are referred to as SlrVA-I and are almost always found in tandem at a second non-core genomic locus described below. Similar to the SlrC, all SlrV proteins also contain an N-terminal signal peptide predicting that they are secreted or membrane associated. Each SlrV subfamily has unique variations of the SLR motif sequence (S1 Fig, S2 Table). However, conservation of several key residues can be seen within all SlrV, including a 100% conserved tyrosine at position 27. In addition, almost all SlrV (but not SlrC) contain a conserved C-terminus with two equi-distant cysteine residues (Fig 1D). Individual strains vary with respect to the total number of slrV genes, ranging from 0 to 5 (Table 2), and as to whether or not they contain genes from multiple SlrV subfamilies (Fig 2). Strains with multiple copies from the same SlrV subfamily can also display heterogeneity with respect to the number of motif repeats per gene (Fig 2, S3 Table).
Due to the original clustering requirements we expect that each SLR gene family has a distinct biological function that may or may not be related. This is supported by the fact that the different SLR subfamilies can be easily distinguished by examining the sequences of their signal peptides and C-termini. Phylogenetic trees generated from entire SLR gene sequences (Fig 3A) closely resemble those based on just the signal peptides ( Fig 3B) or C-termini ( Fig 3C) from those genes. However, individual motifs do not cluster in the same manner. Motifs located within the same protein (and thus of the same SLR subfamily) are dispersed around the tree, often clustering more closely with motifs found in other proteins (often in other SLR subfamilies) (Fig 3D). This is illustrated best by the five SLR motifs found in tandem in each slrVB gene, two of which are closely related by sequence, one which is more closely related to slrC motifs and two of which are more closely related with slrVA motifs. This reveals a complicated  [19] evolutionary history in the SLR-containing genes likely driven by gene and motif duplication/ contraction, as well as horizontal gene transfer in this naturally transformable organism.

Organization of slrV genes on the chromosome
The vast majority (50/55) of the slrV genes are found at a common chromosomal locus (SlrV locus 1). It is flanked by a putative transporting ATPase (COG3101) and peptide chain release factor 1 (prfA) (Fig 2). The SlrVA subfamily accounts for over 60% of the total slrV genes and is always found at this locus. When present, slrVB, slrVC and slrVD are also found at this locus. In most of the genomes that do not have any genes present at SlrV locus 1 (22421, 3655, 6P18H1, PittAA, R3021, B10810, and RdKW20), remnants of slrVC or slrVD can be found, suggesting that these genes are ancestral and have been lost in these strains. We also observed four non-SLR-containing ORFs at the SlrV locus 1: (a) a hypothetical protein with a predicted hydrolase/metallo-beta-lactamase domain (COG2333) which is present in 6 strains (cluster417 in Fig 2); (b) a second hypothetical protein also with a metal-dependent hydrolase/beta-lactamase domain (COG1234) which is only present in the PittHH strain (cluster1434 in Fig 2); (c) a hypothetical protein with a conserved uncharacterized domain (COG3883) and two transmembrane domains which is present in PittGG and R1838 (cluster754 in Fig 2); and (d) a hypothetical protein only found in strain R1838 (cluster1770 in Fig 2). Members of the much less prevalent SlrV subfamilies were found at three other genomic loci. In strain PittGG, slrVE and slrVF are found inserted between a hypothetical ycbL homolog and a 2-oxoglutarate dehydrogenase E1 component (SlrV locus 2). In strain R1838 slrVG is found inserted between an Undecaprenyl-phosphate N-acetylglucosaminyl 1-phosphate transferase and (Protein-PII) uridylyltransferase (SlrV locus 3). And finally in strain 3655 slrVH and slrVI are found in tandem between a methionyl-tRNA synthetase and an Apb scaffold protein gene.

Orthologues with similar motifs
BLAST searches of the SlrV sequences against the non-redundant NCBI database revealed that this family is highly conserved across many bacterial species, including multiple genera and  SlrV locus 1 is located between core genes encoding a putative transporting ATPase and peptide chain release factor 1 (prfA). Strain PittII contains four slrVA genes in tandem; two with 2 SLR and two with 4 SLR. These genes correspond to msfA1-4 in this manuscript. Only genes predicted to encode full-length products are illustrated. 7 genomes do not contain any full-length slrV gene at this locus. * denotes genomes in which SlrV locus 1 is located on the edges of contig breaks. Therefore it is possible that there are more slrV genes or SLR motifs located in the assembly gaps. Phylogenetic inference from SLR genes and protein domains. Maximum-likelihood trees were calculated using RAxML [111] and visualized with the interactive Tree of Life web server (http://itol.embl.de) [112,113]. Colored nodes represent the SLR subfamily from which the particular sequence was extracted from (see Legend  [61][62][63][64][65][66], Helicobacter pylori (Hcp family) [67][68][69][70][71][72], Francisella tularensis (DipA) [73,74] and Escherichia coli (EsiB) [75,76] where SlrV homologues have all been shown to be critical for host interactions, in particular, intracellular interactions. In all of these species the consensus SLR motif is 36 amino acid residues long and contains the characteristic conserved alanine and glycine residues (Fig 1).
Distribution of SLR-containing genes among 210 H. influenzae strains, and correlation with virulence Because slrC is core to H. influenzae, no simple association can be made between the presence of SLRs and virulence, however we considered whether the possession of a particular slrV or subset of the accessory slrV subfamilies might be associated with colonization outside of the nasopharynx. In addition to the 24 sequenced strains, we mined gene possession data for another 186 H. influenzae strains from a dataset generated from a custom-designed supragenome-based genomic hybridization (SGH) array [19]. This array contains 31,307 probes that collectively cover all known alleles of 2890 of the gene clusters identified from the 24 WGS strains. This includes 299 probes specific to the nine SlrV subfamilies. The H. influenzae SGH array was used to determine gene presence/absence profiles for each subfamily. However, this method was unable to capture gene copy number or SLR motif copy number. Therefore, statistical associations with copy number were not evaluated below. For the WGS strains, the array analysis accurately identified the presence and correctly assigned the identity of all of the previously detected SlrV, providing confidence that the application of this technology to unsequenced strains would provide robust and accurate data with respect to distribution. The SGH analysis confirmed that slrC is a core gene being identified in 209/210 of the strains. A follow-up PCR confirmed that the one slrC-negative strain was a false negative (S3 Table). SGH analysis also confirmed that the SlrV family is widespread within the species, such that 92% (193/210) of strains contain at least one member (Fig 4, S3 Table). The most prevalent SlrV subfamily was confirmed to be SlrVA (which we have named msf) which was identified in 153/210 strains (Fig 4, msf possession is indicated by a red block in the outer track) and is the subject of all of the detailed characterizations reported in this study. SlrVB (43/210, dark orange blocks in Fig 4), SlrVC (21/210, orange blocks in Fig 4) and SlrVD (98/210, light orange blocks in Fig 4) are usually only present in strains that also contain SlrVA. SlrVG is found in 12/210 strains interspersed in the population (green blocks in Fig 4). SlrVE and SlrVF are rare and present in only three closely-related strains (yellow and lime green blocks in Fig 4). SlrVH (20/210) and SlrVI (21/210) are highly correlated and both are present in 21/210 strains; 16 of which are grouped together into a distinct lineage (blue blocks in Fig 4).
We considered the hypothesis that isolates collected from the site of infection of diseased individuals have greater virulence potential than nasopharyngeal carriage isolates taken from healthy individuals due to the presence of accessory virulence factors or a lack of commensal factors [19,77,78]. To test whether the slrV genes fit this hypothesis, we determined their gene frequencies within the carriage and disease isolate subgroup. Indeed, we found that the fraction of isolates containing either slrVA (msf) or slrVB was significantly higher among disease isolates than carriage isolates (p-values of 0.027 and 0.009 respectively, Fisher-exact test) ( Table 3). This analysis also found that slrVH and slrVI are highly correlated with carriage isolates, as supported by the gene possession tree (Fig 4, Table 3). This difference supports the notion that individual SlrV subfamilies have different biological functions. . Colored strain names indicate whether the strain is a commensal isolate (blue) or disease isolate (red). Gene data was obtained by whole genome sequencing (24 strains) and by genome hybridization using the custom-designed H. influenzae SGH array (186 strains) [19]. Binary data (gene presence or absence) was used to build a distance matrix and the phylogenetic tree was calculated using the neighbor joining method [114]. The interactive Tree of Life web server (http://itol.embl.de) was used to visualize the un-rooted tree [112,113]. doi:10.1371/journal.pone.0149891.g004

Selection of PittII as a model strain for in vitro and in vivo studies
We elected to investigate the SlrVA (msf) subfamily due to its prevalence within H. influenzae, its over-representation among disease isolates, and the fact that slrVB is only present when slrVA is also present. To characterize the contribution of the SlrVA subfamily in NTHi virulence we selected strain PittII, isolated from a child with perforating otorrhea. This strain was chosen because it: (a) provides a good baseline to observe decreases in virulence, since it induces rapid and severe local and systemic disease in the chinchilla model of OM and invasive disease (OMID) [20]; (b) is more easily transformed than other strains; and (c) codes for only the slrVA (msf) genes. The SlrV locus 1 in PittII codes for four sequential msf genes (Fig 2). These genes have been named msfA1 (447 base-pairs [bp]), msfA2 (660 bp), msfA3 (447 bp), and msfA4 (660 bp). Although exactly the same size, msfA1 and msfA3 share 90% identity, whereas msfA2 and msfA4 share only 82% identity. This leads to significant differences in predicted amino acid sequences with 80% identity between msfA1/msfA3 and only 73% between msfA2/msfA4 (S4 Table). Each coding sequence is separated by 187 bp, but no promoters or termination sequences were detected in these regions, suggesting that the whole locus acts as a single operon. To assess the role of the msf in NTHi disease, we constructed a knockout (KO) with all four msf copies deleted in PittII and replaced with a kanamycin resistance cassette producing the strain PittII Msf-KO (Table 4).

PittII msf genes are transcribed in vivo in the chinchilla OMID model
To determine if msfA1-4 are transcribed under various growth conditions we examined planktonic cultures, in vitro biofilms, and bacteria recovered from the tympanic bullae of PittII-infected chinchillas. We detected transcripts under all three conditions, consistent with the predicted 447 and 660 bp ORFs, confirming that at least two separate msf genes were transcribed ( Fig 5).

Msf is important in in vitro macrophage uptake and survival
The L. pneumophilia and F. tularensis Msf homologues (LpnE, EnhC, and DipA respectively) all play roles in macrophage survival [62][63][64][65]73]. Furthermore, it has been proposed that an ability to survive within human host cells contributes to H. influenzae persistence and/or trafficking to new infection sites [36][37][38][39][40][41][42][43][44][45][46][47][48][49]. Thus, we compared the ability of WT and KO PittII strains to invade and survive in human macrophages to determine the role of Msf in phagocytosis and intracellular persistence. We inoculated differentiated THP-1 macrophages at a MOI at 100:1 and incubated for 1h to allow for adherence and phagocytosis. Extracellular bacteria were then eliminated with polymyxin B, the macrophages were lysed, and the viable intracellular bacteria were enumerated using dilution and plate counts at 2, 24, 48 and 72 hours after inoculation (Fig 6A). At the 2 hour time point, there were~15X more WT than KO bacteria within the macrophages. Furthermore, the WT was able to survive up to 72 hours, whereas the mutant strain was completely killed within 48 hours (Fig 6A). The survival defect of the KO was rescued by complementation with a single msf gene, msfA1, demonstrating that a single allele is sufficient for extended survival within macrophages (strain PittII Msf-COMP). The complemented strain was created by inserting msfA1 into the ompP1 site, but PittII survival in macrophages was not affected by the deletion of the ompP1 gene as demonstrated with control strain PittII OMPP1-KO (Table 4). We confirmed that the bacteria were intracellular by using a combination of a fluorescent reporter strain and inside-out staining (Fig 6B).
The effect of Msf loss on macrophage survival is not strain-specific, since the presence of SlrVA subfamily is also critical for macrophage survival of strain 86-028NP. Like PittII, the 86- Effusions from the middle-ears were harvested immediately and RNA was extracted. Both panels: RNA samples were reverse transcribed (+) or had reverse transcriptase (RT) omitted from the reactions (-). PCR was then performed on + and -RT samples with a primer pair specific to the msfA genes. Due to sequence similarity multiple alleles are amplified. The two different sizes of the four msfA genes make them easily discernible in the gels (white arrows).
Identification of a Novel Virulence Factor in Haemophilus influenzae 028NP strain contains four adjacent msf genes (Fig 2). In this strain, a deletion mutant of the entire SlrV locus 1 (86-028NP Msf-KO) is unable to survive 48 hours after invasion, while the WT survives more than 72 hours (Fig 6C).
Furthermore, the Rd KW20 strain, which lacks any slrV gene (Fig 2), has been previously shown to be quickly phagocytosed and killed within 24 hours by macrophages [50]. Yet, when msfA1 is inserted into Rd KW20 (Rd Msf-INS), it survives substantially and significantly better than WT, lasting up to 48 hours within macrophages (Fig 6D).
Phagocytosis by human macrophages requires rearrangement of the actin cytoskeleton, which is inhibited by cytochalasin D. In the absence of this inhibitor, after one hour 16.46 ±0.12% of the PittII were found within the PMA-differentiated THP-1 cells, while in the presence of cytochalasin D the vast majority of the PittII cells remained in the extracellular compartment (invasion rate of 1.48±0.11%). This effect suggests that PittII primarily enters macrophages via an actin-dependent process.

Msf confers PittII with an in vivo fitness advantage
To determine whether Msf provides a fitness advantage, we competed WT and Msf-KO PittII strains in: 1) planktonic culture; 2) in vitro biofilms; and 3) in vivo using the chinchilla OMID model. In these assays, the WT and kanamycin resistant mutant strains were mixed in a 1:1 ratio and then inoculated bilaterally through the tympanic bullae and allowed to infect for three days. In each assay, end point samples were serially diluted and plated on two sets of agar plates: non-antibiotic-containing plates to enumerate the total amount of bacteria present and kanamycin-containing plates to enumerate the KO only. The competitive Index (CI) was calculated as the ratio of the colony-forming units (CFU) of KO to WT recovered, adjusted for initial input (KO end / WT end) / (KO t = 0 / WT t = 0 ).
In planktonic cultures we detected a slight difference in growth rate between the PittII WT and Msf-KO strain, however both grew to the same maximum OD A600 (S2 Fig). Despite this there was no significant difference in the fitness of the WT and KO strains during co-culture: for 3 independent experiments, CI =~1 ( Fig 7A). In contrast, when bacteria were grown as biofilms, the WT strain displayed a strong advantage starting on day 2 and dominated the cultures by day 4 (Fig 7B) as indicated by decreasing CI. The WT's advantage was observed both in the biofilm itself, as well as in the supernatant where detached bacteria are found and was statistically significant from Day 3 onwards (p<0.05, one-sample two-tailed t-test). This suggests that the difference does not reflect variability in attachment between the strains.
Equal numbers of WT and KO were used to bilaterally infect the middle ears of six chinchillas. After 3 days, the animals were euthanized and the bacteria were collected from: left and right ear effusions; the adherent biofilm layers attached to the middle-ear mucosa (left and right bullar membranes); and the brains. In all 6 animals and all 5 collections, the WT strain displaced the KO strain almost completely (Fig 7C).
The single msfA1 complement strain (PittII Msf-COMP) did not fully restore the WT phenotype during in vivo infections, since they were still outcompeted by the WT strain, albeit to a lesser degree (Fig 7D). This is in contrast to the macrophage phagocytosis/survival experiments where the phenotype was fully complemented (indicating that the slight growth defect is irrelevant in these assays) ( Fig 6A). As shown above, in the WT:KO competition experiment, very few tissue sites had any detectable KO bacteria after three days (3/12 bullar membranes and 4/ independent means. (D) Bacterial uptake and survival of: Rd KW20 (non-encapsulated variant of a type D strain that lacks any Msf gene) and Rd Msf-INS (A mutant with the PittII msfA1 gene inserted at the ompP1 locus). Dotted line indicates the limit of detection for the Rd KW20 strain. * p<0.05 by two-tailed t-test for two independent means. doi:10.1371/journal.pone.0149891.g006 . CFU were enumerated from both the adherent biofilms as well as the overlying supernatant and the CI was determined from each fraction. *p<0.05 by one-sample two-tailed t-test (μ 0 = 0) on the log CI for both the biofilm and planktonic fractions. (C-E) In two separate experiments six chinchillas were inoculated with 1:1 mixtures of strains and five tissue sites (brain, right and left bullar membranes and right and left bullar effusions) were harvested three days after inoculation. Thus n = 6 for brains and n = 12 for bullar sites. (C) In vivo competition between PittII WT and Msf-KO inoculated 1:1 into 6 animals. Each data point represents a single tissue-site CI value. Points on the X-axis (CI of 0) indicate that no KO bacteria were 12 bullar effusions). In contrast, in the WT:COMP competition, both the WT and the Msf-COMP strains were recovered from almost all tissue-sites (Fig 7E). Of particular note the MsfA1 complementation restored trafficking to the brain, as we detected both WT and Msf-COMP bacteria in 3/6 brains.
Msf plays a role in NTHi dissemination to the brain in the chinchilla OMID model WT and Msf-KO strains were separately evaluated to ascertain the virulence effect of the msf on NTHi virulence and disease progression. Strains were inoculated bilaterally through the tympanic bullae and animals were monitored daily for up to 12 days for signs and severity of local (otologic) and systemic disease (See S5 Table for scoring criteria). All animals developed bilateral OM, though we detected no significant difference between the WT and Msf-KO strains with respect to local middle ear disease (data not shown). However, the mortality between the two groups was noticeably different (although not statistically significant due to the small number of animals infected). Only two out of the ten WT infected animals survived until the end of the experiment, whereas six of nine animals infected with the KO strain survived (p = 0.0698, Fisher-exact test) (Fig 8A). In addition to observing clinical signs during disease progression, upon death the left and right bullar effusions, the brain, and the lungs were collected and analyzed for the presence of WT and Msf-KO bacteria (Fig 8B). Consistent with the in vivo competition experiment, the WT strain was recovered from the brain in 8 out of 10 animals, while the Msf-KO strain was not detected in the brain of any animals (0/10) (pvalue = 0.0007, Fisher-exact test). This difference along with the competition data suggests an important role for msf in dissemination to the CNS and/or blood in this model.

Msf is important in anaerobiosis
We compared the transcriptional profiles of the PittII WT and Msf-KO strains during late exponential/early stationary phase planktonic culture (OD A600 of 0.7) using the H. influenzae SGH Array and the methods outlined by Janto et al [79]. The threshold for differentially regulated genes was set as an absolute change of at least 2-fold and with a Bonferroni-corrected pvalue of 0.05 or less. The Msf-KO strain had 75 up-regulated and 75 down-regulated genes compared to the WT (S6 Table). Raw and processed transcriptional data for this experiment has been deposited in NCBI's Gene Expression Omnibus (GEO) [80] and are accessible through GEO Series accession number GSE70172 (http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE70172).
Many genes involved in anaerobic respiration were down-regulated in the KO strain (Table 5), including: arcA, part of the two component system ArcAB; an oxygen-sensitive master regulator, and the nrfABCD operon which encodes a periplasmic nitrite reductase. Under anaerobic conditions, the gene products from the nrf operon reduce nitrate to ammonia in the bacterial periplasm [81,82]. In addition, the Nrf complex has been implicated in virulence through the detoxification of nitric oxide (NO), which is important in macrophage survival [81,83]. observed and therefore have an infinitely low CI value. (D) In vivo competition between PittII WT and Msf-COMP inoculated 1:1 into 6 animals compared with the previous data obtained from competition between PittII WT and Msf-KO in vivo. Each data point represents a single tissue-site CI value. Points on the Xaxis (CI of 0) indicate that no KO bacteria were observed and thus have an infinitely low CI value. *p<0.05 by two-tailed t-test for two independent means of log cfu data. **p<0.05 by Mann-Whitney U test. n.s. (not significant). (E) Percentage of tissue sites that were positive for bacteria in each of the two in vivo competition experiments (WT vs Msf-KO, and WT vs Msf-COMP). * p<0.05 by two-tailed Fisher-exact test. n.s. (not significant). doi:10.1371/journal.pone.0149891.g007

Identification of a Novel Virulence Factor in Haemophilus influenzae
The ccmABCDEFGHL operon, of which all components were found down-regulated in the KO, encodes a type 1 cytochrome c biogenesis system including a heme exporter that is required for cytochrome c maturation [84,85]. Thus, the Ccm complex plays an essential role in electron transfer and respiration.
Also down-regulated in the KO strain were the dmsABC/torD/napF and torYZ operons. The Dms complex encodes a dimethlyl-sulfoxide (DMSO) reductase, and torD and napF encode components of the trimethylamine N-oxide (TMAO) and nitrate reducing complexes, respectively. DMSO reductase is a membrane-associated anaerobic electron transfer enzyme that contains molybdenum and iron-sulfur cofactors [86,87]. In addition to DMSO this complex can act on various other methyl-sulfoxides including TMAO and is transcriptionally activated in the absence of nitrate and oxygen [86][87][88]. The Tor gene products make up a TMAO reductase that is functionally related to the Dms DMSO reductase due to overlapping substrates [89]. Finding both Dms and Tor operons cumulatively strengthens the theme of anaerobic and nitrogen regulated genes involved in electron transfer. Like Nrf and Dms, Tor gene products require molybdenum cofactors [90]. Keeping in this theme, the moaA-D operon was also found downregulated in the KO strain, which encodes molybdenum cofactor biosynthesis genes.
Finally, in addition to anaerobiosis genes, several virulence factors were found differentially regulated between the WT and KO strain. Highly down-regulated in the KO strain were the LrgAB virulence factors which encode an anti-holin like complex that increases penicillin tolerance and inhibits murein hydrolase channels in Staphylococcus aureus [91]. Highly up-regulated in the KO strain was a Hop effector protein associated with Type III secretion as well as manganese superoxide dismutase (sodA).
Ten of these genes (each from a different operon) were chosen for confirmation by quantitative real-time PCR (qRT-PCR). Primers were designed for each of these genes based on the Pit-tII WGS data (S7 Table). The microarray results were confirmed in all cases by the qRT-PCR (Table 6).

Discussion
We have reported and characterized a large and heterogeneous set of genes in H. influenzae that contain Sel1-like repeats (SLR). The SLR acronym is derived from the first characterized member, the C. elegans suppressor-enhancer of lin-12 (sel) gene, but refers specifically to the motif found among bacteria (Pfam #PF08238) [92]. Proteins with SLR domains are a subgroup within the solenoid protein superfamily, which includes tetratricopeptide repeat (TPR, 34 aa repeats) proteins, pentatricopeptide repeat (PPR, 35 aa repeats) proteins and transcription activator-like (TAL) effectors (30-42 aa repeats). Tandem arrays of amino acid repeats in these proteins lead to the formation of modular secondary structures such as sets of anti-parallel αhelices and result in a superhelical macromolecule. Functionally, some TPRs have been implicated in protein-protein interactions [93], some PPRs show RNA-binding capability [94] and some TAL effectors have been demonstrated and exploited to bind DNA [95,96]. SLRs contain 36-44 aa repeats and are characterized by conserved glycine residues that support sharp turns in the superhelices as well as conserved alanine residues [76,97]. Consistent with this, the 36-residue SLR motif we found in H. influenzae contains four highly conserved glycine and alanine residues (positions 4, 11, 12, 15, 19, 24, 30, and 32: Fig 1). The highly conserved residues allow for identification of the motif, yet there is considerable heterogeneity associated with SlrV genes, which occurs at multiple levels including: the presence of at least ten SLR-containing gene subfamilies based on sequence homology; variation in the number of motif repeats with a gene subfamily; variation in the number of gene copies per strain; variation in the number of different gene subfamilies per strain, and variation with respect to chromosomal location based on gene subfamily type (Figs 1-4). The modularity of SLR-containing genes allows for rearrangement of the modular units, as well as expansion and contraction of tandemly repeated SLR domains. In this manner, SLR-containing genes have the potential to rapidly evolve. We hypothesize multiple adaptive values for the changes. First, they could affect protein function by changing the binding properties of the SLR-containing protein and its partners. Alternatively, they could misdirect the immune response by focusing it on decoy peptide that is highly variable yet functionally irrelevant [98].
One SLR subfamily, SlrC, is found in all strains and highly conserved, whereas the remaining SlrV subfamilies have variable distributions. Although slrV genes are found in the majority of H. influenzae strains, different SLR subfamilies have distinct distributions suggesting that  (Table 3). We focused our initial studies on SlrVA, since it is the most common SlrV subfamily and is the sole SlrV locus in some strains, allowing us to investigate its function in isolation of the others. This is the case in NTHi strain PittII, a highly virulent strain in the chinchilla OMID model originally isolated from a child with perforating otorrhea, which we used to characterize the SlrVA subfamily (of which msf is a member).
In investigating the function of the slrVA (msf) genes in PittII, we considered functional studies of SLR-containing proteins in other species of bacteria, many of which interact with host proteins. At least nine genes with SLR domains have been identified in H. pylori; also known as the Helicobacter cysteine-rich protein (Hcp) family, due to the presence of conserved pairs of cysteine residues within each SLR repeat. These cysteines are separated by seven residues and preceded by alanine, glycine or serine [71]. Phylogenetic analyses in H. pylori found strong positive selection of residues on the SLR surface of Hcps in a gene and lineage specific manner (which for this species is also correlated with geographic location) [68]. These observations suggest that the mutations are adaptations to host responses. In H. influenzae we found a highly conserved pair of cysteine residues matching the H. pylori motif, not in the SLR themselves, but as a part of a conserved SlrV C-terminus (Fig 1D). This raises the possibility that the H. influenzae SlrV proteins also cross-link via di-sulfide bridges similar to the Hcps [71].
Many previously characterized SLR-containing genes in other species are involved in hostpathogen or host-symbiont protein-protein interactions. Hcps are recognized by the host's immune system, as indicated by anti-Hcp antibodies in sera from H. pylori patients [99]. HcpC has been shown to interact with the host proteins Nek9, Hsp90 and Hsc71 [70]. HcpA is a potent pro-inflammatory and Th1-promoting protein, and can trigger the differentiation of human myeloid monocytes into macrophages [67,69]. Six SLR-containing genes have been identified in L. pneumophilia; three of them (lpnE, enhC and lidL) have been implicated in host interactions, specifically cell entry and/or trafficking of the L. pneumophilia containing vacuole [61][62][63][64][65][66]. Consistent with direct host interaction, LpnE is found in culture supernatants [63], is required for invasion of human epithelial and macrophage cell lines [62,63], is localized to the legionella containing vacuole membrane [65], and can interact with the human proteins OBSL1 [63] as well as OCRL1 and the glycolipid PtdInd(3)P [65]. The intracellular pathogen F. tularensis also produces an SLR-containing protein, DipA, which has been shown to be membrane-associated and localized to the bacterial surface. Deletion of dipA, results in a defect in intracellular replication and survival in macrophages as well as dissemination and lethality in mice [73].
Here we report similar findings in H. influenzae. Deletion of all four SlrVA genes from H. influenzae strain PittII revealed a defect in survival within macrophages (Fig 6A). We therefore renamed these genes macrophage survival factors (msf). We also observed this survival defect in OM strain 86-028NP upon deletion of its four msf genes (whose copies have slightly different numbers of motif repeats: Fig 2). Furthermore, insertion of a single copy of the PittII msfA1 gene into the avirulent strain Rd KW20 led to an increase in its survival time within macrophages ( Fig 6D). All of these data support the hypothesis that msf (and therefore the SlrVA subfamily) plays a role in intracellular survival. Competition studies in the chinchilla OMID model showed that the presence of msf provided a significant fitness advantage in vivo, and vastly increased trafficking to the brain (Figs 7 and 8). Together these two traits probably account for much of the difference in mortality seen between the PittII WT and Msf-KO strains (Fig 8A). Notably, complementation of the PittII Msf-KO mutant with a single msfA1 gene restored the macrophage survival defect (Fig 6A), but only partially complemented the mutant defect in causing systematic disease in chinchilla (Fig 7D and 7E). This suggests a gene dosage effect in vivo that is not observed in macrophage survival in vitro. Alternatively, there may be slightly different functions for the various msf alleles.
Protein-protein interactions can influence signaling events and there is some evidence for involvement of SLR-containing proteins in signal transduction. The alpha-proteobacterium Sinorhizobium meliloti utilizes the two-component system (TCS) ExoS/Chv1 to regulate the switch from its free living to invasive form within its alfalfa host (Medicago sativa) by modulating biofilm formation and lipopolysaccharide modification [100][101][102][103]. ExoR, which is an SLRcontaining protein, represses ExoS/Chv1 signaling by direct binding to ExoS [103]. In this context, it is notable that the H. influenzae SLRs share a 100% conserved tyrosine residue, which is not common to SLRs in general. We therefore hypothesize that this residue is important for the H. influenzae specific functions of its SLR-containing proteins. Future work will focus on establishing whether this residue is a kinase target involved in bacterial signaling.
We investigated a role for msf in bacterial signaling by performing a microarray analysis. We observed down-regulation of multiple operons that encode periplasmic proteins for utilizing alternative electron acceptors such as nitrate (nap), nitrite (nrf), and methyl sulfoxides (dms, tor), as well as genes associated with required cofactors (moa). While these operons are under the control of oxygen-sensitive master regulators like Fnr and the TCS ArcAB, we note a much more significant overlap with the regulon that is controlled by the nitrate-and nitritesensitive TCS NarPQ [104][105][106][107]. Because no transcriptional changes were observed in narPQ and due to the propensity of SLR-containing molecules to be involved in protein-protein interactions, we hypothesize that Msf plays a role in NarPQ signaling at the protein level. Alternatively, Msf might be a host-interacting protein that affects the NarPQ regulon indirectly via an unidentified intermediate. Regardless, the transcriptomic differences observed between the WT and KO PittII strains suggest that Msf proteins play an important role in the regulation of genes during under anaerobic conditions (Table 5). H. influenzae forms robust biofilms during chronic infections [3,4,8], and it is known that dissolved oxygen levels drop precipitously within biofilms [108]. Thus, the fitness advantage of the WT over the KO in the in vitro biofilm competition assays and the in vivo competition assays may exist, in part, because of the WT's ability to sense and respond to a lack of O 2 as a terminal electron acceptor. The same pathway may also be involved in the macrophage survival phenotype due to oxygen limitation in an intracellular environment. Additionally, it is known that NrfA (which is down-regulated in the Msf-KO) consumes NO, thereby minimizing the formation of reactive oxygen species by macrophages [81,82]. We hypothesize that the importance of Msf in intracellular macrophage persistence explains the reduced invasiveness and inability of the Msf-KO to infect the chinchilla brain in vivo. Future work will focus on determining whether this is by direct trafficking within macrophages or whether the intracellular persistence phenotype is relevant to other cell-types as well.
Our data demonstrate that the SLR-containing Msf proteins are virulence factors in H. influenzae infections, where they likely play a role in both chronicity of disease by providing a fitness advantage in biofilms and increased survival in macrophages, as well as in invasive disease as shown by increased trafficking to the brain in the chinchilla disease model. We propose that other SlrV family members are also likely to be involved in the virulence potential of H. influenzae. Chronic H. influenzae infections are usually polyclonal, and on average, strains differ by approximately 20% of their genic content. Further, many strains are not virulent, and eliminating all H. influenzae strains may lead to adverse changes in the host's microbiome. The SlrVA (Msf) represent a potential target to eliminate large subsets of highly virulent strains, while allowing strains with less pathogenic potential to remain intact, thus setting the stage for a microbiome-friendly treatment strategy.

Ethics Statement
All animal work was conducted with the approval of the Allegheny-Singer Research Institute's Institutional Animal Care and Use Committee (IACUC) and Research Facilities Department (RFD). Working closely with the IACUC, the RFD provides the highest standards of humane care and use of laboratory animals and assures compliance with institutional and federal regulations. They share responsibility to assure that the use of animals in research projects are necessary, that the investigator has included in the protocol measures to eliminate any unnecessary pain and discomfort to the animals, and that alternatives to the use of live animals have been considered.
In silico analysis for domain identification. 47,997 coding sequences (CDS) identified in 24 strains of H. influenzae were interrogated using the Multiple EM for Motif Elicitation (MEME) program [58,59] (http://meme.nbcr.net/meme/tools/meme). This program is designed to discover domains conserved among sequences by creating a position-dependent probability matrix. Once the 36 amino acid SLR motif had been identified, the consensus sequence from MEME was submitted to the Motif Alignment and Search Tool (MAST) program [60] to search for new instances and variants of the initially identified SLR-containing ORFs. Multiple iterations of MEME/MAST were performed to maximize identification of SLRcontaining proteins. Sequence identities and similarities were determined using the BLAST programs and the GenBank non-redundant database on the NCBI web server. Motifs were drawn in R with the help of the motifStack package (http://www.bioconductor.org/packages/ release/bioc/html/motifStack.html) (Fig 1 and S1 Fig). Amino acid colors are a modification of the WebLogo default, with Tyr and Cys having unique colors (Y = orange and C = turquoise).
Phylogenetic Tree based on SLR gene sequences and domains. We generated multiple sequence alignments (MSA) using Clustal Omega [109] for 1) 79 SLR-containing full-length CDS and; 2) the signal peptide sequences identified from these CDS using SignalP 4.1 n = 78; one sequence was located on a contig break and missing the N-terminus) [110]. Output from MEME/MAST analysis was used as an MSA for the 36 amino acid (aa) SLR motifs as well as a 22 aa motif identified in the C-terminus of 54/55 SlrV CDS. Maximum likelihood trees were built with the MSA as input to the RAxML 8.1.2 software using rapid bootstrapping with convergence test, thorough maximum likelihood search, Gamma distribution, and the WAG aa substitution matrix [111]. The interactive Tree of Life web server (http://itol.embl.de) was used for visualization and to generate Fig 3 [ 112,113].
Phylogenetic Tree based on gene possession. The 'ape' package in the 'R' environment was used to 1) build a distance matrix based on the SGH gene presence/absence data using the binary setting and 2) generate a phylogenetic tree based on this distance matrix using the neighbor joining method [114]. The interactive Tree of Life web server (http://itol.embl.de) was used for visualization and to generate Fig 4 [112 , 113].
PCR and Sanger Sequencing of SlrV locus 1 in H. influenzae. Genomic DNA extractions were performed on 210 H. influenzae isolates using the QIAamp DNA Mini Kit (Qiagen, CA) according to the manufacturer's instructions for Gram-negative bacteria. Primers located within core genes prfA (CGSHiII_02915) and a putative transporting ATPase (CGSHiII_00679) were designed for the PCR amplification of SlrV Locus 1 based on the PittII genome (Bioproject #PRJNA16404). Sanger sequencing on an ABI 3730xl DNA analyzer was performed to determine gene sequences in PittII.
Bacterial strains and culture conditions. The bacterial strains used in this study are listed in Table 4. The NTHi strain PittII was recovered from a spontaneous pediatric otorrhea case [20]. The PittII mutant strains include the msfA1-4 deletion mutant and an msfA1 complement of the deletion mutant. The NTHi strain 86-028NP, recovered from the nasopharynx of a child with acute OM was also used [115]. We constructed an msfA1-4 deletion mutant on the 86-028NP background as well. Further, we used the laboratory strain RdKW20 and a mutant that was engineered to produce msfA1 PittII . All strains were cultivated in brain heart infusion (BHI) medium (Difco) supplemented with 10 μg mL-1 of hemin (ICN biochemicals) and 2 μg mL-1 NAD (Sigma); we refer to this medium as supplemented BHI (sBHI). For the inside-out staining in Fig 6B, we used PittII GFP, where GFP is transcribed from the prsm2211 plasmid which was obtained as a gift from Drs. Robert Munson and Lauren Bakaletz.
Strain Construction. DNA flanking the SlrV1 locus was amplified from PittII using primers 1/2 (flank1) and 3/4 (flank2) (S8 Table) and from 86-028NP using primers 5/6 (flank1) and 7/8 (flank2). Unique 5' restriction sites were designed in primers to facilitate directional ligation. A kanamycin resistance cassette (km R ) was amplified from the plasmid pHP1 using primers 9/ 10. Purified PCR products were digested using NotI and SalI restriction enzymes to generate non-complementary overhangs. Equimolar amounts of purified flank1, flank2 and km R digests were ligated in a single reaction using T4 DNA ligase (New England Biolabs) with incubation overnight at 16°C. The ligation reactions were run on 0.6% TAE agarose gels and bands of the expected size were excised and purified. This purified gel cut ligation was used as template DNA in a PCR reaction using nested flanking primers, which were then purified and quantified using a Nanodrop 1000 Spectrophotometer and used as the transforming DNA. The H. influenzae complementation vector pASK5 described in Saeed-Kothe et al [116] was used to insert PittII msfA1 into the ompP1 locus of the SLR-KO strain as well as the Rd KW20 WT strain. This plasmid contains a multiple cloning site and a chloramphenicol resistance cassette (cm R ) flanked by 5' and 3' regions of the nonessential ompP1 gene. Amplification from constructs using this plasmid generates transforming DNA that inserts a gene of interest at the ompP1 locus driven by the strong ompP1 promoter. MsfA1 was PCR amplified from PittII chromosomal DNA using primers 11/12 (S8 Table), purified, and digested with BamHI and SalI restriction enzymes. The msfA1 fragment was ligated into BamHI/SalI digested pASK5 vector using T4 ligase. The empty vector was used to generate the PittII OMPP1-KO strain. Plasmid constructs were linearized and transformed into the PittII, PittII Msf-KO strain and Rd KW20 strain and transformants were selected by plating on sBHI plates containing chloramphenicol as described below.
Transformation Procedure. H. influenzae were grown at 37°C, shaking at 200 rpm in 5 mL of sBHI to log phase (OD A600 0.4). 500 μL was transferred to a separate tube containing 1 μg of transforming DNA and mixed gently. The tube was incubated at 37°C for 10 minutes without shaking. 1 mL of pre-warmed sBHI was then added to each tube and incubated at 37°C for an additional 1.5 hours with shaking. 100 μL was then spread on multiple sBHI antibiotic plates. Km r strains were selected by including kanamycin at 40 μg mL-1 and Cm r strains were selected by including chloramphenicol at 2 μg mL-1. Plates were incubated at 37°C with 5% CO 2 for 24 hours. Isolated colonies were picked into 5 mL sBHI with the appropriate antibiotic and incubated overnight at 37°C with shaking at 200 rpm. PCR reactions were performed using different combinations of primers listed in S8 Table from transformant and WT cultures to confirm the correct mutation had occurred. Positive cultures were frozen in 25% glycerol at -80°C.
Bacterial growth Assays. Starter cultures were grown to mid-log phase and used to inoculate 1 mL sBHI cultures in 24-well BD Falcon tissue-culture plates at an initial OD A600 of 0.02. Three wells in each plate containing media without bacterial inoculation were used to calculate background absorbance readings which were subtracted from each experimental well. Measurements were made on a Tecan Infinite M200 Pro plate reader set at 37°C with shaking at 200 rpm. A script was programmed to take absorbance readings at 600 nm every 15 minutes. Data are representative of 3 biological replicates with n = 3 for each strain.
RNA isolation. Bacterial pellets stored in RNAProtect were resuspended in 100 μL of 1X Tris-EDTA (TE) + 1 mg mL-1 proteinase K (Qiagen). Tissues stored in RNALater were homogenized in RLT+ buffer. RNA was then extracted using a Qiagen RNeasy Mini Plus kit with the standard protocol including steps with genomic DNA (gDNA) eliminator columns. The eluted RNA (~85 μL) was DNased by adding 10 μL 10X TurboDNase buffer and 5 μL Tur-boDNase (2 units μL-1) (Ambion) and incubating at 37°C for 1.5 hours. 2 μL more TurboD-Nase was added and incubation continued for an additional 1.5 hours. The DNased RNA samples were cleaned by passing them through the RNeasy protocol a second time (including the gDNA eliminator column steps). Samples were eluted in nuclease free water, quantitated on a Nanodrop 1000 spectrophotometer and stored at -80°C. Each RNA sample was also run on an Agilent 2100 Bioanalyzer using RNA Nano6000 chips to check for RNA degradation.
Reverse transcription for gDNA check, microarrays and qRT-PCR. We performed paired reverse transcription reactions on every RNA sample where one reaction received reverse transcriptase (+RT, Promega) and the other did not (-RT). Both reactions were PCR amplified using primers directed against a housekeeping gene (gapA) and observation of amplification in the +RT reaction as well as lack of amplification in the -RT reaction verified removal of gDNA from each RNA sample. RNA for microarray analysis was reverse transcribed using a SuperScript One-Cycle cDNA Kit (Invitrogen) as outlined in the NimbleGen Microarray Experienced User's Guide including RNaseA and cDNA precipitation steps. RNA for qRT-PCR was reverse transcribed using a Roche Transcriptor First Strand cDNA Synthesis kit with random hexamers.
HI Supragenome Hybridization (SGH) Array. A complete description of the design and methods associated with the HI SGH array for assessment of genic content are described by Eutsey et al [19]. Methods for performing microarray analysis with the HI SGH array are described in full by Janto et al [79]. Briefly, 1 μg of genomic DNA or cDNA was Cy3-labeled using a NimbleGen One-Color DNA Labeling Kit. NimbleGen Hybridization Kits and Sample Tracking Control Kits were used to hybridize the labeled cDNA to the custom-designed 4x72 H. influenzae SGH arrays as well as for array washing. Images were acquired on an Axon Instruments GenePix 4200AL array scanner. Images were processed and data was normalized within chips using a Robust Multichip Average (RMA) algorithm and quantile normalization via the NimbleScan software v2.5 [117,118]. Raw data was converted into gene possession or absence by applying a combination of an expression threshold (1.5X the median background value in log 2 scale) and a measure of probe variance [19]. Subclusters producing a signal above this value were set to a value of 1 (present) and subclusters with values below this value were set to a value of 0 (absent).
For microarray analysis raw expression data was merged with a reference list of genes/ probes that had been determined to be present in the PittII genome (from SGH data) in order to remove non-relevant gene/probe data. Parsed data was then normalized within and across chips as described above. For comparison of PittII and Msf-KO expression data the web-based tool CyberT was used to obtain Bayesian corrected p-values, Bonferroni corrected p-values and Benjamini-Hochberg values [119]. Significance Analysis of Microarrays (SAM) in the 'R' environment was used to obtain lists of genes with associated permutation-based false discovery rates (FDR) [120]. These data were combined and filtered using the following cutoffs: Bayesian p-values < .05, Benjamini-Hochberg FDR < 10%, SAM local FDR < 0.1, SAM q-value < 0.1, Bonferroni corrected p-value < .05, average raw expression values in at least one of the two conditions being compared > 256. Only genes that passed all of these filters are presented in Table 5 and S6 Table. Data is representative of two biological replicates and two technical replicates. Raw and processed transcriptional data for this experiment has been deposited in NCBI's Gene Expression Omnibus (GEO) [80] and are accessible through GEO Series accession number GSE70172 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70172).
Quantitative real-time polymerase chain reaction (qRT-PCR). Gene specific primers were designed using Roche Probe Finder online software to generate~75 bp amplicons (S7 Table). Amplification and quantitation was performed with a Roche Light Cycler 480 and SYBR green master mix. 20 μl reaction volumes were used containing 2 μl cDNA (1:5 dilution) and primers at 0.5 μM each. Relative expression levels of the tested genes were obtained by normalizing to the gapA reference gene as an internal standard. Each of two biological replicate RNA samples were assayed in duplicate. Data analysis was carried out using the Roche Light Cycler software. Data is representative of two biological replicates and three technical replicates.
Chinchilla Model of Otitis Media and Invasive Disease (OMID). The comparisons of virulence between the WT and PittII Msf-KO strains were assessed as previously described [20]. All experiments were conducted with the approval of the Allegheny Singer Research Institute's Institutional Animal Care and Use Committee (IACUC). Young adult chinchillas (Chinchilla laniger, 400-600 gm; McClenahan Chinchilla Ranch, New Wilmington, PA) were obtained free of middle-ear disease as culls from the fur industry. After at least a 7-day acclimation period, the animals were anesthetized on experimental day 0 by intramuscular injection of 0.1 mL of a cocktail of ketamine hydrochloride 100 mg mL-1, xylazine hydrochloride 30 mg mL-1 and acepromazine 5 mg mL-1. After anesthesia was confirmed (abolishment of eye-blink reflex) bacteria were injected bilaterally through the tympanic bullae using a 0.5 inch, 27-gauge needle on a 1 mL tuberculin syringe into the middle ear space of chinchillas as described [20]. Three experiments were performed with differing numbers of animals, inoculum amounts and/or inoculum preparations. 1) To determine whether msfA was transcribed in vivo, one animal was inoculated bilaterally (into each ear) with 10 8 CFU of PittII WT and was euthanized 3 hours later. One animal was inoculated bilaterally with 10 6 CFU of PittII WT and euthanized after 24 hours. The middle-ear mucosa with the adherent bacterial biofilm as well as lavages were harvested into RNA-later. RNA was extracted as described above. 2) For the in vivo competition experiment 10 3 CFU of a mixed culture (WT:Msf-KO or WT:Msf-COMP) was injected bilaterally into the middle ears of six chinchillas. Animals were euthanized on day three for tissue collection. The right and left middle-ear mucosa with the adherent bacterial biofilm (Bullar membrane), lavages from both ears (Bullar effusion), brains and lungs were collected, homogenized and plated for bacterial counts. 3) For the in vivo virulence experiment comparing separate PittII and Msf-KO infections, 10 3 CFU of PittII were injected bilaterally into the middle ears of 10 chinchillas and 10 3 CFU of Msf-KO were injected into the middle ears of 9 chinchillas (1 animal was lost prior to the beginning of the experiment). The animals were monitored daily for twelve days for signs and severity of local (otologic) and systemic disease using the criteria in S5 Table. Any animal that was determined to have symptoms corresponding to a systemic score of 4 was euthanized immediately. Animals that did not succumb to infection were euthanized at day 12 for tissue collection. All evaluations were performed by an observer who was blinded with regard to the inoculating strains. Local disease was evaluated by a single validated otoscopist (a practicing board-certified, fellowship-trained, otolaryngological surgeon) to ensure uniformity. Hence for each animal three scores were recorded daily: otoscopic score for right ear, otoscopic score for left ear and systemic score. From the collected data we also evaluated measures relating to the severity of local disease: maximum otologic score, days to first significant otologic score, and days to maximum otologic score. We also determined measures relating to systemic evaluations including rapidity of onset, maximum severity of disease, and mortality. As soon as possible after death, animals were dissected. The right and left middle-ear mucosa as well as lavages or each middle-ear, lungs and brains were harvested and homogenized. The homogenates were serially diluted and plated to determine the presence of infecting strains.
In vitro competition experiments. Mixed-culture experiments were performed (a) in planktonic culture (b) in in vitro biofilms. Broth starter cultures were grown to mid-log phase and cultures were adjusted such that equal numbers of bacteria were mixed together to a final concentration of 10 3 cfu mL-1. For planktonic competition experiments (n = 3) mixed cultures (n = 3) were grown in 15 mL culture tubes at 37°C, shaking at 200 rpm. At time 0 and 24 hours, cultures were serially diluted and plated on two sets of plates: sBHI (for total bacterial count) and sBHI+km 40 μg mL-1 (for PittII Msf-KO count). A competitive index was calculated with the formula: (KO t = 24 / WT t = 24 ) / (KO t = 0 / WT t = 0 ). For in vitro static biofilm competition experiments (n = 3) mixed cultures were seeded into 6 well culture plates and initially incubated at 37°C with 5% CO 2 (v/v) without shaking. After 2 hours the plates were set to rotate slowly at 50 rpm. At each time point (0, 24, 48, and 72 hours) three replicate wells were harvested by first collecting the supernatant (media in the wells plus two PBS washes). Following washing, biofilms were mechanically disrupted in PBS using a cell scraper, collected and washed/disrupted a second time. Both collection samples (supernatant and biofilm) were vortexed vigorously and were serially diluted and plated on two sets of plates as described above.
Macrophage survival assays. The human monocyte cell line THP-1 (ATCC TIB-202) was maintained in RPMI media (ATCC) supplemented with 10% (v/v) fetal bovine serum (FBS) (ATCC) and 0.05 mM 2-mercaptoethanol (Sigma). The cells were maintained as monocytelike, non-adherent cells at 37°C with 5% (v/v) CO 2 . For macrophage infection, cells were seeded at 5 X 10 5 cells per well in 24 well tissue culture plates and were differentiated by addition of phorbol 12-myristate 13-acetate (PMA) (1 μg mL-1) for 24 hours. After 24 hours fresh media containing PMA was added. After another 24 hours the medium was then removed and the macrophages were infected with stationary phase cultures of bacterial strains that had been diluted in RPMI + 10% FBS to achieve a multiplicity of infection (MOI) of 100 bacteria per macrophage. Plates were centrifuged for 15 minutes at 200 x g at room temperature and then incubated at 37°C with 5% (v/v) CO 2 for 1 hour. Next, the macrophages were washed twice and fresh media with 10 μg mL-1 of polymyxin B was added. At 2, 24, 48 and 72 hours post inoculation, wells of infected macrophages were washed twice and then lysed with 1% saponin (MP Biomedicals, LLC) in PBS. Serial dilutions of the resulting macrophage lysates were plated onto sBHI plates for CFU counts. Data is representative of 3 experiments with n = 3 for each strain. In the inhibition experiments, prior to the addition of PMA, cells were pretreated with Cytochalasin D (1 μM) for 1 hour (Sigma Chemical Co. St. Louis, MO, USA).
Inside Out Staining for Confocal Imaging. THP-1 monocytes were seeded into plates, differentiated, and then infected with PittII GFP as described above. After one hour, polymyxin B was added to each well to kill extracellular bacteria (see survival protocol). At designated time points (2, 24, 48 and 72 hours) the macrophage monolayers were washed twice with PBS and then fixed with 4% paraformaldehyde (PFA) for 30 minutes. PFA was removed and the cells were again washed with PBS twice. For storage the fixed cells were kept in 50% ethanol/ 50% PBS at 4°C. For staining, samples were blocked with 10% FBS for 1 hour at room temperature. Any remaining extracellular bacteria were stained using a rabbit anti-NTHi antiserum (obtained as a gift from Dr. Ed Swords, Wake Forest University) and Alexa Fluor 594-conjugated anti-rabbit antibodies (Biotium). Confocal images were obtained and analyzed on a Leica TCS SP2 AOBS filter free spectral confocal microscopy system. Bacteria outside macrophages appeared either red (dead) or yellow (alive and expressing GFP mixed with signal from the antibody stains) whereas bacteria inside macrophages appeared green (alive and expressing GFP with no antibody stain).
Statistical Testing. All statistical tests were carried out in the R statistical environment (version 3.1.1).
For in vitro competition data (Fig 7A and 7B) the CI values were log transformed and evaluated by one-sample two-tailed t-tests using μ 0 = 0 and α = 0.05.
For in vitro macrophage survival data the percentage of surviving bacteria was logit transformed for subsequent statistical testing. The PittII experiment (Fig 6A) with four strains (WT, Msf-KO, Msf-COMP and OMPP1-KO) was evaluated by a weighted One-Way analysis of variance (ANOVA) test for independent samples using α = 0.05 at each time-point. Tests which rejected the null-hypothesis (p<0.05) were further tested with the Tukey HSD post-hoc test using α = 0.05. For the 86-028NP and Rd KW20 experiments (Fig 6C and 6D), WT and mutant pairs were evaluated with unpaired two-tailed t-tests using α = 0.05 at each time-point.
For in vivo competition data (Fig 7D) raw CFU values were log transformed and evaluated by unpaired two-tailed t-tests using α = 0.05 for each strain pair from each tissue. WT:Msf-KO and WT:Msf-COMP data sets from each tissue site were further evaluated by the Mann-Whitney U test using α = 0.05.
For evaluation of data involving presence/absence of bacteria including the in vivo competition infection experiment (Fig 7E) and the in vivo single strain infection experiments (Fig 8B), two-tailed Fisher-Exact tests using α = 0.05 were performed.
Supporting Information S1 Fig. SlrV motifs found in H. influenzae. Sequence logos based on MEME/MAST analysis that represent the 9 slrV genes. Motifs were generated using the R Bioconductor package motif-Stack. Amino acid colors are a modification of the WebLogo default, with Tyr and Cys having unique colors (Y = orange and C = turquoise). The number of sequences each logo is based on is indicated above each panel as n = #. Arrow indicates the location of the 100% conserved tyrosine residue. . 36 amino acid SLR motifs extracted from the 79 genes containing them (n=256). Motifs found within the same protein do not cluster together. Node labels indicate the strain, strain-specific SGH cluster ID number, and location of the motif within the CDS (amino acid position). This tree corresponds to Fig 3D. (TIFF) S1 Table. List of SLR-containing H. influenzae ORFs identified from MEME/MAST analysis.
(XLSX) S2 Table. Consensus motifs found in H. influenzae SLR-containing protein subfamilies. Each consensus motif was defined by the most prevalent amino acid found at each position in the motif. n = number of sequences used to generate each motif. (XLSX) S3 Table. Distribution of SLR-containing genes for 210 H. influenzae strains. Columns B-K are gene possession data from the supragenome hybridization array; "1" presence and "0" absence of each listed gene. Columns L-M are data from PCR analysis of each strain. # based on PCR using primers to genes upstream and downstream of slrC. Ã based on PCR with primers to the putative transporting ATPase and prfA; size in the absence of slrV genes is 1300 bp. (XLSX) S4