Resolution and Characterization of Distinct cpn60-Based Subgroups of Gardnerella vaginalis in the Vaginal Microbiota

Bacterial vaginosis (BV), characterized by a shift of the vaginal microbiota from a Lactobacillus-dominated community to a dense biofilm containing a complex mixture of organisms, is an important risk factor in poor reproductive health outcomes. The Nugent score, based on Gram stain, is used to diagnose BV and Gardnerella vaginalis abundance in the sample is one factor determining Nugent score. A high Nugent score is indicative of BV but does not always correspond to the presence of clinical symptoms. G. vaginalis is recognized as a heterogeneous group of organisms, which can also be part of the normal, healthy vaginal microbiome. In addition, asymptomatic BV and non-Gardnerella types of BV are being recognized. In an attempt to resolve the heterogeneous group of G. vaginalis, a phylogenetic tree of cpn60 universal target sequences from G. vaginalis isolates was constructed that indicates the existence of four subgroups of G. vaginalis. This subdivision, supported by whole genome similarity calculation of representative strains using JSpecies, demonstrates that these subgroups may represent different species. The cpn60 subgroupings did not correspond with the Piot biotyping scheme, but did show consistency with ARDRA genotyping and sialidase gene presence. Isolates from all four subgroups produced biofilm in vitro. We also investigated the distribution of G. vaginalis subgroups in vaginal samples from Kenyan women with Nugent scores consistent with BV, Intermediate and Normal microbiota (n = 44). All subgroups of G. vaginalis were detected in these women, with a significant difference (z = −3.372, n = 39, p = 0.001) in frequency of G. vaginalis subgroup B between BV and Normal groups. Establishment of a quantifiable relationship between G. vaginalis subgroup distribution and clinical status could have significant diagnostic implications.


Introduction
Gardnerella vaginalis, first isolated by Leopold in 1953 [1], has long been recognized in vaginal samples and has been identified by several names, including Haemophilus vaginalis by Gardner and Dukes in 1955 [2]. Further characterization based on metabolic requirements and Gram staining led to its reclassification as Corynebacterium vaginale [3]. The proposal to create the genus Gardnerella and allocation of Corynebacterium vaginale and Haemophilus vaginalis to this new taxon as Gardnerella vaginalis was put forward by Greenwood and Pickett [4], based on a taxonomic study that utilized DNA-DNA hybridization, biochemical analysis of the cell wall, and electron microscopy.
G. vaginalis is strongly associated with bacterial vaginosis (BV), and is one of the most frequently isolated bacteria from women with symptoms of BV [5][6][7]. Abundance of G. vaginalis in vaginal samples has also been associated with infertility and preterm labour [8]. G. vaginalis has also been isolated from urine and blood and is associated with bacteremia, osteomyelitis and cervical cancer [9][10][11]. However, recent studies of vaginal microbiota indicate that G. vaginalis can also be a part of the vaginal microbiota in clinically healthy women [6,12,13].
G. vaginalis is recognized as a diverse taxon, both phenotypically and genotypically [13][14][15]. Eight biotypes of G. vaginalis have been identified by Piot et al. based on the presence of b-galactosidase, lipase and hippurate hydrolysis activities [14], whereas Benito et al. identified seventeen biotypes based on these characteristics in addition to fermentation of xylose, arabinose and galactose [15]. Phenotypic diversity within G. vaginalis has also been described in terms of virulence factors, particularly production of sialidase [16] and formation of biofilms [17]. Genetic heterogeneity within G. vaginalis has been demonstrated using amplified ribosomal DNA restriction analysis (ARDRA) [18]. Santiago et al. [16], identified three ARDRA genotypes of G. vaginalis, of which only two genotypes (genotypes 1 and 3) produced sialidase. However, like biotyping schemes, ARDRA can only be performed on isolates.
Genotype diversity is apparent in whole genome studies and in metagenomic studies of the human vaginal microbiome based on 16S rRNA [19] or cpn60 [6,12]. Hummelen et al. [19] reported the presence of four types of G. vaginalis sequences differing by a single nucleotide within the 16S rRNA V6 region. Four clusters of G. vaginalis sequences, ranging between 89 and 100% sequence identity to the type strain (ATCC 14018 T ), were observed in a cpn60-based study of clinically healthy women by Hill et al. [12] and followed up in a larger study by Schellenberg et al. [6].
Given the observed phenotypic diversity (especially virulence factors), genotypic diversity, and the presence of G. vaginalis in women regardless of clinical status, it is critical to improve our understanding of the clinical significance of these different strains. In order to accomplish this most effectively and to lay the foundation for the development of more informative diagnostic tools for women's health, direct culture-independent analysis of vaginal samples, exploiting a genetic target that facilitates robust resolution is required.
The objective of the current study was to investigate if previously observed cpn60 based subgroups of G. vaginalis are consistent with other (phenotypic) classification systems and/or available whole genome sequences, and to investigate the distribution of cpn60 defined subgroups of G. vaginalis in women with and without BV. Our results demonstrate that the cpn60 universal target sequence differentiates distinct subgroups within G. vaginalis and that only one of these subgroups (Subgroup B: Piot biotype 5, sialidase positive and ARDRA genotype 1) was found to be significantly more abundant in women with BV (high Nugent score) than women with normal vaginal microbiota in a retrospective analysis of metagenomic profiles of Kenyan women.

DNA Extraction and PCR
DNA was extracted from isolates using a phenol-chloroform extraction method and was stored at 220uC. Primers used in the study are listed in Table 1. cpn60 UT PCR amplicons were produced for direct sequencing, using universal primers H729 and H730 as described previously [33]. Primers JH0315 and JH0316 were designed based on the 16S rRNA sequence from G. vaginalis ATCC 14018. Amplification with these primers was carried out by incubating the reactions at 94uC for 3 minutes, followed by 40 cycles of 94uC for 30 sec, 52uC for 1 min and 72uC for 90 sec, and completed with a final extension of 10 min at 72uC. Sialidase gene presence was assessed by amplifying the sialidase gene using primers GVSI forward and GVSI reverse [16]. Vaginolysin gene sequences were amplified using primers V1 and V2 as previously described [44].

Phenotyping of G. vaginalis Isolates
Representative G. vaginalis isolates with unique cpn60 sequences were phenotyped using the Piot typing scheme using assays for hippurate hydrolysis, b-galactosidase and lipase activity as described previously [14]. Lactobacillus crispatus was used as a negative control for the lipase assay. G. vaginalis ATCC 14018 T was used as a positive control for all biochemical assays.

Biofilm Formation
Isolates of G. vaginalis were cultured from 280uC stocks for 72 hrs, and subcultured for 48 hrs on Columbia sheep blood agar plates anaerobically at 37uC. A loopful of culture for each isolate was used to inoculate 4 ml of either ATCC broth #1685 with 1% (w/v) glucose or Brain Heart Infusion with 1% (w/v) glucose (BHIG) and incubated anaerobically for 48 hrs at 37uC. Broth cultures were diluted 1:100 in media and 200 mL of diluted culture was added to individual wells of a 96-well tissue culture plate and incubated anaerobically for 48 hrs at 37uC. Qualitative assessment of biofilm formation was done by washing off the planktonic cells  vaginalis isolates with whole genome sequence information available in Genbank (Accession numbers AEJD00000000, AFDI00000000, AEJE00000000, CP001849, ADAN00000000, ADAM00000000, ADNB00000000, CP002104 and CP002725 respectively). Isolates with names starting with ''N'' are isolates from Kenyan women from Schellenberg et al. [6]. W11 was isolated from a Canadian woman (Schellenberg,  and staining the wells with 1% crystal violet solution to visualize any biofilm.

ARDRA Genotyping
G. vaginalis isolates were genotyped using amplified rDNA restriction analysis (ARDRA) [16,18]. Full-length 16S rRNA gene sequences were amplified using primers GV10F and vMB (Table 1). PCR products were purified (QIAquick PCR Purification Kit, Qiagen, Inc., Toronto, ON) and subjected to overnight restriction digestion using TaqI (Life Technologies, Inc., Burlington, ON). The digestion products were resolved on a 1.5% agarose gel at 140 volts for 2 hrs. In silico ARDRA was performed for some strains for which only published genome sequence information was available (not the isolates themselves) by extracting full length 16S rRNA gene sequences from published whole genome sequences and then restricting the sequence using the program remap within the EMBOSS software suite [45].

Phylogenetic Analysis
cpn60 UT sequences obtained from whole genome sequences of G. vaginalis reference strains or amplified from cultured clinical isolates [6] were used to construct a phylogenetic tree, using Alloscardovia omnicolens CCUG 34444 as a root. Sequences were aligned using ClustalW (gap opening penalty = 10, gap extension penalty = 0.10) [46], followed by utilization of the Phylip software package [47] to calculate a distance matrix using dnadist and construct a tree using neighbor. The final tree was obtained from the bootstrapped consensus of 100 trees and was visualized using Dendroscope [48].
Unpublished). Sequences highlighted in red were used as representatives of the subgroups in the distribution analysis of metagenomic sequence data. B. Pairwise distances for the 26 G. vaginalis cpn60 UT sequences included in the phylogenetic analysis. Distances for both inter-subgroup comparisons (white bars) and intra-subgroup comparisons (black bars) are indicated. doi:10.1371/journal.pone.0043009.g001 Table 2. ANIm (first row), cpn60 UT sequence identity (second row) and 16S rRNA sequence identity (third row) for representatives of G. vaginalis subgroups A, C and D, for which whole genome sequence data was available.

Statistical Analysis
All statistical analyses were done using SPSS Statistics, version 19.0. For the analysis of assignment of assembled reads of Nairobi metagenomic data set to G. vaginalis subgroups, one-way ANOVA was done, followed by Post-hoc analysis by Tukey's test. Statistical analysis for the distribution of G. vaginalis subgroups in Nairobi women was done by Kruskal-Wallis H test, followed by Mann-Whitney U test.

Biotyping, Sialidase, ARDRA Genotyping and Biofilm Production
Results for the biotyping assays and genotyping are shown in Table 4. Piot biotypes 1, 2, 5, 7 and 8 were identified among the isolate collection, but no consistent pattern of biotype distribution and cpn60 subgroup was observed. All subgroup C isolates were lipase positive. Subgroup B and C isolates were sialidase gene positive and ARDRA genotype 1, whereas subgroup A isolates were genotype 2 with no detection of the sialidase gene. Subgroup D isolates differed in sialidase gene presence (N160 was negative, strain 101 positive) and ARDRA genotype (N160 was genotype 2, strain 101 was predicted to be genotype 1 based on in silico restriction analysis). All isolates produced biofilm in BHIG by 48 hrs, and substantial variability in biofilm production was observed in the two media tested (BHIG and ATCC broth #1685 with 1% glucose) (Figure 2). Both subgroup B isolates (N144 and N153) formed biofilm in both media, although the biofilm formed in BHIG was more extensive, completely coating the well. In subgoups A and C, at least one of the isolates failed to produce any visible biofilm in ATCC broth #1685.
We also attempted to detect vaginolysin gene sequences in the isolates selected for phenotypic analysis. A PCR product of the expected size of 1,551 bp was obtained for only three isolates (ATCC 14018 T , ATCC 49145 and N153). An amplicon of 1,200 bp was amplified from three others (N134, N137 and N158), but sequence analysis indicated a mixture of products, suggesting that this product was the result of non-target sequence amplification. Isolates N165 and N144 did yield any product after repeated attempts. The vaginolysin sequence from ATCC 49145 was 99% identical to ATCC 14018 and only 91% identical to N153.

Distribution of G. vaginalis Subgroups in Kenyan Women
A previously published cpn60 metagenomic dataset was used to investigate distribution of cpn60-based G. vaginalis subgroups in vaginal microbiome profiles derived from samples classified as BV, Intermediate or Normal based on Nugent score [6]. All unique sequences assembled from the study data (n = 831 OTU) were compared using watered-BLAST [30] to a reference database of cpn60 sequences containing one representative of each species in cpnDB (cpnDB_nr; www.cpndb.ca) and two representatives of each G. vaginalis subgroup as indicated in Figure 1. All assembled sequences with any of the G. vaginalis reference sequences as their best match, and meeting the minimum requirement for identification as a cpn60 sequence (60% identity over $100 nucleotides) were included in the analysis of distribution (n = 93). For 84/93 of the assembled metagenomic sequences, the top two hits were to the same G. vaginalis subgroup. Identities for each query and its top two hits (medians for 93 queries were 96.6% and 92.9% respectively) were significantly (p,0.0001) higher than identity to the third (median 89.4%) through eighth best hits ( Figure 3). Thirty-one queries had sequence identities ,95% to their best match.
Once metagenomic sequences were assigned to a subgroup (based on the best watered-BLAST match) the distribution of the subgroups among the vaginal microbiomes of women diagnosed as BV (n = 20), Intermediate (n = 5) and Normal (n = 19), based on Nugent score was determined (Figure 4). The sequence read frequencies used in this analysis were normalized to the median library size of 15,000 reads [6]. All vaginal microbiota libraries contained sequences corresponding to more than one subgroup of Study isolates N156 (subgroup B) and N164 (subgroup C) were not included in the biotyping analysis since they were not reliably cultured as pure isolates after revival from frozen stocks following their original isolation and cpn60-based characterization. 3 In cases where no isolates were available to us for culture, ARDRA genotypes for some strains (41V, ATCC 14019, 101, AMD, 409-05, and 5-1) were obtained by in silico analysis as described in the text.

Discussion
Bacterial vaginosis is the most commonly reported vaginal infection [49,50]. BV can be diagnosed clinically using Amsel's criteria, which include presence of homogenous vaginal discharge, a vaginal pH of greater than 4.5, positive whiff test (production of a fishy odour on addition of 10% KOH to vaginal sample), and also presence of clue cells in at least 20% of the total cell count [51]. The Nugent score is another commonly used diagnostic tool for BV. To calculate the Nugent score, a Gram stained vaginal smear is assessed for the relative abundance of various bacterial morphotypes including Gram-positive large rods, Gram-negative/ Gram-variable rods and curved Gram-variable rods. With increasing numbers of survey studies in which Nugent scores of clinically normal women are determined, the phenomenon of ''asymptomatic BV'' has been widely observed [52][53][54]. These women have high Nugent scores, but do not have symptoms of BV. The clinical significance of asymptomatic BV is unknown. Since the presence of G. vaginalis is one of the key determinants of Nugent score, one possible explanation for asymptomatic BV is the presence of large numbers of non-pathogenic G. vaginalis or other species with similar Gram stain morphology. If this is true, then the detection of G. vaginalis in general may be of questionable diagnostic value. Resolution of this important issue requires the investigation of distribution of G. vaginalis lineages among women in a variety of clinical cohorts. Tackling this on a large scale requires culture-independent tools that provide resolution of phenotypically distinct G. vaginalis subgroups when applied directly to clinical samples. Sequence diversity within G. vaginalis has been reported in metagenomic studies of the vaginal microbiome based on 16S rRNA and cpn60 gene targets, and four subdivisions have been reported in several recent studies [6,12,19]. The subdivision observed by Hummelen et al. [19] was based on single nucleotide differences within the V6 region of 16S rRNA. The much lower cpn60 sequence identity between these subgroups (#93% versus $98% identity for 16S rRNA) facilitated the identification of vaginal isolates corresponding to these subgroups, demonstrating that the metagenomic studies had revealed real biological diversity and not artifactual diversity resulting from PCR, sequencing and or data assembly (Figure 1). Overestimation of microbial diversity in metagenomic sequencing studies is an ongoing concern [55], so the identification and characterization of actual isolates corresponding to metagenomic sequences is reassuring and further supports the value of the cpn60 universal target for resolution of diversity at species-and strain-level [20].
Whole genome DNA-DNA hybridization persists as the gold standard method for defining bacterial species [56] although it remains unpopular due to its technical demands [57]. DNA sequence data is increasingly relied upon to support species definition and resolution, and recently whole genome sequence comparison has been suggested as a new gold standard [58].
Richter & Rosselló-Mora [58] demonstrated that average nucleotide identity (ANI) values correlate well with DNA-DNA hybridization results and suggest that an ANI values greater than <95-96%, calculated by either BLAST or the MUMmer rapid aligning tool, were indicative of bacteria of the same species. Another alternative was proposed by Ziegler, who developed a computational algorithm based on sequence of three genes (recN, rpoA and thdF) that corresponds well to the conclusions of DNA-DNA hybridization data [59]. In that study, the 16S rRNA gene, widely used for identifying bacterial species and metagenomic studies, was found to have the lowest correlation between sequence identity and genome sequence identity. Subsequently, Verbeke et al. [21] demonstrated that a cpn60 UT sequence alone could predict whole genome identity as well as the three gene model. The ease of amplifying and sequencing the cpn60 UT from bacteria, the curated reference database of chaperonin sequences (cpnDB, www.cpndb.ca) [43], and the ability of the cpn60 UT to predict whole genome sequence similarity make it the ideal target for studies of Gardnerella, or any other bacterial taxon for which subspecies resolution is of interest.
Our results show a strong relationship between cpn60 UT sequence identity and whole genome comparison with the ANIm algorithm in JSpecies (Table 2). In fact, our results suggest that subgroups A, C and D of G. vaginalis meet the whole genome sequence-based criteria for designation as different species. Although no whole genome sequence is available for a subgroup B isolate, the cpn60 sequence data for isolates in this group certainly support a similar species level status for this group (Table 3). Complete genomes of nine G. vaginalis strains determined at the time of writing, and the fact that none of them belongs to subgroup B is interesting. Our experience with culturing of subgroup B isolates suggest that their conspicuous absence from the genome sequence database is most likely due to the fact that members of this subgroup, unlike the others, only grow in anaerobic conditions and do not grow in 7% CO 2 , which is the atmosphere recommended for routine isolation of G. vaginalis [60].
The heterogeneity of the G. vaginalis taxon is well documented based on application of biotyping schemes. Some of the biotypes of G. vaginalis from both the Piot and Benito biotyping schemes have been associated with BV [61,62]. Piot biotypes 1, 4 and 5 are the most frequently isolated regardless of BV status [13] and biotype 5 has been reported to be predominantly associated with healthy vaginal ecosystems [61]. Piot biotypes 7 and 8 have been reported as the most frequently isolated from BV patients with isolation rate of 32% and 20%, respectively [61]. Of the seven isolates characterized in this study, four were Piot biotype 5, supporting previous observations of the prevalence of this biotype (Table 4). Otherwise, biotyping results were not consistent with sialidase gene presence, ARDRA or observations of association with BV in the Kenyan cohort. Although we did not provide evidence of sialidase enzymatic activity in our isolates, we observed a consistent relationship between the presence or absence of the sialidase gene and ARDRA genotype in that we detected the sialidase gene in all genotype 1 isolates examined and did not detect the sialidase gene in any of the genotype 2 isolates examined. Sialidase activity is recognized as a virulence factor in G. vaginalis and is the basis for a chromogenic, BV Blue Kit for BV diagnosis [63], but the results of reports examining the correlation between sialidase gene presence and detection of sialidase enzymatic activity are variable, making it inadvisable to draw conclusions about sialidase activity based solely on gene presence [16,64].
Vaginolysin is a protein toxin belonging to the cholesteroldependent cytolysin family of toxins that has been previously identified in G. vaginalis [44]. To detect this purported virulence factor in the study isolates, we employed previously published PCR primers designed based on the type strain, ATCC 14018. The primers failed to amplify the target sequence from most study isolates, and among the isolates for which we did generate sequence (ATCC 14018, ATCC 49145 and N153), we observed only 91% nucleotide sequence identity between some isolates (N153 vs. either the type strain or ATCC 49145). These results suggest that these primers may be too specific for general application in G. vaginalis, rather than indicating the absence of a vaginolysin gene in the other strains included in the study.
Subgroups of G. vaginalis were not evenly distributed among vaginal microbiomes diagnosed as BV, intermediate or normal based on Nugent score (Figure 4). Although G. vaginalis sequences were ubiquitous in the study group, and most women hosted multiple subgroups of G. vaginalis, only subgroup B was significantly more abundant in BV than normal samples. Analysis of pH and clue cells in these samples showed, as expected, a negative correlation of pH and Nugent score and a positive correlation of clue cells and Nugent score (data not show). An obvious and immediate question is whether subgroup B or any other subgroup is differentially associated with symptomatic and asymptomatic BV. Although we were unable to stratify our current data by symptoms (discharge, odour), relatively low cpn60 sequence identities facilitate robust differentiation of G. vaginalis subgroups (Figure 1 and 3) making it an ideal target to exploit in cultureindependent approaches to addressing these questions in future studies. High throughput sequencing of cpn60 amplicons [30], bead-based hybridization assays [65] and quantitative real-time PCR methods [66] have all been developed based on cpn60 UT sequences and offer powerful tools for investigation of microbial diversity at, and below, the species level.
The results of our work support previous observations of genotypic and phenotypic diversity within G. vaginalis and we have been successful in using cpn60 UT sequences for robust classification of available G. vaginalis strains into four subgroups. We have also provided evidence that supports the eventual reclassification of subgroups as different species of Gardnerella. The degree of cpn60 UT and whole genome sequence diversity within this taxon is beyond that associated with ''ecotypes'' [67] or strains and suggests that reclassification may be warranted. However, additional genotypic and phenotypic analysis of additional isolates will be required to make this case. The cpn60 UT sequence offers a robust tool for identification of subgroups within G. vaginalis that may not be discernable using other targets. This feature of the cpn60 target will facilitate future efforts to expand diagnostic panels for rapid, high throughput characterization and improved resolution of species and strain distribution in the vaginal microbiome.