A Study of the Vaginal Microbiome in Healthy Canadian Women Utilizing cpn60-Based Molecular Profiling Reveals Distinct Gardnerella Subgroup Community State Types

The vaginal microbiota is important in women’s reproductive and overall health. However, the relationships between the structure, function and dynamics of this complex microbial community and health outcomes remain elusive. The objective of this study was to determine the phylogenetic range and abundance of prokaryotes in the vaginal microbiota of healthy, non-pregnant, ethnically diverse, reproductive-aged Canadian women. Socio-demographic, behavioural and clinical data were collected and vaginal swabs were analyzed from 310 women. Detailed profiles of their vaginal microbiomes were generated by pyrosequencing of the chaperonin-60 universal target. Six community state types (CST) were delineated by hierarchical clustering, including three Lactobacillus-dominated CST (L. crispatus, L. iners, L. jensenii), two Gardnerella-dominated (subgroups A and C) and an “intermediate” CST which included a small number of women with microbiomes dominated by seven other species or with no dominant species but minority populations of Streptococcus, Staphylococcus, Peptoniphilus, E. coli and various Proteobacteria in co-dominant communities. The striking correspondence between Nugent score and deep sequencing CST continues to reinforce the basic premise provided by the simpler Gram stain method, while additional analyses reveal detailed cpn60-based phylogeny and estimated abundance in microbial communities from vaginal samples. Ethnicity was the only demographic or clinical characteristic predicting CST, with differences in Asian and White women (p = 0.05). In conclusion, this study confirms previous work describing four cpn60-based subgroups of Gardnerella, revealing previously undescribed CST. The data describe the range of bacterial communities seen in Canadian women presenting with no specific vaginal health concerns, and provides an important baseline for future investigations of clinically important cohorts.


Introduction
The microorganisms that inhabit the human body, the microbiota, are an integral component of an individual's health. Within the female reproductive tract, the resident vaginal microbiota play an important protectiv e role by interfering with the proliferation of organisms that cause vulvo-vaginal, urinary tract and sexually transmitted infections [1][2][3][4][5]. A healthy vaginal microbial community is generally defined as being dominated by Lactobacillus species, many of which produce acid, hydrogen peroxide, biosurfactants and bacteriocins antagonistic to pathogens [6][7][8][9][10][11]. Although the relationship between vaginal microbiology and adverse clinical symptoms (such as odour and discharge) is not well understood, women with a high diversity of anaerobic bacteria in communities dominated by organisms other than Lactobacillus, are generally considered to have a condition called bacterial vaginosis (BV). However, research especially on microbiota of asymptomatic women who meet the microbiological criteria of BV has expanded our understanding of the microbial constituents of the vagina [12][13][14].
In addition to a growing consensus regarding the most commonly observed CST, an expanding list of individual factors and behaviours have been associated with particular CST or with women changing from a Lactobacillus-dominated CST to one associated with BV or vice versa. Profiles of vaginal microbiota have been reported to be associated with race/ethnicity [15,18], level of education [34], use of hormonal contraceptives [35,36], use of feminine hygiene products [37], gender of sexual partners [35,38], number of sexual partners [39], condom use [35,39], sexual behaviours [39] and smoking [38]. Some factors, such as level of education, most likely represent a composite variable of numerous factors associated with differences in the vaginal microbiota, including race/ethnicity, sexual behaviour and socioeconomic class [34]. Given the complex and interconnected nature of many of these factors, more research is needed with diverse cohorts of women to help clarify and understand how they contribute to the composition and dynamics of the vaginal microbiota.
In this study, phylogenetic profiles of vaginal microbiota were generated by massively parallel sequencing of the universal target (UT) from the cpn60 gene. This target has been shown to provide comparable information to the 16S rRNA gene in terms of community coverage [21] but provide better discrimination of species and subspecies [40][41][42][43], increased coverage of Bifidobacteriales [44,45] and detection of eukaryotic microbes [46]. This resolution is critical, particularly for the vaginal microbiome, given recent evidence that the species Gardnerella vaginalis is actually comprised of four genotypically and phenotypically distinct subgroups, easily distinguished by variation in cpn60 UT sequences [47][48][49]. However, it is a recognized limitation of cpn60 that some Mollicutes, notably some species of Mycoplasma and Ureaplasma, lack this gene target. Given the established importance of these species in the vaginal microbiome, specific PCR assays were included in this study to estimate prevalence of these organisms. Finally, an estimate of total bacterial population density was generated using a 16S rRNA targeted quantitative PCR. The combination of cpn60 microbial profiling with targeted additional assays ensured complete microbiome profiling consistent with the largest 16S rRNA gene studies, but at a finer resolution with consequently increased phylogenetic detail.
The primary objective for this study was to define the range of vaginal microbiota profiles common in non-pregnant Canadian women of reproductive age without specific vaginal health concerns, in relation to socio-demographic, behavioural and clinical characteristics. In addition to our results broadly confirming previous studies in terms of the most commonly observed Lactobacillus-dominated CST in healthy women, they provide important phylogenetic insight into non-Lactobacillus-dominated CST based on distinct cpn60-based Gardnerella subgroups.

Ethics statement
This study received ethical approval from the University of British Columbia Children's & Women's Research Ethics Board (certificate no. H10-02535).

Participants and study design
Healthy, reproductive-aged women aged 18-49 years were recruited from the greater Vancouver area of British Columbia, Canada, through research clinics, primary care offices, and online and print advertisements. Women were eligible to participate if the following inclusion criteria were met: between the ages of 18-49 and premenopausal (e.g. reported having a menstrual cycle over the previous 12 months), not pregnant, HIV negative, and had not used any oral or intravaginal antibiotic or antifungal treatments in the four weeks prior to the study visit (antiviral medications, e.g. valacyclovir for suppression of HSV infection, were permitted). After obtaining written informed consent, research staff collected demographic and clinical data from the participants via interview and by reviewing medical charts. Two vaginal swab samples were collected for Gram stain assessment and sequencing analysis, either by a clinician during an indicated speculum examination (i.e. routine pap smear screening), or by a study nurse conducting a speculum examination for study purposes, or using a validated self-collection method [28]. Amies gel transport swabs without charcoal (Stevens Company Ltd., Brampton, ON) were used to collect samples for Gram stain assessment, and dry Dacron swabs (Copan Diagnostics Inc., Murrieta, CA) were used to collect samples for sequencing analyses.
Samples collected for Gram stain assessment were processed within 48 h at the clinical laboratory for BC Women's Hospital and Health Centre utilizing validated Nugent's scoring methodology [50].
Samples collected for sequencing analyses were transferred to -80°C storage within 30 minutes, with the exception of those collected in community-based clinics, in which case, they were stored at 4°C for a maximum of 12 hours prior to transfer to -80°C. Total nucleic acid was extracted from batches of swabs using the MagMAX Total Nucleic Acid Isolation Kit (Applied Biosystems, Life Technologies, Burlington, ON, Canada) as per manufacturer's instructions. To minimize opportunities for cross contamination between samples, small batches were processed. In addition, a negative control sample (sterile water) was run with each kit and tested with the subsequent cpn60 PCR to ensure no PCR product was generated.
Quantitative PCR (qPCR) and conventional PCR Samples were quantified for total bacterial DNA with a SYBR Green assay targeting the 16S rRNA gene (V3 region) as described previously [51]. Mollicutes (Mycoplasma and/or Ureaplasma) were detected by genus-specific, conventional semi-nested PCR targeting the 16S rRNA gene [52]. The primary PCR targeted a 700 bp portion of the 16S rRNA gene using primers GPO-1 and MGSO [52]. PCR was performed under the following conditions: 40 cycles of 94°C for 30 s, 64°C for 30 s, and 72°C for 60 s. The secondary PCR used primers My-ins [53] and MGSO, and 2 μl of the primary PCR product as template. Thermocycling parameters included 35 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 60 s. Ureaplasma species (U. parvum and U. urealyticum) were detected using a conventional PCR based on the multiple-banded antigen gene with primers UMS-125 and UMA226, which yield products of two different sizes depending on the target species: 403 bp (Ureaplasma parvum) or 443 bp (Ureaplasma urealyticum) [54,55]. Representative PCR products (several products of each size generated) were sequenced and confirmed to the expected target sequence. Every PCR assay contained a no template control (NTC) reaction containing all the PCR components without added template DNA and had to generate no signal (qPCR) or product (conventional PCR) for the assay to be valid.
cpn60 universal target (UT) PCR and pyrosequencing Generation of cpn60 PCR amplicon libraries for microbial profiling of vaginal samples was carried out as previously described [28]. cpn60 UT-specific primer sets [44,56] were modified at the 5' end with a unique decamer multiplexing identification (MID) sequences. Amplicons were pooled in equimolar concentrations to create libraries for sequencing on the GS Junior platform as per the manufacturer's recommendations (Roche/454, Brandford, CT, USA).

Analysis of operational taxonomic units (OTU)
Sequence data were processed as previously described with minor modifications [28], using the default on-rig procedures from 454/Roche, which include confirmation of key sequence and trimming for quality. To classify experimental data, MID-partitioned reads were mapped using Bowtie 2 (http://bowtie-bio.sourceforge.net/bowtie2/) on to a previously created, manually curated reference set of 1,561 OTU sequences generated by assembly of cpn60 sequence reads from each of 546 vaginal microbiome samples from non-pregnant and pregnant Canadian women of reproductive age. The reference assembly was created by the microbial Profiling Using Metagenomic Assembly pipeline (mPUMA, http://mpuma.sourceforge.net) [57] with Trinity as the assembly tool [58]. Assembled OTU (> 150 bp) were identified and labeled according to their nearest reference sequence determined by watered-Blast comparison [21] to the cpn60 reference database, cpnDB_nr_vag (downloaded on September 18, 2014 from http:// www.cpndb.ca, [59]). OTU having less than 55% identity to any reference sequence were considered to be non-cpn60 sequences and removed from the data set [60]. This reference assembly strategy facilitates comparison of microbiome profiles from various cohorts under study, including the 310 women described in this sub-study. Raw sequence data files for the 310 samples described in this study were deposited to the NCBI Sequence Read Archive (Accession SRP056439, BioProject PRJNA278895).

Statistical analysis
All analyses were carried out in R v3.1.1 (R Core Team 2014). We restricted the pool of OTU to those having a total of at least 20 reads across all samples. A Jensen-Shannon distance matrix was calculated using the 'vegdist' function in the vegan package [61], with a custom distance function that calculates the square root of the Jensen-Shannon divergence [62]. This distance matrix was used for hierarchical clustering using the 'hclust' function in R, with Ward linkage.
Several internal cluster validation metrics were assessed [63], including average silhouette width [64], average Pearson gamma [65] and Dunn index [66], as calculated by the 'cluster. stats' function in the fpc package [67]. The Pearson gamma index measures the correlation between distances and a vector of 0s and 1s where 0 indicates the same cluster, 1 indicates a different cluster and higher values indicate better fit to the data [65]. The optimal cluster number was validated using a bootstrapping technique, as implemented by the 'clusterboot' function in the fpc package, with 500 bootstrap replicates for each of 5, 6, 7, 8, 9 and 10 clusters. Once the optimal number of clusters was determined, taxonomic composition was compared to previously published CST [18,27]. Relationships between cluster membership and the variables listed in Table 1 were assessed using Fisher exact tests. All p-values were corrected for multiple testing using the Benjamini-Hochberg method with a false discovery rate of 0.05 [68]. Principal coordinates analysis from all the OTU in the dataset were calculated from Bray-Curtis dissimilarity values as the mean value of 100 subsampling of 1000 reads (or all reads available when less than 1000) in QIIME with the jackknifed_beta_diversity.py command [69].
Variation in OTU abundance according to Nugent score and clinical/demographic variables was assessed using center log ratio transformations in ALDEx2 (v2.0.7.2), an algorithm for compositional analysis that uses a Dirichlet-multinomial model to infer relative abundances from counts [70]. Expected p-values for Kruskal-Wallis tests of differences among clinical data categories were determined using 128 Monte Carlo runs. A false discovery rate of 0.05 was applied using the Benjamini-Hochberg correction and adjusted p-values are reported [71]. Shannon diversity and Chao1 estimated species richness were calculated as the mean of 100 subsampling of 1000 reads (or all reads available when less than 1000) in QIIME with the mul-tiple_rarefactions.py, alpha_diversity.py and collate_alpha.py commands [69]. These were compared to CST and clinical/demographic variables using linear models for Shannon diversity and Poisson regressions for Chao1 richness estimates. Significance was assessed using likelihood-ratio tests with p-values corrected for multiple testing using the Benjamini-Hochberg method as above.

Phylogenetic analysis
For abundance tree analysis, 164 full-length cpn60 UT sequences that are nearest neighbours for all OTU with abundance exceeding 1% of total reads in at least a single individual were aligned using ClustalW with gap opening/gap extension penalties of 10/0.2 and phylogenetic trees constructed by neighbour-joining with 100 bootstrap iterations in MEGA v6 (www. megasoftware.net) [72]. Proportions of total reads for subgroups of women were represented Number of sexual partners in past year (Mean ± SD, range) Number of sexual partners in past 2 months (Mean ± SD, range) In past 48 hours 24 (7.7%)  as circles whose combined area is equal to 100% of all reads for that subgroup, as previously described [22]. Each subgroup of women is represented as a different colour based on Nugent score or CST, with all individual libraries normalized to the same total read count.

Cohort description
Five hundred and thirty-four women were initially recruited, referred, or contacted the study coordinating centre to participate in the study. Potential participants were provided with detailed information about the study and screened to ensure they met eligibility criteria, resulting in 320 women who attended study visits and consented to participate. Ten women were excluded post-enrollment due to erroneous assessment of exclusion criteria, inability to provide vaginal samples, or laboratory errors (e.g. lost/missing samples, improper storage of samples), resulting in total sample size of 310 women. The average age of participants was 30 ( Table 1). The majority of women had been sexually active and reported a history of predominantly heterosexual relationships, but some had bisexual and same sex partnerships. Of those who had been sexually active, most had been sexually active during the past year, with the number of partners ranging from 1-25. The ethnicity distribution of our study population was diverse and representative of Canadian populations from the White, Asian, Black, Aboriginal and Hispanic communities according to 2006 census data [73]. A subgroup of women (n = 24 or 7.7%) who self-identified as not having specific vaginal health issues reported vaginal symptoms (odour, abnormal discharge, and/or irritation) within 48 hours prior to of sample collection (

Hierarchical clustering and principal components analysis of community state types (CST)
Hierarchical clustering of vaginal microbiome profiles resulted in six clusters with good support (S1 Fig). All Jaccard indices from bootstrapping for six clusters were greater than 0.6, with two clusters greater than 0.65, one cluster greater than 0.8 and two clusters greater than 0.9. When the number of clusters was reduced or increased, the Jaccard indices fell below 0.6, and in some cases, below 0.5. Average silhouette width and the Pearson gamma index were also highest for six clusters, and the Dunn index at five and six clusters, indicating the most overall support for six clusters (S1 Fig).
Lactobacillus-dominated CST identified correspond to those previously described in 16S rRNA deep sequencing studies (Fig 1). Consistent with previous studies of healthy   The L. gasseri-dominated CST II described by Ravel et al. [18] did not form its own supported cluster in this study, but L. gasseri was dominant in 7/36 or 19% of women in CST IVA (S2 Fig, panel D). Two other women in this cluster had profiles dominated by Lactobacillus, including one with L. delbrueckii and another with L. acidophilus. As described previously, CST IVA is remarkably heterogeneous relative to other CST, with profiles dominated by organisms besides Lactobacillus, including Gardnerella subgroup B (n = 4), Bifidobacterium breve (n = 3), B. dentium (n = 2) or Atopobium vaginae (n = 1), or with no single dominant OTU but even mixtures of Stapylococcus, Streptococcus, Prevotella, Alloscardovia, Gardnerella and Lactobacillus (n = 17). Although most of the women in this CST had BV-Nugent scores, 25% (8/36) had BVI Nugent scores.
For the remaining CST, IVC (S2 Fig, panel E) and IVD (panel F), both have dominant Gardnerella and sub-populations of Clostridiales and Bacteroidales. However, two distinct clusters based on OTU from different cpn60-defined G. vaginalis subgroups [47] were observed: one characterized by Gardnerella subgroup A (OTU 1670, 99.6% identity, n = 22) and the other by Gardnerella subgroup C (OTU 1668, 99.3% identity, n = 24). For women in CST IVC, Gardnerella subgroup A was dominant in most profiles, with sub-dominant populations of Megasphaera sp. and Prevotella timonensis in several women. Only about half of the women in CST IVD were dominated by Gardnerella subgroup C, while the other half had lower levels of this organism and co-dominant mixtures of Megasphaera sp., Prevotella timonensis, P. amnii and Atopobium vaginae. As expected, most of the women in CST IVC and IVD had BV+ Nugent scores (10/22 or 45% for IVC, 16/24 or 67% for IVD).

Abundance of reads for each CST in phylogenetic context
In order to better visualize sub-dominant populations in relation to CST, microbiome sequence data was viewed in phylogenetic context (Fig 2). CST I, III and V are clearly dominated by L. crispatus, L. iners and L. jensenii respectively. CST IVC and IVD are clearly dominated by Gardnerella subgroup A and C respectively. However these organisms represent a smaller proportion of total reads for these CST compared to the Lactobacillus-dominated CST. The most commonly observed organisms are found in all CST, confirming that fluctuations in relative abundance rather than presence or absence of specific organisms characterize CST.
Several Many organisms were observed in relatively high proportions in CST IVA, reflecting the mixed dominant phenotype. L. gasseri, Gardnerella subgroup B, Bifidobacterium sp., Atopobium vaginae, and E. coli sequences were all most abundant in IVA, which was expected since all individuals with profiles dominated by these organisms are included in this CST. Interestingly, abundance of several sub-dominant organisms from the phyla Firmicutes, Actinobacteria and Proteobacteria was associated with CST IVA, to a lesser extent with other, Lactobacillus-dominated CST, and virtually absent in CST IVC and IVD. These observations indicate that broad phylogenetic groups of organisms are more likely to be present at lowabundance in women with Lactobacillus-dominated CST or CST IVA compared to IVC or IVD.

CST in relation to socio-behavioural variables
Despite extensive investigation of factors previously reported to influence the vaginal microbiota profile, ethnicity was the only variable significantly related to CST when considering White and Asian women only, perhaps due to low numbers of women representing other ethnic groups (Table 1, S3 Fig). CST membership was associated with Asian vs. White ethnicity (Benjamini-Hochberg adjusted p = 0.049) with greater than expected numbers of CST III in Asian women (S3 Fig). There were no other significant relationships between CST membership and clinical or demographic variables (Table 1).

CST and specific OTU in relation to Nugent score
The relationship between CST and Nugent score, noted in the analysis of heatmaps above, was determined to be statistically significant (Fisher exact test, Benjamini-Hochberg adjusted Abundance and phylogenetic relationships of cpn60 defined species detected in samples from each CST. Neighbour-joining phylogeny of 164 unique cpn60 universal target using MEGA v6 for Mac, that are nearest neighbours for OTU detected at an abundance of at least 1% of at least one woman's sample. Branches proceeding from nodes with less than 50% bootstrap percent (100 replicates) are shown in grey. Circle area represents the proportion of each taxon (branch) in total normalized reads from women in different CST. Therefore, the combined area of all circles of the same colour equals the area of the central grey circle that represents 100% of reads attributed to that CST. Only the most abundant taxa are labelled. L. = Lactobacillus, P. = Prevotella, E. = Escherichia doi:10.1371/journal.pone.0135620.g002 p < 0.0001; Table 1), with greater than expected numbers of profiles in CST IVC and IVD having BV+ Nugent scores, and greater than expected numbers of profiles in CST IVA having BVI scores (Fig 3A). Principal component analysis of all samples according to CST (Fig 3B) and Nugent score (Fig 3C) revealed extensive overlap between the two categories, with BV-Nugent scores and BVI Nugent scores observed in all CST, while BV+ Nugent scores were observed only in CST IVA, IVC and IVD. Of the 24 women who self-identified as having vaginal symptoms within 48 hours prior to sample collection (Table 1), only three had Nugent scores consistent with BV and each belonged to a different CST group (IVA, IVC and IVD, respectively) (S4 Fig). A total of 35 OTU were significantly different across Nugent categories by Kruskal-Wallis test. Not surprisingly, all OTU with a lower abundance in BVI or BV+ Nugent categories were Lactobacillus spp., while the remainder were all present at a higher abundance in those with BVI and/or BV+ Nugent scores (Fig 4). OTU 1663: G. vaginalis subgroup B was somewhat higher in the BVI category, although this was not statistically significant. No significant relationship between Mollicutes detection (by targeted PCR) and Nugent score was observed. A phylogenetic tree illustrating relative abundance of organisms in relation to Nugent score category was constructed (S5 Fig), reinforcing previous observations regarding CST IVA in the CST abundance tree (Fig 2). Several sub-dominant organisms in the phyla Actinobacteria and Proteobacteria were observed almost exclusively in BV-and BVI samples, and several Streptococcus sp. were observed almost exclusively in BVI samples.

Discussion
The vaginal microbiota of 310 non-pregnant women aged 18 to 49 years was consistent with previous studies from Canada and other countries that sampled as few as 32 and as many as 494 adolescent girls and/or women [15][16][17][18][19]27]. The ethnic diversity in our study cohort reflects the Canadian population, with relatively large proportions of White and Asian women and smaller proportions of Black, Aboriginal and Hispanic subjects. Microbiota profiles in this cohort were largely characterized by the dominance of a single OTU most closely related to  Lactobacillus species including L. crispatus, L. iners, L. jensenii, and L. gasseri. However, we also observed CST dominated by two of the four known Gardnerella subgroups [47,48]. Individuals with dominant Gardnerella subgroups A and C separated into CST IVC and IVD, respectively, while those with dominant Gardnerella subgroup B clustered within the heterogeneous CST IVA category. Overall, this cpn60-based study confirms CST defined in earlier studies using the 16S rRNA target, while expanding the range of CST that can be defined due to increased resolution of cpn60. The ability to target Gardnerella subgroups in the context of microbial community profiles may help improve our clinical understanding of the role played by these organisms in the vaginal microbiome.
Similar to findings in our study, the majority of vaginal bacterial CST in other studies have been found to be dominated by lactobacilli, although the particular pattern of dominance of any one species varies among reports. This is perhaps not surprising given the variety of cohorts studied and the techniques used (S2 Table). In our study, CST groups I, III and V were dominated by L. crispatus, L. iners and L. jensenii respectively, but L. gasseri did not define any of the CST, although it was dominant in 7/36 (19%) of women in CST IVA. This species has been reported to define CST in previous T-RFLP [15] or 16S rRNA amplicon sequencing studies [17,18,27], but has also been found using the same methods as dominant or co-dominant in a subset of profiles within larger CST [16], or as a low abundance organism [19]. In this study, CST IVA was similar to other described groups, including a wide range of anaerobic species. Equivalent CST from other studies include "group IV" (Atopobium vaginae, Gardnerella vaginalis and Prevotella) [16], "group IV" (Prevotella, Dialister, Atopobium and others) [18], "group IVB" (Atopobium, Prevotella and others) [27], and "group IV" (Atopobium and others) [17]. Further groups have been defined containing Streptococcus spp. ("group VII") and high numbers of clones from a single clade of the family Lachnospiraceae ("group VIII") [15]. Atopobium is common to all of these previous descriptions, and was equally represented in CST IVA, IVB and IVC in this study (Fig 2).
A noteworthy difference between our results and those of other reports is the identification of two distinct CST dominated by Gardnerella: IVC and IVD, dominated by Gardnerella subgroups A and C respectively. Drell et al. [19] identified a Gardnerella dominated CST (CST IV) in their recent study of Estonian women, but in other studies Gardnerella occurs as a subdominant constituent of one or more CST, e.g. CST IV in [18] or CST IVB in [27]. The variation in reports of abundance and prevalence of Gardnerella in the vaginal microbiome of healthy reproductive aged women can be explained by differences in the detection of these sequences with various "universal" PCR primers for a variety of gene targets. However, the evidence from real-time quantitative PCR validation of cpn60 pyrosequencing read abundance for Gardnerella performed previously by our group suggests that the higher estimates provided by cpn60-based microbiome profiles are realistic [28]. Regardless of abundance, the discrimination of two distinct community types containing either Gardnerella cpn60-defined subgroups A or C, and the association of Gardnerella subgroup B with the BVI communities in our study is intriguing since these subgroups have been shown to be phenotypically and genomically distinct [47,48] and differentially associated with BV status [49]. Further studies of the biological and clinical significance of this diverse taxon are currently underway.
Analysis of CST composition in a phylogenetic context (Fig 2) offers several additional insights, including the ability to visualize group-level differences as well as low-and mediumabundance sequences that correspond to the lightest colours in heatmaps. Several lowabundance genera were observed most frequently in CST IVA (e.g. Streptococcus spp.) and/or in Lactobacillus-dominated CST (several Proteobacteria and Actinobacteria). These findings confirm previous observations in a cpn60-based study of women in Nairobi, Kenya, where Proteobacteria genera nearest Acidovorax and Sphingobium were associated with Lactobacillus-dominated profiles [22]. These organisms may be less likely to be detected using 16S rRNA gene sequencing [21], which would account for these components of the vaginal microbiota not being widely reported previously.
Reports of the prevalence of Mollicutes in reproductive aged women vary widely, and these reports are invariably based on culture and/or screening of samples with targeted PCR assays such as those used in the current study. Mollicutes are frequently undetected in 16S rRNA based studies due to bias in "universal" PCR primers [74] and they have not been detected previously in the vaginal microbiota using the cpn60 UT, due to the absence of the target sequence in almost all Mollicutes species [20][21][22]28]. Our observation that 70% of women in our study cohort were PCR positive for Mollicutes is consistent with the high end of reported values [75].
The range of total 16S rRNA copies per sample was remarkable, including samples with fewer than 10 4 to more than 10 9 copies. However, no significant relationship between bacterial population size and CST was detected (Table 1). While variations in 16S rRNA copy number per genome among different taxa and variations in the amount of sample material collected with the swabs may have contributed to this finding, it also reflects the enormous variation in microbial population density that is possible. Bacterial overgrowth, as evidenced on clue cells, is certainly associated with clinical BV, but our results suggest that even among healthy, asymptomatic women, there is large variation in density of the vaginal microbiota. Similar observations have been made using flow cytometry [76].
Our ability to associate OTU sequences assembled from cpn60 UT amplicon libraries with those of known bacterial species depends upon the availability of representative reference sequences. In the particular case of BVAB1 and BVAB2, this is problematic since these organisms have not been cultured to date and are known only by their 16S rRNA sequences [77]. We detected several clostridia-like OTU, such as OTU 0441: Clostridium sp. NC039 (79.8% identity), OTU 0440: Clostridium sp. NC039 (80% identity), OTU 0402: Clostridium sp. NC039 (80.4% identity) or OTU 0553: Clostridium sp. (64.6% identity), OTU 0207: Clostridium sp. (81.6% identity) that could possibly correspond to BVAB1 and BVAB2. However, identification of the cpn60 sequences of BVAB1 and BVAB2 awaits either culture or whole genome sequencing of these fastidious organisms.
Our results confirm a previously reported difference between the vaginal microbiota of Asian and White women in North America [18]. No obvious explanation can be given for this difference. No other socio-demographic, hygiene or behavioural practices showed any significant correlation with CST. This is perhaps affected by minimal variations in behavioural and hygiene practices in a healthy Canadian cohort, as well as the cross-sectional design of the study. Future comparisons with different cohorts of women are needed to address this question further.
At the CST level, the only significant factors besides ethnicity were Nugent score category and Shannon's diversity (highest in CST IVA), while at the OTU level, Nugent score was the only significant factor. Although the associations between Lactobacillus-dominated CST I, III and V with BV-Nugent scores and Gardnerella-dominated CST IVC and IVD with BV+ Nugent scores are straightforward, the link between the mixed dominant CST IVA and BVI Nugent scores is more difficult to draw. In the PCA plots, BVI samples and CST IVA clearly overlap with each other (Fig 3), however BVI samples are also observed in all other CST. In the abundance tree analysis, Streptococcus, Staphylococcus, Corynebacterium and Gardnerella subgroup B among others were more abundant in both CST IVA and BVI samples (Fig 2 and S6 Fig). Since CST IVA is significantly associated with BVI Nugent scores, these observations may provide insight into the ambiguous clinical significance of the intermediate BV Nugent score category.
In conclusion, this large study of Canadian women has provided a solid foundation for expanded investigation of the vaginal microbiota into clinically significant cohorts. Comparisons of this cohort with HIV-positive non-pregnant women, non-pregnant women with recurrent vulvovaginitis, pregnant women at high and low risks of complications are currently in progress. The overall agreement of CST composition in healthy Canadian women to women from other parts of the world reaffirms that vaginal microbiota research is converging on a broad understanding of microbial community membership in this body site. This makes it easier for research groups to share their investigative findings and collectively seek to improve the diagnosis of aberrant conditions and more appropriately treat them with existing and novel therapeutics. The challenge for future work will be to tease apart the details of this community structure, in concert with social and behavioural information, to understand and effect positive clinical outcomes for women's health. Hierarchical clustering of Jensen-Shannon distance matrices with Ward linkage on the relative proportions of reads for each OTU within women with Nugent scores consistent with BV (scores 7-10) (n = 32). Each column represents a woman's vaginal microbiome profile, and each row represents an OTU. For clarity, only the top 65 OTU by read abundance are shown on the heatmap. The proportion of the total microbiome comprised of each OTU is indicated in the yellow to red colour scheme. Community state type (CST) and whether vaginal symptoms (odor, abnormal discharge, and/or irritation) were selfreported within 48 hours of sample collection (Symptoms 48 hr) for each woman is indicated by the top bars. (PDF) S5 Fig. Nugent score abundance tree. Abundance and phylogenetic relationships of cpn60 defined species detected in samples from each Nugent score category. Neighbour-joining phylogeny of 164 unique cpn60 universal target using MEGA v6 for Mac, that are nearest neighbours for OTU detected at an abundance of at least 1% of at least one woman's sample. Branches proceeding from nodes with less than 50% bootstrap percent (100 replicates) are shown in grey. Circle area represents the proportion of each taxon (branch) in total normalized reads from women in different Nugent categories. Most abundant taxa are labeled. L. = Lactobacillus, P. = Prevotella, E. = Escherichia (PDF) S1