An Emerging Mycoplasma Associated with Trichomoniasis, Vaginal Infection and Disease

Humans are colonized by thousands of bacterial species, but it is difficult to assess the metabolic and pathogenic potential of the majority of these because they have yet to be cultured. Here, we characterize an uncultivated vaginal mycoplasma tightly associated with trichomoniasis that was previously known by its 16S rRNA sequence as “Mnola.” In this study, the mycoplasma was found almost exclusively in women infected with the sexually transmitted pathogen Trichomonas vaginalis, but rarely observed in women with no diagnosed disease. The genomes of four strains of this species were reconstructed using metagenome sequencing and assembly of DNA from four discrete mid-vaginal samples, one of which was obtained from a pregnant woman with trichomoniasis who delivered prematurely. These bacteria harbor several putative virulence factors and display unique metabolic strategies. Genes encoding proteins with high similarity to potential virulence factors include two collagenases, a hemolysin, an O-sialoglycoprotein endopeptidase and a feoB-type ferrous iron transport system. We propose the name “Candidatus Mycoplasma girerdii” for this potential new pathogen.


Introduction
Application of next-generation sequencing to the study of the human microbiome is rapidly transforming our understanding of the diversity of the microbial communities that inhabit the human body [1]. However, progress towards the identification of specific microbiome signatures or specific organisms with strong links to disease states has proven elusive. We characterize a new vaginal mycoplasma species ''Candidatus Mycoplasma girerdii'', previously identified only by its 16S rRNA sequence [2], that exhibits a strong and unique association with the sexually transmitted pathogen Trichomonas vaginalis.
Organisms of the Mycoplasma and Ureaplasma genera are collectively referred to as mycoplasmas. They lack cell walls, have small genomes and are often dependent on their hosts. Mycoplasmas of the female urogenital tract are associated with bacterial vaginosis (BV), pelvic inflammatory disease, preterm labor and preterm birth [3,4]. These mycoplasmas are among the most common organisms to invade the amniotic cavity, and their carriage is associated with chorioamnionitis in preterm premature rupture of membranes (PPROM) [5]. Moreover, uncultivated and uncharacterized bacterial species also invade the amniotic cavity and likely impact pregnancy outcome [6]. Mycoplasmas can induce inflammatory cytokines in the host [4], and they are more prevalent in the vaginal flora of HIV-infected women [7]. Ureaplasmas have been associated with complications during pregnancy [4], M. genitalium with pelvic inflammatory disease, cervicitis, endometritis and salpingitis [8], and M. hominis with BV [3] and trichomoniasis [9,10]. M. genitalium is an emerging sexually-transmitted infection, which causes nongonococcal urethritis in men. Despite these associations with disease, M. hominis and Ureaplasma are also common in apparently healthy women. T. vaginalis causes trichomoniasis, the most common non-viral sexually transmitted infection worldwide [11]. Trichomoniasis often accompanies low levels of lactobacilli [12] and BV, and has been implicated in an array of pregnancy complications [13,14]. Although the extracellular eukaryotic parasite binds to vaginal epithelial cells and is hemolytic [15], the mechanisms of its pathogenesis remain enigmatic.
Martin et al. [2] recently described a 16S rRNA sequence from an unknown Mycoplasma, which they called ''Mnola'', in vaginal secretions and found it to be strongly associated with presence of T. vaginalis in the study. Shortly thereafter, Costello et al. [16] reported a phylotype with a 16S rRNA sequence exhibiting ,99% identity to that of the new mycoplasma described herein and by Martin et al. [2] as a predominant taxon in the oral sample of a low birth-weight infant (24.5 wks) and speculated it may have been acquired by vertical transmission during delivery. Hyman et al. also subsequently identified a partial 16S rRNA sequence from a vaginal sample of a woman who delivered full term that is 99% similar to ''Ca. M. girerdii'' [17]. We independently identified this phylotype and its association with T. vaginalis, first reported by Martin et al. In the current work, we confirm and extend the characterization of this new Mycoplasma using metagenomic strategies, present the genomic sequences of four independently identified strains, one of which was isolated from a pregnant woman who subsequently delivered preterm, and we propose ''Candidatus Mycoplasma girerdii'' for its name.

Results and Discussion
As part of the Vaginal Human Microbiome Project at Virginia Commonwealth University, we generated 16S rRNA gene-based microbiome profiles for 1,361 mid-vaginal samples collected from women visiting outpatient clinics and an additional 110 samples collected in a labor and delivery unit [18]. Our analyses revealed a novel mycoplasma phylotype that represented the most abundant bacterium observed in 25 mid-vaginal samples (i.e., 25/1,471), including at least one from a woman who experienced preterm labor. Twenty-two of these 25 (88%) women had a clinically diagnosed vaginal infection (Table 1), and all but one of these 22 women for whom vaginal pH was recorded exhibited an elevated pH value greater than 4.5 (median pH value = 5.8), an indicator of vaginal dysbiosis. Although microbiome profiles based on 16S rRNA gene surveys are not always accurate measures of the proportions of bacterial taxa present in a sample for a variety of reasons (e.g., biases inherent in DNA extraction, PCR and related sequencing technologies and variations in the number 16S rRNA genes per genome in different species), it is clear that this mycoplasma represents a very abundant taxon in the vaginal samples collected from these 25 women.
We examined the association between vaginal carriage of the novel bacterium, even as a minor component of the microbiome, and common clinically diagnosed vaginal infections. The association between ''Ca. M. girerdii'' and trichomoniasis was highest of several vaginal organisms of the female urogenital tract with a relative risk of 20.12 (Table 2). M. hominis, which has previously been linked with trichomoniasis [9,10], exhibits a much weaker association with a relative risk of 2.53, likely at least in part due to its strong association with BV. We did not find ''Ca. M. girerdii'' to be associated with an elevated relative risk for BV as diagnosed by Amsel's criteria [19]. Amsel's criterion assessment provides a dichotomous test with a relatively high specificity, but relatively low sensitivity [20]. BV assessed the Nugent's Gram-stain criteria [21] represents the continuum of alterations in vaginal flora. Both pregnant [22] and non-pregnant [2] women with intermediate Nugent scores have been reported to be more likely to have trichomoniasis. While Nugent scores were not recorded in this study, the 16S rRNA microbiome profiles ( Figure S1) are consistent with the hypothesis that women co-infected with ''Ca.
M. girerdii'' and T. vaginalis may also be more likely to have intermediate flora.
We detected ''Ca. M. girerdii'' at threshold of at least 0.1% of the 16S profile in 28 of the 63 (44.4%) women with clinically diagnosed trichomoniasis. We also found the new mollicute at less than 0.1% of the 16S rRNA threshold in eight additional women with trichomoniasis. Thus, we were unable to detect ''Ca. M. girerdii'' in the 16S rRNA gene profiles of only 27 of the 63 (42.9%) women with clinically defined trichomoniasis. In this study, trichomoniasis was clinically diagnosed by wet prep microscopy rather than culture and microbiome profiles were generated using the V1-V3 hypervariable region of the 16S rRNA gene rather than the V4-V6 region used by others [2]. Despite these methodological differences and differences in the study populations, we confirmed a strong association between the presence of T. vaginalis and ''Ca. M. girerdii'' previously reported as statistically significant (p = 0.026) by Martin et al. [2].

vaginalis
Up to half of all T. vaginalis infections are asymptomatic and undiagnosed [11]. We performed real-time qRT-PCR on all midvaginal samples positive for ''Ca. M. girerdii'' and found that 49 of the 51 (96%) women who carried the mycoplasma at a 1% threshold by 16S rRNA gene profiling also carry T. vaginalis ( Figure 1A; Table 1). Even at a lower 16S rRNA threshold of 0.1%, 61 of 72 (85%) of women who carried ''Ca. M. girerdii'' were T. vaginalis positive. Thus, ''Ca. M. girerdii'' exhibits an unusually strong correlation with trichomoniasis. We also found that ''Ca. M. girerdii'' was associated with both of the previously described genotypes of T. vaginalis [23,24], type 1 and type 2 ( Figure S3), indicating a broad-range association with this infectious disease. Both T. vaginalis genotypes have been reported in the HIV-positive women [25]. Additional studies are needed to determine whether ''Ca. M. girerdii'' co-infection contributes to the increased risk of HIV acquisition and transmission or to adverse pregnancy outcomes associated with trichomoniasis.
Interestingly, of 22 women with no diagnosis who were positive for ''Ca. M. girerdii'', 14 were also positive for T. vaginalis (Table 1). Lactobacillus crispatus is associated with decreased rates of T. vaginalis infection [12], and we found that the three ''Ca. M. girerdii'' positive women with a predominance of L. crispatus were negative for T. vaginalis (Figures 2 and S1). Thus, although our data are supportive of a dependent relationship, it appears that ''Ca. M. girerdii'' may not absolutely require T. vaginalis to colonize the human vagina. Our data suggest vaginal carriage of the new mycoplasma is associated with elevated vaginal pH and African American race (Table 3), risk factors for preterm birth [26], which are also associated with BV [27] and trichomoniasis [28]. Given the tight association of the mycoplasma with T. vaginalis, it is not possible to determine whether the organism is independently associated with these factors.  Figure  S2) showed that the bacterium is prominent in polymicrobial biofilms sometimes associated with ''clue cells'' ( Figure 1B, 1C, 1E), a characteristic of BV. The mycoplasma was also dispersed with other bacteria and only occasionally co-localized with T. vaginalis ( Figure 1D). It is not yet clear whether ''Ca. M. girerdii'' can enter and replicate inside of T. vaginalis like M. hominis [29], penetrate human cells like M. penetrans [30], or whether the mycoplasma is strictly extracellular. Given that eight women carrying the mycoplasma were negative for T. vaginalis (Table 1), our data suggest ''Ca. M. girerdii'' is not an obligate symbiont of the parasite as suggested by Martin et al. [2]. Symbiotically-

Genomic and Phylogenetic Analyses of ''Ca. M. girerdii''
Our attempts to cultivate ''Ca. M. girerdii'' have not succeeded. Therefore, we employed the strategy of assembling whole metagenome shotgun sequence reads to complete the genome of a reference strain of this organism. The ,619 kb genome is ,28.6% GC content and features sequences for ,572 putative proteins, 34 structural RNAs and one predicted CRISPR locus ( Figure 3). Three additional strains from other samples were similarly assembled and aligned to the reference. Gene synteny was very high among the four strains, but limited with other related species ( Figure S4). The four strains of ''Ca. M. girerdii'' exhibited an average of 99.8% nucleotide identity.
The reference genome exhibits irregular GC skews with no distinctive inversion (Figure 3), which is indicative of the high genome plasticity that is typical for mycoplasmas and consistent with the overall lack of synteny with related species. In the Genes in the rRNA operon were organized 59-16S-23S-5S, and no tRNAs were identified in the intergenic transcribed spacer regions as with most Mycoplasma and Ureaplasma species [32]. Like other mycoplasmas, ''Ca. M. girerdii'' possesses a minimal number of tRNAs and utilizes UGA as a tryptophan codon. We identified 31 putative tRNAs: 11 amino acids represented by a single anticodon; seven amino acids (Gly, Lys, Ser, Thr, Trp, Met) represented by two anticodons; and two amino acids (Leu, Arg) represented by three anticodons. Some, but not all, mycoplasmas have lost the tRNA-Trp gene that utilizes the TGG codon [33], but ''Ca. M. girerdii'' appears to have both. We identified two tRNA-Trp genes, one that utilizes the UGA codon with an observed codon frequency of ,0.87% and another that utilizes the TGG codon with an observed codon frequency of ,0.13%. The dnaA and dnaN genes are co-localized, but recF appears to be absent and gyrB is only distantly linked. As with M. penetrans and U. urealyticum [34], no clusters of DnaA boxes were identified upstream of the dnaA gene, as only one 9-mer with two base differences from the DnaA box consensus (59-TTATCCACA) was identified in that region. Homologs to putative virulence factors, including collagenases, a hemolysin, an O-sialoglycoprotein endopeptidase, and a feoBlike iron transport system, were identified in all four strains. Intriguingly, tensile strength of fetal membranes is imparted by collagens, and thus bacterial collagenase activity could facilitate fetal membrane rupture. ''Ca. M. girerdii'' appears to lack the superoxide dismutase gene, but encodes a complete desulfoferrodoxin-type superoxide reductase system that likely functions to protect against oxidative stress. A ,16 kb plasmid is apparently present at approximately two copies per ''Ca. M. girerdii'' cell in  the sample containing the reference strain, but was not observed in the samples containing the other strains. It carries a plasmid replication initiator protein, two genes resembling components of a type IV secretion system and ,9 hypothetical genes. Because of its prevalence in the former sample and its similarity to plasmids from related organisms, it is possible that this may represent the first plasmid associated with a mycoplasma in this phylogenetic group (see below).
Phylogenetic analysis of 16S rRNA genes shows that ''Ca. M. girerdii'' is most closely related to other uncultivated organisms identified by 16S rRNA sequence: the organism reported by Martin et al. [2], an organism identified by Costello et al. in oral samples of a low birth weight neonate [16], and other organisms from bovine rumen [35], the gut of termites [36][37][38] and Asiatic elephant and Somali wild ass feces [39] (Figure 4). Interestingly, the environments of the gut of lower termites, the rumen of cattle and other foregut fermenters and the cecum of hindgut fermenters   Figure 6). Arginine dihydrolase pathway and urease genes are absent, thus ''Ca. M. girerdii'' is not predicted to utilize arginine and urea. Catabolism of galactose, mannose, sucrose, maltose, glycogen, starch or glycerol is not predicted, and the roles of genes in the lactose/galatose pathways are unclear. ''Ca. M. girerdii'' possesses genes for a putative IIA component of the lactose-specific phosphotransferase system (MGM1_4770), ribose/galactose-ABCtype transporter system (MGM1_3070, MGM13080) and a galactose-6-phosphate isomerase (MGM1_4760/MGM1_4750). However, other genes required for lactose and galactose catabolism, including 6-phospho-beta-galactosidase, tagatose-6-phosphate kinase and tagatose-bisphosphate aldolase, were not identified. Moreover, the L-lactate dehydrogenase gene (MGM1_4130) has an apparent frameshift. Other components of the phosphotransferase system (PTS) system were also identified, including the HPr phosphocarrier protein (MGM1_1420) and an HPr phosophatase/kinase (MGM1_4210), which likely functions in the regulation of carbon metabolism. While all eight subunits for the F1F0 ATPase complex were identified (MGM1_4310 through MGM1_4380), these genes are thought to be involved in maintenance of the proton gradient rather than ATP generation in mycoplasma species as the cytochrome components are absent.
Unique among the genital mycoplasmas, the ''Ca. M. girerdii'' genome encodes serine dehydratase (MGM1_2560, MGM1_0390), alanine dehydrogenase (MGM1_5480, MGM1_1820), and 29,39cyclic-nucleotide 29-phosphodiesterase (MGM1_1930) that may permit use of alternate energy sources in the absence of glucose: Lalanine, L-serine, and 29,39 cyclic AMP. No serine dehydratases have been previously described for the mollicutes, and while alanine dehydrogenase has been described for Acholeplasma laidlawii [40] and annotated for M. mycoides (ADH22225.1), M. mobile (AAT27586.1), M. leachii (ADR24467.1), and M. putrefaciens (YP_004790384.1), neither the gene nor the enzyme has been previously identified in any of the genital mycoplasmas or organisms classified in the Pneumoniae group. Moreover, 29,39-Cyclic phosphodiesters may be available in the environment as intermediate products in the hydrolysis of RNA by ribonuclease I. This strategy has been proposed for Yersinia enterocolitica [41], which has been shown to grow on 29,39-cAMP as a sole carbon source. ''Ca. M. girerdii'' does not metabolize pyruvate through the pyruvate dehydrogenase pathway that is used by M. genitalium or the other mycoplasma species that catabolize pyruvate to acetate. However, ''Ca. M. girerdii'' may utilize one or both of two alternate enzymes identified in the genome: pyruvate-formate lyase (MGM1_5430), which produces acetyl-CoA and formate from CoA and pyruvate, and/or pyruvate ferredoxin/flavodoxin oxidoreductase (MGM1_5310), which yields acetyl-CoA and carbon dioxide from the same substrates by reducing either ferredoxin or flavodoxin. Both of these enzymes seem to be unique to ''Ca. M. girerdii'' among the mycoplasmas. Acetyl-CoA may be converted to acetate by phosphate acetyltransferase (MGM1_0120) and acetate kinase (MGM1_2290), resulting in the production of ATP.
As with other mollicutes, ''Ca. Mycoplasma girerdii'' appears to have limited metabolic capabilities and imports much of what it needs from its environment or host. ''Ca. M. girerdii'' seems to lack gluconeogenesis and the TCA cycle like other mycoplasmas. It lacks enzymes for de novo purine or pyrimidine synthesis and amino acid synthesis, but appears to be capable of nucleotide salvage and amino acid transport. The genome encodes ,40 genes associated with transport of various ions and substrates including amino acids, glucose, ribose/lactose, potassium ion, magnesium ion, calcium ion, ferrous iron, cobalt, phosphate and represents the reference strain (VCU_M1). Circles 5 (dark red) and 6 (blue) represent the predicted coding sequences in the forward and reverse orientations respectively. Circle 7 (black) shows the GC content, and circle 8 shows GC skew (pink (-), green (+)). doi:10.1371/journal.pone.0110943.g003 spermidine/putrescine (Table S2). An alcohol dehydrogenase (MGM1_5890) exhibiting homology to a butanol dehydrogenase and putative bifunctional aldehyde-alcohol dehydrogenase (MGM1_1150) were also annotated in the genome, thus ethanol may also be an end product of metabolism. Although ''Ca. M. girerdii'' is predicted to be able to convert butanol to butanoyl-CoA, it appears to lack other enzymes of butanoate metabolism. The predicted metabolic reconstruction may provide insight and help guide future cultivation attempts.

BspA-like proteins encoded in ''Ca. M. girerdii''
Mycoplasma species contain surface proteins that exhibit high frequency antigenic variation [42]. Although these organisms exhibit a low level of horizontal gene transfer, expanded families of surface proteins are an exception [43][44][45]. In the reference genome of ''Ca. M. girerdii'', we identified a family of 26 BspAlike proteins containing Treponema pallidum leucine rich repeat (TpLRR) domains with homology to the prototypical BspA virulence factor of Tannerella forsythia and a family of over 900 BspA-like proteins of T. vaginalis [46,47]. BspA-like proteins from ''Ca. M. girerdii'' exhibit variable length ranging in length from 136 to 1481 amino acids (Figure 7). Twenty-three of the BspA-like proteins contained a predicted C-terminal transmembrane domain, and a signal peptide was detected for five of the BspAlike proteins.
Members of this family stimulate a Toll-like receptor 2 (TLR2)mediated host immune response. We also identified two other putative surface lipoproteins that lack the TpLRR domain, but exhibit homology to other mycoplasma proteins that stimulate a TLR-mediated innate immune response. It is intriguing to hypothesize that the expanded families of BspA-like proteins in the ''Ca. M. girerdii'' and T. vaginalis may represent a common host-adaptation strategy.
The BspA from T. forsythia is perhaps the best studied protein containing the TpLRR domain. This protein has been shown to mediate a host innate immune response through Toll-like receptor 2 (TLR2 [48]) by directly interacting with the receptor [49]. More recently, the protein has been shown to elicit a response through scavenger receptor gp340 [50]. T. forsythia BspA has also been shown to be required for host cell attachment and invasion [51] and co-aggregation with Fusobacterium nucleatum [51]. Thus, the ''Ca. M. girerdii'' BspA-like proteins may also mediate interactions with the host and contribute to virulence through induced host inflammation.
No TpLRR-containing proteins have been identified in U.

Conclusions
In summary, we confirmed the identity of a recently described and still uncultivated species of mycoplasma, further documented its strong association with the presence of T. vaginalis, and comprehensively characterized the genomes of ''Ca. M. girerdii'' from four vaginal samples collected in the Vaginal Human Microbiome Project at VCU. The genomes of this potentially emerging pathogen provide insight into its metabolic strategies and reveal a potential for virulence and for triggering host inflammatory responses through innate immune mechanisms. This work lays the foundation for understanding the impact of ''Ca. M. girerdii'' on women's urogenital health and the nature of its association with T. vaginalis.

Identification of ''Ca. M. girerdii''
Mid-vaginal swab samples collected from women at 1,471 visits (1,361 outpatient visits, 110 visits to the labor and delivery unit) were assayed by 16S rRNA gene pyrosequencing according to the protocols of the Vaginal Human Microbiome Project at VCU [18,52]. The team that performed the PCR was blinded to the BV and T. vaginalis diagnoses and results. Mid-vaginal pH, clinical diagnosis and health history were recorded. The teams that recorded and digitally entered this information were blinded to the PCR and RT-PCR results. Clinical diagnosis of trichomoniasis was based on identification of motile trichomonads in a saline wet mount preparation of vaginal discharge. BV was clinically diagnosed in women meeting at least three of Amsel's four criteria [19]: characteristic BV discharge, clue cells on microscopy, vaginal pH .4.5 and positive whiff test. Consent was obtained from all participants in accordance with the study protocol (HM12169) as approved by the institutional review boards for human subjects protection at Virginia Commonwealth University and the Virginia Department of Health Raw sequence data from the project is available from the Short Read Archive at NCBI (projectID phs000256).

Detection and genotyping of T. vaginalis
Detection of T. vaginalis by quantitative real-time RT-PCR was performed as described by Shirm et al. [53], and T. vaginalis genotyping was performed using three single-copy genes as described by Conrad et al. [23,24].

Relative Risk Analysis
Bootstrap (n = 1,000) samples were selected from the outpatient clinic population to reflect the outpatient community composition. Samples from women enrolled in labor and delivery were not included in this analysis. Median relative risk and 95% bootstrap confidence intervals were calculated. A bacterial taxon was considered present in the mid-vaginal sample if at least 0.1% of the metagenomic 16S rRNA gene microbiome profile reads classified to the taxon.

Metagenomic assembly of ''Ca. M. girerdii''
We selected one mid-vaginal sample (VCU_NT41; Table 1) with .90% of 16S rRNA reads classified to ''Ca. Mycoplasma girerdii'' from which to assemble the reference genome of the organism. The woman who provided this sample was in active preterm labor and also tested positive for group B Streptococcus (GBS), Chlamydia trachomatis and T. vaginalis.
Fifty nanograms of total DNA was used in a tagmentation reaction with a Nextera DNA Sample Prep Kit (Roche Titaniumcompatible, Epicentre Biotechnologies) following the manufacturer's protocol and sequenced in the Nucleic Acids Research Facilities at VCU. Titanium FLX pyrosequencing (Roche/454; 1/ 2 plate) yielded 793,732 reads and 241,486,162 bases. The raw data was pre-filtered to remove most human reads (55% of the total reads) using Bowtie 2 [54] with default parameters. The reads were then split into two bins using AbundanceBin [55]: (1) a bin containing abundant reads including those derived from ''Ca. M. girerdii'', and (2) a bin containing less abundant reads derived from the minor components of the vaginal microbiome. Humanfiltered reads from the bin of abundant sequences (i.e., 252,073 reads, 79,029,547 bases) were assembled using Newbler, resulting in 298 contigs larger than 500 bases and 1,966 contigs larger than 100 bases, with a total of 89.10% of reads aligned to a contig. Through a careful analysis of the single-end read flow information from Newbler, we inferred a circular scaffold for the organism that incorporated 19% (152,023 total reads) of the total unfiltered reads from the metagenomic sample. The scaffold incorporated eight of the largest ten contigs that ranged in size from 144,547 bases to 4,826 bases and exhibited 67.2-fold to 95-fold coverage. Sequence reads incorporated into the ''Ca. M. girerdii'' genome did not map to known Mycoplasma or Ureaplasma sequences, and no other unnamed mollicutes were detected in these samples by 16S rRNA analysis. The eighth-largest contig encoded a 16kb plasmid with an observed depth of 189-fold coverage. Because of its abundance, it is likely that this plasmid is from ''Ca. M. girerdii'' but the host of the plasmid cannot yet been unequivocally assigned. The ninthlargest contig aligned to T. vaginalis ribosomal RNA genes. An additional eight smaller contigs ranging in size from 161 bases to 814 bases were also incorporated into the scaffold, with two of the contigs incorporated twice. The majority of non-assembled contigs aligned to T. vaginalis (232 contigs), Homo sapiens (21 contigs), or other bacterial species; e.g., Gardnerella vaginalis (10 contigs). Physical gaps in the contig junctions were confirmed and closed by PCR across gaps and fluorescent chain termination sequence analysis on the AB3730 or AB3130 capillary sequencers (Applied Biosystems). These gaps commonly occurred either in genes exhibiting homology to type I restriction modification system proteins or those encoding BspA-like proteins, although one junction spanned a gene encoding a signal recognition particle protein, which was present in only one copy. PCR and sequencing primers are provided (Table S3). All physical gaps in the scaffold were closed by PCR-amplification and sequencing using the Sanger capillary methodology. The circularity of the 16kb plasmid was similarly confirmed.
Three additional mid-vaginal samples (VCU_CT62, VCU_NT44, VCU_NT71; Table 1) each containing more than 30% ''Ca. M. girerdii'' by metagenomic 16S rRNA gene microbiome analysis were also sequenced by whole metagenome shotgun sequencing using Titanium FLX pyrosequencing and the protocol described above. Each sample was run on approximately one eighth of a plate, yielded between 185,612 and 222,667 total reads and assembled using Newbler, and aligned to the reference strain. The genomes of ''Ca. M. girerdii'' have been deposited with NCBI under the Bioproject accession numbers PRJNA196996, PRJNA196997, PRJNA196998, and PRJNA196999. The complete genome of ''Ca. M. girerdii'' has been deposited at NCBI under accession number CP007711.

Genome annotation and metabolic reconstruction
Open reading frames (ORFs) greater than 100 nucleotides were predicted by Glimmer3 [56] and GeneMarkS [57] using translation table 4 and were manually examined. In most cases the start site predicted by Glimmer3 was chosen for genes that had the same predicted stop codon called by both Glimmer3 and GeneMarkS. Translated ORF predictions were searched against the non-redundant (nr) database from NCBI and a custom database of Mollicute proteins downloaded from NCBI using the blastp algorithm, and the gene products were manually annotated. Predicted gene products were compared to conserved domain databases (COGs and Pfam) by RPS-BLAST. Other annotation features were predicted using TMHMM 2.0c [58] for transmembrane domains and SignalP 4.0 for signal peptides. Although mollicutes have a unique membrane composition, SignalP has been previously validated on experimentally verified secreted proteins from mollicutes [59]. The hmmsearch program from HMMER3.0 [60] was used to search predicted proteins for the Treponema palladium family of leucine rich repeats (TpLRR) using the Pfam raw hidden markov model for the family (LRR_5; PF13306). Transfer RNA genes were predicted by tRNAscan-SE v 1.3 using the genetic code outlined in translation table 4. The tRNA-Ile, elongator tRNA-Met and initiator tRNA-fMet were distinguished by alignment with previously annotated tRNAs [61]. The CRISPR element containing four 34-nucleotide repeats and three 35-nucleotide spacers was identified in the genome with a consensus direct repeat sequence of 59-AAGTATTAATATTC-CAAGTAGTGTAACTAGTATT using the CRISPR recognition tool (CRT) [62]. Metabolic reconstruction and Gene Ontology classification assignments were performed using ASGARD [63] and the UniRef100 database.

Phylogenetic analysis
One reference genome was selected for each species in the Mollicutes class for which a completely sequenced genome is available. A total of 57 transitively closed orthologous clusters were retrieved from the RoundUp [64] database (release date: Dec. 23, 2011). ''Ca. M. girerdii'' orthologs were identified using blastp and confirmed using the Reciprocal Smallest Distance (RSD) algorithm. Orthologs were similarly identified for M. iowae for which only a draft genome is available. The maximum-likelihood tree was inferred by RAxML 7.2.74 [65] using the gamma-distributed heterogeneity rate categories with 1,000 bootstraps. The tree was rooted using Lactobacillus gasseri as the outgroup. Phylogenetic trees based on 16S rDNA gene sequences were similarly constructed. The 16S rRNA gene alignments were manually inspected and the maximum likelihood tree was inferred by RAxML 7.2.74 using the gamma-distributed heterogeneity rate categories with 1,000 bootstraps.

Attempts to cultivate ''Ca. M. girerdii''
Frozen vaginal swab samples were incubated on A8 and SP4 agar (Hardy Diagnostics). When these plates did not yield colonies, the frozen samples were cultured on PPLO broth base containing 10% horse or 10% human serum, 10% yeast extract, 1% arginine and 1.5% Bacto agar [66]. The samples were also incubated on supplemented BHI agar [67] containing 10% human blood. All solid media was supplemented with 100 mg ampicillin/mL and plates were incubated at 37uC under anaerobic conditions and in air supplemented with 5% CO 2 . None of these efforts yielded detectable growth.

Ethics Statement
Consent was obtained from all participants in accordance the study protocol as approved by the institutional review boards for human subjects protection at Virginia Commonwealth University and the Virginia Department of Health. All enrolled subjects were 18 years of age or older and provided written informed consent.  Figure S3 ''Ca. M. girerdii'' coexists with both genotypes of T. vaginalis. The maximum likelihood tree was constructed using concatentated, aligned partial protein sequences from three single-copy orthologs (CRN, PMS1, Mlh1a). Isolates indicated as type 1 or type 2 were previously typed using microsatellite markers. Strains from 43 clinically diagnosed cases of trichomoniasis from this study are indicated with the prefix ''VCU''. The type 1 cluster is shaded blue and contains eight ''Ca. M. girerdii'' positive cases, the type 2 cluster is shaded green and contains ten ''Ca. M. girerdii'' positive cases and the ambiguous cluster is shaded gray and contains five ''Ca. M. girerdii'' positive cases. In this analysis, the ambiguous cluster groups with type 2 T. vaginalis, but the subgroup contains isolates that were differentially classified as type 1 using microsatellite markers. T. vaginalis strains from ''Ca. M. girerdii'' positive cases as determined by 16S rRNA microbiome profiling (0.1% threshold) are indicated with red boxes. Ambiguous cases that were detected at less than 0.1% threshold are denoted with pink boxes. Blue dots denote branches with bootstrap values greater than 50. (PDF) M. girerdii'' strains (VCU_PA1, VCU_JB1, VCU_G1). Panel (B) shows dot plot amino-acid based alignments of the reference with three closely related species (M. iowae, M. penetrans and U. parvum). Horizontal grid lines delineate contigs. Nucleotide-based and protein-based alignments were performed using Nucmer and Promer respectively. (PDF)