Microbiota-based analysis reveals specific bacterial traits and a novel strategy for the diagnosis of infectious infertility

Tubal factor infertility (TFI) accounts for more than 30% of the cases of female infertility and mostly resides from an inflammatory process triggered by an infection. Clinical appearances largely differ, and very often infections are not recognized or remain completely asymptomatic over time. Here, we characterized the microbial pattern in females diagnosed with infectious infertility (ININF) in comparison to females with non-infectious infertility (nININF), female sex workers (FSW) and healthy controls (fertile). Females diagnosed with infectious infertility differed significantly in the seroprevalence of IgG antibodies against the C. trachomatis proteins MOMP, OMP2, CPAF and HSP60 when compared to fertile females. Microbiota analysis using 16S amplicon sequencing of cervical swabs revealed significant differences between ININF and fertile controls in the relative read count of Gardnerella (10.08% vs. 5.43%). Alpha diversity varies among groups, which are characterized by community state types including Lactobacillus-dominated communities in fertile females, an increase in diversity in all the other groups and Gardnerella-dominated communities occurring more often in ININF. While all single parameters did not allow predicting infections as the cause of infertility, including C. trachomatis IgG/IgA status together with 16S rRNA gene analysis of the ten most frequent taxa a total of 93.8% of the females were correctly classified. Further studies are needed to unravel the impact of the cervical microbiota in the pathogenesis of infectious infertility and its potential for identifying females at risk earlier in life.


Introduction
Tubal factor infertility is a major cause of involuntary childlessness in females that mostly derives from an inflammatory process in the fallopian tubes caused by a sexual-transmitted infection (STI). Epidemiological data suggest Chlamydia trachomatis and Neisseria gonorrhoeae as the most prevalent STIs worldwide, however, direct correlation between the STI and infertility is hampered by several limitations: 1) most acute STI are asymptomatic and are not recognized, 2) diagnostic approaches often miss chronic infections, 3) there is often a long time span between an acute STI-related inflammatory disease (e.g. pelvic inflammatory disease, salpingitis) and the recognition of female infertility.
The current diagnostic workflow in detecting infection as the underlying reason for female infertility comprises anamnestic and serological analysis on previous STIs, combined with diagnostic assessments of anomalies of the reproductive tract of both partners [1]. Limitations in detecting previous STIs that often remained asymptomatic during acute infection, but also discrepancies between the retrieval of pathogens from the lower and upper genital tract by cultivation and nucleic acid detection, makes it difficult in the clinical routine to identify all cases of infectious infertility. The emergence of bacterial vaginosis (BV) is considered as one of the key factors in the development of female diseases of the upper genital tract and has been associated with pre-term birth, salpingitis and pelvic inflammatory disease [2]. However, the role of BV in this context is not well understood, arguing for clinical, inflammatory and microbial patterns to be considered in the link between BV and long term sequels of STIs. Recently, a shift in the microbial composition of the female genital tract, which is accompanied by a decrease of Lactobacillus-dominated communities, has been determined as one of the hallmarks of BV in different clinical entities [3,4]. Furthermore, high diversity communities including high relative abundance of Gardnerella and Ureaplasma in the vaginal tract have been shown to be associated with preterm birth [5]. In general, next-generation-sequencing techniques offer a new approach in studying bacterial-bacterial association within the urogenital microbiome [6] in comparison to classical, cultivation-dependent methods. Clinical studies in females suffering from infectious infertility that integrate data from vaginal microbiota sequencing are not yet performed. By overcoming the limitations of classical studies, recent research highlights the importance of the cervico-vaginal microbiota in reproductive health, e.g. through the inhibition of pathogens such as E. coli [7], HIV [8] C. trachomatis [9]. Besides others, L. crispatus dominated vaginal bacterial communities have been described to be protective in the transmission of C. trachomatis from infected men to women [10]. As outlined in a recent review of Ziklo et al. a diverse microbial community is suggested to provide a particular microenvironment favoring the growth of C. trachomatis [11], thereby playing an important role in development of infectious infertility. However, clinical studies in females suffering from infectious infertility that integrate data from vaginal microbiota sequencing are not yet performed. Here, we characterized the microbial pattern found in females with infectious infertility in comparison to females with non-infectious infertility, fertile controls and female sex workers. The aims of this study were to obtain a detailed insight into the bacterial burden and communities of females at the time of diagnosis of infectious infertility, and propose a diagnostic algorithm for identification of females at risk during earlier screening.

Study design and sample collection
From 2012 to 2014 female patients from the Infertility Clinic at the University Hospital Schleswig-Holstein/Campus Luebeck, fertile controls from an Outpatient Clinic in Luebeck and female sex workers (FSW) from the Center of Sexual Health in Berlin were prospectively included in the study. The work on human samples within this study was approved by the ethics commitee of the University of Lübeck at the 05.02.2012 with the reference number 11-185. All participants were informed and signed a written consent. Females from the Infertility Clinic were classified by the treating physicians into the two groups, infectious infertility (ININF) and non-infectious infertility (nININF), after finalization of the conventional diagnostic assessment and before the samples were further processed. Three swabs from the cervix uteri were obtained from every woman by a trained gynecologist, avoiding contamination from the lower genital tract. Elution-swab samples (Copan) were used for subsequent storage in Amies transport medium (Copan) or UTM-medium (Copan) and transferred to the local microbiology laboratory. In addition, 10 ml of serum was collected from every women included in the study for testing anti-chlamydial antibodies. Only women with a complete set of swabs and serum were finally included in the analysis.

Processing of the cervical swabs and the serum
The first cervical swab was used for conventional bacterial culture of commensal and pathogenic bacteria of the urogenital tract. According to the standard operating procedures, the samples were plated on blood and chocolate agar plates and incubated for up to 48 hours. The second swab was stored at -80˚C until direct PCR-testing against C. trachomatis, N. gonorrhoeae, M. genitalium, M. hominis and U. urealyticum (TIB Molbiol) was performed. The third swab was also stored at -80˚C until further processing for the 16S rRNA gene sequencing. In addition, serum samples were tested with an ELISA (Mikrogen) and Immunblot Assay (Mikrogen) to specifically determine C. trachomatis IgG and IgA-antibodies against the chlamydial proteins MOMP, OMP2, TARP, CPAF and HSP60.

DNA-extraction
Swabs in UTM™ media (Copan Diagnostics) were vortexed for 1 minute on maximum speed. Bacterial DNA was isolated from 1 ml media using PowerSoil1 DNA Isolation Kit (MoBio) according to the manufacturer´s protocol. The method was modified by using double amounts of solutions C2, C3 and C4 and a 2h incubation with OB Protease (Peqlab) at 50˚C, followed by a homogenization step prior to the first centrifugation step. A negative extraction control was run for each extraction.

Amplicon generation, preparation and sequencing
Partial sequences of the bacterial 16S rRNA gene were amplified with primers spanning the V3/V4 hypervariable regions adopted from Kozich et. al. [12]. The primers contained unique indices and linker sequences for paired end sequencing on an lllumina1 MiSeq sequencer. PCR was performed as follows: 98˚C for 5 min followed by 30 cycles with 98˚C for 9 sec, 55˚C for 60 sec and 72˚C for 90 sec followed by a final step at 72˚C for 10 min. After PCR samples were stored at -20˚C until further usage. Amplicons were quantified on an agarose gel with a DNA ladder as reference, were the concentration of each amplicon was determined by comparison to a ladder band of the same size and intensity as the respective amplicon. Equimolar amounts of the correct sized fragments were pooled for sequencing. Afterwards the pool was run again on an agarose gel and eluated with MinEluteGel Extraction Kit (Qiagen). The pool was stored at -20˚C until sequencing. Sequencing was performed on a MiSeq sequencer (Illumina 1 ) using the MiSeq Reagent Kit v3 (600 cycles) as described by Kozich et. al. [12].

Data processing
Fastq files were processed using mothur version 1.35.0 [13]. Contigs were produced of forward and reverse sequences and any sequence was removed if it had ambiguous bases, a homopolymer length > 9 or a size longer than the amplified fragment. We aligned the remaining sequences using a customized SILVA reference data base [14] and removed unaligned sequences. Chimeras were detected using the UCHIME algorithm [15] as implemented in mothur [13] and removed from the data set. We classified the sequences using the mothur-formatted RDP [16] training set version 9, with a cutoff of 80 and removed non-bacterial sequences. Further analysis was done on a random subset of 2500 reads/sample, either using operational taxonomic units (OTUs) clustered with a similarity threshold of 97% or based on taxonomic assignment. To keep a consistent classification on the genus and species level, we used Stirrups for taxonomic assignment with a reference taxonomy specifically produced for vaginal microbiota [17]. We used PICRUSt (Phylogenetic Investigation of Communities of Unobserved States) as online tool (http://huttenhower.sph.harvard.edu/galaxy/) to predict the metagenome of microbial communities [18].

Statistical analysis
Statistics and graphical visualizations were produced using R version 3.2.2 [19]. We tested the prevalence of diagnostic parameters globally over all 4 groups using Pearson´s Chi-square tests with Holm correction for the number of calculations within each set of diagnostic tests (PCRtesting, cultivation of pathogens, IgG-serology and IgA serology). Subsequently, we used Fisher´s exact tests to determine which groups differ significantly from each other in each parameter. The relative read count of bacterial taxa from the sequencing was calculated per group and differences between the groups were tested using Kruskal-Wallis rank sum tests with Benjamini-Hochberg correction for the number of taxa tested. Subsequent pairwise comparisons were performed for significant taxa using Wilcoxon rank sum test with continuity correction. Assessing bacterial alpha diversity, we calculated Simpson´s diversity index and Shannon´s diversity index for each sample on the basis of OTUs using R package vegan [20]. Differences in alpha diversity between groups were tested based on Simpson´s diversity index using the Kruskal-Wallis rank sum test followed by pairwise Wilcoxon rank sum test with Benjamini-Hochberg correction. We tested the influence of sexual intercourse on alpha diversity based on Shannon´s diversity index. Therefore we used fractional polynomials to assess the best regression model [21]. Based on the best proposed model we performed linear regression and computed Pearson´s product-moment correlation. A heatmap of the 25 main taxa from the sequencing was produced using R package BoutrosLab.plotting.general [22]. Furthermore, bacterial communities were clustered into community state types (CST). CSTs were determined using euclidean distances with complete linkage clustering. The observed CSTs were named on the basis of the dominant member of the respective CST. Differences in the proportion of the community types were assessed using the Chi-square test.

Probability modelling
We established a prediction model for suffering from infectious infertility based on a subset of the clinical and sequencing data excluding FSW. Prediction was performed using binary logistic regression, and the following parameters were included as predictors in the modelling: detection of bacterial pathogens by culture/PCR, IgA and IgG-immunoblots against epitopes of C. trachomatis, and the most abundant 10 taxa from the sequencing analysis. Samples were assigned as suffering from infectious infertility if they reached a probability threshold of 0.2 while other samples were assigned as not suffering from infectious infertility. We assessed the prediction outcome and the total accuracy of the model. Goodness-of-fit of the model was assessed using McFadden's pseudo R 2 . ROC curves were generated assessing true positive rate versus false positive rate and the area under curve and its 95% confidence interval (CI) was computed using r-packages ROCR [23] and pROC [24].

Accession numbers
The datasets analyzed during the current study are available at the European Nucleotide Archive (ENA) with the accession number PRJEB17077.

Results
From 2012 to 2014, a total of 210 females were recruited for this study. Within the couples from the infertility group, 26 females were classified as having non-infectious infertility (nININF) on the basis of the currently recommended diagnostic assessment [1]. The main reasons for nININF included diagnosis of Uterus myomatosus, polycystic ovarian syndrome (PCOS), endometriosis or proven infertility of the male partner. 21 females were classified as having infectious infertility (ININF), including females with a history of pelvic inflammatory disease (PID), and females with tubal occlusion due to acute or chronic PID. 89 fertile females were included on the basis of having children and no evidence of current STI. In addition, 54 female sex workers (FSW) were randomly included in this study for adjusting the results with respect to frequencies of sexual intercourse and underlying STIs.

History of previous STIs and detection of pathogens from cervical swabs
In the group of female sex workers (FSW) significantly more infections with C. trachomatis and N. gonorrhoeae are reported when compared to fertile controls (Fig 1). Females with infectious infertility (ININF) significantly differ in the frequency of previous C. trachomatis infections in comparison to fertile controls (p<0.01) and females with nININF (p<0.05). No differences between the groups are observed for previous infections with HPV, HSV, Treponema pallidum, HIV or Hepatitis B and C.

Serum antibodies against specific C. trachomatis epitopes differ in infectious infertility
We next tested C. trachomatis antibodies against IgG and IgA (S1 Fig) and performed immunoblotting against the chlamydial proteins MOMP, OMP2, TARP, CPAF and HSP60. In contrast to fertile controls, females with ININF have IgG antibodies targeting chlamydial antigens MOMP, OMP2, CPAF and HSP60 (Fig 2C) significantly more often. IgG antibodies against CPAF and HSP60 are also significantly enhanced in ININF when compared to nININF. FSW exhibit significantly higher frequencies in the detection of IgG antibodies against MOMP, OMP2, TARP and CPAF compared to both fertile controls and nININF. HSP60 IgG antibodies discriminate ININF from FSW showing significant reduction in FSW. In addition, significantly higher frequencies of IgA antibodies against OMP2 and TARP are observed for FSW, and OMP2 and HSP60 for ININF, when compared to the fertile controls (Fig 2D).

Characterization of the cervical microbiota in infectious infertility
To further discriminate the local microbiota composition of the four different groups, we performed 16S rRNA gene sequencing of the V3/V4 hypervariable region from cervical swabs (Fig 3 and Fig 4). Comparing the average relative read count of genera in each study group, Lactobacillus is the most prominent genus. While on average 78.34% of all sequence reads in fertile females belong to Lactobacillus, the average percentage is reduced to 69.01% in nININF, 57.74% in ININF and 41.70% in FSW. In contrast, the relative read count of the genus Gardnerella increased from 5.43% in fertile females, to 5.61% in nININF and up to 10.08% in ININF and 10.78% in FSW. The same trend is observed for the genera Prevotella and Sneathia (Fig 3A). Some genera exhibit high detection rates in some single patients (e.g. Mycoplasma in ININF), while others are found frequently in moderate range in many patients (Clostridiales in FSW). However, this often allows the characterization down to the species level. Accordingly, all sequencing reads of Neisseria belong to N. gonorrhoeae and are exclusively found in FSW, whereas Mycoplasma reads could be attributed in more than 90% of the cases to M. hominis.
We specifically analyzed the sequencing results of those pathogens that were detected by conventional PCR-testing in a second swab. In contrast to the PCR-testing, the relative read count of C. trachomatis reads, as determined by microbiota sequencing, change. The mean relative read count of positively tested samples in fertile women, nININF and ININF is 0.14%, while the mean relative read count in FSW is 2.91% in positively tested samples.
An overview of statistical comparisons among taxa is given in the S1 Table. The diversity of each sample was calculated using Simpson´s diversity index (Fig 3B). The diversity is low in fertile females, with a median of 0.21, and higher in nININF (median: 0.52) and ININF (median: 0.57). FSW display the highest diversity indices with a median of 0.69. We observe an increase in diversity communities in groups with high sexual activity (S3 Fig), and a significant contribution of sexual activity to the alpha diversity ( S2 Fig and S2 Table, R 2 = 0.12, p<0.001). An established method for further characterization of differences in the microbial composition of the female urogenital tract is the classification in so called community state types (CST) [5]. Several CSTs can be attributed to the different study groups, showing greatly different relative read counts among taxa. The observed CSTs are dominated by L. crispatus (Lcr), L. iners (Lin), L. gasseri (Lga) or G. vaginalis (Gva), or displaying diverse community (div, Fig 4A-4D). Some single communities differ from these major CSTs (Other). Thus, fertile females mostly are comprised of Lactobacillus-dominated CSTs, with only low numbers of diverse communities or the Gva CST. The proportion of diverse communities is enhanced in nININF and INNIF, but in contrast to ININF, no Lga or Gva CST is observed in nININF. FSW are mostly colonized with diverse communities and the Lin CST, with only a limited number of the Lcr CST (Fig 4D, chi-square-test: p<0.001). While the Lcr CST is reduced in ININF and FSW compared to the other groups, the proportion of the Lin CST is relatively stable. Complete linkage clustering of samples based on predicted metagenomes and assignment of community state types display a high concordance (Fig 5).

A statistical model for the prediction of infectious infertility
We further established a model to predict the underlying cause of infertility by using the following parameters: 1) detection of potential pathogens by PCR and cultivation, 2) serological status of C. trachomatis IgG/IgA and 3) the first ten taxa found in microbiota sequencing. Using binary logistic regression, we are able to define infectious infertility with an overall accuracy of 89.8% within the patient samples used in this study. We correctly predict 17

Discussion
Identification of females suffering from an infectious cause of infertility is arduous and has been based in the past on a combination of parameters including patient history, conventional microbial testing and gynecological assessments of anomalies of the reproductive tract. One of the major obstacles is the long time period between the acute initial exposure to a sexually   Although frequencies of STIs among adults are in the range of 1-10%, the public awareness in many countries is still low. Epidemiologic studies on Chlamydia trachomatis as the most frequent STI worldwide reveal high inter-study variations concerning its prevalence in European countries ranging from below 1% up 12% with confidence intervals ranging up to 20% [25,26]. While PCR-based methods for the detection of acute infections are well established the usefulness of determining C. trachomatis antibody responses for diagnosing subsequent Chlamydia-derived sequelae are less certain. Detection of high antibody levels against the chlamydial-protease like activity factor (CPAF) in females with Chlamydia trachomatis cervicitis was first reported by Sharma et al., suggesting that CPAF is immunogenic during ascending chlamydial infection [27]. Forsbach-Birk et al. previously analyzed sera from 13 female patients with upper genital tract infections, and detection of C. trachomatis infection was confirmed by a positive PCR result and/or serum positivity of IgM antibodies. In this group MOMP, CPAF, OMP2, TARP and PmpD showed the highest overall diagnostic sensitivity and specificity [28]. In contrast to the findings in acute infections, our results highlight a role of HSP60, CPAF and OMP2 in host-pathogen interactions in females with ININF. In particular, HSP60 IgG antibodies separate the ININF group from all other groups, including FSW. Chlamydial HSP60 has been shown to be associated with PID [29,30] which might explain why ININF display higher antibody titers against HSP60 than FSW in our study. Interestingly, antibodies against HSP60 and CPAF were also detected at significantly higher levels in inflammatory ocular C. trachomatis infections, suggesting both as markers for disease severity [31]. In addition, IgG antibodies against CPAF were significantly elevated among the trachomatous trichiasis cases, which is seen as a consequence of chronic inflammation in the tarsal conjunctiva [31].
Microbiota sequencing of the V3/V4 hypervariable region of the 16s rRNA gene from cervical swabs revealed further characteristic differences between the groups. Thus we detected significant differences in the relative read count of specific taxa, the diversity of the microbial communities and the proportions of the community types. In particular, genera (Gardnerella, Prevotella, and Sneathia) that have been linked to BV, together with Atopobium, Peptostreptococcus, Mobiluncus, Leptotrichia, Mycoplasma, and BV-associated bacterium 1 (BVAB1) to BVAB3 [3], are increased in the ININF and FSW group. This is accompanied by a shift in the vaginal community from the dominant Lactobacillus to a polymicrobial community [3]. Some of the above mentioned taxa, including Dialister and Veillonella were shown to be capable of producing biogenic amines, which might favor the development of a vaginal dysbiosis, but also could be used as a biomarker for BV detection [32]. Interestingly, while total Lactobacillus-dominated CSTs are reduced in ININF and FSW in comparison to fertile females and nININF, this is not true for the Lin CST, which is distributed rather equally among the study groups. This is of particular interest, as van der Veer et al. have recently shown that diverse bacterial communities (often containing high relative abundance of G. vaginalis) and vaginal communities dominated by L. iners are associated with C. trachomatis transmission [10]. Thus, ININF and FSW comprise high numbers of CSTs showing to prevent infections that cause infertility.
In interpreting microbiota composition of individuals, it must be considered that some genera might exhibit prominent detection rates in single groups because of high read numbers in some patients, while others are found frequently in a moderate range in many patients of the respective group. However, this often allows for the characterization down to the species level, and thereby greatly complements single-/multiplex-PCR approaches for respective STIs. Significantly higher loads of M. hominis were detected in women with BV, and the respective loads in M. hominis and G. vaginalis co-infections displayed a significant positive correlation [33].
Our data show that while C. trachomatis prevalence is equally distributed among the study groups, as according to the diagnostic PCR-testing, the relative read count in FSW is higher compared to the other groups in the microbiota data. This result further supports the hypothesis that direct detection of pathogenic bacteria in cervical swabs is not a good predictor for infections of the upper female genital tract. However, it would be interesting to see whether changes in relative read count upon acute infection might indicate a more severe disturbance of the physiological community and represent a risk factor for more severe disease. This hypothesis is supported by the finding that diverse bacterial communities are associated with C. trachomatis transmission [10].
The primers we used amplifying the V3/V4 regions were shown to be a good option with a high overall coverage for bacterial sequences (89.0%) with the lowest number of undetectable phyla of all primer pairs used in 16S microbiota assessment using 16S rRNA gene sequencing projects [34]. Notably, the typically undetected phyla for this primer pair are not members of the vaginal microbiota. This is a major advantage over the V1/V2 region, which was used in several key vaginal microbiota studies [6,35], as this primer combination at best detects a limited amount of Bifidobacteriales (which includes the genus Gardnerella) and Chlamydiales [36], which are important in vaginal health and disease and play a prominent role in our urogenital community data.
We further analyzed whether the frequency of sexual intercourse has an impact on the microbial diversity, as an increase in sexual intercourse was proposed to induce a week negative effect on the stability of microbial communities [35]. It was reported in a group of African females that recent sexual exposure negatively affects the presence of Lactobacillus sp. and that having more than one sexual partner in the last three months significantly increases the prevalence of detecting G. vaginalis and L. iners [37]. In an observational study, Fethers at al. investigated the association between sexual risk factors and the occurrence of BV as determined by conventional techniques like the Amsel or a Gram stain method [38]. They observed that the epidemiological profile of BV was similar to that of established sexually transmitted infections. Bacterial vaginosis could also be linked with viral sexually transmitted infections (including HPV) in a univariate analysis. However, after controlling for sexual behaviour, the association was only significant for HSV1/HSV2 [39]. Although we observe a significant contribution of sexual activity to alpha diversity, the effect of increasing sexual intercourse is rather small and does not explain the variability in the data set.
Taken together, our data show that single parameters alone are not useful for distinguishing between infectious-and noninfectious infertility. Therefore we established a model for the prediction of ININF using binary logistic regression. The combination of several parameters, including microbiota sequencing data from cervical swabs significantly enhances the predictive power and discriminates all females suffering from noninfectious infertility. Further studies will be necessary on a larger scale to validate these findings and further refine our model in a clinical setting. Enhancing the number of ININF cases in the model building in future will increase its stability. Our results also highlight the potential of modern sequencing technologies to optimize classical microbiological diagnostic traits.  Table. Pearson´s product-moment correlation using Shannon´s diversity index as response and frequency of sexual intercourse/month as predictor. (PPTX) S3 Table. Prediction of the diagnosis "infectious infertility" using binary logististic regression. The number of positive predictions is given for different combinations of predictors in the groups fertile, nININF and ININF. Total accuracy and McFadden R 2 are displayed. (PPTX)