Urinary Microbiota Associated with Preterm Birth: Results from the Conditions Affecting Neurocognitive Development and Learning in Early Childhood (CANDLE) Study

Preterm birth (PTB) is the leading cause of infant morbidity and mortality. Genitourinary infection is implicated in the initiation of spontaneous PTB; however, examination of the urinary microbiota in relation to preterm delivery using next-generation sequencing technologies is lacking. In a case-control study nested within the Conditions Affecting Neurocognitive Development and Learning in Early Childhood (CANDLE) study, we examined associations between the urinary microbiota and PTB. A total of 49 cases (delivery < 37 weeks gestation) and 48 controls (delivery ≥ 37 weeks gestation) balanced on health insurance type were included in the present analysis. Illumina sequencing of the 16S rRNA gene V4 region was performed on urine samples collected during the second trimester. We observed no difference in taxa richness, evenness, or community composition between cases and controls or for gestational age modeled as a continuous variable. Operational taxonomic units (OTUs) classified to Prevotella, Sutterella, L. iners, Blautia, Kocuria, Lachnospiraceae, and S.marcescens were enriched among cases (FDR corrected p≤ 0.05). A urinary microbiota clustering partition dominated by S. marcescens was also associated with PTB (OR = 3.97, 95% CI: 1.19–13.24). These data suggest a limited role for the urinary microbiota in PTB when measured during the second trimester by 16S rRNA gene sequencing. The enrichment among cases in several organisms previously reported to be associated with genitourinary pathology requires confirmation in future studies to rule out the potential for false positive findings.


Introduction
Preterm birth (PTB) is the leading cause of infant morbidity and mortality [1], [2] and occurs in approximately 11% of all live births. [2][3][4] Microbial infection of the genitourinary and reproductive tract during pregnancy is thought to be an initiating factor in spontaneous PTB. Intrauterine infections, bacterial vaginosis (BV), urinary tract infections (UTI), and maternal systemic infections have been associated with preterm delivery. [5][6][7][8] A prevailing theory postulates that pathogenic organisms of the lower genital tract migrate to the fetal membranes, and subsequently into the amniotic fluid, invoking an inflammatory response that results in the initiation of preterm labor. [9][10][11] However, recent reports have also suggested hematogenus transmission of microbes to the amniotic fluid providing a plausible mechanism for systematic infection in PTB. [12][13][14][15] Numerous organisms including Ureaplasma spp., G. vaginalis, and Mycoplasma spp. have been found in the amniotic tissue or fluid of preterm samples by culture or PCR. [16] Culturedependent approaches have also identified BV-associated species including Prevotella spp., G vaginalis, and Peptostreptococcus spp. to be associated with preterm delivery. [16] However, these methods are limited to identifying only a fraction of the bacterial diversity of female genitourinary and reproductive tract. Moreover, the disappointing results of antibiotic treatment for the prevention of PTB [17] may, in part, be related to the lack of comprehensive assessment of pathogenic microbes associated with preterm delivery. The increased accessibility and reduced cost of next-generation sequencing (NGS) platforms now provides the opportunity to examine unculturable and rare organisms and to assess microbial community composition. Culture-independent investigations of the cervicovaginal microbiota have reported PTB to be associated with increased α-diversity, as well as a high diversity, Lactobacillus poor community state type; although, results have been inconsistent. [11,[18][19][20][21] Recent efforts using culture-independent approaches have shown high diversity and number of operational taxonomic units (OTUs), as well as the presence of opportunistic, anaerobic bacteria associated with female urogenital pathology in the urine of asymptomatic women. [22][23][24][25][26][27] In particular, Lactobacillus, Prevotella and Gardnerella organisms have been shown to be dominant genera of the female urinary microbiota. [23] Thus, the use of NGS technologies may provide for novel insight into the role of the female urinary microbiota in the etiology of PTB by allowing for the examination of large numbers of previously unobservable organisms.
For the present nested case-control study, we examined whether bacterial diversity and community composition identified from 16S rRNA gene sequencing of urine collected during the second trimester were associated with the risk of PTB among women participating in the Conditions Affecting Neurocognitive Development and Learning in Early Childhood (CAN-DLE) study. We also examined whether specific OTUs generated from percent similarity and entropy-based partitioning methods or the inferred metagenome were associated with delivery status.

Study Population
The Urban Child Institute's Conditions Affecting Neurocognitive Development and Learning in Early childhood (CANDLE) project was designed to provide insights into the biological and environmental factors that influence development during the first years of life. The CANDLE project consists of a racially and socioeconomically diverse cohort of approximately 1,500 mother-child dyads. [28] Briefly, between December 2006 and July 2011, women in the second trimester of a singleton pregnancy; residing in Memphis and Shelby County, TN; age 16-40 years at enrollment; and intending to deliver at a participating hospital were eligible for participation. Exclusion criteria were existing chronic disease requiring medical treatment or pregnancy complications including premature rupture or prolapse of membranes, placenta previa, or oligohydraminios prior to enrollment. A total of 135 mothers (9%) experienced uninduced preterm deliveries (<37 weeks). A random subset (n = 50) were selected for the present analysis. An additional 50 mothers enrolled in the CANDLE study who delivered at term (>37 weeks) and were balanced on health insurance type as a proxy for socioeconomic status were randomly selected as controls. Written informed consent was obtained from participants 18 years of age. Assent and parental or guardian written consent were obtained from all participants less than 18 years of age. The study protocol and informed consent process was approved by the institutional review board of the University of Tennessee Health Sciences Center.

Data and Sample Collection
Demographic, health, and pre-pregnancy anthropometric information was collected via selfreported questionnaires at enrollment. Nurse coordinators abstracted information concerning labor and delivery from the medical records. Maternal urine (60mL) specimens were collected into 120cc plastic containers, aliquoted, and kept cold (2-4°C) and frozen (-70°C) within 1 hour of collection. Participants were provided with Benzalkonium (BZK) wipes and instructed to wipe from front to back and then collect the urine. 2mL aliquots were mailed on dry ice to the laboratory of Piyathilake and kept frozen at -70°C until microbial DNA was isolated.

DNA Extraction, Amplicon Library Preparation, and Sequencing
Microbial DNA was isolated using the Fecal DNA isolation kit from Zymo Research (Irvine, CA). The V4 region of the 16S rRNA gene (515F/806R) was amplified from extracted DNA using the protocol and region-specific PCR primers described by Caparoso et al. [29] PCR products were purified by gel electrophoresis and quantitated using the Picogreen (Invitrogen) dsDNA quantitation assay prior to sequencing. Paired end sequencing (2 x 250) was performed on the Illumina MiSeq platform (Illumina Inc., San Diego, CA).

Data Processing
Sequence reads were quality filtered and processed using an integrated, high-throughput 16S rDNA analysis pipeline built on the QIIME v1.8 tool suite. [30] Read filtering and processing included: initial quality check of raw sequencing data (FASTQC) [31], quality filtering (FAS-TX-Toolkit) [32], merging of paired-end reads (QIIME), chimera filtering (UCHIME) [33], cluster de novo OTUs at 97% similarity (UCLUST) [34], assign taxonomy (RDP Classifier) [35] using the Greengenes v13.8 reference database [36], align sequences (PyNAST) [37] and construct a phylogenetic tree (FastTree). [38] Filtered OTU tables were obtained by removing OTUs with an abundance < 0.005% [39] and subsampling to 20k reads; resulting in the exclusion of n = 3 participants due to low read counts. Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) [40] was used to infer metagenome functional gene content from reference-assisted, 16S-derived community composition. Measures of alpha-diversity (observed richness, Shannon Index [41] and Faith's phylogenetic diversity [42]) were obtained using QIIME. Beta-diversity metrics (UniFrac [43], Bray-Curtis [44], Jaccard [42], Morisita-Horn [45]) were calculated as implemented in the R package vegan. [46] Minimum Entropy Decomposition (MED) was also used to partition sequence reads into MED nodes. [47] MED uses Shannon entropy to iteratively partition sequence reads based on information-rich (high entropy) positions while accounting for stochastic variation via abundance filtering. Resolution to a single nucleotide is possible with MED and was chosen to allow for discrimination between closely related organisms while limiting artificial variation. MED was applied to the merged reads after filtering those < 200bp in length and minimum quality score < 25. MED partitioning was performed using the default settings with the-M noise filter set to 385 (7,697,666/20k) resulting in 17% of reads filtered and 810 MED nodes. Taxonomic assignment of MED nodes was performed by manual BLAST against the NCBI reference database.

Statistical Analysis
The first objective was to compare cases and controls on αand β-diversity. Differences in demographic and clinical characteristics were assessed by t-tests or Wilcoxon rank-sum tests for continuous variables and χ2 tests for categorical variables. We compared α-diversity metrics using the Wilcoxon rank-sum test with differences in β-diversity examined by non-metric dimensional scaling (NMDS) and tested by permutational ANOVA. [46] Differences in the PICRUSt imputed metagenome were also examined by NMDS and tested by permutational ANOVA. Dirichlet multinomial models were used to test for differences in phyla-and genuslevel phylotypes with expected frequencies under the Dirichlet multinomial model obtained using the HMP package in R and tested by the χ2 test of location. [48] The second objective was to evaluate associations for OTUs and MED nodes with PTB. We used negative-binomial regression as implemented in the DESeq2 [49] and package in R and phyloseq interface [50] to test for differences between cases and controls. Models were fit to the data without subsampling. Multivariable models and included adjustment for covariates as described below. Log2-fold changes for OTUs and MED nodes with false discovery rate (FDR) corrected p 0.05 are reported.
Our third objective was to assess whether we could identify naturally occurring clusters of participants based on the co-occurrence of abundant organisms in the urinary microbiota and assess their association with PTB. A Dirichlet multinomial mixture (DMM) model was used to cluster samples into partitions. [51] The DMM model as implemented in Mothur v1.35 [52] was fit to the most abundant (n = 20) genus-level phylotypes to improve clustering. Optimal partition number was chosen based on minimizing the Laplace approximation to the negative log model evidence. The DMM provides an evidence-based approach to the estimation of metacommunities, as well as parameters for the weight (i.e., relative frequency, π) and precision (i.e., a measure inversely related to the degree of dispersion with respect to OTU composition, θ) of each partition. Odds ratios (OR) and 95% confidence intervals (CI) for preterm delivery according to urinary microbiota clustering partitions were calculated using logistic regression. OR were calculated by contrasting each partition to samples collected from all other women. Multivariable models included adjustment for maternal age, body mass index, weight gain during pregnancy, race (black vs. non-black), income (<$20k/y, $20-$45k/y, >$5k/y), education (<high school, high school or technical college, college graduate), abnormal vaginal discharge at delivery (yes/no), Group B streptococcus infection at delivery (yes/no) and the history of PTB (yes/no). Multiple imputation (n = 30 datasets) was performed for missing covariate data using the Amelia II package [53] in R.

Participant Characteristics
Mean maternal age at delivery was 25.7 (±5.2) years, 62.9% identified as non-Hispanic black, 39.2% were college graduates, and 35.1% reported a maternal income exceeding $45k/annually (Table 1). Mean parity was 2.5 (±1.4), pre-pregnancy body mass index was 27.1 (±7.6), and weight gain during pregnancy was 14.5 (±6.9) kg. Cases were older than controls (cases = 26.8 y, controls = 24.5 y; p = 0.03), but did not differ on other demographic or clinical characteristic examined. Notably, a history of PTB was more common among cases than among controls, but failed to reach statistical significance (p = 0.09). Infant gestational age at birth was 34.8 (±2.3) and 39.2 (±1.1) weeks for cases and controls, respectively (p<0.01).

Microbial Composition and PTB
Genus-level phylotypes are shown in Fig 1 according to delivery status. Lactobacillus, Serratia, Prevotella, Atopobium, and Gordonia organisms were among the most abundant genera detected. The Dirichlet parameter test did not support differences in the most abundant genera between cases and controls (p>0.82). Similar results were obtained for phyla (p>0.78, data not shown). Large numbers of reads were also classified to Bifidobacteriacea (13.4%), Caulobacteraceae (3.7%), and Oxalobacteraceae (2.4%) with the relative abundance similar for cases and controls. Observed OTU richness was 275 (IQR = 84) for cases and 280 (IQR = 103) for controls (p = 0.99, Fig 2, S1 Table). Additional α-diversity metrics did not support differences in richness or evenness (p> 0.88); nor did ordinations and permutational ANOVA performed on dissimilarity matrices obtained from UCLUST OTUs, MED nodes, or the imputed  Table). No associations were detected for αor β-diversity metrics in sub-analyses restricted to early (< 34 weeks; n = 13) and late (34-36 weeks, n = 36) PTB or for gestational age when modeled as a continuous variable (data not shown).

Associations for OTUs and MED Nodes
Differential abundance testing identified enrichment in both UCLUST OTUs and MED nodes mapping to Prevotella, Sutterella/Parasutterella, and Lachnospiraceae organisms in the urine of cases (FDR corrected p0.05, S3 and S4 Tables). Enrichment in Streptococcus, S. agalactiae, A. vaginae, and Shuttleworthia was observed in the urine of controls. Large inverse associations were seen for an abundant OTU/node classified to Shuttleworthia. There were also inconsistencies for partitioning approaches. MED identified L. iners and two nodes classified to Blautia (one to B. obeum) as enriched in cases. Interestingly, MED partitioning identified several small nodes classified to Gardnerella as enriched in cases, whereas this association was not seen for the UCLUST OTUs. Similarly, associations were observed for UCLUST OTUS, and not MED nodes, including a positive association for an OTU classified to Kocuria. A UCLUST OTU (log2fold = 0.76, SE = 0.26, FDR p = 0.05) and MED nodes (log2fold = 0.90, SE = 0.31, FDR p = 0.05) mapping to S.marcescens were also enriched in cases. A total of 34 MED nodes and 15 UCLUST OTUs were associated with delivery status after false discovery rate correction.

Urinary Microbiota Clustering Partitions and PTB
Four partitions were identified from the DMM model (Table 2). Partition 1 was the most prevalent, had the lowest clustering precision, and was uniquely characterized by the high proportion of reads classified to Bifidobacteriacea (24%), Prevotella (6%), and Shuttleworthia (6%) (S5 Table). Partition 2 had the highest clustering precision and was dominated by Lactobacillus (29%) followed by Bifidobacteriacea (10%) organisms. Partition 3 was small with low precision and uniquely characterized by the high proportion of reads classified to S. marcescens (26%).

Discussion
In this case-control analysis nested in the CANDLE cohort, we found no difference in urinary microbiota αor β-diversity between cases and controls. However, OTUs classified to Prevotella, Sutterella, L. iners, Blautia, Kocuria, Lachnospiraceae, and S.marcescens were enriched among cases delivering prior to 37 weeks gestation. A urinary microbiota clustering partition dominated by S. marcescens was also associated with PTB. Together, these results provide little support that the composition of the urinary microbiota in this cohort of predominantly non-Hispanic black women as measured by 16S rRNA gene sequencing during the second trimester is associated with PTB. Enrichment among cases in several organisms previously reported to be associated with genitourinary pathology is intriguing, but requires confirmation in future studies.
We observed a diverse microbiota in urine of mothers collected during the second trimester. This is consistent with previous reports demonstrating high diversity in the urine of asymptomatic, culture-negative women. [22][23][24][25][26][27] The relative abundance of dominant taxa and OTU richness in our sample were similar to those reported by Siddiqui et al. [23] despite differences in the study population, PCR primers, and sequencing technology. Differences in the relative abundance of Lactobacillus and Gardnerella may, in part, reflect differences in the abilities of primers to amplify Lactobacillus and the high number of unclassified Bifidobacteriaceae reads, many expected to be Gardnerella organisms, in our study. Our ability to compare these findings to culture or PCR-based studies implicating BV-associated bacteria in the etiology of PTB [16,54] is limited by the lack of robust species-level resolution inherent to 16S rRNA gene (i.e. amplicon) sequencing. Future efforts utilizing shotgun metagenomic sequencing will provide for further insight into associations for specific bacterial species. An extensive study of the female genital microbiota associated with PTB is currently underway as part of the Integrative Human Microbiome Project (iHMP) and expected to provide significant contributions in this area. [55] Using an unsupervised clustering approach, we identified four urinary microbiota partitions in a racially and socioeconomically diverse sample of women. Partition 3 was found to be uniquely dominated by S. marcescens and associated with PTB. S. marcescens is a non-endospore forming, rod-shaped, gram-negative bacteria and known human opportunistic pathogen causing UTIs, indwelling urinary catheter infections, and important source of hospital acquired infections. [56][57][58] It is also found in diverse environments, resistant to a wide range of antibiotics, and produces virulence factors associated with cell toxicity, host inflammatory response, Missing covariate data in adjusted models was multiply imputed (n = 30 datasets) using the Amelia II package in R. *π reflects proportion of samples clustered to the partition. θ reflects precision with small values reflecting greater variation.
and tissue penetration; [57,[59][60][61] providing a plausible mechanism by which it may influence the risk of PTB. DiGiulio and Callahan et al. [19] reported a high diversity, Lactobacillus poor vaginal community type identified using an unsupervised clustering approach to be associated with PTB. Interestingly, both the duration and proportion of time the microbiota were in this community state was associated with PTB; however, it also was the least stable requiring a frequent sampling interval to estimate the portion of time spent in the high diversity state. Conversely, no association with PTB was reported by Romero et al. [11] for similar vaginal community state types examined across pregnancy. Discrepancies across could be due, in part, to differences in the urinary and vaginal microbiota, choice of primers, sequencing technologies, and informatics/statistical approaches; and/or differences in racial/ethnic composition of participants given demonstrated differences in the vaginal microbial by race. [62] Several individual OTUs and MED nodes, including those mapping to S. marcescens, were also associated with PTB. L. iners has been shown to be a dominant member of the vaginal microbiota [62] and was found here to be enriched in preterm samples. In vitro studies have suggested L. iners may induce IL-8 secretion [63] moderating localized proinflammatory activity. Prevotella and Streptococcus spp. are commonly found in the vagina, potentially uropathogenic, associated with UTIs and BV, [64][65][66][67] and were more abundant in samples from mothers delivering preterm. This is in contrast to a recent study in African American women reporting the abundance of Prevotella to be inversely associated with risk of PTB. [21] Similarly, the lower vaginal microbiota diversity suggested for women delivering preterm by Nelson et al. [21] was not seen here. It is also noteworthy that OTUs/nodes classified to F. prausnitzii, Blautia, Ruminococcus, and Sutterella were also associated with PTB. These organisms are found in high abundance in human fecal samples and may suggest pathogenic transmission from the rectum to the genitourinary tract. [68,69] However, we cannot rule out that their presence does not reflect fecal contamination occurring during collection.
OTUs or MED nodes classified to Lactobacillus, Shuttleworthia, and A. vaginae were inversely related to PTB in our sample. Lactobacillus was the dominant genus observed in cases and controls; comprising approximately 25% of all sequence reads. Lactobacillus dominance is generally considered characteristic of a "healthy" vaginal microbiota with reductions, and relative increases in facultative or anaerobic organisms, associated with dysbiosis. [70] Given that no difference was seen between cases and controls for total Lactobacillus reads, this may represent a chance finding and requires confirmation in future studies.
PCR and standard urine culture are limited to identifying only a fraction of true bacterial diversity. NGS platforms now offer inexpensive, high throughput approaches to interrogate unculturable and rare organisms and their role in PTB. To the best of our knowledge, this study represents the first attempt to employ NGS sequencing to examine the urinary microbiota in relation to PTB and provides support that potential uropathogens can be detected in the urine of women delivering preterm. Collection of urine specimens is inexpensive and noninvasive and can be performed over the course of pregnancy facilitating the examination of changes in specific bacteria or community structure over time. In addition, the urine is an environment distinct from that of the vagina, cervix, and rectum and may serve as a valuable biomarker of urogenital microbial communities of potential importance to preterm delivery. Integrated profiling of the genitourinary (e.g. urine, vagina, cervix) and amniotic microbiota across pregnancy will be best suited to address the role of these communities in PTB.
Strengths of the current study include the racial and socioeconomic diversity of participants, access to detailed demographic and clinical information, use of 16S rRNA gene sequencing, and the relativity large number of preterm births. There were also limitations. First, the number of cases and controls when examined across levels of urinary microbiota clustering partitions was small contributing to imprecise estimates for associations with PTB. Second, this analysis was also limited to urine specimens collected during the second trimester. This precluded assessment of the urinary microbiota at other time points, the ability to examine associations for changes over time, and shifts in the microbiota arising from normal biologic processes or environmental factors such as antibiotic use to be captured. The collection of urine at different time points could also be a source of bias should the urinary microbiota change over the course of the second trimester and sampling be performed at different times for cases and controls. However, the vaginal microbiota has been shown stability in community composition and diversity when examined across pregnancy [19] and collection was performed similarly for cases and controls. Third, despite providing materials and instructions, there remains a risk of urine contamination during collection from bacteria in the surrounding environment. Fourth, the high variance seen for several of the urinary microbiota partitions reflects the inherent challenge of attempting to cluster high dimensional microbial communities to a lower dimensional space. Of primary interest in assigning samples to partitions was to assess whether we could identify clusters based on similarities in community composition that may better capture the co-occurrence of abundant taxa (i.e. reduce dimensionality). To the extent that the urinary microbiota is not comprised of distinct taxonomic subgroups, clustering is expected to result in partition heterogeneity, high variance, and reduced ability to detect associations in subsequent inferential tests. Despite these limitations, the results obtained here suggest the potential to identify subgroups of patients at increased risk of PTB. Future studies employing finer taxonomic profiling or shotgun metagenomics are expected to be better suited to identifying more homogenous subgroups of patients. Lastly, despite the advantages 16S rRNA gene sequencing provides over PCR and culture-based approaches, it is limited in the ability to discriminate between similar organisms (i.e. provide species or strain level resolution) and does not provide direct information on the metagenome or microbial gene expression in a given environment.

Conclusion
Our findings suggest that the composition of the urinary microbiota as measured during the second trimester of pregnancy is not associated with PTB. However, the enrichment of several organisms previously reported to be associated with genitourinary pathology in the urine of mothers delivering preterm warrants further investigation and confirmation. Whole genome sequencing and metatranscriptomic approaches are required to resolve associations for specific organisms at finer taxonomic levels and to assess microbial function in relation to PTB. Should these findings be reproduced they may suggest strategies for the targeting of specific organisms for the prevention of PTB.
Supporting Information S1 Table. Wilcoxon rank-sum test for differences in α-diversity metrics according to delivery status.