Whole Blood Gene Expression Profile Associated with Spontaneous Preterm Birth in Women with Threatened Preterm Labor

Threatened preterm labor (TPTL) is defined as persistent premature uterine contractions between 20 and 37 weeks of gestation and is the most common condition that requires hospitalization during pregnancy. Most of these TPTL women continue their pregnancies to term while only an estimated 5% will deliver a premature baby within ten days. The aim of this work was to study differential whole blood gene expression associated with spontaneous preterm birth (sPTB) within 48 hours of hospital admission. Peripheral blood was collected at point of hospital admission from 154 women with TPTL before any medical treatment. Microarrays were utilized to investigate differential whole blood gene expression between TPTL women who did (n = 48) or did not have a sPTB (n = 106) within 48 hours of admission. Total leukocyte and neutrophil counts were significantly higher (35% and 41% respectively) in women who had sPTB than women who did not deliver within 48 hours (p<0.001). Fetal fibronectin (fFN) test was performed on 62 women. There was no difference in the urine, vaginal and placental microbiology and histopathology reports between the two groups of women. There were 469 significant differentially expressed genes (FDR<0.05); 28 differentially expressed genes were chosen for microarray validation using qRT-PCR and 20 out of 28 genes were successfully validated (p<0.05). An optimal random forest classifier model to predict sPTB was achieved using the top nine differentially expressed genes coupled with peripheral clinical blood data (sensitivity 70.8%, specificity 75.5%). These differentially expressed genes may further elucidate the underlying mechanisms of sPTB and pave the way for future systems biology studies to predict sPTB.


Introduction
Preterm birth (PTB; birth at ,37 weeks of gestation) occurs in about 8-11% of pregnancies worldwide and remains the main cause of perinatal mortality and morbidity in the developed world [1]. Medical advances have increased the survival rates of premature babies; however, premature infants remain vulnerable to disabilities such as respiratory disorders, cognitive impairment, blindness, deafness etc. [2]. In later life, they may face complications such as motor and sensory impairment, learning difficulties and behavioral issues. Prematurity leads to an immediate and long term emotional and financial burden to families, communities and the health care system [3][4][5][6][7].
Threatened preterm labor (TPTL) is defined as persistent premature uterine contractions between 20 and 37 weeks of gestation and may include other symptoms such as pelvic pressure, backache, increased vaginal discharge, menstrual-like cramps, bleeding/show and shortened cervix [8][9][10][11]. Treatment of TPTL involves administration of tocolytic agents to temporarily inhibit uterine contractions and prolong the pregnancy up to 48 hours. This 48 hour window serves to achieve both the benefits of corticosteroid administration on fetal lung maturation as well as the ability to transport the woman to a tertiary hospital with advanced neonatal facilities [12,13].
Generally, labor contractions in the majority of TPTL women will cease and they often continue their pregnancies to term [14]; while approximately 5% of these women will progress into true PTL and deliver a premature baby within 10 days [15]. Thus, women in ''false labor'' are subjected to unnecessary hospitalization, medical intervention, psychologic stress, exposed to drug side effects and contribute to healthcare costs [16,17]. For example, although corticosteroids augment fetal lung maturity and reduce respiratory distress syndrome, intraventricular hemorrhage and neonatal mortality in premature infants [18], their longer term side effects remain unclear [19][20][21][22].
There is limited success for clinicians to accurately stratify TPTL women into ''true PTL'' or ''false labor'' at the point of hospital admission. Currently, predicting PTB involves the assessment of clinical risk factors [23], detecting a short cervix (,15 mm) [1,24] and the presence of fetal fibronectin (fFN) in the cervicovaginal fluid [8,25]. fFN testing is clinically useful for its high negative predictive value (NPV) [15]. However, fFN cannot be performed on women who had prior vaginal/cervical examination, unprotected sexual intercourse, antepartum hemorrhage or ruptured fetal membranes as these factors can influence the fFN test results [26]. As a result, about 50% of TPTL women are generally ineligible for fFN testing.
The ability to predict PTB at point of hospital admission would provide clinicians the opportunity to focus therapeutic interventions on women who are more likely to deliver within the next 48 hours, whilst women in ''false labor'' can be offered supportive care and discharged. Therefore, a new diagnostic test that predicts PTB, using an easily accessible biological fluid (e.g. blood) and can be performed on all TPTL patients will be beneficial. This capability will allow a more rational approach to manage TPTL and provide considerable cost savings to the healthcare system. Recently, Chim and colleagues utilized gene expression microarray to identify placental genes associated with spontaneous PTB (sPTB) and subsequently screened for placental RNA transcripts and microRNAs in the maternal plasma in an attempt to discover novel biomarkers associated with sPTB [27].
Hence, this study utilized microarrays to characterize whole blood gene expression in women with TPTL. The specific hypotheses were: 1) women in TPTL who progress to true PTL and had a sPTB within 48 hours have a different gene expression profile compared with women who did not deliver within 48 hours, and 2) a genomic expression signature can predict sPTB within 48 hours in TPTL women. We obtained a nine gene signature coupled with peripheral clinical blood data to predict sPTB with 70.8% sensitivity and 75.5% specificity. Differentially expressed genes and their respective proteins may further elucidate the underlying mechanisms of sPTB and pave the way for the development of more precise and targeted therapies.

Patient recruitment and ethics statement
The study was approved by the Ethics Committee, Department of Health, Government of Western Australia (EC05-34.3). Participants provided written consent; consent procedure was also protocol and for comparison with microarray data. High vaginal swabs and urine were collected from each participant for standard hospital bacteria vaginosis screening, microscopy, culture and sensitivities as part of their routine clinical care. Women who delivered preterm at our hospital had their placentae sent for microscopy, culture and formal histopathology. Clinical microbiology assessment of vagina, urine and placenta cultures and placenta histopathology were reported where available. The lack of placenta reports in women who did not deliver within 48 hours was either due to them subsequently delivering at another hospital or histology was not routinely performed on term placentas. Participants were subsequently stratified into two groups: sPTB within 48 hours of hospital admission and no sPTB within 48 hours.

Blood sample collection, RNA extraction and quality check
Maternal blood samples were collected at point of hospital admission prior to any medical treatment. Five PAXgene blood collection tubes (PreAnalytics, Hombrechtikon, Switzerland) were collected from each participant, total RNA was isolated using PAXgene Blood RNA system kit (QIAGEN, Doncaster, Victoria, Australia). RNA quality check was done at The Centre for Applied Genomics (TCAG; The Hospital for Sick Children (SickKids), Toronto, ON, Canada) using an Agilent 2100 BioAnalyser with the RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, CA). The bioanalyser provides a RNA integrity number (RIN) to gauge RNA integrity, compare samples and ensure the repeatability of experiments. RIN is calculated using an algorithm and the bioanalyzer's electrophoretic trace where a RIN score of one represents strongly degraded RNA and a score of 10 represents intact RNA [28]. Adhering to TCAG's protocol, microarrays were only performed on samples with RIN greater than six.

Microarray
Full microarray experiments were performed by TCAG. Total RNA samples were hybridized to Affymetrix Human Genome U133 Plus 2.0 arrays (Affymetrix, Santa Clara, CA). The globin reduction protocol was incorporated into microarray processing [29]. Microarray data have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE46510: (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE46510). Clinical data associated with the microarray data are presented in Table S1 in File S1.

Differential gene expression analysis
Affymetrix U133 Plus2 GeneChip CEL files were normalized using the GeneChip Robust Multiarray Average (GCRMA) (Bioconductor, R). Custom (Gene)Chip Definition Files (CDFs) for Entrez Gene (version 17) were used to map Affymetrix GeneChip probes to transcripts/exons/genes for specific databases [30]. Differential gene expressions were analyzed using Limma (Bioconductor, R) [31]. To correct for multiple hypotheses testing, significant differentially expressed genes were identified based on a false discovery rate (FDR) threshold of ,0.05 using the Benjamini and Hochberg approach. Fold changes were calculated using median values and expressed as logarithm base 2 (Log 2 ).
Gene Ontology (GO) Slim annotations were obtained for significant genes (Limma FDR,0.05) [32,33]. Reactome Functional Interaction Cytoscape plug-in (version 4.0 beta) was used to visualize community networks and to determine if differently expressed genes (passing Limma FDR threshold of ,0.05) may be enriched and form clusters of interconnected molecular events or 'reactions' associated with sPTB within 48 hours [34,35]. Representative GO Slim terms were obtained for each cluster. Enrichment analyses using FuncAssociate 2.0 were performed separately on genes passing Limma FDR threshold of ,0.1 that were either up or down regulated [36]; and pathway analyses were performed using DAVID Bioinformatics Resources 6.7 (BioCarta, KEGG and Panther) with genes passing Limma FDR threshold of ,0.1 [37]. For both enrichment and pathway analyses, the total number of genes observed in the microarray (n = 19,008) was used as the background. A less stringent FDR of ,0.1 allows more genes to be included in the analyses.

Quantitative real time-PCR
Quantitative real-time PCR (qRT-PCR) was performed to validate a subset of significant genes that displayed a Log 2 fold change of 60.6 (i.e. $50% increase or $34% decrease in women who delivered within 48 hours) and a minimum average microarray intensity expression of .4 (CEL files arbitrary Microbiology not performed 3 1 Decidual vasculopathy (total n) 2 1 Ureaplasma spp. or Mycoplasma hominis or GBS positive 2 1 Ischaemia (total n) 1

Statistical analysis
Demographics, peripheral clinical blood data and microbiology data comparison between women who did or did not have a sPTB within 48 hours were done using the Mann-Whitney U test and Fisher's exact test; correlation was performed using Pearson's r (Statistical Package for Social Sciences version 17.0, SPSS Inc, Chicago, IL). qRT-PCR data analyses were performed using binary logistic regression (LogXact 10, Cytel Inc, Cambridge, MA). The dot plots were produced using GraphPad Prism v5.02 (GraphPad Software, San Diego, CA).

Random forest classifier
A random forest classifier model was constructed to differentiate between women who had a sPTB within 48 hours and those who did not. The dataset consists only of numeric values and missing values were imputed with the median across the samples. The model was built using the random forest implementation (randomForest version 4.6-7, R). Leave-one-out cross validation was performed to provide out-of-bag evaluation of the classifier's performance. Receiver Operating Characteristics (ROC) curve was used to assess prediction performance.

Results
Two hundred out of 300 samples were retrospectively selected based on delivery outcome and re-examined for eligibility adhering to our inclusion and exclusion criteria; 178 samples were confirmed for eligibility and sent to the microarray facility. Twenty four samples were unfavorable post RNA quality check or post microarray procedure (chip defect/error) thus resulting in 154 samples for final data analysis (Table 1). There were 48 women who had a sPTB within 48 hours of hospital admission and 106 women who did not. Gestational age at presentation was significantly different between women who had or did not have  a sPTB within 48 hours. Figure 1 displays the gestational age at presentation of women in the microarray and qRT-PCR studies, respectively. Peripheral blood clinical laboratory results were obtained from 125 women. Total leukocyte and neutrophil counts were 34.7% and 40.8% higher in women who had sPTB within 48 hours of hospital admission (p,0.001). There were no differences in urine culture, vaginal microbiology as well as placental histopathology and microbiology assessment between the two groups (p.0.05; Identification of differentially expressed genes using microarray As gestational age at presentation was significantly different between women who did or did not have a sPTB within 48 hours, this suggested that gene expression in whole blood may be gestational age dependent. We thus analyzed the microarray data by pairing matched gestational age samples using a modified Limma design matrix (moderated paired t-test). Samples were rounded down to the nearest gestational week and then grouped according to their gestational age at presentation. Data analysis using Limma obtained 469 significantly differentially expressed genes (256 up-regulated genes, 213 down-regulated genes, FDR, 0.05; Table S2 in File S1). Table 3 displays the top 20 up and down regulated genes (ranked by Log 2 fold change). Of the 469 significant genes, 176 genes were confidently mapped to relevant GO Slim terms, 254 genes were reported as ambiguous and 39 genes were not annotated [32] (Figure 2).

Enrichment analyses
From the Reactome network constructed around the 469 differentially expressed genes (Limma FDR,0.05), eight major clusters with minimum five genes were discovered. These clusters generally represent metabolic process, response to stress, immune system process and signal transduction (Figure 3). A total of 1333 genes (Limma FDR,0.1) were subjected to functional enrichment analysis. Enrichment analyses found 14 functions (odds ratio .2) significantly enriched in 570 up-regulated genes and 17 functions (odds ratio .2) significantly enriched in 763 down-regulated genes ( Table 4). No significant pathway analysis was obtained using DAVID.

Validation of microarray using qRT-PCR analysis
Twenty eight differentially expressed genes (60.6 Log 2 fold change and $4 microarray intensity expression) were chosen for microarray data validation using qRT-PCR. A total of ninety six samples (a subset of the 154 women) were selected for qRT-PCR validation: 41 samples from women who had sPTB within 48 hours were used and 55 samples were randomly chosen from the 106 women who did not deliver within 48 hours. After adjusting for gestational age at presentation, qRT-PCR confirmed the differential expression of 20 genes (p,0.05) and the direction of change in all 28 genes were 100% consistent with the microarray analysis (Table S3 in File S1). There was a significant correlation between microarray and qRT-PCR data (Pearson's r = 0.956, p, 0.001).

Generation of models to predict sPTB within 48 hours
Optimal random forest classifier model performances was achieved using the top nine differentially expressed genes (ZDHHC19, HPGD, GPR84, OPLAH, METTL18, TDRD9, ATP9A, GALNT14 and GOLGA8A; ranked by decreasing magnitude of fold change, Limma FDR ,0.05) with 100 decision trees. We also evaluated whether the inclusion of the five peripheral clinical blood data (total leukocyte, neutrophil, lymphocyte and monocyte counts and hemoglobin levels) in the model improved prediction. Table 5 displays the random forest classifier performances generated with 154 samples using the top nine genes with and without clinical blood data ( Figure 4A). A representative heat map of the top 50 genes demonstrating the highest fold changes (Limma FDR ,0.05) is shown in Figure 5.

Comparison of gene signatures and fFN
To obtain an unbiased performance comparison between the microarray genomic signature and fFN, separate classifier models were constructed on the subset of patients who had corresponding fFN test results (n = 62). Table 5 also displays the comparison of the predictive efficacies of the random forest classifier models (using top nine genes, with and without clinical blood data, and with and without fFN test results included in the model) and the stand-alone fFN test to predict sPTB within 48 hours in 62 women ( Figure 4B).

Discussion
This study has utilized microarrays to provide novel genomic insights of sPTB and our Limma analysis revealed 469 differentially expressed genes associated with sPTB within 48 hours in women with TPTL. PTB is a heterogeneous condition with varying etiologies. We accounted for known causes of PTB with our exclusion criteria and attempted to discover a blood-based gene signature to predict imminent idiopathic sPTB. Labor is an inflammatory process with elevated levels of maternal circulating leukocytes [41,42] and increased leukocyte infiltration into the myometrium, decidua and cervix before and during labor [43][44][45]. Thus, it was unsurprising to observe elevated whole blood mRNA expression of cytokines (IL-1beta, OSM), cytokine/chemokine receptors (IL-4R, IL10RB, TNFRSF10D, LTBR, CCR1, CXCR1), Table 5. Predicting spontaneous preterm birth within 48 hours in 154 women using the top 9 genes (with and without peripheral clinical blood data) and comparing the predictive efficacies of fetal fibronectin and the random forest classifier models (top 9 genes, with or without peripheral clinical blood data and fFN) to predict spontaneous preterm birth within 48 hours in 62 women.  genes involved in cytokine signaling pathways (JAK3, TNIP2, SOCS3) and related transcriptional factors (NFKB1, JUNB) in our Limma differential analysis as well as inflammation-related clusters (Reactome) and enriched GO terms (FuncAssociate) in women who had sPTB within 48 hours. This suggests that a repertoire of inflammatory markers in the blood are differentially expressed 48 hours prior to labor and associated with impending delivery.
There is a wealth of information linking infection and PTB (reviewed in [4,46,47]). Despite our attempt to study idiopathic sPTB, these participants may have undiagnosed sub-clinical infection. Thus, the differential expression of infection-or inflammation-related genes in our study could be attributed to a combination of infection and labor. For example, inflammatoryrelated genes and their respective proteins such as NFKB1 [48], CCR1 [49], SOCS3, JAK3 [50], JUNB [51], RHOG [52], TIMP1 [53], IL1beta [54][55][56][57], ITGAM [58], CD44 [59,60], TLR5 [61][62][63] and CXCR1 [64,65] from our Reactome analysis are known to be up-regulated in reproductive tissues and other biologic fluids during term labor or PTL with or without infection. CD63 was increased in our study but fetal blood CD63 was not associated with sPTB [66]. Whole blood is widely studied [67,68] because it serves as reservoir that collects information from various physiologic processes such as immune response and cell-cell communication. It can be assumed that differentially expressed whole blood mRNAs are in part, contributed by activated maternal peripheral leukocytes with impending labor [69,70]. We observed an increase in peripheral blood leukocyte counts in women who delivered within 48 hours and this increase appears to be mainly contributed by neutrophils. This phenomenon agrees with the study by Campbell et al where women with TPTL without infection had increased leukocyte counts [71] and supports previous studies which reported increased circulating maternal leukocytes before and during labor [41,42,[72][73][74][75]. Thus, our study has identified novel genes, most likely originating from leukocytes, which may serve as therapeutic targets for sPTB prevention.
We decided to focus our discussion on the top three differentially expressed genes, ZDHHC19, HPGD and GPR84. DHHC19 (ZDHHC19; 2.3-fold increase) belongs to a family of palmitoyltransferases/acyltransferases that catalyzes post-translational attachment of palmitic acid groups onto cysteine residues via thioester linkage (i.e. palmitoylation) on newly synthesized protein substrates, thereby enhancing the association of these palmitoylated proteins with lipid bilayers of the plasma membrane and transport vesicles [76,77]. Protein palmitoylation is essential for a diverse range of cellular processes such as protein-protein interactions, protein trafficking, stability, degradation and intracellular localization [76,78]. Palmitoylation by DHHC19 increased R-Ras association with plasma membranes and lipid rafts in COS7 cells [77] as well as enhanced phosphodiesterase 10A2 (PDE10A2) binding to plasma membranes in HEK293 cells and mouse primary striatal neurons [79]. The up-regulation of ZDHHC19 in women who delivered within 48 hours could be linked to protein palmitoylation and the regulation of signaling pathways involving R-Ras and PDE10A2 in peripheral leukocytes. HPGD demonstrated a 2.1-fold increase in women who had a sPTB within 48 hours. HPGD catabolizes prostaglandin into inactive keto-metabolites. HPGD is expressed in reproductive tissues, especially in the myometrium [80], chorion [81] and cervix [82]. HPGD is significantly decreased during term and PTL [80][81][82][83], thus promoting the effect of prostaglandin during labor [84,85]. The increase in whole blood HPGD mRNA may be a homeostatic response to the elevated concentration of prostaglandin associated with impending labor onset [86,87].
GPR84 (2.0-fold increase) is a G protein-coupled receptor highly expressed in immune cells such as microglia, monocytes and neutrophils [88][89][90][91]. It is activated by medium-chain (C 9 -C 14 ) free fatty acids (FFAs) such as capric, undecanoic and lauric acids [89] but not medium-chain triglycerides [92]. Hyper production of IL-4 occurs in GPR84 2/2 activated T-cells displaying T helper 2 phenotype [90]. Lipopolysaccharide (LPS) increases GPR84 expression in monocytes and microglia [88,89]. In addition, medium-chain FFAs can enhance LPS-stimulated secretion of the pro-inflammatory cytokine, IL-12 p40, via GPR84 in monocytes [89] while TNF and IL-1 can also increase GPR84 mRNA expression in microglia [88]. The increase in GPR84 mRNA in our study could be contributed by leukocytes which are stimulated by IL-1 and TNF in the serum [93,94], or suggests the potential novel involvement of medium-chain FFAs in signaling pathways and immunologic regulation prior to and during labor.
From our bioinformatics modeling of all 154 women, a set of nine genes coupled with clinical blood data could classify women who would or would not have a sPTB within 48 hours of hospital admission with 70.8% sensitivity and 75.5% specificity. The next step was to benchmark the performance of the stand-alone fFN test and our nine gene signature. fFN was only performed on 62 eligible women. When we modeled data from these 62 women, the nine genes coupled with clinical blood data was the best performing model with the highest area under ROC curve. It had improved sensitivity (91.7% vs 83.3%) and specificity (88.0% vs 66.0%) compared with the stand-alone fFN test. The effect of fFN was generally minimal (identical or lower area under the ROC curves) when fFN was included as a feature in the classifier models. The minimal contribution of fFN in the models may be because whole blood mRNA and fFN represent two fundamentally different elements of the parturition process -fFN indicates a disruption of the maternal-fetal interface and activated leukocytes contribute to the process of labor. However, it was also interesting to note that the addition of fFN into the model shifted the performance of the original model (nine genes and clinical blood data) from being a highly sensitive test to a more specific test (nine genes, clinical blood data and fFN) to predict sPTB. This influence may be attributed to the high specificity (NPV) of the fFN test [15].
The heterogeneity of sPTB and the inability to identify sub clinical patients may explain the small fold changes observed in this study and why no significant pathway analysis was obtained. This is a common challenge in PTL discovery or diagnostic studies. Moreover, our findings are limited to our institution with a homogenous population and therefore, the generalizability of our gene signatures is limited. Nevertheless, this gene expression study provides a broad overview of the mechanisms of sPTB and generates future hypotheses to investigate molecular interactions of various cell types (e.g. leukocytes) and reproductive tissues (e.g. placenta and myometrium) associated with sPTB. The predictive efficacy of our nine gene signature coupled with clinical blood data outperformed the fFN test and highlights the advantage of utilizing a blood-based diagnostic test for sPTB where all women could be tested. This study may lead the way for a blood-based systems biology approach to sPTB in the future.

Supporting Information
File S1 Contains the following files: Table S1. Clinical data of 154 women (GEO microarray submission). Table S2. The list of 469 significant differentially expressed genes obtained using Limma ranked by magnitude of fold change. Table S3. Microarray and quantitative real time-PCR of the selected 28 genes. (DOCX)