Developing a Gene Expression Model for Predicting Ventilator-Associated Pneumonia in Trauma Patients: A Pilot Study

Background Ventilator-associated pneumonia (VAP) carries significant mortality and morbidity. Predicting which patients will become infected could lead to measures to reduce the incidence of VAP. Methodology/Principal Findings The goal was to begin constructing a model for VAP prediction in critically-injured trauma patients, and to identify differentially expressed genes in patients who go on to develop VAP compared to similar patients who do not. Gene expression profiles of lipopolysaccharide stimulated blood cells in critically injured trauma patients that went on to develop ventilator-associated pneumonia (n = 10) was compared to those that never developed the infection (n = 10). Eight hundred and ten genes were differentially expressed between the two groups (ANOVA, P<0.05) and further analyzed by hierarchical clustering and principal component analysis. Functional analysis using Gene Ontology and KEGG classifications revealed enrichment in multiple categories including regulation of protein translation, regulation of protease activity, and response to bacterial infection. A logistic regression model was developed that accurately predicted critically-injured trauma patients that went on to develop VAP (VAP+) and those that did not (VAP−). Five genes (PIK3R3, ATP2A1, PI3, ADAM8, and HCN4) were common to all top 20 significant genes that were identified from all independent training sets in the cross validation. Hierarchical clustering using these five genes accurately categorized 95% of patients and PCA visualization demonstrated two discernable groups (VAP+ and VAP−). Conclusions/Significance A logistic regression model using cross-validation accurately predicted patients that developed ventilator-associated pneumonia and should now be tested on a larger cohort of trauma patients.


Introduction
Ventilator-associated pneumonia (VAP) is the most common serious infection in critically ill patients and results in significant morbidity, mortality, and health care costs [1]. Overall, 9-27% of mechanically ventilated patients develop VAP; however, trauma patients are at the highest risk [2]. Trauma-related risk factors for VAP have been identified, but a fundamental unanswered question is why some patients develop VAP while similar patients do not. A clinically useful tool to identify patients who are at risk for VAP would allow targeted prophylaxis. One method to determine which patients receive prophylaxis would be to establish the genetic profile that identifies patients more likely to develop VAP.
Previously, a number of single gene polymorphism studies have shown that over or under expression of immuno/inflammatory genes such as TNF-a, interleukin (IL)-1, IL-10, interferon gamma, and CD14 receptor are related to infection development [3].
However, none of these polymorphisms alone are sensitive or specific enough to be used to predict infections. It is highly unlikely that one polymorphism will be responsible for, or a suitable indicator of, infection risk.
Genome-wide screening approaches may be useful for identification of new genetic factors or gene expression profiles that are associated with infection development. cDNA microarrays can identify a broad range of differentially expressed genes in patients who develop infection compared to those who do not. These gene expression profiles may be used to predict infection risk. A focus on VAP is important because it is the most common serious infection in the intensive care unit (ICU). The purpose of this pilot study was to begin constructing a model for ventilator-associated pneumonia (VAP) prediction in critically-injured trauma patients, and to identify differentially expressed genes in patients who go on to develop VAP compared to similar patients who do not.

Patient enrollment
This study was performed at the Level 1 Presley Regional Trauma Center housed in the Regional Medical Center in Memphis, TN. Inclusion criteria were age 1 8-65 years, expected need for mechanical ventilation .6 days, and one of the following trauma-related risk factors for VAP [4]: severe traumatic brain injury (Glasgow coma score 4-8), severe thoracic trauma (multiple rib fractures or pulmonary contusions), spinal cord injury with paralysis, or a combination of injuries that placed the patient at risk for VAP as determined by the attending physician (e.g. severe intraabdominal trauma). Exclusion criteria were expected nonsurvivability of injuries, history of significant lung disease (e.g. COPD, asthma) immunocompromised state (e.g. pharmacologic, HIV infection), or pregnancy.
Patients were enrolled within 72 hours of trauma intensive care unit (TICU) admission. Patients were then followed throughout their ICU stay to monitor for the development of VAP, and were subsequently categorized at VAP+ or VAP2. VAP was definitively diagnosed using the center's standard criteria. Patients with fever/ hypothermia (.38uC or ,36uC), leukocytosis/leukopenia (.12,000/mm 3 or ,4,000/mm 3 ), purulent sputum, and new or progressive infiltrate on chest radiograph underwent diagnostic bronchoscopic bronchoalveolar lavage (BAL) using a method previously described [5,6]. A definitive diagnosis of VAP required growth of a pathogenic organism from the quantitative BAL culture $10 5 colony forming units/mL. This diagnostic method is recommended by the ATS/IDSA guidelines (1). At design of this study, the optimal determination of study size for microarray studies had not been established. Therefore, the method by Simon et al. was used. A power analysis using the method by Simon et al. suggested a need for approximately 24 patients for this study [7]. Based on previous data, it was expected that the incidence of VAP would approach 50% in this population [8]. Thus, patients were expected to accrue into the VAP+ and VAP2 groups in approximately equal fashion.

Sample preparation
Upon enrollment, 40 mL of whole blood was collected. Blood samples were immediately stimulated with 1000 ng/mL of lipopolysaccharide (LPS) solution (E. coli 011B4 LPS in RPMI 1640 culture medium 210% fetal bovine serum and 100 U/mL penicillin-streptomycin) to approximate the effect of colonization with Gram-negative bacilli [9]. This model was used because the development of Gram-negative VAP is preceded by bacterial colonization. It was thought that the interpatient genetic variability in the immuno/inflammatory response to bacterial colonization would be important in determining which patients went on to develop VAP. Samples were incubated for 3 hours in a water bath at 37uC and then centrifuged for 10 minutes at 4uC. The plasma layer was decanted. The white blood cell layer was removed and incubated in 30 mL of RBC lysis buffer (Tris HCL/TRIZMA HCl/NH4Cl) at 37uC for 15 minutes then centrifuged for 15 minutes at 4uC. The supernatant was decanted and the pellet resuspended and washed with D-PBS two times.
Total RNA was isolated from peripheral blood using phenolchloroform extraction per the RNAgents Total RNA isolation kit protocol (Promega, Madison, WI). RNA concentration was initially estimated by comparing the ultraviolet absorbance (A260/A280) ratio of the sample. The integrity and concentration of the RNA sample was subsequently assessed using the RNA 6000 Nano LabChip Kit and Agilent 2100 bioanalyzer (Agilent Technologies). Approximately 5 mcg of total RNA was used for cDNA synthesis. The MessageAmp aRNA kit (Ambion) was used for cDNA and cRNA synthesis. The labeled cRNA samples were then hybridized to the GeneChip Human Genome Focus Arrays (Affymetrix).

Ethics
This study was approved by the University of Tennessee Health Science Center Institutional Review Board. The study was conducted in accordance with the Declaration of Helsinki. Informed consent was required from the patient or legally authorized representative. Once written consent was obtained the patient was enrolled into the study.

Expression Analysis
All analysis was performed using Partek software (St Louis, MO). Differentially expressed genes were identified by ANOVA. Estimation of false discovery rate (FDR) due to multiple hypothesis testing was obtained by calculating the q-value as described by Storey and Tibshirani [10]. These genes were used to cluster patients by Hierarchical clustering using Pearson's dissimilarity scores and average linkage parameters. In addition, principal components analysis (PCA), which is a mathematical technique used to reduce the dimensionality of the data, was used to project patients in 3 dimensions based on their gene expression profiles. Lastly, patients were classified based on gene expression profiles using logistic regression model. VAP status prediction was obtained by fitting data to a logit function/logistic curve and calculating the posterior class probability. Leave-one-out partitioning cross-validation was used to evaluate the model accuracy. One round of cross-validation divided data into a training set (19 patients) and a test set (1 patient). The top 20 significant genes were selected from the training dataset based on all available genes on the chip using ANOVA. There were 20 rounds of crossvalidation performed. The 20 correct rates were normalized to get the final accuracy.
Functional analysis of genes was performed using GO and KEGG annotations available through WebGestalt tool [11]. The p-values for enrichment of each category C, given our gene list A (containing n genes) and the reference gene list B (containing m genes) were calculated as follows. If there were k genes from A and j genes from B in a given category C, the expected number of genes in Category C can be calculated by k e~( n=m) Ã j. Therefore, the enrichment ratio is calculated byr~k=k e . If k is greater than the expected number k e (i.e. r is greater than 1), then category C is considered to be enriched in our gene list. The significance of the enrichment was calculated using the hypergeometric distribution, p~P n i~k m{j n{i j i m n .

Results
Thirty-two patients were enrolled within 72 hours of admission to the TICU, 12 patients could not be evaluated. Two patients died within 48 hours of admission, two patients were extubated within 48 hours of admission and samples in eight patients had insufficient quantities of RNA for the microarrays. Thus 20 patients were included, ten that went on to develop VAP (VAP+) and ten that did not (VAP2). Patients in the two groups were similar except with regard to duration of mechanical ventilation and length of ICU stay, which were significantly longer in the patients that developed VAP (Table 1).
A microarray approach was utilized to examine gene expression profiles in LPS treated blood cells from the VAP+ and VAP2 groups. Using a one-way ANOVA test, 810 genes were identified whose transcript levels were significantly different between the two groups (Table S1). The q-value FDR estimates ranged from 0.035 to 0.44 (Table S1). This suggests that up to 44% of 810 genes (p,0.05) could be false positives. Although this number is quite high, it does not necessarily rule out the functional significance of the results. Indeed, functional analysis using Gene Ontology and KEGG classifications revealed several processes that were enriched among the differentially expressed genes (DEGs) ( Table  S2). For instance, a large number of DEGs are involved in regulation of protein translation (43 gene, p,6.98e 208 ), protein folding (21 genes, p,5e 204 ) and ribosomal machinery (28 genes, p,7.14e 208 ). Also, a set of DEGs are involved in regulation of protease activity: Serine peptidase (20 genes, p,2.2e 203 ), inhibitors of endopeptidases (18 genes, 2.3e 203 ), and inhibitor of metalloprotease activity (4 genes, p,4.00e 203 ). More importantly, a set of DEGs appeared to be involved in bacterial infection: antimicrobial humoral response (2 genes, p,8.80e 203 ), bacterial binding (5 genes, p,1.00e 203 ) and vibrio cholera infection (10 genes, 3.50e 203 ). These genes may contribute in some way to development of VAP.
Multiple approaches were used to determine if gene expression profiles could be used to cluster patients that go on to develop VAP. First, hierarchical clustering was performed using the gene expression levels of 810 DEGs identified by ANOVA ( Figure 1A). Seventeen out of 20 patients' samples clustered according to their VAP classification, except for #1, #4 and #5 (VAP+). These results were consistent with the distribution of samples observed by principal component analysis (PCA), a mathematical method to reduce dimensionality of the data. Projection of the samples in the 3 largest components (accounting for approximately 59.5% of the variation in the 810 DEGs across the 20 samples) revealed that all patients except #1, #4 and #5 were segregating clearly in the two groups (Figure 2A).
To more rigorously test the predictive power of the gene expression profiles, a logistic regression model was implemented using a leave-one-out cross validation. Here, each sample was removed from the pool of 20 and DEGs were identified using a one-way ANOVA. The positive predictive value of the top 20 significant genes to classify patients as VAP+ or VAP2 was then determined and reported as a posterior probability (Table 2). This model accurately predicted the VAP status of all patients except #15. The posterior probability of predicting VAP+ for VAP2 patients ranged from 2.85e 205 to 0, with no false positives. Conversely, the posterior probability of predicting VAP2 for VAP+ patients was 0 for all cases, except for one patient (#15), which could not be determined.
Five genes (PIK3R3, ATP2A1, PI3, ADAM8, and HCN4) were common to all significant gene sets used in the cross validation tests. After LPS stimulation of whole blood, the expression of these five genes were down regulated in patients' that went on to develop VAP ( Figure 1B). To further validate the association of these genes with VAP status, PCA and hierarchical clustering analysis were repeated ( Figure 1B and 2B). Again, expression profiles of these 5 genes accurately clustered all patients except sample 14 ( Figure 1B). Two groups were discernible by PCA visualization except patient samples 1 and 14, which appeared to be at the interface ( Figure 2B).

Discussion
The key result of this pilot study was development of a logistic regression model that accurately predicted critically-injured trauma patients that went on to develop VAP (VAP+) and those that did not (VAP2). Five genes (PIK3R3, ATP2A1, PI3, ADAM8, and HCN4) were common to all top 20 significant genes which were identified from all independent training sets in the cross validation. This model is supported by the fact that hierarchical clustering using these five genes accurately categorized 95% of patients and PCA visualization demonstrated two discernable groups, VAP+ and VAP2. This preliminary research is the first step in developing a clinically useful screening tool to identify patients at highest genetic risk of VAP secondary to trauma and subsequent bacterial exposure.
Of the five genes identified, three appear to have a clear role in host response to infection. Since all five genes were down regulated in response to LPS stimulation, it is logical that they may play a role in the relative susceptibility to infection. PI3 has been best studied in the host response to infections. It encodes for an elastase-specific inhibitor with antimicrobial peptide activity that is synthesized in response to cytokine and bacterial stimuli [12]. It has been shown to promote early clearance of Pseudomonas aeruginosa via macrophage activation and neutrophil recruitment [13]. ADAM8 encodes a distintegrin and metalloproteinase domain-containing protein 8. Neutrophil activation by proinflammatory cytokines induces rapid translocation of ADAM8 to the cell membrane with subsequent shedding [14]. This process is associated with shedding of L-selectin, which is intimately tied to transendothelial migration of neutrophils [15]. PIK3R3 encodes for phosphoinositide 3-kinase regulatory subunit gamma (PI3Kc), which is part of a large family of enzymes involved in intracellular signaling. PI3Kc is predominately expressed immune cells and plays a role in chemoattractant-induced cell migration [16,17,18,19]. Thus, down regulation of these genes following LPS stimulation would seem to place the host at a greater risk of developing an infection.
The role of the remaining two genes in response to infection is less clear. HCN4 codes for the hyperpolarization activated cyclic nucleotide-gated potassium channel. Its role in cardiac rhythm maintenance is well defined [20], but there are currently no data linking it to infectious complications of critically ill patients. ATP2A1 encodes the sarcoplasmic reticulum calcium transporting ATPase (SERCA1) in the of skeletal muscle [21]. It is interesting to note that LPS can cause myocardial dysfunction in critically-ill patients and that SERCA is intimately related to cardiac relaxation [22]. These two processes could be related and may be a reason why we found ATP2A1 was down regulated in VAP+ patients after LPS stimulation. While the role of these two genes in infections are not as clear, they were still important to the model as they were among the top 20 DEGs identified for every leave-oneout cross validation. This consistency is noteworthy and could represent a relation to VAP in critically-injured trauma patients that has yet to be identified. A number of previous studies suggest that genetic variability in infection risk exists in hospitalized patients. The most widely studied polymorphism is associated with overproduction of TNF-a; which has been related to a 2.1-13 fold increase in the incidence of severe sepsis from all causes including pneumonia [23]. Similar data exist on a smaller scale for IL-1, IL-10, interferon gamma, and CD14 receptor genes [24,25,26,27]. However, none of these associations were sensitive or specific enough to be used as clinical tests for an increased risk of infection. As such, the current study design is different because it used gene expression profiling to search for candidate genes that may be useful clinical markers of infection risk.
Changes in gene expression patterns have been used in the early diagnosis of VAP [28,29]. McDunn et al. identified 85 genes in a riboleukogram that identified patients with VAP at 24 to 96 hours before the clinical diagnosis could be established. These data demonstrate that differences in gene expression occur well before patients manifest clinical signs of infection. Cobb et al. validated the early VAP diagnosis using the riboleukogram, and the studies together support the concept that gene expression is a useful clinical diagnostic tool. It is reasonable that the genes identified in both studies relate to immune response, neutrophil activation, and intracellular signaling pathways. The current study is similar in that the genes identified in the logistic regression model also relate to the immune response and involve cell signaling (PIK3R3), antimicrobial peptides (PI3), and cell adhesion (ADAM8). The current study is different in that the logistic regression model developed does not diagnose pneumonia, but instead identifies patients at future risk of VAP. The potential benefit of these results would be the ability to identify at-risk patients upon ICU admission. The ultimate goal would be to use that information to employ prophylactic measures such as systemic and/or topical (i.e. gut, aerosolized) antibiotics to prevent VAP. Prophylaxis is highly effective and would benefit patients by reducing VAPassociated morbidity and mortality. Only targeting those at risk would reduce the likelihood for development of bacterial resistance, as this has been the main reason for avoiding prophylactic antibiotic therapies.
In addition to providing a predictive tool, gene expression profiling can also provide new insights into the mechanism of the disease. We found 810 genes, which were differentially regulated in VAP patients, compared to controls. Unfortunately, the q-value FDR estimates were as high as 44% for some genes (Table S1). This makes it difficult to determine if a given gene might have a specific role in VAP development. In spite of this shortcoming, we found that using the 810 gene expression profiles, the samples could be delineated fairly accurately by hierarchical clustering or by PCA visualization, which gives us some confidence that the DEGs might have some functional relevance. Indeed, we noted several sets of functionally related genes that were affected similarly in VAP patients. For example all 25 genes in 'translational elongation' category and both genes in 'antimicrobial humoral response' category were down regulated in VAP+ patients. On the other hand, all five genes in 'bacterial binding' category were up regulated in VAP+ patients. In addition, a large number of genes involved in regulation of endopeptidase activities were differentially regulated in VAP+ patients. Taken together, these results suggest that these processes may play an important role in development of VAP.

Limitations
The current study has limitations. First, the pilot nature of the study is hypothesis generating and requires further validation of the findings. Second, the predictive model was developed with 20 patients and was not tested on an independent sample of patients. While these issues limit the immediate clinical value of this study, the stage is set to test the predictive value of the five genes identified across a larger patient cohort. Third, the timeframe for enrollment was large, within the first 72 hours following trauma intensive care unit admission. It is likely that time from injury results in changes in inflammation and gene expression. However, the optimal time to relate gene expression to future infectious processes has not been established. Future studies may benefit from a shorter time window for enrollment. Fourth, as mentioned above, due to multiple hypothesis testing the FDR reported for the 810 genes was high. However, it is important to note that this result does not impact the identification of predictive genes by an independent regression model, which included a leave-one-out cross validation procedure. Finally, performing LPS stimulation prior to measuring gene expression is relatively uncommon. This method was used in the current study to provide a gene expression response that would approximate the in vivo response in patients exposed to Gram-negative bacteria. In designing the study, the investigators believed that this would better accentuate the true genetic differences between those who go on to develop VAP compared to those who do not. A recent study by Bryant et al. found that LPS stimulation produced variable expression in immune-related genes. The authors concluded that this variability may be related to interpatient differences in response to an infectious insult [30]. Additionally, Textoris J, et al. analyzed unstimulated whole blood and failed to see gene expression differences in trauma patients that developed VAP and those that did not [31].
This small pilot study has three important strengths. First, the definition of VAP used in the study was the most rigorous diagnosis available, and is considered to be an optimal diagnosis by the current ATS/IDSA guidelines [1]. Thus, there is confidence that the patients in the study group actually had VAP and the control patients did not. Second, the use of the Affymetrix platform provides external validity. This is the most widely used microarray platform and the chip used in this study is widely available. Third, this was a relatively homogenous population of critically injured trauma patients with similar demographics between the VAP+ and VAP2 groups.

Conclusion
In this pilot study a logistic regression model using crossvalidation accurately predicted patients that went on to develop ventilator-associated pneumonia. This model should now be tested on a larger cohort of trauma patients to validate its prognostic value.

Supporting Information
Table S1 Differentially Expressed Genes. Using a one-way ANOVA test, 810 genes were differentially expressed in the VAP2 and VAP+ groups. The q-value FDR estimates ranged from 0.035 to 0.44. (DOC)