Prediction Model for Critically Ill Patients with Acute Respiratory Distress Syndrome

Background and objectives Acute respiratory distress syndrome (ARDS) is a major cause respiratory failure in intensive care unit (ICU). Early recognition of patients at high risk of death is of vital importance in managing them. The aim of the study was to establish a prediction model by using variables that were readily available in routine clinical practice. Methods The study was a secondary analysis of data obtained from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center. Patients were enrolled between August 2007 and July 2008 from 33 hospitals. Demographics and laboratory findings were extracted from dataset. Univariate analyses were performed to screen variables with p<0.3. Then these variables were subject to automatic stepwise forward selection with significance level of 0.1. Interaction terms and fractional polynomials were examined for variables in the main effect model. Multiple imputations and bootstraps procedures were used to obtain estimations of coefficients with better external validation. Overall model fit and logistic regression diagnostics were explored. Main result A total of 282 ARDS patients were included for model development. The final model included eight variables without interaction terms and non-linear functions. Because the variable coefficients changed substantially after exclusion of most poorly fitted and influential subjects, we estimated the coefficient after exclusion of these outliers. The equation for the fitted model was: g(Χ)=0.06×age(in years)+2.23(if on vasopressor)+1.37×potassium (mmol/l)-0.007×platelet count (×109)+0.03×heart rate (/min)-0.29×Hb(g/dl)-0.67×T(°C)+0.01×PaO_2+13, and the probability of death π(Χ)=eg(Χ)/(1+eg(Χ)). Conclusion The study established a prediction model for ARDS patients requiring mechanical ventilation. The model was examined with rigorous methodology and can be used for risk stratification in ARDS patients.


Conclusion
The study established a prediction model for ARDS patients requiring mechanical ventilation. The model was examined with rigorous methodology and can be used for risk stratification in ARDS patients.

Introduction
Acute respiratory distress syndrome (ARDS) is a severe form of acute lung injury most commonly seen in intensive care unit (ICU) and it is associated with significant morbidity and mortality. The incidence of ARDS is estimated to be approximately 40-80 per 100,000 patientyears [1][2][3][4]. and the figure can vary with different definitions of ARDS. The mortality rate in patients with established ARDS is around 50-60% [4][5][6]. More recently, due to advances in the management of ARDS such as use of low tidal volume ventilation and extracorporeal membrane oxygenation, the mortality has shown a reduction to around 30% [7,8]. However, There is limited supportive evidence that specific interventions can decrease mortality in ARDS, and the mortality of ARDS does not show significant reduction over time. [9][10][11][12] Therefore, ARDS remains to be a great challenge to clinicians.
The initial step in mortality reduction is to identify risk factors for poor clinical outcomes. This is an area being extensively studied. For instance, sepsis-induced ARDS has been found to be associated with increased risk of death as compared with other caused. [13] Patients with lower BAL levels of procollagen peptide showed lower mortality than those with higher levels. [14] However, most of these studies investigated risk factors in isolation. Because there are multiple factors working together to determine the final outcomes of ARDS patients, it is more clinically useful to develop a prediction model for risk stratification. Gajic O and colleagues [15] developed a well-calibrated model for mortality prediction for ARDS patients, which however required information on organ functions three days after intubation. In another study, risk tertiles model was developed for predicting mortality in ARDS. However, the study categorized continuous variables into tertiles, which is thought to be associated with information loss [16]. In the present study, we aimed to develop a prediction model for ARDS patients requiring mechanical ventilation. The principal in developing the model is a balance between parsimony and model fitting. Furthermore, the variables included in the model should be readily available in routine clinical practices.

Methods
The performance of the secondary data analysis was approved by the ethics committee of Jinhua municipal central hospital and informed consent was waived. Patient records or information was anonymized and de-identified prior to analysis. The study was a secondary analysis of data obtained from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center. The original randomized controlled trial was entitled "Randomized, Placebocontrolled Clinical Trial of an Aerosolized b2-Agonist for Treatment of Acute Lung Injury" (NCT 00434993) and has been published elsewhere. [17] Patients were enrolled between August 2007 and July 2008 from 33 hospitals of the National heart, lung, and blood institute ARDS clinical trials network. Inclusion criteria were 1) patient had to be intubated and receiving mechanical ventilation; 2) bilateral infiltrates consistent with edema on chest X-ray, 3) had an PaO2/FiO2<300, 4) no clinical evidence of left atrial hypertension. The definition of ARDS in the study was made according to the American-European Consensus Conference (AECC) criteria [18]. In Berlin definition, the use of PEEP was considered and ARDS was categorized into mild, moderate and severe forms [19]. Patients were excluded if 1) they had coexisting chronic lung disease; 2) unable to obtain consent; 3) acute myocardial infarction; 4) chronic liver disease; 5) neuromuscular disease.

Data extraction
Data were extracted from the dataset obtained from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center after approval. The variable name was annotated in a file named "data dictionary". The following variables were extracted: age, gender, body mass index (BMI), past history of cigarette smoking and alcohol consumption, types of ICU (medical ICU, surgical ICU and mixed ICU), admission type (unscheduled surgery, scheduled surgery and medical admission), causes of ARDS (sepsis, transfusion, aspiration, pneumonia, other lung conditions), admission source (operating room, emergency department, floor ward and others), comorbidities (chronic dialysis, leukemia, immunodeficiency, cirrhosis, diabetes, hypertension, myocardial infarction, heart failure, vascular disease, dementia, chronic pulmonary disease, arthritis, peptic ulcer), vasopressor use, hemoglobin, white blood cell, platelet count, creatinine, bilirubin, sodium, potassium, glucose, bicarbonate, phosphate, magnesium, total protein, albumin, minimal glucose, FiO2, PaO2, PaCO2, pH, lowest and highest temperature, lowest and highest systolic blood pressure, lowest and highest mean blood pressure, lowest and highest respiratory rate, urine output (24 hours), transfusion of RBC, FFP transfusion. All measurements, including laboratory and physiological findings were performed within the first 24 hours.

Statistical analysis
Univariate logistic regression model was performed to screen factors associated with mortality in ARDS patients requiring mechanical ventilation. The dependent variable was a binary outcome with "1" indicated death and "0" indicated survival. Independent variables were categorized into continuous variables and indicator variables. Continuous variables such as age, laboratory measurements and urine output were reported their ORs for each one unit increase in the parameter. Indicator variables were reported their ORs for each category as referenced to the base status. For instance, patients exposed to cigarette smoking were compared to those without cigarette smoking, and OR was reported for the variable. Alcohol intake was reported as the frequency and we dichotomized them into patients with and without history of alcohol intake. Smokers were reported as non-smoker, former smoker and current smoker. We combined the latter two categories as smokers. There were eight categories for the variable of admission sources. Because too many categories might compromise the statistical power, we grouped patients from operating room and recovery room together.
Variables with p< 0.3 in univariate analysis are included for automatic stepwise selection of covariates. Furthermore, variables with prevalence<10% or with >15% missing observations were excluded from further analysis. In the study, phosphate and bilirubin both showed more than 30% missing observations and was excluded from multivariable analysis despite their significance in univariate analysis. Stepwise forward selection of variable with p<0.1 was performed to screen variables independently associated with mortality. The automatic selection would finally generate a main effect model. To make full use of data information and improve statistical power, we use multiple imputation technique other than the conventional complete case analysis as a sensitivity test to examine the robustness of our result. [20] Overfitting with optimism in coefficient estimate was another concern in building a prediction model. [21] Thus, we use the bootstrapping technique to adjust for coefficients estimated from conventional multivariable analysis. [22] Bootstrapping was a technique of resampling with replacement and here we repeated the sampling for 500 times. The bootstrapping sampling technique would shrink the estimated coefficient but provide better prediction to future samples.
Potential interactions among included covariates in the main effect model were tested by including them one by one. To simplify this process, we created a local macro called "covariate" to store all covariates in the main effect model and thereafter the process can be automated by using foreach syntax. Interaction terms with p<0.1 would be included in the model. Linearity of covariates on logit scale is a fundamental assumption in model fitting. Therefore, we employed multivariable fractional polynomials (MFP) to test whether other power terms were superior to the linear term. [23,24] Firstly, the best fitting one-term and two-term models were modeled by choosing power transformations from the set <-2, -1, -0.5, 0, 0.5, 1, 2, 3>, where 0 denoted the log transformation. Next, the closed test procedure was performed in which the best fitting two-term model was compared with the linear model. If the two-term model was significantly better than the linear one (p<0.05), two-term model was then compared to the best fitting one-term model. Otherwise, linear model was adopted. The procedure continued until there was no statistical significance and the best fitting model was chosen.
Model fit would be assessed from two aspects: a summary measure and regression diagnostics. The Hosmer-Lemeshow tests would be employed in which grouping of covariate pattern was based on the estimated probability. [25] The Hosmer-Lemeshow goodness-of-fit statistic was obtained by calculating Pearson Chi-square statistic from the g×2 table of observed and estimated frequencies. The variable g refers to the number of groups. A statistical significance level p<0.05 indicates that the model is significantly different from the observed outcome. Furthermore, the discrimination of the fitted model was assessed graphically. Logistic regression diagnostics were calculated to see if the model fit over the entire set of covariate patterns. [26] Statistics including leverage, change in Pearson Chi-square (Δχ 2 ), change in deviance (ΔD) and Cook's distance (Δb) would be plotted against the estimated probability of death (p). [27] The aim was to examine cases lied far away from the others. New model fitting would be performed by excluding these outliers. However, diagnostics statistics were used to identify influential subjects and the decision on exclusion should incorporate subject matter considerations.
All statistical analyses were performed by using the STATA 13.1 (StataCorp, College Station, Texas 77845 USA).

Results
A total of 2688 patients were initially screened. Then 2406 patients were excluded, remaining 282 ARDS patients who required invasive mechanical ventilation. The most common reasons for exclusion were chronic lung disease (19.2%), unable to obtain consent (15.2%) and time window exceeded (14.7%). There were 61 non-survivors and 221 survivors, with an overall mortality rate of 21.63%. Univariate logistic regression analysis (  (9), pH (9), mean blood pressure (2), urine output (2). Because the variables bilirubin and phosphate had too many missing values (>15%), they were excluded from analysis. Furthermore, the comorbidity variables including leukemia, immunodeficiency and vascular disease were excluded because the prevalence was less than 10%. There was no significant difference on PEEP (9.1±3.7 vs.9.3±3.3 cmH2O; p = 0.71) and FiO2 (0.58±0.19 vs.0.57±0.17; p = 0.69) between survivors and non-survivors (Fig. 1). As a result, a total of 21 covariates were entered into the full model. After stepwise forward selection with p = 0.1, eight covariates remained in the model (Table 2) Interaction terms were evaluated for all possible interactions, which showed no statistically significant interactions among variables. Linearity assumption for continuous variables in the main effect model was assessed by using multiple fractional polynomials, which showed that other non-linear functions were no better than the linear one. As a result we adopted the original main effect model as the final model. Overall model fit was assessed by using Hosmer-Lemeshow goodness-of-fit test, which showed a χ 2 (df = 8) of 6.54 (p = 0.59). Graphical  or Δχ 2 (poorest fit) and two with outlying values of Δb (largest influence). These covariate patterns (#72, 126, 137, 147, 171, 207) were shown in table 3. Because there were many continuous covariates, one covariate pattern corresponded to one subject. For instance, the subject #137 was characterized by old age, vasopressor use, hyperkalemia and thrombocytopenia, which was a covariate pattern of high probability of death. However, the subject was observed to survive which violated the fitted model and thus it was considered as an outlier. We examined how the exclusion of these outliers could influence the estimation of coefficients ( Table 4). The result  showed that all coefficients changed significantly after exclusion of outliers. We would like to use this model as the prediction model for probability of death in future ARDS patients requiring mechanical ventilation: where g Χ ð Þ ¼ 0:06 Â age in years ð Þþ2:23 if on vasopressor ð Þ þ 1:37 Â potassium mmol l À 0:007 Â platelet count Â10 9 À Á þ 0:03 Â heart rate =min ð ÞÀ0:29 Â Hb g dl À 0:67 Â T C ð Þ þ 0:01 Â PaO 2 þ 13: The prediction model was compared with APACHE III score for its discrimination in predicting mortality (Fig. 4). The result showed that the prediction model had better discrimination than APACHE III (AUC: 0.85, 95% CI: 0.79-0.90 vs. AUC: 0.77 95% CI: 0.70-0.84; p = 0.037).  (Δχ 2 >10). The size of the symbol is proportional to Δb , allowing us to more clearly ascertain the relative contribution of residual and leverage to Δb . The largest circle in the right corner correspond to a moderate leverage and a large Δχ 2 , indicating that high leverage might not be a contributing factor. The same five points are shown in lower right panel, but note that the range of Δχ2 is much greater than the change in deviance (ΔD). As a result we identified five covariate patterns with large values of ΔD or Δχ 2 (poorest fit) and two with outlying values of Δb (largest influence). doi:10.1371/journal.pone.0120641.g003

Discussion
The study, by using a prospectively collected dataset, established a prediction model for ARDS patients requiring mechanical ventilation. The model included eight covariates without interaction and non-linear functions. The parsimony of the model may improve the prediction For a prediction model to be clinically useful, it should be easy to use. In the study we incorporated variables that were readily available in routine clinical practice. In the model, we found that old age was a significant independent risk factor for death. This is consistent with other studies and prediction models. [29][30][31] Vasopressor use was also found to be an independent risk factor for mortality. Vasopressor use indicates circulatory failure which is established to be associated with multiple organ failures (e.g. acute kidney injury) in critically ill patients. [32,33] Organ failure such as acute kidney injury is a well-known mortality risk factor that is also supported by our previous study. [34] Platelet count was also found to be associated with mortality risk in this cohort. However, our previous study showed that it was platelet distribution width and mean platelet volume, rather than platelet count that were independently associated with mortality risk. [35] In that study, we included unselected critically ill patients, and the mortality was slightly higher. The difference in study population and severity of illness may partly explain the disparity between these two cohorts.
Because the prediction model was established with single cohort without external validation, overfitting is a major concern. We employed bootstraps procedure to shrink coefficient and chose model with the principal of parsimony. However, the result showed that the bootstrap procedure did not change the coefficient, indicating that the estimated coefficient is less likely to be biased. There is no evidence of substantial problem with model fit as reflected by the nonsignificance of Hosmer-Lemeshow goodness-of-fit test. However, such overall model fit cannot exclude some outlying observations. We therefore further examined model fit over the entire set of covariate patterns. As a result, six covariate patterns showed large values in diagnostic statistics, indicating they are either poorly fitted or influential. After exclusion of these six subjects, the coefficients were substantially changed and we choose to retain the model with the outliers excluded. The strength of this technique is to exclude the influence of minority of outlying covariate patterns. However, the shortcoming is certainly that our model cannot be used for subjects with those covariate patterns.
Several limitations need to be acknowledged in the study. First, the major methodological flaws of this secondary analysis is the use of subjects enrolled in a RCT, instead of using data from an observational cohort of consecutive patients. Patients included in a RCT are a selected population that differs from the common patient with the diagnosis under study. For example, the overall mortality rate of this selected group of patients (21%) is below other figures reported in recent epidemiological studies [4][5][6]. Second, the present analysis is the lack of a validation cohort to test the model. The trial was stopped early due to futility of the intervention. Thus the sample size was small and the dataset cannot be split to training subset and validation subset. However, we examined the overall model fit, as well as the influence of outliers. Furthermore, the problem of overfitting was addressed by using bootstraps procedure to shrink coefficient and the final model was chosen with the principal of parsimony. Third, the reported model is not specific for ARDS patients. In fact, all predictive variables are not specific for ARDS. Thus it would be interesting to test our prediction model in patients without ARDS. Forth, The study suffers slightly from using a single, local cohort, such that international generalisability is questionable. This can be addressed by validating our prediction model in ARDS cohorts from other institutions.
In aggregate, the present study established a prediction model for ARDS patients requiring mechanical ventilation. The model contained eight covariates that are readily available in routine clinical practice and can be applied to all critical care settings. Interaction terms or nonlinear functions are not included in the model for parsimony.