Seven-Day Mortality Can Be Predicted in Medical Patients by Blood Pressure, Age, Respiratory Rate, Loss of Independence, and Peripheral Oxygen Saturation (the PARIS Score): A Prospective Cohort Study with External Validation

Background Most existing risk stratification systems predicting mortality in emergency departments or admission units are complex in clinical use or have not been validated to a level where use is considered appropriate. We aimed to develop and validate a simple system that predicts seven-day mortality of acutely admitted medical patients using routinely collected variables obtained within the first minutes after arrival. Methods and Findings This observational prospective cohort study used three independent cohorts at the medical admission units at a regional teaching hospital and a tertiary university hospital and included all adult (≥15 years) patients. Multivariable logistic regression analysis was used to identify the clinical variables that best predicted the endpoint. From this, we developed a simplified model that can be calculated without specialized tools or loss of predictive ability. The outcome was defined as seven-day all-cause mortality. 76 patients (2.5%) met the endpoint in the development cohort, 57 (2.0%) in the first validation cohort, and 111 (4.3%) in the second. Systolic blood Pressure, Age, Respiratory rate, loss of Independence, and peripheral oxygen Saturation were associated with the endpoint (full model). Based on this, we developed a simple score (range 0–5), ie, the PARIS score, by dichotomizing the variables. The ability to identify patients at increased risk (discriminatory power and calibration) was excellent for all three cohorts using both models. For patients with a PARIS score ≥3, sensitivity was 62.5–74.0%, specificity 85.9–91.1%, positive predictive value 11.2–17.5%, and negative predictive value 98.3–99.3%. Patients with a score ≤1 had a low mortality (≤1%); with 2, intermediate mortality (2–5%); and ≥3, high mortality (≥10%). Conclusions Seven-day mortality can be predicted upon admission with high sensitivity and specificity and excellent negative predictive values.


Introduction
Emergency departments and admission units across the globe are experiencing a steady increase in admissions. [1][2][3][4] Frontline personnel treating these patients must quickly assess the severity of illness. However, clinical assessment and prognostication are difficult.
Although prognostication is key to treatment selection, it is not an integrated part of modern medicine, [5] and many physicians feel inadequately trained. [6] The lack of training in prognostication adds to the importance of developing risk stratification systems that can assist in estimating the prognosis for a patient and plan treatment and resource allocation accordingly. Indeed, two studies on patients admitted to intensive care have shown that a high number of patients received inadequate care before transfer, resulting in a potential increase in mortality. [7,8] Triage is widely used when handling high-risk patients, but the goal of triage is resource allocation, [9] not risk stratification. Several specific risk stratification systems have been introduced. [10,11] However, most of these have been developed using inadequate methodology and do not reach standards necessary for implementation in daily clinical practice. [10,11] For a system to be clinically valuable, it has to be easy to use, have adequate performance, and show reliability across groups of patients in various settings. [12] Our objective was to develop a risk stratification system that, at admission, can accurately predict seven-day mortality of acutely admitted medical patients using routinely collected variables easily obtained within the first few minutes after arrival.

Materials and Methods
We used multivariable logistic regression to identify the clinical variables that best predict seven-day all-cause mortality. On the basis of this, we developed a simplified model that can be calculated without special technology and without loss of performance (see Online-only Material).
We have included only parameters that are easily recorded upon admission and validated our models extensively. Only variables that provided a high prediction of outcome were included in our model, without compromising performance and reliability.

Setting
This prospective observational cohort study consists of three independent cohorts. The development cohort was collected at the medical admission units (MAUs) at Sydvestjysk Sygehus from October 2008 through February 2009. The first validation cohort was collected from February 2010 through May 2010, and the second validation cohort at the MAU at Odense University Hospital from March 2011 through July 2011.
Sydvestjysk Sygehus Esbjerg is a regional 460-bed teaching hospital in western Denmark with a mixed urban and rural contingency population of 220 000. All subspecialties of internal medicine, pediatrics, and general and orthopedic surgery and a 12-bed intensive care unit (ICU) are present. Odense University Hospital is a 1300-bed, level 1 trauma center and a university teaching hospital with all specialties present and a contingency population of 290 000 and serves as a tertiary referral center for 1.2 million people. All adult medical patients (age 15 and older) who are admitted through the MAU (cardiology, neurology, hematology, oncology, and nephrology patients are admitted through other departments at Odense University Hospital) from all sources (ie, emergency department, family physician or out-patient clinic) were included.

Variables
Before beginning inclusion of patients, we had selected nine potential independent variables for inclusion based upon relevancy and practical concerns: loss of independence (LOI), systolic blood pressure, age, peripheral oxygen saturation (SaO 2 ), respiratory rate, level of consciousness, temperature, pulse, and blood glucose. Upon admission, a nurse registered the first collected vital signs as well as assessing LOI on a form, and the data were entered into an electronic database. During data collection, all nurses were blinded to details of the study purpose (i.e. precise endpoint and prioritized independent variables).
SaO 2 was measured using the department's electronic non-invasive equipment. To take the fraction of inspired oxygen (FiO 2 ) into account, we used the SaO 2 /FiO 2 ratio suggested by Rice et al. [13] and Pandharipande [14]. LOI was defined as an inability to get into bed without assistance, either from a wheelchair or emergency department/ambulance gurney, regardless of previous status. Level of consciousness was recorded using the AVPU (defined as Alert, responsive to Vocal stimuli, responsive to Pain, or Unresponsive) scale. [15,16]

Endpoint
The endpoint was all-cause seven-day mortality regardless of admission status, co-morbidity, and "do not attempt resuscitation" orders. Data on the endpoint were extracted from the Danish Person Register [17] and retrieved after all patients were discharged. Foreign nationals (n = 50; 0.6%) who were discharged alive were considered to be alive at the endpoint, even though complete follow-up was impossible.

Ethics
The study was approved by the Danish Data Protection Agency and reported in accordance with the STROBE statement. [18] Danish law does not require approval by the regional ethics committee for observational studies.

Statistics
To reduce the risk of overfitting, [19][20][21] we required 10 events per independent variable, ie, 90, to include all predefined variables. In case of fewer events, we needed to reduce the number of independent variables. Before beginning analyses, we decided that LOI, systolic blood pressure, age, and respiratory rate would remain, based on the existing literature. We determined that blood glucose could be discarded (because it is easily lowered and increased), as could temperature because it can be measured in various ways (eg, tympanic, axillary, and rectal), which could affect predictions. [22] If further variables were to be discarded, we prioritized level of consciousness, peripheral oxygen saturation, and lowest, pulse.
Both the full and the simple models were developed using only patients from the development cohort. Both models were afterwards validated independently in the validation cohorts using coefficients and scores as identified in the development cohort (see S1 Text).

Generation of the full model
We analyzed the association between the independent variables and the endpoint using univariable analyses with a 25% significance level. The variables were included in a multivariable logistic regression analysis with a 5% significance level. We tested for interaction, co-linearity and deviation from linearity using fractional polynomials in the continuous variables. [23] To minimize the impact of missing values, we used multiple imputation (data considered to be missing at random) [24][25][26] in our main analyses and report these coefficients.

Generation of the simplified model
To develop a model that would be easy to use in clinical practice and make mental calculation possible, we defined a simplified model by dichotomizing the continuous variables included in the full model. The cutoff level for dichotomization was arbitrarily defined as the point at which the mortality of each variable rose above 5%. Because SaO 2 /FiO 2 is difficult to calculate mentally, we defined the threshold as SaO 2 below the 5% mortality level on room air or if the patient received any supplementary oxygen.

Performance of the models
Discriminatory power (the ability to identify the participants at highest risk) for both the full and simplified models was assessed using area under the receiver-operating characteristic curve (AUROC). [27] Calibration (ie, the ability to correctly estimate risk of death) was tested using the Hosmer-Lemeshow goodness-of-fit test [28] for the full model and Pearson's χ 2 goodness-of-fit test for the simplified model. To further explore the calibration of our simplified model, we decided to replicate the method introduced by Seymour et al. [29] Briefly, we first predicted the probabilities of the individual scores using logistic regression analysis and then calculated the Hosmer-Lemeshow goodness-of-fit test.
Discriminatory power was considered to be excellent when AUROC was over 0.8, [28] and calibration was considered acceptable when the goodness-of-fit test reached P>05. [28]

Sensitivity analysis
We planned an extensive set of sensitivity analyses. Our primary concern was missing data, and we reran the analysis using list-wise deletion and imputation of the mean instead of multiple imputation. [24][25][26] Development of our full model was not automated and could potentially be affected by irrational preferences. We performed an automated model development using stepwise regression with backward elimination initially using both all nine potential independent variables and only the prioritized variables (in case of too few events).
LOI is not widely used in risk stratification, and there is no generally accepted definition. We thus tested two other markers, ie, inability to stand unaided [30] and inability to rise from a chair unaided. [31] Use of SaO 2 /FiO 2 is new in this context. For this reason, we introduced the partial pressure of O 2 (PaO 2 )/FiO 2 as an alternative, as suggested by Rice et al. and Pandharipande. [13,14] PaO 2 was estimated using linear regression.
Our arbitrary choice of a 5% cutoff for the dichotomization in the simplified model was not based on statistical calculation. As an alternative, we applied a 10% cutoff.
Last, we recalculated the simplified model under the assumption that missing values of the variables in the score were normal, ie, that they had a score of 0.

Sample size and descriptive statistics
To define the sample size, we required 90 cases if we were to include nine independent variables. [19][20][21] With an estimated 3% mortality, we required 3000 cases in the development cohort.
Data are reported as mean (standard deviation [SD]) or proportions whenever appropriate, with the 95% confidence interval (CI) when applicable. Stata version 12.1 (Stata Corp LP, College Station, Texas, USA) was used for analyses.

Results
We had 3046 admissions (2608 patients) in the development cohort; 2848 (2463 patients) in the first validation cohort; 2561 (2210 patients) in the second validation cohort; and all were included in the study. Seventy-six patients (2.5%) died within seven days from admission in the development cohort, as did 57 patients (2.0%) in the first validation cohort and 111 (4.3%) in the second. Patients who died had a higher age, pulse, blood glucose, and respiratory rate but a lower systolic blood pressure, temperature, and SaO 2 /FiO 2 while fewer were alert and more had lost their independence. Characteristics of the admissions can be found in Table 1.

Development of the full model
We could, according to the number of outcomes (fewest in the first validation cohort), analyze six independent variables and had, as previously stated, prioritized LOI, systolic blood pressure, age, SaO 2 /FiO 2 , respiratory rate, and level of consciousness. All were associated with the endpoint in univariable analyses.
Using multivariable logistic regression, we found systolic blood pressure, age, respiratory rate, SaO 2 /FiO 2 , and LOI to be associated with the endpoint whereas loss of consciousness was not (see S1 Table). We did not identify interaction between variables and found no evidence of deviation from linearity (see also S1 Text). The full model is presented in Table 2.

Development of the simplified model
Mortality rose above 5% when systolic blood pressure was 115 mmHg, age 80 years, respiratory rate 25 breaths per minute, and SaO 2 93%. These limits, any use of supplementary oxygen, and LOI were used as cutoffs in our simplified model, allowing for a score ranging from 0-5 (Table 2). We named our simplified model the PARIS score, derived from systolic blood Pressure, Age, Respiratory rate, loss of Independence and peripheral oxygen Saturation.

Sensitivity analyses
Our sensitivity analyses did not lead to improvement or major deviations from our models (see S2, S3, S4 and S5 Tables).

Performance of the models
The discriminatory power was excellent (AUROC0.87) and the calibration good for the full model in all cohorts (Table 3). In the PARIS score, we found excellent discriminatory power  (AUROC0.86) in all cohorts, and calibration was acceptable in the first validation cohort but failed in the second validation cohort (Table 3).
In the PARIS score, seven-day mortality increased with increasing score (Fig 1). With a score of three or higher, sensitivity was 74.0%, specificity 85.9%, positive predictive value 11.9%, and negative predictive value 99.2% in the development cohort. Sensitivity was lower in the validation cohorts, specificity was slightly higher, and the negative predictive value remained high (Table 4). Patients with score 1 had mortality 1.1%; with 2, mortality was 1.9-4.6%; and 3, mortality was 8.3% (S6 and S7 Tables).

Discussion
We have developed and validated a risk stratification system that can predict seven-day allcause mortality for acutely admitted medical patients. Using five easily obtainable variables (ie, Table 3. Performance measures of the models, both discriminatory power (ability to identify patients at increased risk) and calibration (precision in predictions).

Measure
Full  Use of risk stratification tools might help the clinician but is not without important limitations. Statistics, chance, and human perseverance dictate that even the best risk stratification system will not be completely accurate and patients predicted to be at low risk might eventually die. This is one reason why authors have advocated that these systems should be used with caution on individual patients, [32][33][34][35] as our data remind us. Even with a cutoff of 1, two patients in the development cohort would have been designated as low risk yet still died (Table 4).
Clinical assessment relying on experience alone is an interesting alternative to complex models. However, clinical assessment alone has never been scientifically proven as a strong predictive tool in an admission unit. Data from other environments suggest that it has limitations. Comparing a clinician gut feeling to clinical features (eg, medical history, observation, and clinical examination), Van den Bruel et al. found that gut feeling could identify sick children missed by clinical features at a cost of decreased specificity. [36] Asking attending physicians, residents, and nurses to predict in-hospital mortality of medical ICU patients, Meadow et al. found a high level of discordant predictions, and only 52% of the patients predicted to die actually died while 15% survived unexpectedly. [37] Our PARIS score is not perfect either. Use without critical evaluation will lead to cases being missed. If the suggested cutoff of 3 is implemented, 13-29 patients will be missed and 198-273 falsely identified. Development of more accurate models is needed.
Compared to clinical experience, risk stratification systems have some advantages. First of all, they are expected to have better intra-and inter-observer reliability because fewer parameters are subject to interpretation. Second, they should have improved external validity because they do not require exactly the same clinicians to be present at each institution to make the prediction. Last, most scores can be calculated automatically once the staff has collected the information. The predicted mortality could then be added to the overall picture and provide another piece of the puzzle for the physician. At this point, we do not know to which degree risk stratification systems supplement performance in clinical practice, and further studies are warranted. We provide two models, a complex (full) model with a precise prediction of mortality and a simplified model with a score for seven-day mortality (the PARIS score). Both models have their place in a MAU. The full model, although precise, is difficult to calculate and requires computational support. Discriminatory power is excellent and calibration good even in an external environment. We believe that the full model is best suited for research purposes (eg, comparing cohorts). The PARIS score can easily be calculated mentally. Discriminatory power is excellent, but calibration in an external environment was not perfect. However, increasing mortality follows increasing scores (Fig 1), and we believe that the PARIS score can be used as an additional tool in identifying patients at increased risk of poor outcome.
The external validity of our models is good. We included all patients admitted, not only patients thought to be of either high or low risk or other select characteristics. Our models have been through rigorous statistical analyses and, most important, validated externally. Our second validation cohort is a completely independent sample from an institution far removed from our own, not only geographically but also in time and in terms of case-mix. In both validation cohorts, the nursing staffs were given a short written and oral introduction to the variables assessed and were fully able to register the necessary information. To further test the generalizability of our score, dr. John Kellett of Nenagh Hospital in Ireland has kindly validated our simplified score. He found a discriminatory power of 0.803 and acceptable calibration (p = 0.08) in an Irish sample and a discriminatory power of 0.714 and good calibration (p = 0.27) in a Ugandan cohort from Kitovu Hospital (personal communication).
The difference in case-mix (ie, mortality) between the two institutions would serve to explain the differences in negative and positive predictive values (as well as calibration) in the second validation cohort. With mortality almost twice as high (for multifactorial reasons, eg, access to outpatient evaluation, proportion of urban population, and decision to admit made by attending rather than resident physicians), this scenario is expected.
Our study has limitations and weaknesses. First, we were affected by missing data (especially LOI and respiratory rate), and to compensate, we used multiple imputation. However, our extensive sensitivity analyses proved that this was not a problem. Second, we had a limited casemix because we have evaluated our models only on medical patients. However, within this spectrum, our models have proven to be reliable although they still must be tested on surgical patients. Also, our first two cohorts are very similar. Only the second validation cohort differs significantly. Therefore, further validation in lager groups of medical patients is warranted. Third, use of LOI is unconventional. It is not routinely documented, but we decided to include it regardless because previous studies have shown that its inclusion improves models. [10] Fourth, our model is limited by not including specific variables on co-morbidity and physical capacity. To compensate, we added LOI as this can be seen as a general marker of capability. Last, we have not assessed inter-observer reliability of our models or tested reproducibility.
From a patient, clinician, and organizational perspective, a risk stratification model has no meaning in itself. The true value lies in its ability to guide the clinician to deliver improved care. The optimal measure would be reduced seven-day mortality after implementation, but we have not performed an impact analysis; therefore, we still need to test whether our model will improve patient care.

Conclusions
We have shown and validated that seven-day all-cause mortality can be predicted with excellent discriminatory power and acceptable calibration upon admission for acutely admitted medical patients. Before our models should be used in clinical practice, there still is a need for further independent validation studies as well as a randomized trial to evaluate patient outcome when the scoring system is used.
Supporting Information S1 Text. Additional methods. (DOCX) S1 Table. Internal validation in the development cohort using bootstrapping with 1984 replications. (DOCX) S2 Table. Logistic regression using two alternative definitions of loss of independence, ie, ability to stand unaided and unable to get out of a chair unaided. (DOCX) S3 Table. Performance measures using two alternative definitions of loss of independence, ie, ability to stand unaided and unable to get out of a chair unaided. (DOCX) S4 Table. Missing data in all three cohorts, data presented as number (%). (DOCX) S5 Table. Logistic regression of the full model using list-wise deletion without multiple imputation. (DOCX) S6 Table. Seven-day mortality in the simplified model in each of the three cohorts, number (%). (DOCX) S7 Table. Logistic regressions of the simplified score, both univariable and multivariable analyses; CI, confidence interval. (DOCX)