Machine learning predicts the short-term requirement for invasive ventilation among Australian critically ill COVID-19 patients

Objective(s) To use machine learning (ML) to predict short-term requirements for invasive ventilation in patients with COVID-19 admitted to Australian intensive care units (ICUs). Design A machine learning study within a national ICU COVID-19 registry in Australia. Participants Adult patients who were spontaneously breathing and admitted to participating ICUs with laboratory-confirmed COVID-19 from 20 February 2020 to 7 March 2021. Patients intubated on day one of their ICU admission were excluded. Main outcome measures Six machine learning models predicted the requirement for invasive ventilation by day three of ICU admission from variables recorded on the first calendar day of ICU admission; (1) random forest classifier (RF), (2) decision tree classifier (DT), (3) logistic regression (LR), (4) K neighbours classifier (KNN), (5) support vector machine (SVM), and (6) gradient boosted machine (GBM). Cross-validation was used to assess the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity of machine learning models. Results 300 ICU admissions collected from 53 ICUs across Australia were included. The median [IQR] age of patients was 59 [50–69] years, 109 (36%) were female and 60 (20%) required invasive ventilation on day two or three. Random forest and Gradient boosted machine were the best performing algorithms, achieving mean (SD) AUCs of 0.69 (0.06) and 0.68 (0.07), and mean sensitivities of 77 (19%) and 81 (17%), respectively. Conclusion Machine learning can be used to predict subsequent ventilation in patients with COVID-19 who were spontaneously breathing and admitted to Australian ICUs.


Introduction
SARS-CoV-2 is a highly transmissible upper respiratory tract virus that causes coronavirus disease 2019 . A striking feature of COVID-19 is rapidly progressive respiratory failure which develops in approximately 5% of infected adults, typically one week after the onset of coryzal symptoms [1,2]. Globally, two-thirds of adult patients admitted to intensive care with respiratory failure secondary to severe COVID-19 require invasive mechanical ventilation [3]. The institution of mechanical ventilation is strongly associated with poor outcomes in COVID-19-so identifying cohorts at high risk for mechanical ventilation is important to allow therapies to be targeted to specific populations and for resource allocation [4]. Avoiding intubation where possible decreases the risk of the intubation procedure, ventilator-induced lung injury and nosocomial infection. Alternately, delaying an inevitable intubation increases the risk of sudden respiratory arrest and unplanned airway management which exposes staff to a greater risk of infection [5]. Accordingly, developing tools to accurately predict patients at risk of deteriorating is a priority [6].
During the COVID-19 pandemic the prominence of the Electronic Medical Record worldwide has allowed artificial intelligence researchers to interrogate rich databases with machine learning algorithms to improve the speed and accuracy of diagnosis [7,8], analyse response to therapeutic interventions [9], identify susceptible patients based on genomics [10], and predict mortality [11,12]. There is a paucity of artificial intelligence research modelling predictors of mechanical ventilation and no studies utilising Australian data. This is important as a limitation of supervised machine learning models is that they are subject to regional bias [13].
The Short Period Incidence Study of Severe Acute Respiratory Infections (SPRINT-SARI) Australia registry [4] has been prospectively collecting comprehensive data on critically ill patients with COVID-19 admitted to Australian intensive care units (ICU) from February 2020. The aim of this study was to use the SPRINT-SARI database to develop a machine learning algorithm to predict progression to mechanical ventilation within the first three days of admission to an Australian ICU.
of SPRINT-SARI Australia was approved by the Victorian State Government Chief Health Officer (Professor Brett Sutton) as an Enhanced Surveillance Project "to capture detailed clinical, epidemiological and laboratory data relating to COVID-19 patients in the intensive care setting". The requirement for informed consent was waived as was Site Specific Governance at most contributing sites.

Study design, setting and participants
The methodology for SPRINT-SARI Australia has been described in detail elsewhere [4]. In brief, the SPRINT-SARI Australia case report form prospectively collected data on all COVID-19 admissions to participating ICUs. Patients had to have a positive polymerase chain reaction (PCR) test for COVID-19 and require ICU admission. Patients without PCR-confirmed COVID-19 and those < 18 years of age were excluded. Data pertaining to baseline demographics, past medical history, clinical characteristics, treatments, and outcomes were collected prospectively and extracted from the SPRINT-SARI Australia database for patients admitted from 20 February 2020 until 7 March 2021. Consistent with previous machine learning studies in severe COVID-19, our study aimed to predict progression to mechanical ventilation within 72 hours of admission using data from the first calendar of admission [13]. Intubation on the first calendar day of ICU admission was thought to reflect pre-ICU variables such that this time window was excluded.

Variable selection
All available variables were analysed for inclusion in the predictive modelling. Initial exploration of the data involved univariate analysis of variables using Pearson's Chi-Squared test for categorical variables, and Welch two-sample t-tests for continuous variables. All clinically relevant variables were included in machine learning models regardless of univariate significance. Only data available from day one of ICU admission were used as model inputs. Variable reduction/feature selection was trialled on a per-model basis, removing all inputs with a mutual information score of zero. Sensitivity analysis was subsequently performed, comparing the performance of 'full' and 'reduced variable' models. A complete list of the input variables included in the final models can be found in the results.

Outcome definition
A binary outcome variable was defined as "1" if patients received invasive ventilation by either day two or day three of their ICU admission, and "0" if this did not occur [13]. Notably, patients classified as "0" may have ultimately required invasive ventilation at a latter point than day three of their ICU admission. Patients discharged from ICU prior to day three were assigned "0". Deaths within the designated time-frame were included in the final analysis.

Data pre-processing
Continuous variables were rescaled to between 0 and 1 using a min-max approach retaining the shape of the continuous distribution. However, rescaling was not used for tree-based approaches, namely random forest, gradient boosting, or decision tree algorithms. For any missing values (see Appendix C of the S1 File) in the final data frame, k nearest neighbour imputation was performed using R statistical software (version 3.5.3) with k = 5. Based on the study protocol and circumstances surrounding data collection, observations missing at random was deemed to be a fair assumption in the context of this investigation [15,16].
Hyperparameter optimisation was achieved with grid search. Models were supplied the same input variables, and the AUC was the main optimisation metric. Final hyper-parameter values and training metrics are detailed in Appendix A of the S1 File. Machine learning models were constructed using open-source software libraries (Python version 3.6, scikit-learn version 0.24).

Training and evaluation
Five-fold cross-validation repeated four times was used to assess model performance. Metrics measuring performance were the AUC, sensitivity, and specificity; these were calculated using Youden's Index at a per-fold basis To account for class imbalance in the data set, minority class oversampling was applied to the training data using SMOTE [19].

Explanatory model generation
Model accuracy is often achieved through increased complexity, often incurring the cost of compromising explicability. Explanatory modelling of the most performant algorithms in this investigation was achieved with Shapley additive values [20], which provides a unified framework for interpreting feature importance in the context of black-box algorithms. Explanatory modelling was developed for predictions from a test set (20%), from the algorithm trained on a training set (80%). Explanations for correctly classified samples were visualised with a summary plot, where the points on the plot are the change in model output, derived from the Shapley value of that feature, for each patient in the test set [20].

Results
The raw dataset consisted of 608 patients, of whom 387 (63.7%) were not ventilated on day one of admission. A further 87 patients had inadequate data collected on day one of admission (nil bloods data for the given patient at the relevant site and time-point) and were excluded leaving 300 patients from 53 ICUs included in the final analysis. This included 60 (20%) patients who required invasive ventilation by day three of their ICU admission, and 240 (80%) patients who did not. Median (IQR) age for the final dataset was 59 (50-69) years, comprising of 191 (63.7%) male patients. Inputs utilised in the modelling are shown in Tables 1 (discrete variables) and 2 (continuous variables), along with their population characteristics (stratified by whether or not invasive ventilation was required by day 3) and respective p-values. Variable reduction did not yield a statistically significant improvement in model performance (see Appendix D of the S1 File). A further 26 patients from the original cohort went on to require invasive ventilation beyond day three. The median time to invasive ventilation was two days (Range 2-14, IQR 2-4).

Training and model fit
Final performance metrics (see Appendix A of the S1 File) suggest that despite optimized hyperparameter tuning, all models evaluated in this study suffered from a degree of overfitting.

Input variable Received invasive ventilation on day 2 or 3 (%)
Did not receive invasive ventilation on day 2 or 3 (%)

Predicting the need for invasive ventilation by day 2 or 3 of ICU admission
The best overall performing machine learning algorithms were gradient boosted machine and random forest classifier, with mean (SD) AUC of 0.68 (0.07) and 0.69 (0.06) respectively. These models additionally demonstrated high mean (SD) sensitivities of 0.81 (0.17) and 0.77 (0.19) respectively. DeLong's test revealed that there was no significant difference in the performance of gradient boosted machine and random forest classifier (Z = 0.82, p-value = 0.41), and that these both significantly outperformed each of the remaining machine learning algorithms tested. A comprehensive list of DeLong's test coefficients can be found in Appendix B of the S1 File. Second in overall performance was support vector machine, with a mean (SD) AUC of 0.65 (0.08), followed by LR with a mean (SD) AUC of 0.64 (0.08). Decision tree was the poorest performing model tested with a mean (SD) AUC of 0.54 (0.07), representing zero class separation capability. A complete outline of the models tested, their AUCs, and additional performance metrics can be seen in Table 3.

Explanatory modelling
The top 20 most impactful features that contributed to correct sample classification in RF and GBM are seen in Figs 1 and 2, respectively. Blue and red are indicative of higher and lower variable values respectively, whilst left of the X-axis meridian implies favouring a requirement for short term ventilation. For example, in the case of GBM (Fig 1), the estimated weight variable is red left of the Y-axis, and blue to its right. This broadly suggests that the model attributed a higher risk of short term ventilation to overweight patients. Numerous highly weighted features were shared between the two algorithms, with an apparent focus on arterial blood gas derived data including the fraction of inspired oxygen (FiO 2 ), arterial partial pressure of oxygen (PaO 2 ), pH, and base excess. Other laboratory derived data (worst plasma sodium, potassium, and lactate levels) and clinical observations (lowest systolic blood pressure and diastolic blood pressure) were also shared between the two models. Minor differences included that gradient boosted machine utilised pulse oximetry derived arterial oxygen saturation (SaO 2 ) whereas random forest classifier did not, and, conversely, random forest classifier gave relative importance to arterial partial pressure of carbon dioxide (PaCO 2 ) whilst gradient boosted machine did not. The logistic regression coefficients are shown in Fig 3, with blue and red bar colours representing direct and indirect correlation respectively to the requirement for short term ventilation. There were multiple prominent inputs from a linear standpoint that were not deemed important to random forest classifier or gradient boosted machine. These were an array of both clinical (chronic kidney disease, wheeze, skin ulcers, diarrhoea) and demographic features (Aboriginal ethnicity, presence in a healthcare facility with documented COVID-19, close contact with confirmed or suspected COVID-19 case). That being said, a handful of inputs were deemed to be of high utility in both linear and non-linear modelling, particularly arterial blood gas derived values such arterial partial pressure of oxygen.
Whilst univariate analysis (see Tables 1 and 2) was not used for input filtration, the categorical (high flow nasal cannula therapy) and continuous (pH) inputs deemed to be of greatest   significance by univariate analysis nonetheless featured as highly weighted inputs for all three of gradient boosted machine, random forest classifier and logistic regression. Finally, neither age nor sex featured in the top 20 impactful features of the most performant algorithms in this investigation.

Discussion
This is the first study to leverage Artificial Intelligence/Machine Learning to identify readily available clinical risk factors for mechanical ventilation in COVID-19 patients admitted to ICU using Australian data [21]. The population in this study represent a 'grey-area' cohort who have been deemed unwell enough for ICU admission, however, did not require invasive ventilator support on admission to ICU. The high sensitivity (81%) AI-driven tools developed in this investigation, empower institutions to predict resource allocation for COVID-19 patients at risk of requiring intubation in the short term.
Consensus guidelines on when to intubate patients with severe COVID-19 are lacking and the decision to intubate at present is based on the discretion of the treating physician. Early in the pandemic The Chinese Society of Anaesthesiology Task Force on Airway Management advocated for early intubation of patients showing no improvement in respiratory distress and poor oxygenation (PaO2:FiO2 ratio <150 mmHg) after two hours of high flow oxygen or noninvasive ventilation [22]. Concerns regarding aerosolizing the virus with high-flow oxygen and non-invasive ventilation with subsequent increased risk to healthcare workers, further reinforced calls to intubate early [5]. More recently there has been a shift away from protocolised early intubation. A French prospective multicentre observational study of 245 patients with severe COVID-19 categorised early intubation as within the first two days of ICU admission [23]. Patients in the early intubation cohort had higher rates of pneumonia and bacteraemia, longer lengths of ICU stay and increased 60-day mortality (weighted hazard ratio 1.784, 95% CI 1.07-2.83) [23]. A systematic review of 12 studies involving 8944 critically ill patients with COVID-19 found that timing of intubation had no effect on morbidity or mortality [24]. In the absence of traditional evidence-based guidelines to guide timing of intubation, machine learning algorithms have been proposed as a tool to inform this important clinical decision [11][12][13].
Utilising a supervised machine learning algorithm, Arvind et al. used 24-hour admission data to predict mechanical ventilation at 72 hours in 4,087 patients admitted to hospital in New York City (United States) with suspected or confirmed COVID-19 [13]. Using a random forest classifier they demonstrated a superior AUC of 0.84 [11]. In a retrospective study of 1,980 COVID-19 patients in Michigan (United States), Yu et al. used a XGBoost machine learning model to predict mechanical ventilation from emergency department data with a prediction accuracy of 86% (96%CI 0.03) and an AUC of 0.68 [12]. In a single centre prospective observational study of 198 patients admitted to an Infectious Disease Clinic in Modena (Italy), Ferrari et al applied GBM machine learning to predict mechanical ventilation with a superior AUC 0f 0.84 [25]. Finally, Heldt et al. applied machine learning to inpatient data of 879 confirmed COVID-19 patients in London (United Kingdom) to predict risk of ICU admission, need for mechanical ventilation and death [11]. Prediction performance was best with random forest and XGBoost models with AUC of 0.87. The algorithms developed in this study are the first to use Australian data to predict outcomes in critically ill patients with COVID-19. The performance of our GBM model with an AUC of 0.68 and sensitivity of 0.81 is inferior to what has been reported internationally [11][12][13]25]. This is not surprising; our population were critically ill patients that had already deteriorated to the point of requiring admission to the intensive care unit as opposed to previous machine learning models which had been developed on patients in the emergency department or hospital ward [11][12][13]. By virtue of a greater severity of illness at baseline we hypothesize that any signals for deterioration to requiring mechanical ventilation will be more dilute in our critically ill cohort.
Strengths of our study include that it was performed using readily available data from a national database in which data collection was performed by experienced research staff using a standardised case report form. The follow-up rate was high with complete data for the primary outcome of invasive ventilation. Our study also represents a unique high acuity cohort for AI modelling of mechanical ventilation risk. Whereas previous studies modelled data from COVID-19 patients in the emergency department [12] and/or hospital ward [11,13] our cohort were exclusively patients admitted to intensive care. Additionally, it has been shown that the interpretability of the results for time-constrained decision-makers are critical success factors when attempting to integrate automated processes into clinical tasks [26]. Advances in explanatory modelling systems, such as Shapley additive values [20] utilised in this investigation, increase 'black box' transparency and thus clinical interpretability. Taken together, these models highlight the potential for artificial intelligence/machine learning to guide clinical decision making across an array of hospital settings.
There are, however, important limitations. Firstly, we restricted our prediction window to a 72-hour interval as per Arvind et al [13]. This meant that a proportion of patients in our cohort who eventually required invasive ventilation beyond day three of their ICU admission were not detected by the model (26/87 30.2%). This shortened forecasting was deemed appropriate given the median time to ventilation was two days (IQR 2-4 days) and events beyond three days were thought to have less mechanistic link to variables collected on the day of admission [13]. Nevertheless, these models do not predict the risk of mechanical ventilation throughout the entire ICU admission. Secondly, increasing the complexity of ML models, especially in the context of smaller sized datasets such as that utilised in this investigation, can cause overfitting [27]. Although our investigation attempted to address the issues of overfitting via active and appropriate choice of pre-training, hyper-parameter selection, and regularisation [28], all models evaluated in the study suffered from overfitting as indicated by performance discrepancies in the training and test sets during cross-validation (see Appendix A of the S1 File).
Cross validation during hyperparameter optimisation was also not nested, potentially posing a source of bias [29]. These limitations may impact the external validity of these models. Additionally, despite a rigorous and highly protocolised data collection process, the degree of missingness was high for a selection of the variables. The clinical design of this investigation, however, ensured that these values were missing at random, justifying the implementation of conventional imputation. We tested six of the most commonly used classes of machine learning algorithms which were chosen based on their clinical utility in predicting patient deterioration in critical care settings [11][12][13]25]. We acknowledge that there are a multitude of high performing machine learning algorithms with clinical and medical informatics utility and cannot exclude that these additional classes would have superior predictive ability [30][31][32]. During the capture period Australia experienced two distinct 'waves'; an initial wave from 27 February to 30 June 2020 and a second wave from 1 July to the 7 th of March 2021. Due to insufficient sample size we were unable to undertake a time period analysis by COVID wave. Furthermore, we were not able to compare the model to the current standard being intensivist prediction of mechanical ventilation. Machine learning models may be better, worse or the same as the experienced clinician gestalt.

Conclusions
ML models based on readily available demographic, observational and laboratory data can reliably predict short term requirements for invasive ventilation in Australian patients with COVID-19 patients not intubated on day one of their ICU admission. Writing -original draft: Roshan Karri, Mark P. Plummer.