Development and validation of a prediction model for tocilizumab failure in hospitalized patients with SARS-CoV-2 infection

Background The aim of this secondary analysis of the TESEO cohort is to identify, early in the course of treatment with tocilizumab, factors associated with the risk of progressing to mechanical ventilation and death and develop a risk score to estimate the risk of this outcome according to patients’ profile. Methods Patients with COVID-19 severe pneumonia receiving standard of care + tocilizumab who were alive and free from mechanical ventilation at day 6 after treatment initiation were included in this retrospective, multicenter cohort study. Multivariable logistic regression models were built to identify predictors of mechanical ventilation or death by day-28 from treatment initiation and β-coefficients were used to develop a risk score. Secondary outcome was mortality. Patients with the same inclusion criteria as the derivation cohort from 3 independent hospitals were used as validation cohort. Results 266 patients treated with tocilizumab were included. By day 28 of hospital follow-up post treatment initiation, 40 (15%) underwent mechanical ventilation or died [26 (10%)]. At multivariable analysis, sex, day-4 PaO2/FiO2 ratio, platelets and CRP were independently associated with the risk of developing the study outcomes and were used to generate the proposed risk score. The accuracy of the score in AUC was 0.80 and 0.70 in internal validation and test for the composite endpoint and 0.92 and 0.69 for death, respectively. Conclusions Our score could assist clinicians in identifying, early after tocilizumab administration, patients who are likely to progress to mechanical ventilation or death, so that they could be selected for eventual rescue therapies.


Introduction
After 10 months of SARS-CoV-2 pandemic, it is still unclear which should be the best standard of care for the treatment of COVID-19. On the basis of large randomized clinical trials, only glucocorticoids were shown to be able to reduce mortality, especially among patients undergoing mechanical ventilation [1]. Nevertheless, mortality rate in the RECOVERY trial remained high at 20%, implying that salvage therapies are needed after the failure of this treatment [1]. A few immunomodulatory agents, mainly tocilizumab, anakinra and baricitinib, have been tested in observational studies on the basis of their possible activity against the cytokine storm characterizing more severe COVID-19 clinical pictures and results have been encouraging [2][3][4][5]. Tocilizumab is a recombinant humanized monoclonal antibody, of the IgG1 class, directed against both the soluble and the membrane bound IL-6 receptor [6]. A systematic review and meta-analysis of 10 observational studies including 1358 patients demonstrated that mortality was 12% lower for COVID-19 patients treated with tocilizumab compared to patients who were not treated. The number needed to treat was 11, suggesting that for every 11 COVID-19 patients treated with tocilizumab, 1 death could be prevented [7]. These results were confirmed by two more recently published meta-analyses including a larger number of studies [8,9].
Tocilizumab was also tested in randomized clinical trials with discordant results. A pharmaceutical company press releases showed no evidence of clinical improvement and reduced risk of mortality when comparing patients treated with tocilizumab vs. placebo in the double-blind COVACTA trial [10], but final results remain unpublished. Similar negative results were obtained prescribing tocilizumab to patients with mild to moderate COVID-19 pneumonia in the attempt to prevent the cytokine storm [11,12]. In contrast, a 44% reduced risk of mechanical ventilation in people treated with tocilizumab was observed in the EMPACTA trial which included patients with a greater severity of disease at enrollment and a significant enrichment in Hispanic, Native American and Black ethnic minorities as compared to the target population examined in COVACTA [13]. Additionally, CORIMUNO, a French multi-centre randomized clinical trial including 129 patients (65 standard of care + 64 tocilizumab), also released a statement regarding the beneficial effect of tocilizumab vs. standard of care in reducing mechanical ventilation and/or mortality at 14 days [14]. More recently, the REMAP-CAP trial conducted in critically ill patients showed a reduction in mortality in patients receiving either tocilizumab or sarilumab. As a consequence of these results, tocilizumab is now recommended for use in the clinics for patients in critical conditions in the UK [15].
Discordant results may be explained by different inclusion criteria, study methods, faulty interpretation of trial results when sample size was small (confusing 'no statistical significance' with 'no effect of the drug') and overall clinical competence in managing severe COVID-19 patients. Indeed, even before the publication of the REMAP-CAP results, the drug has been largely used in clinical practice worldwide as salvage treatment for critical pneumonia in patients failing standard of care. Nevertheless, there is still an appreciable percentage of patients who fail to respond to tocilizumab, for example in the TESEO cohort by day 28, 15% of patients treated with tocilizumab underwent mechanical ventilation or death [4].
Therefore, as many more people will be now treated with this drug, it is crucial to early identify patients who are likely to fail tocilizumab, so that treatment could be intensified or modified. The aim of this analysis was to identify early predictors of tocilizumab failure in order to develop and validate a risk score to predict mechanical ventilation or death within 28 days of follow-up.

Methods
This is a secondary analysis of the Tocilizumab in Patients with Severe COVID-19 Pneumonia (TESEO) Study [4], restricted to people treated with tocilizumab. It is a retrospective, observational cohort study done in three tertiary care hospitals in the Emilia-Romagna region, Italy, on patients with COVID-19 severe pneumonia. All consecutively enrolled adult patients (�18 years) with severe COVID-19 pneumonia, defined by the presence of at least one of the following: a respiratory rate (RR) � 30 breaths per minute (bpm), peripheral blood oxygen saturation (SaO 2 ) � 93%, a PaO 2 /FiO 2 ratio < 300 mmHg in room air and lung infiltrates >50% within 24-48 hours were included in the study [16,17].
All patients with moderate and severe COVID-19 disease requiring hospital admission were considered for analysis. Patients were admitted from 12 March 2020 to 28 July 2020. The retrospective data were fully anonymized (the only sensitive data was year of birth) and were assessed on 9 September 2020. The sample of patients analyzed is representative, as the majority of COVID-19 cases during the first wave of epidemic were observed in Lombardy, Veneto and Emilia-Romagna.
A non-randomly selected subset of patients was selected to receive tocilizumab after failing standard of care (SoC) of hydoxycloroquine, lopinavir and low molecular weight heparin. Tocilizumab was given at the dose of 8 mg/kg given intravenously twice 12 hours apart and subcutaneously at a dose of 162 mg administered in two simultaneous doses. The study was approved by the Regional Ethical Committee of Emilia Romagna. This analysis focuses on the subset of people in TESEO who were treated with SoC+ tocilizumab.

Statistical analysis
For a subset of markers collected in at least one of the tertiary care hospitals contributing data to this analysis, we defined the following 3 time points: baseline, day4 and day9 values. Baseline was defined as the most recent value collected prior to the initiation of tocilizumab. Day 4 value was the most recent value in the time window [+2;+6] days from baseline and day 9 was defied as the most recent value in the time window [+7;+11] days from tocilizumab infusion. We also calculated the two changes, as compared to baseline levels, at day 4 and day 9, respectively.
The main outcome was initiation of mechanical ventilation or death by day 28 from the date of starting tocilizumab and the secondary outcome was death. Of note, at day 6 from starting tocilizumab participants had to be alive and free from invasive mechanical ventilation to be included in the analysis. This is because day6 was the upper limit of the defined day4 window. A similar selection was used for the day9 values but not used for the main analysis. People who developed the event over follow-up and up to day 28 from starting the drug, were labelled as cases and the remaining controls.
Proportion of females was described and compared by case-control status. Mean and standard deviation (SD) of age, sequential organ failure assessment (SOFA) score and a set of biomarkers at each of the 3 time points were also compared in cases and controls. Chi-square test was used to compare proportions and unpaired t-test was used to compare mean values.
Factors who were most strongly associated (by chi-square and t-test) with the primary endpoint in univariable analysis were selected as candidates to be included in a prediction score. Kaplan-Meier method was used to estimate the percentage of patients still event-free starting from 4 days after tocilizumab initiation.
Univariable and multivariable associations between these factors and the risk of outcome were estimated by means of logistic regression and the magnitude of the association expressed by means of Odds Ratios (OR).
Participants were grouped according to the median value of the markers and ORs of the primary endpoint associated with a value below/above the median were shown.
In order to evaluate the goodness of fit of the model and its ability to predict the outcomes, the area under the receiver operating characteristic (ROC) curve was calculated. This was first calculated in the training set (including all patients in TESEO) and compared to the classifier with AUC = 0.5 using a Mann-Whitney statistics and chi-square test. In order to control for extra-sample variation, a leave-one-out cross validation (CV) was implemented. This amounts to a K-fold cross validation, with K equal to the total number of participants N. This means that N separate times, the function approximator is trained on all the data except for one point and a prediction is made for that point. The AUC in CV was computed and used to evaluate the predictive ability of the model by comparing it with the value obtained on the training set.
In addition, we used an independent sample of patients seen in other three Italian hospitals with identical inclusion criteria and definition of outcomes (Niguarda-Milano, INMI-Rome and Padua) to test the score on external data (the test set).
Symmetrical analyses were performed for the primary and secondary endpoints. For the purpose of simplicity and to broadly categorize the risk of failing tocilizumab, the study population was divided into those with low (0-10%), moderate (11-20%), and high (>20%) risk of day-28 mechanical ventilation/death. These groups were matched to actual risk ranges calculated from the propensity score formula below (1). Exact individual's risk can be calculated as per this formula by entering the individual's own demographics and biomarkers values.

IRB approval and data information
The study was approved by Regional ethical committees of Emilia Romagna (Modena and Bologna), Lombardy (Niguarda-Milano), Lazio (INMI-Rome) and Veneto (Padua).
In Modena and Bologna all tocilizumab-treated patients provided verbal, not written, informed consent due to isolation precautions, while the patients from Niguarda-Milano, INMI-Rome and Padua provided written informed consent.

Results
Preliminary exploration analyses in the whole dataset of 323 individuals treated with tocilizumab at the Clinics of Infectious Diseases of Modena, Reggio Emilia and Bologna identified the day-4 values for the markers to have the strongest association with the primary outcome (S1 Table). We consequently restricted the analysis to the subset of 266 patients who at day 6 after starting the treatment were still alive and free from mechanical ventilation and for whom day-4 PaO 2 /FiO 2 ratio, platelets and C-reactive protein (CRP) values were also available. By day 28 of hospital follow-up post treatment initiation, 40 of these (15%) were put under mechanical ventilation or died. Of these events, 26 (10%) were deaths. Table 1 shows the main demographic characteristics and average markers values recorded at baseline, day 4 and day 9 after starting tocilizumab, as well as day-4 and day-9 marker changes from baseline. The PaO 2 /FiO 2 ratio was greater in controls at baseline and remained stable over day0-day9 while in controls there was an appreciable deterioration over time from 221 to 157 mmHg (a value approximating the indication for mechanical ventilation) ( Table 1 In a screening univariable analysis, gender and day-4 PaO 2 /FiO 2 ratio, platelets and CRP were the factors showing the strongest association with the composite outcome of day-28 mechanical ventilation or death. Baseline SOFA score and respiratory rates, other markers of COVID-19 disease severity, were also strongly associated with the risk of outcome but were not further considered for the construction of the tocilizumab response prediction score because too correlated with the day-4 PaO 2 /FiO 2 ratio value. Other markers considered (IL-6, D-dimer and total lymphocytes at the various time-points) showed less strong associations and were also discarded at this stage. Table 2 shows the magnitude of the effect of the four factors selected at univariable analysis from fitting a logistic regression model of the day-28 odds of developing the composite event.
In the unadjusted analysis, all factors were associated with at least a 2-fold difference in risk of developing the outcome, results which were confirmed after controlling for gender alone and after full mutual adjustment (Table 2).
When these variables were fitted as binary variables classifying patients according to whether they had a value above or below the median of the study population values, day-4 PaO 2 /FiO 2 ratio showed the largest association with an aOR of developing the composite endpoint of 18.9 (95% CI: 4.14-86.6, p<0.001, Table 3) comparing people with values below and above the median of 209 mmHg in PaO 2 /FiO 2 .
The prediction score was constructed as a linear predictor of the four factors identified in the screening phase with sex as a binary variable and the 3 biomarkers fitted in the log10 scale. The area under the curve (AUC) of the ROC curve using the training dataset was AUC = 0.89, showing a good trade-off between sensitivity and specificity (S1 Fig). As expected, as the performance of the model is overestimated in training, the AUC was significantly better than a random classification of the participants (Mann-Whitney test vs. AUC = 0.5, p-value<0.001). Of note, the internal validation, by means of CV, provided only a slightly lower value for AUC = 0.87 (Fig 1).
On the basis of the estimates of the logistic regression analysis, a simplified prediction score has been developed allocating weights to each of the four components of the score. Thus, a person with a day-4 platelet value above the median was given a score of +3, female gender and a day-4 CRP value above the median both a score of +4 and a day-4 PaO 2 /FiO 2 ratio value above the median the largest weight of +6 (Table 4). The final score for a person is then obtained by summing the weights of each of the four variables. This sum total score was used to create three risk groups as follows: low risk (total score 0-4, 0-10% risk), intermediate risk (total score 5-9, 10-20% risk) and high risk (total score �10, >20% risk). Thus, for a virtual participant who is male (+4), with a CRP> 1.24 mg/dL (+4), a PaO 2 /FiO 2 ratio >210 mmHg (0) and platelets> 334 cells/mm 3 (0) the sum risk score would be 4+4 = +8, placing her in the intermediate risk category. The exact propensity risk for this person estimated from the logistic regression model is 12.4% (it can be calculated from the formula (1) described in the Methods and shown again at the bottom of Table 4). On average, 40/106 (38%, 95% CI 29%-48%) of individuals in the training dataset classified as having a high risk of mechanical ventilation or death experienced this composite event by day 28 vs. 15% and 4% in the intermediate and low risk groups, respectively.
The predictive value of the proposed score was higher when we used death alone as the endpoint (n = 26 events). AUC was 0.94 in training and 0.92 in cross-validation (S3 Fig). We also put together an external validation dataset which included patients treated with tocilizumab in 3 independent clinics in Italy and satisfying the same inclusion criteria and endpoint definitions used for the training population. This included 36 patients treated at INMI Spallanzani Roma, 29 at Niguarda hospital Milano and 20 at Padua hospital for a total of 85 additional participants. In this set, 17% were females, median (IQR) PaO 2 /FiO 2 ratio, platelets and CRP were 223 mmHg (172-329), 294 cells/mm 3 (248-383) and 1.0 mg/dL (0.4-4.9), respectively. In total, 11 people experienced mechanical ventilation or death, of whom 7 died. AUC in test were 0.70 for the composite endpoint (Fig 2)   we are 95% confident that the score could be successfully applied to predict the outcome of a minimum of 88% of the patients who started the drug.

Discussion
Despite the TESEO study estimates of a substantial reduction in risk of mechanical ventilation in patients treated with SoC+tocilizumab vs. SoC [4], still a non-negligible 15% (40/266) of patients in the tocilizumab group experienced the composite endpoint. The present study gives a tool for early recognition of treatment failure. We propose a novel predictive model which assumes the knowledge of patients' sex and whether the values of one blood gas analysis and two biomarkers measured at day 4 after tocilizumab administration are above or below a certain threshold: PaO 2 /FiO 2 ratio <210 mmHg, CRP >1.23 mg/dL and platelets <333 cells/mm 3 . Our model and risk score had high predictive accuracy and performed similarly in patients treated at different hospitals in Italy, providing independent validation.
As an example, for a virtual male patient who, 4 days after tocilizumab administration, still has relatively preserved respiratory function (e.g. with a PaO 2 /FiO 2 ratio> 210 mmHg, a value well above the indication for starting invasive mechanical ventilation) the score produces an estimated risk of >20% of failing to respond to the drug if her CRP is >1.24 mg/dL and platelets are >333 cells/mm 3 (Table 4).
In support of the composition of our risk score, a recent meta-analysis showed that tocilizumab treatment was associated with reduction in a number of biomarkers including C-reactive protein [18]. Absolute values at baseline did not predict and, although the greatest change was observed at day 9, both absolute values and the changes from baseline at day 9 showed weaker associations with the outcomes and were not included in the algorithm.
Several risk score to predict mortality or respiratory failure in the whole population of patients admitted to hospital with COVID-19 disease have been published, using a large number of variables and a variety of statistical approaches [19][20][21][22]. In these studies, detection of clinical worsening after first-or second-line therapies, seemed to be more challenging to forecast [19]. To our knowledge, our risk score is the first algorithm derived specifically for people treated with tocilizumab. To build the prediction model, we have used our previous experience in developing risk scores in the field of HIV resistance which was easily transferable to the field of COVID-19 research [23]. Our study has several limitations. First, we included a selected population of patients who were still alive and with no need for mechanical ventilation 6 days after starting tocilizumab. This selection may have resulted in an underestimation of the overall mortality risk and our proposed score is only applicable to a similar target population. Of note, the specific inclusion criteria hardly affected the process of variables selection. Second, we cannot rule out that a full machine learning approach with a larger number of parameters as well as their interactions could have led to better values for the AUC in validation and test. For example, the score does not account for extent of pre-existing co-morbidities. On the other hand, the simplicity and transparency should make the score more widely applicable in clinical practice. Third, sample size of the test cohort was not large and the AUC in test for the endpoint death was below 70% which is classified as the minimum for a fair predictive value. Also, data collection was not complete for some biomarkers at all centers and we performed a complete case analysis excluding those with missing data. Another limitation is the fact that the score cannot be calculated before day 4 although the average time from admission to ICU appears to be even shorter at 3.5 days [16]. Nevertheless, the data from our cohort showed that up to 88% of patients were still free from mechanical ventilation and death by day 4 after starting tocilizumab. However, the calibration performance of the prediction model was not assessed and the score should be further externally validated in the future using other test sets of data collected during the more recent waves of the pandemic. Finally, in the real-life setting people, future target populations are likely to be people who have failed glucocorticoids and not the drugs considered as standard of care at the time of this study (e.g. chloroquine) and this might or might not have an impact on the predictive ability of the score.
In conclusion, our study provides an important tool that could assist clinicians in identifying early in the course of treatment with tocilizumab, patients who are likely to progress to mechanical ventilation or death. This algorithm may also rapidly identify patients who qualify for rescue therapies with new promising strategies such as monoclonal antibodies. Of note, the score highlights the fact that it is not only respiratory function but the combination of different factors routinely tested in clinical practice that could alert clinicians to change strategy. We believe that our proposed score could be easily implemented in clinical practice and has the potential to be useful for many more patients now that tocilizumab is recommended for use in those critically ill.