We collected a multi-centric retrospective dataset of patients (N = 213) who were admitted to ten hospitals in Czech Republic and tested positive for SARS-CoV-2 during the early phases of the pandemic in March—October 2020. The dataset contains baseline patient characteristics, breathing support required, pharmacological treatment received and multiple markers on daily resolution. Patients in the dataset were treated with hydroxychloroquine (N = 108), azithromycin (N = 72), favipiravir (N = 9), convalescent plasma (N = 7), dexamethasone (N = 4) and remdesivir (N = 3), often in combination. To explore association between treatments and patient outcomes we performed multiverse analysis, observing how the conclusions change between defensible choices of statistical model, predictors included in the model and other analytical degrees of freedom. Weak evidence to constrain the potential efficacy of azithromycin and favipiravir can be extracted from the data. Additionally, we performed external validation of several proposed prognostic models for Covid-19 severity showing that they mostly perform unsatisfactorily on our dataset.
Citation: Modrák M, Bürkner P-C, Sieger T, Slisz T, Vašáková M, Mesežnikov G, et al. (2021) Disease progression of 213 patients hospitalized with Covid-19 in the Czech Republic in March–October 2020: An exploratory analysis. PLoS ONE 16(10): e0245103. https://doi.org/10.1371/journal.pone.0245103
Editor: Aleksandar R. Zivkovic, Heidelberg University Hospital, GERMANY
Received: December 24, 2020; Accepted: July 15, 2021; Published: October 6, 2021
Copyright: © 2021 Modrák et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data that support the findings of this study are available on request from the corresponding author or the secretariat of the Institute of Microbiology of the Czech Academy of Sciences (contact via email@example.com) for researchers who meet the criteria for access to confidential data. The data are not publicly available due to privacy restrictions imposed by the Ethical committee of General University Hospital in Prague and the GDPR regulation of the European Union. We will be happy to arrange to run any analytical code locally and share the results, provided the code and the results do not leak personal information.
Funding: MM was supported by ELIXIR CZ research infrastructure project (Ministry of Youth, Education and Sports of the Czech Republic, Grant No: LM2018131, https://www.msmt.cz/) including access to computing and storage facilities. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The Covid-19 pandemic caused by severe acute respiratory syndrome coronavirus (SARS-CoV-2) has, as of June 2021, led to over 172 million cases and over 3.7 million deaths. The present study was designed and conducted during March—October 2020, when Czech Republic experienced a relatively mild first wave of the pandemic due to early and strict lockdowns. Low numbers of cases continued throughout the summer but during September and October, after most of the data collection for this study concluded, the number of cases was raising again. On October 1st 2020, Czech Republic had accumulated 74283 total confirmed Covid-19 cases and 704 confirmed Covid-19 related deaths .
At the time the study was conducted, the proposed treatments included antivirals approved for other indications (chloroquine, hydroxychloroquine, lopinavir/ritonavir, remdesivir, favipiravir, umifenovir), azithromycin, corticosteroids, immunoglobulins, tocilizumab and convalescent plasma [2, 3]. Notably, the anti-malarial and anti-rheumatic drug hydroxychloroquine and the macrolide antibiotic azithromycin showed promise in early data and were broadly available and thus were frequently used in the early stages of the pandemic. Remdesivir, previously designed and approved for Ebola, SARS and MERS, also reported good initial results. However, during spring and summer 2020 remdesivir was available in Czech Republic only in limited amounts via compassionate use programme. The RECOVERY trial reported positive results of coroticosteriod dexamethasone for severe cases in June 2020 , but at this point the number of Covid-19 patients hospitalized in Czech Republic was low and dexamethasone thus did not see wider use until later in the pandemic.
Our understanding of the efficacy of Covid-19 treatments has improved substantially since the present study was conducted. As of April 2021, the pharmacological treatments that were deemed to be beneficial for at least one outcome in a systematic review of randomized trials were the corticosteroid dexamethasone (mortality, mechanical ventilation), colchicine (mortality, length of hospital stay), the antiviral remdesivir (mechanical ventilation), Janus kinase inhibitors (mechanical ventilation, duration of ventilation), IL-6 inhibitors (mechanical ventilation, length of hospital stay), the antiviral favipiravir (length of hospital stay, resolution of symptoms) and the anti-androgen proxalutimide (admission to hospital). Hydroxychloroquine, interferon beta, lopinavir-ritonavir, azithromycin, vitamin C, vitamin D, anticoagulants and ACE inhibitors were considered to not be better than standard of care and lopinvair-ritonavir showed evidence of harm, although most of the conclusions were considered to be of low certainty . Interestingly, in observational studies, hydroxychloroquine was often found to be associated with better outcomes [6–8]. No benefit was also observed in a meta-analysis of randomized trials of convalescent plasma treatment .
High IL-6, D-dimer values were observed to be associated with worse outcome and increased disease severity . Large study of electronic health records  showed an increase in C-reactive protein in early disease and increase of D-dimer and white blood cell count in later stages of the disease.
An ongoing challenge in evaluating Covid-19 treatments is that the analysis and interpretation of the data is often inappropriate or misleading, most notably interpreting lack of evidence due to small sample size as evidence of no effect [12, 13].
Additionally, many methods for predicting disease severity of Covid-19 were published, but the methods are at high risk of bias and lack external validation .
The present study aims to describe the outcomes and disease course of hospitalized patients with mild to severe clinical presentation in a multicentric Czech cohort during the early stages of the pandemic, explore the association between the outcomes and pharmacological interventions and to provide external validation to previously published prognostic models for Covid-19 severity.
Patients and study design
A convenience sample of patients from 10 sites was collected. The study sites span the whole spectrum of sizes from large university hospitals in major cities with multiple dedicated Covid-19 wards (Thomayer University Hospital in Prague, Motol University Hospital in Prague, Kralovské Vinohrady University Hospital in Prague, General University Hospital in Prague, University Hospital in Pilsen) through major regional/specialized hospitals (Na Homolce Hospital in Prague, Military Hospital Olomouc) as well as smaller hospitals caring for just several Covid-19 patients at a time (AGEL Hospital Nový Jičín, Hořovice Hospital, Třebíč Hospital). The sites were chosen based on availability and willingness of the personnel to participate in data collection. None of the study sites was exclusively dedicated to treating Covid-19 patients. For each site, the dataset contains all patients hospitalized in the participating wards over the data collection period. The data collection started at the onset of the Covid-19 pandemic in March 2020 (except for one site where some older records were inaccessible), but the end date for collection differs between sites due to time constraints of the participating physicians. Three sites included total of 23 patients that could be considered part of “second wave” (admitted after September 1st). Last patient included in the dataset was admitted on October 12th. See S1 Fig for per-site data collection periods. Patients over the age of 18 were included if they had PCR-confirmed infection of SARS-CoV-2 and were not participating in a clinical trial of any Covid-19 pharmacotherapy.
Not all patients developed pneumonia or other symptoms of Covid-19. All patients received the standard of care which could include supplemental oxygen and ventilation and antibiotics for bacterial superinfections, as determined by the attending physician. Some patients were not indicated for all treatment modalities (especially mechanical ventilation) based on decision of the attending physician and underlying patient condition. We note that the participating sites were not homogeneous in either patient population or treatment protocols. The choice of pharmacological treatment was based on the decision of the attending clinician and its availability.
The study was approved by the Ethical committees of General University Hospital, Hospital Nový Jíčín, Motol University Hospital, Thomayer Hospital, University Hospital Vinohrady, Military Hospital Olomouc, Na Homolce Hospital, University Hospital in Pilsen, Hořovice Hospital, Jihlava Hospital, all data were collected in fully anonymized form. Data was collected between June and October 2020 for patients that were treated between March and October 2020.
We collected data on comorbidities and information about disease progression on daily resolution including breathing support required, oxygen flow rate, experimental anti-Covid-19 and antimicrobial drugs taken and several laboratory markers (PCR positivity for SARS-CoV-2, C-reactive protein, D-dimer, Interleukin 6, Ferritin, lymphocyte count). Full protocol for data collection is attached in S1 File and the data collection table in S2 File. Due to very low number of patients using extra-corporeal membrane oxygenation (N = 1) or non-invasive positive pressure ventilation (N = 6) in our sample, we merged those categories with mechanical ventilation.
The character of the convenience sample does not allow for a proper assessment of the association between treatments and patient outcomes, because the treatments had not been assigned to patients at random but were only observed retrospectively. This can be partially remedied by adjusting for patient characteristics in the analysis, but such adjustments will always be imperfect and the analysis needs to be treated as exploratory and interpreted cautiously.
Since many details of analysis may influence the conclusions made, we performed multiverse analysis  and report results for all the hypothesis tested across multiple different models using both frequentist and Bayesian paradigms. For each model class we worked with several possible sets of adjustments. All analyses were performed in the R language , visualization and data cleaning was run with the tidyverse package .
First class of models are frequentist survival and multistate models under the proportional hazards assumption as implemented in the coxph function from the survival package . We primarily use a model with competing risks for death and discharge from hospital (see Fig 1a).
AA = Ambient air, Oxygen = Nasal oxygen, Ventilated = any form of ventilation (non-invasive positive-pressure ventilation, mechanical ventilation and extra-corporeal membrane oxygenation). In all models the ‘Death’ and ‘Discharged’ states are terminal. In the second hidden Markov model (c), the ‘Improving’ and ‘Worsening’ variants of each non-terminal state are not observable—only the breathing support is observed and improving/worsening is inferred from progression of the disease.
Second class of models are Bayesian hidden Markov models (HMM) of disease progression implemented via a custom extension to the brms package . The parametrization of the HMM is inspired by Williams et al. : the actual process of disease is assumed to be continuous and allow only for transitions between neighboring states (as shown in Fig 1b and 1c). The total probability of transition between any two states over the period of a day is then computed as the total probability of transition across all possible paths. This class of models does not satisfy the proportional hazards assumption, instead, it is assumed the process has the Markov property—i.e., that the (potentially unobserved) state and the covariates at a given day contain all the information to determine probabilities of the states on the next day. We use two versions of such models, one working solely with the observed breathing support and one assuming a hidden improving/worsening distinction. All of the hidden Markov models take into account whether best supportive care was initiated and a patient was thus not indicated to progress to more intensive treatment modalities.
Finally, we used a set of Bayesian regression models implemented with the brms package . Those included overall survival, state at day 7 or 28 as either binary or categorical outcome and a Bayesian version of the Cox proportional-hazards model.
Except for age, sex and comorbidities, all covariates are treated as time-varying, e.g., the effect of taking a drug is only included for the days after the drug was taken. More details on the exact model formulations can be found in the supplementary statistical analysis in S3 File.
Evaluating prognostic models
We searched the living systematic review of Covid-19 prognostic models  for those that could be applied to our dataset (i.e., where we have gathered all the input features). We primarily focused on the Area Under Receiver Operating Characteristic Curve (AUC), and its bootstrapped 95% confidence intervals which we computed using the pROC package . When there were multiple reasonable ways to evaluate the outcome or a predictor in our dataset, we computed and reported all of those options. We used two simple scores with age or the decade of age as the sole predictor to have a baseline to compare the scores against.
Complete code for all analyses is available at https://github.com/cas-bioinf/covid19retrospective/.
In total, we were able to gather data for 213 patients, see Table 1 for the overall characteristics of the patient sample and several subgroups we used in the analysis, including treatments taken. Counts of all treatment combinations are shown in S2 and S3 Figs shows outcomes by study site, demonstrating quite large hospital-specific differences. The dataset includes 19 patients already reported in a study of inflammatory signatures of Covid-19 .
In Fig 2 we show the overall disease progression for all patients and in Fig 3 we show the time-course of a subset of the markers we have measured. The data show some interesting patterns: patients with low Interleukin-6 or D-dimer values are overrepresented among patients with better outcomes, most patients had high CRP upon admission and for many the CRP levels stayed elevated over the whole hospitalization. However, the limited nature of the data does not allow for any statistically robust conclusions. We also see that the marker levels were not substantially stratified by study site. Those patterns should however be interpreted with care due to systematic missingness of the data—in particular, patients that fared worse were probably more likely to have the markers measured. However, we believe this kind of patient-level view is useful to appreciate the extent of both between-patient and within-patient variability.
Each vertical strip is a single patient, the ordering on the horizontal axis is by disease severity. Ventilated = any form of ventilation (non-invasive positive-pressure ventilation, mechanical ventilation and extra-corporeal membrane oxygenation).
Each line represents a patient, stratified by the worst breathing support required over the hospitalization. Color indicates study sites. The vertical scale is logarithmic. Ventilated = any form of ventilation (non-invasive positive-pressure ventilation, mechanical ventilation and extra-corporeal membrane oxygenation), CRP = C-reactive protein [mg/l], D-dimer [ng/ml DDU], Ly = lymphocyte count [109/l], IL-6 = Interleukin 6 [ng/l].
The between-patient variability is notable even across outcomes—when ordering the patients by the highest CRP levels experienced throughout the hospital stay, the top 20% of patients that breathed ambient air for the whole hospitalization experienced higher levels than the bottom 20% of patients that required ventilation or died. This overlap is even larger when comparing only against the patients that died and D-dimer, Interleukin-6 and lymphocyte count also show a notably larger overlap than CRP (S4 Fig).
Association between patients’ characteristics and treatments
As noted above, the nature of the convenience sample did not enforce random assignment of treatments to patients. In fact, patients with worse baseline characteristics, which lead to worse outcomes, were less likely to receive hydroxychloroquine (see Fig 4). This clearly creates a bias towards a positive effect of hydroxychloroquine on the outcome (and potentially for other treatments as well—most were used in combination with hydroxychloroquine), which, however, could be false.
Comorbidities were associated with both worse outcome (black) and lower chance of treatment with hydroxychloroquine (red). Dots and lines represent the estimates and the 95% confidence intervals of the log odds ratio of the respective outcome. HCQ: hydroxychloroquine, IHD: ischemic heart disease, HD: hypertension drugs, HF: heart failure history, COPD: chronic obstructive pulmonary disease, LungD: other lung disease, Dia: diabetes, RenalD: renal disease, LiverD: liver disease, HighCr: creatinin above 115 for males or above 97 for females, HighInr: Prothrombin time (Quick test) as International Normalized Ratio above 1.2, LowAlb: albumin in serum/plasma below 36 g/l.
Taken quantitatively, the comorbidities known upon hospitalization were informative with respect to the future hydroxychloroquine treatment: the score representing the cumulative presence of ischemic heart disease, hypertension drugs, former heart failure, COPD, other lung diseases, renal disease, or high creatinine was associated with a lower chance of taking hydroxychloroquine over the course of the hospitalization (the chance was only 79.9%, 95% confidence interval (65.3, 97)%, Chi-square test in the logistic regression model, χ2 = 5.18, df = 1, P = 0.023).
Association between treatments and outcomes
Here, we focus on hydroxychloroquine and azithromycin as those are the only treatments with larger sample size. We also investigate favipiravir as it is less well reported in the literature. Hydroxychloroquine was dosed almost exclusively in a 5-day regime starting with a loading dose of 800mg on the first day and followed by 400mg. Majority of patients complemented hydroxychloroquine with azithromycin while azithromycin was rarely used alone (see Table 1). Azithromycin was most frequently dosed 250 or 500mg/day, but doses ranging from 100mg/day to 1500 mg/day were observed. Favipiravir was used only at one site with a loading dose of 3600mg on the first day, followed by at most 9 days with a 1600mg dose. All but one of the patients receiving favipiravir also received hydroxychloroquine. Treatment was initiated mostly within two days of admission (see S5 Fig).
The results of the multiverse analysis for association between hydroxychloroquine, azithromycin and favipiravir usage and death is shown in Fig 5—here, we only show models that were not found to have immediate problems representing the data well or computational issues (see S3 File for details). Results for all models we tested are reported in S6–S8 Figs, with additional details in S3 File. The results do not change noticeably when only patients from the first wave are included (S6–S8 Figs).
Each row represents a model—Categorical 7/28 = Bayesian categorical regression for state at day 7/28, Bayes Cox = Bayesian version of the Cox proportional hazards model with a binary outcome, Cox (single) = frequentist Cox model with a binary outcome, Cox (competing) = frequentist Cox model using competing risks (as in Fig 1a), HMM A = Bayesian hidden-Markov model as in Fig 1b with predictors for rate groups, HMM B = Bayesian hidden-markov model as in Fig 1b with predictors for individual rates, HMM C = Bayesian hidden-Markov model as in Fig 1c. For frequentist models, we show maximum likelihood estimate and 95% confidence interval, for Bayesian models we show posterior mean and 95% credible interval. The estimands are either log odds-ratio (Categorical, HMM) or log hazard ratio (Cox variants). In all cases coefficient <0 means better patient outcome in the treatment group. Vertical lines indicate zero (blue) and substantial increase or decrease with odds or hazard ratio of 3:2 or 2:3 (green). Additionally the factors the model adjusted for are listed—Site = the study site, admitted = Admitted for Covid-19, Supportive = best supportive care initiated, Comorb. = total number of comorbidities, AZ = took azithromycin, HCQ = took hydroxychloroquine, FPV = took favipiravir, C. plasma = received convalescent plasma.
Most models report that using hydroxychloroquine is associated with lower risk of death. We must however bear in mind the potential bias noted in the previous section. Also, we see that for the HMM models, as we add adjustments the credible intervals do not widen but instead shift towards zero. This is a weak indication that further adjustments could drive the effect towards zero. We did not attempt to model additional adjustments as the models became computationally unstable. The case of hydroxychloroquine serves as a “control group” for our other results—since randomized trials give us high confidence that hydroxychloroquine does not substantially reduce mortality, we can be quite certain the associations we observe for hydroxychloroquine are just a measure of bias in the data. Additionally, our models either cannot determine the sign of association between azithromycin and risk of death or even show an increase in risk of death. This serves as a weak evidence that a substantial improvement in mortality from azithromycin is unlikely.
Most models exclude very strong association between increased risk of death and using favipiravir, but our data are necessarily quite limited, which is reflected in the very wide uncertainties around estimates. We also cannot put strict bounds on the association between favipiravir and length of hospitalization.
We also examined the association between treatments and length of hospital stay for all the patients that survived. Almost all models cannot discern the sign of the association for all treatments examined (S6–S8 Figs). Similarly, we studied the association between D-dimer and Interleukin 6 and outcomes, with unconclusive results as well (S9 Fig).
Published prognostic models are not better than using age as the sole predictor of outcome
Following Wynants et al.  we found five prediction models we were able to recompute: Li et al. report the ACP index  combining CRP and age to form 3 grades, Chen & Liu  report a continuous score using age, CRP, D-dimer and lymphocyte count, Shi et al.  use age, sex and hypertension to form 4 grades, Caramelo et al. use age, sex, hypertension, diabetes, cardiac disease, chronic respiratory disease and cancer to form a continuous score , Bello-Chavolla et al.  use age, diabetes, obesity, pneumonia, chronic kidney disease, COPD and immunosuppression to build a score ranging from -6 to 22. For the latter two scores we had to impute some of the predictors as they had no immediate equivalent in our dataset. The outcomes present in the studies were: 12-day mortality, 30-day mortality and mortality without any further details, here we report results for both 12-day and 30-day mortality. Full details on the scores and how we used our dataset to compute them is given in the S3 File.
All prognostic models we tested performed similarly to or notably worse than using age as the only predictor and also worse than originally reported (Fig 6). Additionally, some publications did not provide enough detail to unambiguously reconstruct how the score and/or outcome was assessed. We thus concur with Wynants et al.  that reported prediction scores are at high risk of bias and need additional careful evaluation.
AUC = 1 means perfect prediction while AUC ≤ 0.5 means that the score is worse than random guess and a better prediction would be obtained by reversing the score (marked by thick blue line). The line ranges represent the bootstrapped 95% confidence intervals. Red dots show results computed in present study—model variants (horizontal axis) vary in the outcome measured (12-day or 30-day mortality, severe disease) and potentially on how ambiguities in score computation were resolved, although this rarely makes a big difference—see S3 File for details. Cyan triangles show AUC as reported by the original authors or recomputed based on their published data. When the confidence interval or the AUC of the original study is not shown, it means that the value was not reported by the authors and not enough information to recompute it was given.
Our data show the extent of between-patient variability in progression of the disease in terms of both length of hospital stay, duration of various types of breathing support and basic markers. A direct comparison with other studies is hard to perform as almost always only summaries of measurements are reported.
For multiple candidate Covid-19 treatments, observational data repeatedly contradicted results of randomized controlled trials (contrast e.g. Catteau et al.  to the RECOVERY trial  for hydroxychloroquine and Liu et al.  to Agarwal et al.  for convalescent plasma). Our results for hydroxychloroquine also fit into this pattern. This should make us wary about over-interpreting the results of this study for azithromycin and favipiravir, although some higher-quality evidence that suggests clinical benefit of favipiravir has been reported .
The current (April 2021) Covid-19 treatment guidelines in Czech Republic recommend monoclonal antibodies and in some cases convalescent plasma or favipiravir as early treatment for high-risk patients with mild or no symptoms. For more severe cases dexamethason and anticoagulants are recommended while remdesivir is recommended only for patients that have severe disease but do not require mechanical ventilation . This is similar to recommendations from the National Institute of Health in the USA who additionally recommend tocilizumab in some cases while not recommending convalescent plasma and favipiravir . We do not believe our results should directly inform clinical decision, though we see some potential for inclusion of our results in future meta-analyses.
Regarding methodology, there are multiple approaches that are—at least in principle—capable of deriving causal conclusions from observational data, most notably the DAG framework [33, 34] and target trial framework [35, 36]. In all approaches—and also in some randomized designs—it is common that substantial uncertainty about the best statistical model for the task at hand remains and cannot be eliminated. Nevertheless, most published papers present results only from a single statistical model. We believe that this uncertainty about model choice should not be ignored, rather we should embrace the uncertainty and employ a multiverse analysis or other forms of robustness checks to explore how our conclusions would differ had different assumptions been adopted. In this work we tried to show how such an analysis could be performed and reported in practice. We note that modelling choices that are often made semi-arbitrarily, e.g., logistic regression for survival at 28 days vs. a Cox proportional hazards model for time to event, did in our case lead to substantially different results.
The hidden-Markov models (HMMs) used in this study are of some interest because they fitted the data well and allowed for inclusion of a larger number of predictors than the Cox proportional hazards model without making the posterior uncertainty unreasonably large. We believe this is because such models make better use of the available detailed data. Additionally, HMMs can be used even when the outcomes are observed only indirectly or noisily—as in the original study that motivated our models which concerns the progression of Alzheimer’s disease . Noisily observed outcomes can pose problems for the proportional hazards model and require some special care [37, 38]. We should however note that the Markov property assumed in HMMs is likely to be a reasonable approximation in fewer settings than the proportional hazards assumption of the Cox model.
Common problems with prognostic models in medicine are small sample sizes used to develop the models, weak or problematic statistical methods and lack of external validation on independent datasets . Those problems are prevalent also in prediction models for Covid-19 . We used our dataset to validate several models and observed very poor performance for four out of the five models tested. In fact, the simple assumption that older patients tend to have worse outcomes provides better or similar results to all of the models we were able to implement. This is despite all of the scores including age as a predictor. There seem to be two causes—three of the models dichotomize the age into just two groups which is known to lose information [40, 41]. Of the other two models Chen & Liu  use stepwise variable selection which is known to be a problematic approch . The resulting model puts largest relative weight on laboratory markers and deemphasizes age. Caramelo et al.  take the decade of age as a very strong predictor and perform the best on our data. Still their results are not distinguishable from just using age. Our findings are almost the same as in a similar but larger validation study using 22 models and 411 patients from the United Kingdom where no tested model provided better prediction for mortality than age alone .
We provide very weak observational evidence against a substantial beneficiary effect of using azithromycin (both with or without hydroxychloroquine) and against substantial negative effect of using favipiravir in hospitalized Covid-19 patients. We also observed better outcomes associated with taking hydroxychloroquine, which is likely linked to substantial confounding by indication. Where our results contradict randomized trials, the most likely explanation is systematic bias in our dataset.
A lesson from our analysis is that the assessment of treatment efficacy from observational data is sensitive to modelling assumptions while it is usually almost impossible to determine which of the models is more likely to reflect reality (if any). We believe that using multiverse analysis is an appropriate way to explore data in such contexts as it lets us be transparent about this sensitivity. We further believe that using hidden Markov models is a promising complement to the standard Cox proportional hazards analysis when detailed information on disease progression is available, particularly because it lets us impose additional structure on the model and thus make inferences with more disease states than would be possible to handle in the Cox framework, making better use of the available data.
Additionally, our experience indicates that a substantial fraction of published prognostic models will perform much worse on new patients than on the datasets they were built for and that external validation is crucial. We suggest that comparing the prognostic models against simple baselines (e.g., decade of age as the single predictor) should be a first step in validation. Furthermore, some of the published scores lack enough information to let others implement the score in the same way.
S1 Fig. Data collection timeline.
Data collection periods at individual sites, showing the range of admission dates of patients included in the study. Note that we cannot provide additional information to link the sites here with data shown elsewhere as that would increase the risk of deanonymization of the patients.
S2 Fig. Treatment combinations.
Upset plot of treatment combinations—each vertical bar displays the number of patients that received the combination indicated by filled dots in the matrix. Horizontal bars show the total number of patients receiveing the given treatment.
S3 Fig. Outcomes per site.
Number of patients and outcomes at the individual sites. The numbers above bars are the exact counts. Hospitalized = still hospitalized at the end of data collection at the site or transferred to other site and lost to followup. Sites are anonymized to preserve patient privacy.
S4 Fig. Markers and outcomes.
Density plots of worst marker values per patient, stratified by worst condition experienced by the patient. For each patient that had a given marker measured, the worst value was taken. Additionally the patients are classified by the worst condition (regardless of the timing relative to the worst marker levels). For each set of patients and marker an empirical density plot of the worst marker values is shown.
S5 Fig. Treatment onset.
Histogram of timing of first treatment relative to admission into one of the study sites. Two patients initiated treatment before admission, which is shown as the negative numbers.
S6 Fig. Association of HCQ with outcomes.
Estimates of model coefficients for association between hydroxychloroquine and main outcomes. The “Suspicious” section shows models that were found to not fit the data well or have computational issues—see supplementary statistical analysis for details. Each row represents a model—Categorical All/7/28 = Bayesian categorical regression for state at last observed day/day 7/day 28, Binary All/7/28 = Bayesian logistic regression for state at last observed day/day 7/day 28, Bayes Cox = Bayesian version of the Cox proportional hazards model with a binary outcome, Cox (single) = frequentist Cox model with a binary outcome, Cox (competing) = frequentist Cox model using competing risks (as in Fig 1a), HMM A = Bayesian hidden-Markov model as in Fig 1b with predictors for rate groups, HMM B = Bayesian hidden-markov model as in Fig 1b with predictors for individual rates, HMM C = Bayesian hidden-Markov model as in Fig 1c. For frequentist models, we show maximum likelihood estimate and 95% confidence interval, for Bayesian models we show posterior mean and 95% credible interval. The estimands are either log odds-ratio (Categorical, HMM) or log hazard ratio (Cox variants) or log ratio of mean duration of hospitalization (HMM duration). In all cases coefficient <0 means better patient outcome in the treatment group. Vertical lines indicate zero (blue) and substantial increase or decrease with odds or hazard ratio of 3:2 or 2:3 (green). Additionally the factors the model adjusted for are listed—Site = the study site, admitted = Admitted for Covid-19, Supportive = best supportive care initiated, Comorb. = total number of comorbidities, AZ = took azithromycin, HCQ = took hydroxychloroquine, FPV = took favipiravir, C. plasma = received convalescent plasma, first wave = only patients admitted before September 1st were included.
S7 Fig. Association of azithromycin with outcomes.
Estimates of model coefficients for association between azithromycin and main outcomes. The “Suspicious” section shows models that were found to not fit the data well or have computational issues—see supplementary statistical analysis for details. Each row represents a model—Categorical All/7/28 = Bayesian categorical regression for state at last observed day/day 7/day 28, Binary All/7/28 = Bayesian logistic regression for state at last observed day/day 7/day 28, Bayes Cox = Bayesian version of the Cox proportional hazards model with a binary outcome, Cox (single) = frequentist Cox model with a binary outcome, Cox (competing) = frequentist Cox model using competing risks (as in Fig 1a), HMM A = Bayesian hidden-Markov model as in Fig 1b with predictors for rate groups, HMM B = Bayesian hidden-markov model as in Fig 1b with predictors for individual rates, HMM C = Bayesian hidden-Markov model as in Fig 1c. For frequentist models, we show maximum likelihood estimate and 95% confidence interval, for Bayesian models we show posterior mean and 95% credible interval. The estimands are either log odds-ratio (Categorical, HMM) or log hazard ratio (Cox variants) or log ratio of mean duration of hospitalization (HMM duration). In all cases coefficient <0 means better patient outcome in the treatment group. Vertical lines indicate zero (blue) and substantial increase or decrease with odds or hazard ratio of 3:2 or 2:3 (green). Additionally the factors the model adjusted for are listed—Site = the study site, admitted = Admitted for Covid-19, Supportive = best supportive care initiated, Comorb. = total number of comorbidities, AZ = took azithromycin, HCQ = took hydroxychloroquine, FPV = took favipiravir, C. plasma = received convalescent plasma, first wave = only patients admitted before September 1st were included.
S8 Fig. Association of favipiravir with outcomes.
Estimates of model coefficients for association between favipiravir and main outcomes. The “Suspicious” section shows models that were found to not fit the data well or have computational issues—see supplementary statistical analysis for details. Each row represents a model—Categorical All/7/28 = Bayesian categorical regression for state at last observed day/day 7/day 28, Binary All/7/28 = Bayesian logistic regression for state at last observed day/day 7/day 28, Bayes Cox = Bayesian version of the Cox proportional hazards model with a binary outcome, Cox (single) = frequentist Cox model with a binary outcome, Cox (competing) = frequentist Cox model using competing risks (as in Fig 1a), HMM A = Bayesian hidden-Markov model as in Fig 1b with predictors for rate groups, HMM B = Bayesian hidden-markov model as in Fig 1b with predictors for individual rates, HMM C = Bayesian hidden-Markov model as in Fig 1c. For frequentist models, we show maximum likelihood estimate and 95% confidence interval, for Bayesian models we show posterior mean and 95% credible interval. The estimands are either log odds-ratio (Categorical, HMM) or log hazard ratio (Cox variants) or log ratio of mean duration of hospitalization (HMM duration). In all cases coefficient <0 means better patient outcome in the treatment group. Vertical lines indicate zero (blue) and substantial increase or decrease with odds or hazard ratio of 3:2 or 2:3 (green). Additionally the factors the model adjusted for are listed—Site = the study site, admitted = Admitted for Covid-19, Supportive = best supportive care initiated, Comorb. = total number of comorbidities, AZ = took azithromycin, HCQ = took hydroxychloroquine, FPV = took favipiravir, C. plasma = received convalescent plasma, first wave = only patients admitted before September 1st were included.
S9 Fig. Association of markers with outcomes.
Estimates of model coefficients (log hazard ratios) for association between markers and death. The “Suspicious” section shows models that were found to not fit the data well or have computational issues, “Problematic” section shows models that were completely broken—see supplementary statistical analysis for details. Each row represents a model—Cox (competing) = frequentist Cox model using competing risks (as in Fig 1a), HMM A = Bayesian hidden-markov model as in Fig 1b with predictors for rate groups, JM = Bayesian joint longitudinal and time-to-event model. For frequentist models, we show maximum likelihood estimate and 95% confidence interval, for Bayesian models we show posterior mean and 95% credible interval. Additionally the factors the model adjusted for are listed—Site = the study site, Supportive = best supportive care initiated, HCQ = took Hydroxychloroquine. We show posterior mean and 95% credible interval.
S2 File. MS Excel table used for data collection.
- 1. COVID-19 v ČR: Otevřené datové sady a sady ke stažení [Internet]. Ministry of Health of the Czech Republic; Available: https://onemocneni-aktualne.mzcr.cz
- 2. Sanders JM, Monogue ML, Jodlowski TZ, Cutrell JB. Pharmacologic Treatments for Coronavirus Disease 2019 (COVID-19): A Review. JAMA. 2020;323: 1824–1836. pmid:32282022
- 3. Tobaiqy M, Qashqary M, Al-Dahery S, Mujallad A, Hershan AA, Kamal MA, et al. Therapeutic management of patients with COVID-19: a systematic review. Infection Prevention in Practice. 2020;2: 100061. pmid:34316558
- 4. Horby P, Lim WS, Emberson J, Mafham M, Bell J, Linsell L, et al. Effect of Dexamethasone in Hospitalized Patients with COVID-19—Preliminary Report [Internet]. Infectious Diseases (except HIV/AIDS); 2020 Jun. Available: http://medrxiv.org/lookup/doi/10.1101/2020.06.22.20137273
- 5. Siemieniuk RA, Bartoszko JJ, Ge L, Zeraatkar D, Izcovich A, Kum E, et al. Drug treatments for covid-19: living systematic review and network meta-analysis. BMJ. 2020;370. pmid:32732190
- 6. Catteau L, Dauby N, Montourcy M, Bottieau E, Hautekiet J, Goetghebeur E, et al. Low-dose hydroxychloroquine therapy and mortality in hospitalised patients with COVID-19: a nationwide observational study of 8075 participants. International Journal of Antimicrobial Agents. 2020; 106144. pmid:32853673
- 7. Del Valle DM, Kim-Schulze S, Huang H-H, Beckmann ND, Nirenberg S, Wang B, et al. An inflammatory cytokine signature predicts COVID-19 severity and survival. Nature Medicine. 2020; 1–8. pmid:32839624
- 8. Castelnuovo AD, Costanzo S, Antinori A, Berselli N, Blandi L, Bruno R, et al. Use of hydroxychloroquine in hospitalised COVID-19 patients is associated with reduced mortality: Findings from the observational multicentre Italian CORIST study. European Journal of Internal Medicine. 2020;
- 9. Janiaud P, Axfors C, Schmitt AM, Gloy V, Ebrahimi F, Hepprich M, et al. Association of Convalescent Plasma Treatment With Clinical Outcomes in Patients With COVID-19: A Systematic Review and Meta-analysis. JAMA. 2021;325: 1185. pmid:33635310
- 10. Maeda T, Obata R, Rizk DO D, Kuno T. The association of interleukin-6 value, interleukin inhibitors, and outcomes of patients with COVID-19 in New York City. J Med Virol. 2020; jmv.26365. pmid:32720702
- 11. Brat GA, Weber GM, Gehlenborg N, Avillach P, Palmer NP, Chiovato L, et al. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. npj Digital Medicine. Nature Publishing Group; 2020;3: 1–9. pmid:32864472
- 12. Hood K, Dahly D, Wilkinson J. Statistical review of Efficacy and safety of lopinavir/ritonavir or arbidol in adult patients with mild/moderate COVID-19: an exploratory randomized controlled trial. 2020;
- 13. Hood K, Goulao B, Dahly D, Yap C. Statistical review of Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial [Internet]. Zenodo; 2020 May. Available: https://zenodo.org/record/3819778#.X1yBlotS-Uk
- 14. Wynants L, Calster BV, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369. pmid:32265220
- 15. Steegen S, Tuerlinckx F, Gelman A, Vanpaemel W. Increasing Transparency Through a Multiverse Analysis. Perspect Psychol Sci. 2016;11: 702–712. pmid:27694465
- 16. R Core Team. R: A Language and Environment for Statistical Computing [Internet]. 2020. Available: https://www.R-project.org/
- 17. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. JOSS. 2019;4: 1686.
- 18. Therneau TM, Grambsch PM. Modeling survival data: extending the Cox model. New York: Springer; 2000.
- 19. Bürkner P-C. Advanced Bayesian Multilevel Modeling with the R Package brms. The R Journal. 2018;10: 395.
- 20. Williams JP, Storlie CB, Therneau TM, J CR Jr, Hannig J. A Bayesian Approach to Multistate Hidden Markov Models: Application to Dementia Progression. Journal of the American Statistical Association. 2020;115: 16–31.
- 21. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12: 77. pmid:21414208
- 22. Parackova Z, Zentsova I, Bloomfield M, Vrabcova P, Smetanova J, Klocperk A, et al. Disharmonic Inflammatory Signatures in COVID-19: Augmented Neutrophils’ but Impaired Monocytes’ and Dendritic Cells’ Responsiveness. Cells. Multidisciplinary Digital Publishing Institute; 2020;9: 2206. pmid:33003471
- 23. Lu J, Hu S, Fan R, Liu Z, Yin X, Wang Q, et al. ACP risk grade: a simple mortality index for patients with confirmed or suspected severe acute respiratory syndrome coronavirus 2 disease (COVID-19) during the early stage of outbreak in Wuhan, China. medRxiv. 2020;
- 24. Chen X, Liu Z. Early prediction of mortality risk among severe COVID-19 patients using machine learning. medRxiv. 2020;
- 25. Shi Y, Yu X, Zhao H, Wang H, Zhao R, Sheng J. Host susceptibility to severe COVID-19 and establishment of a host risk score: findings of 487 cases outside Wuhan. Critical Care. 2020;24: 108. pmid:32188484
- 26. Caramelo F, Ferreira N, Oliveiros B. Estimation of risk factors for COVID-19 mortality—preliminary results. medRxiv. 2020;
- 27. Bello-Chavolla OY, Bahena-López JP, Antonio-Villa NE, Vargas-Vázquez A, González-Díaz A, Márquez-Salinas A, et al. Predicting Mortality Due to SARS-CoV-2: A Mechanistic Score Relating Obesity and Diabetes to COVID-19 Outcomes in Mexico. J Clin Endocrinol Metab. 2020;105: 2752–2761. pmid:32474598
- 28. Horby P, Mafham M, Linsell L, Bell JL, Staplin N, Emberson JR, et al. Effect of Hydroxychloroquine in Hospitalized Patients with COVID-19: Preliminary results from a multi-centre, randomized, controlled trial. medRxiv. 2020;
- 29. Liu STH, Lin H-M, Baine I, Wajnberg A, Gumprecht JP, Rahman F, et al. Convalescent plasma treatment of severe COVID-19: a propensity score–matched control study. Nat Med. 2020;26: 1708–1713. pmid:32934372
- 30. Agarwal A, Mukherjee A, Kumar G, Chatterjee P, Bhatnagar T, Malhotra P, et al. Convalescent plasma in the management of moderate COVID-19 in India: An open-label parallel-arm phase II multicentre randomized controlled trial (PLACID Trial). medRxiv. 2020;
- 31. Štefan M, Chrdle A, Husa P, Beneš J, Dlouhý P. Covid-19: diagnostika a léčba [Internet]. Společnost infekčního lékařství ČLS JEP; 2021. Available: https://www.infekce.cz/Covid2019/DPcovid-19_SIL_0421.pdf
- 32. COVID-19 Treatment Guidelines Panel. Coronavirus Disease 2019 (COVID-19)Treatment Guidelines [Internet]. National Institutes of Health; Available: https://www.covid19treatmentguidelines.nih.gov/
- 33. Greenland S, Pearl J, Robins JM. Causal Diagrams for Epidemiologic Research: Epidemiology. 1999;10: 37–48. pmid:9888278
- 34. Arah OA. Analyzing Selection Bias for Credible Causal Inference: When in Doubt, DAG It Out. Epidemiology. 2019;30: 517–520. pmid:31033691
- 35. Lodi S, Phillips A, Lundgren J, Logan R, Sharma S, Cole SR, et al. Effect Estimates in Randomized Trials and Observational Studies: Comparing Apples With Apples. American Journal of Epidemiology. 2019;188: 1569–1577. pmid:31063192
- 36. Hernán MA, Sauer BC, Hernández-Díaz S, Platt R, Shrier I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. Journal of Clinical Epidemiology. 2016;79: 70–75. pmid:27237061
- 37. Meier AS, Richardson BA, Hughes JP. Discrete Proportional Hazards Models for Mismeasured Outcomes. Biometrics. 2003;59: 947–954. pmid:14969473
- 38. Chen Y, Lawrence J, Hung HMJ, Stockbridge N. Methods for Employing Information About Uncertainty of Ascertainment of Events in Clinical Trials. Ther Innov Regul Sci. 2021;55: 197–211. pmid:32870460
- 39. Steyerberg EW, Moons KGM, van der Windt DA, Hayden JA, Perel P, Schroter S, et al. Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research. PLoS Med. 2013;10: e1001381. pmid:23393430
- 40. Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ. 2006;332: 1080.1. pmid:16675816
- 41. Thoresen M. Spurious interaction as a result of categorization. BMC Med Res Methodol. 2019;19: 28. pmid:30732587
- 42. Smith G. Step away from stepwise. J Big Data. 2018;5: 32.
- 43. Gupta RK, Marks M, Samuels THA, Luintel A, Rampling T, Chowdhury H, et al. Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: an observational cohort study. Eur Respir J. 2020;56: 2003498. pmid:32978307