Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

COVID-19 and the kidney: A retrospective analysis of 37 critically ill patients using machine learning

  • Anna Laura Herzog ,

    Roles Data curation, Investigation, Methodology, Project administration, Resources, Visualization, Writing – original draft (ALH); (HKVJD)

    Affiliation Division of Nephrology, Medizinische Klinik I, Transplantationszentrum, University of Würzburg, University Hospital Wuerzburg, Würzburg, Germany

  • Holger K. von Jouanne-Diedrich ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Supervision, Writing – review & editing (ALH); (HKVJD)

    Affiliation Faculty of Engineering, Competence Centre for Artificial Intelligence, TH Aschaffenburg (University of Applied Sciences), Aschaffenburg, Germany

  • Christoph Wanner,

    Roles Supervision, Writing – review & editing

    Affiliation Division of Nephrology, Medizinische Klinik I, University of Würzburg, University Hospital Wuerzburg, Würzburg, Germany

  • Dirk Weismann,

    Roles Resources

    Affiliation Intensive Care Unit, Medizinische Klinik I, University of Würzburg, University Hospital Wuerzburg, Würzburg, Germany

  • Tobias Schlesinger,

    Roles Resources

    Affiliation Department of Anaesthesiology and Intensive Care, University of Würzburg, University Hospital Wuerzburg, Würzburg, Germany

  • Patrick Meybohm,

    Roles Methodology, Supervision, Writing – review & editing

    Affiliation Department of Anaesthesiology and Intensive Care, University of Würzburg, University Hospital Wuerzburg, Würzburg, Germany

  • Jan Stumpner

    Roles Resources

    Affiliation Department of Anaesthesiology and Intensive Care, University of Würzburg, University Hospital Wuerzburg, Würzburg, Germany

COVID-19 and the kidney: A retrospective analysis of 37 critically ill patients using machine learning

  • Anna Laura Herzog, 
  • Holger K. von Jouanne-Diedrich, 
  • Christoph Wanner, 
  • Dirk Weismann, 
  • Tobias Schlesinger, 
  • Patrick Meybohm, 
  • Jan Stumpner



There is evidence that SARS-CoV2 has a particular affinity for kidney tissue and is often associated with kidney failure.


We assessed whether proteinuria can be predictive of kidney failure, the development of chronic kidney disease, and mortality in 37 critically ill COVID-19 patients. We used machine learning (ML) methods as decision trees and cut-off points created by the OneR package to add new aspects, even in smaller cohorts.


Among a total of 37 patients, 24 suffered higher-grade renal failure, 20 of whom required kidney replacement therapy. More than 40% of patients remained on hemodialysis after intensive care unit discharge or died (27%). Due to frequent anuria proteinuria measured in two-thirds of the patients, it was not predictive for the investigated endpoints; albuminuria was higher in patients with AKI 3, but the difference was not significant. ML found cut-off points of >31.4 kg/m2 for BMI and >69 years for age, constructed decision trees with great accuracy, and identified highly predictive variables for outcome and remaining chronic kidney disease.


Different ML methods and their clinical application, especially decision trees, can provide valuable support for clinical decisions. Presence of proteinuria was not predictive of CKD or AKI and should be confirmed in a larger cohort.


In late 2019, a new type of lung disease caused by a previously unknown coronavirus, SARS-CoV2, appeared for the first time in Wuhan, China. This has now become a global pandemic, infecting more than 50 million people worldwide and causing more than 1.2 million deaths by November 2020. In Germany, a highly industrialized European nation with 82 million inhabitants, more than 700,000 infections and almost 12,000 deaths have been reported until the end of October [1].

Eighty percent of those infected suffer from mild symptoms, such as dyspnea (21.9%), coughing (68.6%), fever (88.5%), myalgia (35.8%), and anosmia (47%) [24]. Clinical worsening may occur in 20%, often from 7–10 days after the onset of the disease. Approximately 5% of affected patients require intensive care, with mortality varying between 3% and 50% depending on local factors [58].

Older age, underlying hypertension, high cytokine levels (interleukin [IL]-2R, IL-6, IL-10, and tumor necrosis factor [TNF]-α), and high ferritin levels are significantly associated with severe coronavirus disease 2019 (COVID-19). The estimated mortality is 1.1% in non-severe patients and 32.5% in severe cases during an average 32 days of follow-up [9]. Underlying cardiac or cerebrovascular disease and elevated cardiac troponin also seem to be as predictive [10,11] as hypertension [5], male sex, cardiac injury, and hyperglycemia for severe COVID-19 [9].

The incidence of acute kidney injury (AKI) is high in critically ill patients, affecting almost 60% [12], and seems to be common among severe COVID-19 cases, affecting approximately 20–40% of patients admitted to intensive care [13].

We assessed 37 critically ill patients treated in University Hospital Wuerzburg to evaluate whether proteinuria or albuminuria can be predictive for developing AKI, CKD, or higher mortality. In addition to classical statistical analysis, we used machine learning (ML) methods. To the best of our knowledge, this adds a new dimension to the body of COVID-19 research.

ML is the study of computer algorithms that allow computer programs to automatically improve through experience [14]. The subarea supervised learning is of concern to us here. Supervised learning is the ML task of learning a function that maps an input to an output based on example input-output pairs [15]. The tasks at hand are more specifically classification problems, with the aim to learn the boundary separating the instances of one class from the instances of other classes [16].

We also used the One Rule classification algorithm [17] implemented in the OneR package [18] and Classification and Regression Trees (CARTs) [19] implemented in the rpart package [20].

Patients and methods

Thirty-seven critically ill patients were treated between March and May 2020 in the intensive care unit (ICU) of the Department of Anesthesia and Critical Care and the Department of Internal Medicine I, University Hospital Wuerzburg, Wuerzburg, Germany. The medium patient age was 63 years (range 36 to 84 years) and 76% were male. Time of invasive ventilation was 19 days on average (range 4 to 63 days), and two patients had no invasive ventilation. Patient demographics are given in Table 1. The institutional ethics board of the University of Würzburg approved the study. The need for informed consent from individual patients was waived.

We assessed the sex, age, body mass index (BMI), selective (albuminuria) and non-selective proteinuria (2/3 of patients), troponin T and invasive treatment, such as continuous veno-venous hemodialysis (CVVHD) or kidney replacement therapy (KRT), extracorporeal membrane oxygenation (ECMO), invasive ventilation, state of AKI defined by the Kidney Disease: Improving Global Outcomes (KDIGO) Guidelines, remaining CKD, and NTproBNP. We assessed the clinical condition after intensive care treatment, either death, recovery, or remaining critical illness. Critical illness was defined as requiring transfer to a weaning unit, smaller hospital, or other care facilities, as no patients were initially admitted from nursery homes. CKD at the time of discharge from the ICU was classified as either none or restoration to baseline kidney function (class 1), remaining or worsened CKD without renal replacement therapy (KRT) (class 2), or prolonged need for KRT after ICU release (class 3). Proteinuria, either selective as albuminuria or non-selective, was defined as none (<30 mg/gCrea), moderately increased (30–300 mg/gCrea), or severely increased (>300 mg/gCrea) according to the KDIGO definition [21] Laboratory findings included troponin T und NTproBNP, which were collected daily during the first 7 days, on days 10 and 14, and by the time of demission. The highest value was noted.

Statistical analysis

Statistical analyses were performed using R 4.0.2 [22]. The frequencies of metric variables were expressed as arithmetic mean and standard deviation. If two means of normally distributed data were compared, a two-sided unpaired student’s t-test was used. Means from more than two groups were evaluated using analysis of variance (ANOVA) with post-hoc testing (Tukey’s test) if significant differences occurred.

In addition to classical statistical analysis, we also used ML methods, namely One Rule (OneR) and decision trees. OneR is a simple classification algorithm that generates one rule for each predictor in the data, and then selects the rule with the smallest total error as its one rule. According to Holte, very simple rules can be expected to perform well for most datasets. An additional advantage is the good interpretability of the resulting rules, which is particularly important in a clinical setting. The OneR package contains improvements over the original OneR algorithm in the form of sophisticated handling of numeric data, which allows for the detection of cut-off values [18]. The OneR algorithm has been previously used successfully in medical research [18,23].

To evaluate the quality of classification algorithms, we used confusion matrices. A confusion matrix is a specific table of statistical classification that maps the performance of an ML algorithm and determines the accuracy of a classification by summing correctly predicted patients over the whole population. In addition, the number of false positives and false negatives can be determined.


During the COVID-19 pandemic in 2020, we analyzed 37 patients who were treated at our ICUs between March and May. A total of 28 patients suffered acute kidney failure, 20 of them AKI 3 (54.1%); 9 patients died (24.3%), all with AKI 3. The average invasive ventilation time was 19 days (range 4–63 days), and 12 patients (32.4%) were dependent on ECMO for an average of 12 days. Twenty-two patients had to receive KRT (59.5%) for an average of 17 days. Sixteen patients remained on hemodialysis beyond the inpatient stay (43.2%); we did not perform a structured follow-up for more than 60 days. Seventeen patients were transferred to rehabilitation or weaning centers while still critically ill (45.9%; Table 1).

We used the standard t-test to examine the dependence of mortality on COVID-19 and age and found a significantly higher age among deceased patients, as suggested by prior data. Here, the mean value was almost 70 years (range 52 to 84 years), whereas the 27 surviving patients had a mean age of 60 years (range 36 to 80 years; p = 0.05; Fig 1).

Fig 1.

a, b: Association between age and BMI and COVID-19 mortality. Data are presented as arithmetic mean and media. The mean in survivors was 60 years, and the mean in deceased was almost 70 years. b, Association between BMI and COVID-19 mortality (p = 0.13). The mean in survivors was 28.5 kg/m2, and the mean in deceased was 33.4 kg/qm, which was not significant.

A higher BMI also appeared to be associated with increased mortality, but this effect was not significant in our population (p = 0.12). The mean BMI in the survivor group was 28.6 kg/m2 (range 21.5 to 34.7 kg/m2), whereas the mean BMI among deceased patients was 33.4 kg/m2 (range 25.4 to 54.1 kg/m2; Fig 1b).

The optbin function of the OneR package discretizes all numerical data into categorical bins where the cut points are optimally aligned with the target categories. When building a OneR model, this could result in fewer rules with enhanced accuracy. The cutpoints are calculated by pairwise logistic regressions (method “logreg”) or as the means of the expected values of the respective classes (“naive”). The function is likely to give unsatisfactory results when the distributions of the respective classes are not (linearly) separable [18].

To obtain an exact cut-off point, we used the OneR algorithm with the optbin function on the same variables. The found rules showed that when the BMI was in the range from 21.6 to 31.4 the patient survived, whereas when the BMI was bigger than 31.4 up to 54.1 the patient died. Those rules have an accuracy of 81.08%.

The resulting diagnostic plot can be seen in Fig 2; the found cut-off point of 31.4 is in line with previous reports [24].

Fig 2. OneR model diagnostic plot.

Cut-off point for likely death at BMI = 31.4.

We also conducted urinary analyses for proteinuria. The selective proteinuria (albuminuria) was higher in patients with AKI 3 than in patients with a lower AKI class, but the difference was not significant. However, only nine patients did not suffer acute kidney failure, and three of these patients had no albuminuria test. Overall, albuminuria was slightly lower in this group than in the AKI 1–3 group, but the range of variation was high (Fig 3). In AKI 1, we found an albuminuria range from 0 to 106 mg/gCrea, in AKI 2 the range was 29 to 112 mg/gCrea, in AKI 3 17 to 500 mg/gCrea, and in the group without AKI the range was 0 to 1660 mg/gCrea.

Fig 3. Selective proteinuria in COVID-19 patients with AKI.

AKI acute kidney injury, 1: 0 to 106 mg/gCrea, AKI 2: 29 to 112 mg/gCrea, AKI 3: 17 to 500 mg/gCrea, None: 0 to 1660 mg/gCrea.

In the ANOVA regarding proteinuria and albuminuria and the relationship to developing AKI, remaining CKD, or death, we found no significant relationship. The development of AKI was also not significantly related to death or further remaining CKD (Table 2).

Table 2. Patients in the individual classification groups of albuminuria and non-selective proteinuria regarding AKI, outcome and CKD.

The distribution of patients must be taken into account. In some constellations, the case numbers in the individual groups were very small, and proteinuria was not determined for 12 of 37 patients, mostly due to early anuria. As mentioned above, we also constructed various decision trees using COVID-19, AKI, CKD, outcome, NTproBNP, troponin T, BMI, age, and other variables. Fig 4 shows the first binary decision tree with two decision levels; the target variable was the outcome. Based on the presented data, the tree shows that death is very likely to occur in cases of severe AKI in combination with cardiac damage, expressed by a strongly elevated or dynamic troponin T level. In this case the cut-off was 88 mg/dl. In less severe AKI or no AKI, initial albuminuria seems to determine whether the patient fully recovers or if critical illness remains. In this tree, it looks as if even higher proteinuria would lead to a complete recovery.

Fig 4. CART regarding outcome.

Trop_max: Maximum value of troponin T in μg/l, sel.Prot.cI: Selective proteinuria in mg/g creatinine (1: <30 mg/gCrea, 2: 30–300 mg/gCrea, 3: >300 mg/gCrea). Outcome 1: Death, 2: Remaining illness, 3: Recovering.

Using new CKD as a dependent variable, the occurrence of AKI and cardiac involvement also determines the extent to which recovery of renal function or long-term dependence on KRT can be expected. In our cohort, the risk of remaining dependent on KRT was increased, in the case of dialysis-requiring AKI 3 according to our algorithm, from 726 ng/ml NTproBNP (i.e., cardiac impairment due to COVID-19). In the last level of the tree, troponin T > 782 μg/l, the algorithm detected good chances for a complete recovery of renal function despite increased NTproBNP. In our cohort, two patients had at least recovered and did not retain CKD despite cardiac affection of renal failure, which is reflected in the lowest level of the right arm in Fig 5. The algorithm recognizes the most meaningful variable and maps it to the individual levels. Multiple considerations are possible. By defining the depth of the tree in advance, non-relevant variables are disregarded. In this case, the proteinuria (neither selective nor non-selective) seems to not be meaningful enough to be used in the CART.

Fig 5. CKD outcome determined by AKI.

AKI acute kidney injury, maximum NTproBNP value, and maximum troponin T value. 1: Recovery, 2: Remaining or worsened CKD, 3: KRT.

When we correlated proteinuria and remaining CKD with outcome, the algorithm found a higher mortality in patients with extended dependence on KRT. If renal function fully recovers or remains only slightly impaired, the algorithm also predicts that remaining disease with moderate proteinuria is more common, whereas patients without proteinuria have a better chance of complete recovery. The confusion matrix determined the accuracy with 24 of 37 correctly classified values (64%; Fig 6a and 6b). Prior studies have also found a higher mortality in COVID-19 patients with renal involvement [25,26], but whether proteinuria can be used as a reliable predictive marker is not yet clear [27]. In patients critically ill due to other causes, the presence of proteinuria is certainly a risk factor for the development of AKI and a predictor of higher mortality [28]; for COVID-19 patients, this has yet to be proven in a larger cohort.

Fig 6.

a, b: Outcome and its correlation with proteinuria. CKD 1: Restitution to initial function, CKD 2: Remaining new onset or worsening of existing chronic kidney disease, CKD 3: Remaining KRT. Sel. 1: None, 2: 30 mg/gCrea– 300 mg/gCrea, 3: >300 mg/gCrea, Outcome 1: Death, 2: Remaining illness, 3: Recovery. b, Confusion matrix with distribution of the predicted values in absolute numbers. Relative (not shown) describes the percentage deviation.

If we let the algorithm examine the connection between selective or non-selective proteinuria and the expected outcome, the significance of the results seems rather low. The most meaningful variable of the data is shown at the top of the plot, i.e., forms the basis for the division into the first two groups. In the standard ANOVA, the association between proteinuria and mortality in our patient cohort was not significant, but the number of patients and number of proteinuria values were probably too small.

An important consideration is the depth of the tree. With OneR, decision trees can be built with any depth. In our cohort, it was significantly more likely to require KRT for a longer period of time if dialysis-dependent AKI developed (p<0.05). In this case, the tree consists of only one node, which is easy to interpret but reduces the accuracy of the classification, which is only 70%. The few patients who retained CKD were assigned to one of the other two classes in this two-armed tree (Fig 7).

Fig 7. Remaining CKD, KRT, or recovery depending on the severity of AKI.

CKD 1: Restitution to initial function, CKD 2: New onset or worsening of existing chronic kidney disease, CKD 3: Remaining KRT.


Our observation is that ML methods are adopted reluctantly in medical research, which itself is firmly grounded in classical statistics. Several authors have described the divide between the “two cultures”. The main difference in the two approaches can be described by noting that classical statistics assumes that the data are generated by a given stochastic data model, whereas ML uses algorithmic models and treats the data mechanism as unknown [29].

Though classical statistics can be seen as the foundation of all scientific medicine, ML is only used in selected areas, such as intensive care medicine. Some projects facilitate diagnosis, whereas others try to create early warning systems based on a variety of data, increasing the effectiveness of treatment. An ML approach to predicting ICU readmission has been shown to be significantly more accurate than previously published algorithms in internal validation [30]. Many ML systems being used in medical settings are artificial neural networks, mainly in image recognition (i.e., radiology, histology), and are being used to, for example, differentiate between malignant and benign tumors [31,32]. The larger the number of cases, the more accurately ML algorithms can be used, which of course makes the final result more reliable. A limitation of our study is the small number of cases and the fact that urine samples could not be obtained from all patients, especially if anuria has already occurred. Nevertheless, we tried to present a comprehensive overview of different methods and their potential application in clinical routine.

The problem with neural networks is that they constitute so-called black boxes, which means that their decisions cannot be readily explained [33], which constitutes a significant challenge in a medical setting.

Here, we tried to show that both approaches can coexist and complement one another. Interestingly enough, the ML tree-based methods were developed largely by statisticians in the 1970s [34]. We see them as very well equipped to bridge the gap between the “two cultures” because of their firm grounding in classical statistics and convenient availability as mature packages in the R package ecosystem. As we have shown in this paper, tree-based methods are readily comprehensible and can provide new insights, even for data sets that have already been analyzed with more traditional methods.

The general idea of the OneR algorithm is to go through each attribute and evaluate how well it is able to function as a predictor of the dependent variable. The algorithm creates frequency tables for each attribute, providing the number of occurrences at all different levels of the respective attribute and the dependent variable. For each frequency table (i.e., each attribute), a total error is calculated by summing the minima of each level of the attributes. The attribute with the smallest total error is the attribute that is chosen as the best predictor. The rules that are being generated take every level of this predictor and match it with the most frequent class of the dependent variable [17].

Numeric attributes have to be discretized before they can be used by the OneR algorithm. Different discretization methods exist for implementation of the OneR algorithm (package “OneR”) used in this paper. Significant further enhancement of the original OneR algorithm is achieved by the discretization methods to optimally align cut points in relation to the dependent variable (function “optbin”). The method “infogain” used here is an entropy-based method taken from information theory, which calculates cut points based on “information gain”. The idea is that uncertainty is minimized by making the resulting categories as pure as possible. This method is also the standard method of many decision tree algorithms [18].

Natural generalization of the OneR algorithm is conveyed by decision trees. Though OneR only uses one attribute for its predictions, decision trees are not bounded by this restriction, often resulting in better accuracy but worse interpretability (a trade-off well known in the ML area) [33]. Further generalization is achieved with random forests, which will not be covered in this paper [35].

There are several different decision tree generation algorithms (e.g., ID3, C4.5, and C5.0), we used CART in the rpart implementation [20]. Unlike linear models, such as Pearson correlation or linear regression, decision trees map non-linear relationships well [36]. Interestingly, the opening example of Breiman’s seminal work was a medical example in the area of cardiology [19]. In our population, we used decision trees to create a model predicting mortality or remaining CKD or KRT.

For CART, trees are constructed by repeated splits of subsets of the population into two descendant subsets [19]. With OneR, an attribute can be split into several subsets, but the splits in CART are only binary. For numeric data, cutoff values are determined. The splitting is conducted in a recursive manner, and the same attribute can be used several times on different levels of the resulting tree. Unlike other tree methods and OneR above, the splitting criterion is based not on entropy, but on Gini impurity. In practice, both methods often lead to similar results [23].

An important consideration is the depth of the tree. The deeper a tree, the better it represents the data, but the less interpretable it becomes. An additional problem is overfitting, another well-known problem in the ML literature [37]; a fully grown tree could mean that only one example per leave remains, a result that would render the decision next to useless in practice and would fail to generalize the data (i.e., model the noise in the data). CART prune to an optimal level according to some cost function [38].

We have also performed ANOVA and a standard t-test with the collected parameters. Proteinuria has been commonly observed during SARS-CoV2 infection and is reported in 7 to 63% of cases [25,39]. Proteinuria is mostly reported as unselective due to tubular injury, but in some cases there is a selective proteinuria as an indication of glomerular damage [40]. A direct link between proteinuria and mortality in COVID-19 patients has not been shown, though previous data from critically ill patients due other causes strongly suggest that link [28]. Gross et al. already hypothesized in May that the occurrence of proteinuria could be an early marker of AKI onset or a severe course [41]. Our single center observations included too few patients to transfer this hypothesis to COVID-19, as already mentioned, this is a limitation of our study. This is also a problem for some of the following results, although we were still able to reproduce some findings from previous studies:We found a relationship between higher age and mortality as shown previously in various retrospective studies [5,42,43]. In our study population, the average age of surviving critically ill patients was 60 years, but 69 years among deceased patients. This is also similar to prior results, in which age >65 years was shown to be a risk factor for higher mortality [2,10].

In the ANOVA, we found no significant relationship between selective or non-selective proteinuria and the development of AKI, permanent CKD, or increased mortality or protracted disease progression. Prior data suggest that >40% of the cases are affected with abnormal proteinuria at hospital admission, and 20–40% of the critically ill patients develop AKI [13,39]. In our center, only 9 patients did not experience acute kidney failure (24.3%), and 20 of the affected developed AKI 3 (54.1%). This may be due, among other reasons, to the fact that serum creatinine does not match the baseline creatinine when taken in an already critically ill state. Pei et al. found that 75.4% of 333 patients had abnormal urine dipstick tests or AKI, 50% of them developed AKI 3. Among 35 patients who developed AKI in Guangchang Pei´s work, 45.7% experienced complete recovery of kidney function [26]. We were able to reproduce these results in our patient cohort.

Nine patients died (24.3%), all of them experienced AKI 3 with a need for KRT. In this group, six patients had no or only low grade CKD, and three were admitted with CKD 3b or 4. Other data reported similar mortality rates among ICU patients [44], especially for patients requiring mechanical ventilation [45]. One meta-analysis showed that the presence of AKI is associated with 13-fold increased risk of mortality, whereas the incidence of AKI is up to 20% in critically ill patients. Higher age, diabetes, hypertension, and baseline serum creatinine levels were associated with increased AKI incidence [46].

Several studies have reported higher BMI as a significant risk factor [47,48]. A meta-analysis by Hussain et al. demonstrated significantly higher mortality in patients with BMI >25 kg/m2, and obesity (BMI >30 kg/m2) as a significant factor for critical illness during COVID-19 [49]. The BMI among deceased patients was 33 kg/m2 in our study population, but it was 28 kg/m2 among surviving patients; nevertheless, this was not significant in our cohort.

From a meta-analysis of a multinational database [50], the incidence of AKI in mechanically ventilated patients was reported to be 22%, slightly higher than among general inpatients [51]. We found an incidence of AKI 1–3 of 75.7%; in 24 of the 37 cases it was >AKI 2 (64.9%). A Chinese meta-analysis reported the incidence of AKI in hospitalized Chinese adults was up to 50% for those in the ICU, and the presence of AKI was associated with a higher severity of infection [52]. Of course, with the small number of cases it is difficult to derive definitive statements. The high incidence of AKI at our center may also be due to patient selection as a center for ECMO therapy [53,54].

Prior data reported that SARS-CoV-2 uses Angiotensin converting enzyme 2 (ACE-2) to enter target cells, which is expressed in lung, liver, oesophagus, gastrointestinal tract, kidney and the cardiovascular system [5557]. Uncontrolled release of cytokines and other proinflammatory substances are also responsible for either AKI or acute cardiac disorders [58]. In case proteinuria emerges as an early biomarker for AKI in COVID-19 in larger studies, another very interesting approach for future investigation would be to see if potential cardiac deterioration due to COVID-19 can also be detected early this way.


ML methods are traditionally reserved for large data sets, at least for widespread application. The medical field in particular, where research is mostly based on traditional statistical methods, can provide these large data sets. In the current situation, structured cooperation by all countries is needed, which is reoriented daily to the current and permanently growing knowledge about COVID-19. The methods shown in this paper can be enhanced further by generalization of tree-based methods, especially random forests, which are ensembles of decision trees known for their better accuracy, but unfortunately losing some of the comprehensibility of simpler tree-based methods [35]. Further research is warranted to address this issue.


  1. 1. COVID-19 Map [Internet]. Johns Hopkins Coronavirus Resource Center. [cited 2020 Jun 24].
  2. 2. Li L-Q, Huang T, Wang Y-Q, Wang Z-P, Liang Y, Huang T-B, et al. COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(6):577–83. pmid:32162702
  3. 3. Wang D, Hu B, Hu C, Zhu F, Liu X, Zhang J, et al. Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China. JAMA. 2020 17;323(11):1061–9. pmid:32031570
  4. 4. Klopfenstein T, Kadiane-Oussou NJ, Toko L, Royer P-Y, Lepiller Q, Gendrin V, et al. Features of anosmia in COVID-19. Med Mal Infect. 2020 Apr 17.
  5. 5. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020 28;395(10229):1054–62. pmid:32171076
  6. 6. Weiss P, Murdoch DR. Clinical course and mortality risk of severe COVID-19. Lancet. 2020 28;395(10229):1014–5. pmid:32197108
  7. 7. Karagiannidis C, Mostert C, Hentschker C, Voshaar T, Malzahn J, Schillinger G, et al. Case characteristics, resource use, and outcomes of 10 021 patients with COVID-19 admitted to 920 German hospitals: an observational study. The Lancet Respiratory Medicine. 2020 Sep 1;8(9):853–62. pmid:32735842
  8. 8. Sterblichkeitsrate beim Coronavirus nach Ländern 2020 [Internet]. Statista. [cited 2020 Jun 24].
  9. 9. Li X, Xu S, Yu M, Wang K, Tao Y, Zhou Y, et al. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J Allergy Clin Immunol. 2020 Apr 12. pmid:32294485
  10. 10. Du R-H, Liang L-R, Yang C-Q, Wang W, Cao T-Z, Li M, et al. Predictors of mortality for patients with COVID-19 pneumonia caused by SARS-CoV-2: a prospective cohort study. Eur Respir J. 2020;55(5). pmid:32269088
  11. 11. Shi S, Qin M, Shen B, Cai Y, Liu T, Yang F, et al. Association of Cardiac Injury With Mortality in Hospitalized Patients With COVID-19 in Wuhan, China. JAMA Cardiol. 2020 Mar 25. pmid:32211816
  12. 12. Hoste EAJ, Bagshaw SM, Bellomo R, Cely CM, Colman R, Cruz DN, et al. Epidemiology of acute kidney injury in critically ill patients: the multinational AKI-EPI study. Intensive Care Med. 2015 Aug;41(8):1411–23. pmid:26162677
  13. 13. Richardson S, Hirsch JS, Narasimhan M, Crawford JM, McGinn T, Davidson KW, et al. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA. 2020 Apr 22.
  14. 14. Mitchell TM. Machine Learning. New York: McGraw-Hill; 1997. 414 p. (McGraw-Hill series in computer science).
  15. 15. Russell & Norvig, Artificial Intelligence: A Modern Approach, 3rd Edition | Pearson [Internet]. [cited 2020 Oct 13].
  16. 16. Alpaydın E. Introduction to Machine Learning. In: Machine Learning. 2004.
  17. 17. Holte RC. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning. 1993 Apr 1;11(1):63–90.
  18. 18. Jouanne-Diedrich H von. OneR: One Rule Machine Learning Classification Algorithm with Enhancements [Internet]. 2017 [cited 2020 Oct 18].
  19. 19. Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and Regression Trees. UK ed. Boca Raton: Taylor & Francis Ltd; 1984. 368 p.
  20. 20. rpart.pdf [Internet]. [cited 2020 Oct 12].
  21. 21. Palevsky PM, Liu KD, Brophy PD, Chawla LS, Parikh CR, Thakar CV, et al. KDOQI US commentary on the 2012 KDIGO clinical practice guideline for acute kidney injury. Am J Kidney Dis. 2013 May;61(5):649–72. pmid:23499048
  22. 22. R: The R Project for Statistical Computing [Internet]. [cited 2020 Oct 8].
  23. 23. Rokach L, Maimon O. Top-down induction of decision trees classifiers—a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2005 Nov;35(4):476–87.
  24. 24. Simonnet A, Chetboun M, Poissy J, Raverdy V, Noulette J, Duhamel A, et al. High Prevalence of Obesity in Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) Requiring Invasive Mechanical Ventilation. Obesity. 2020;28(7):1195–9. pmid:32271993
  25. 25. Wang L, Li X, Chen H, Yan S, Li D, Li Y, et al. Coronavirus Disease 19 Infection Does Not Result in Acute Kidney Injury: An Analysis of 116 Hospitalized Patients from Wuhan, China. AJN. 2020;51(5):343–8.
  26. 26. Pei G, Zhang Z, Peng J, Liu L, Zhang C, Yu C, et al. Renal Involvement and Early Prognosis in Patients with COVID-19 Pneumonia. J Am Soc Nephrol. 2020;31(6):1157–65. pmid:32345702
  27. 27. Ronco C, Reis T, Husain-Syed F. Management of acute kidney injury in patients with COVID-19. Lancet Respir Med. 2020;8(7):738–42. pmid:32416769
  28. 28. Han SS, Ahn SY, Ryu J, Baek SH, Chin HJ, Na KY, et al. Proteinuria and hematuria are associated with acute kidney injury and mortality in critically ill patients: a retrospective observational study. BMC Nephrol. 2014 Jun 18;15:93. pmid:24942179
  29. 29. Breiman L. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist Sci. 2001 Aug;16(3):199–231.
  30. 30. Rojas JC, Carey KA, Edelson DP, Venable LR, Howell MD, Churpek MM. Predicting Intensive Care Unit Readmission with Machine Learning Using Electronic Health Record Data. Ann Am Thorac Soc. 2018 Jul;15(7):846–53. pmid:29787309
  31. 31. Choy G, Khalilzadeh O, Michalski M, Do S, Samir AE, Pianykh OS, et al. Current Applications and Future Impact of Machine Learning in Radiology. Radiology. 2018 Aug;288(2):318–28. pmid:29944078
  32. 32. Adamson AS, Smith A. Machine Learning and Health Care Disparities in Dermatology. JAMA dermatology. 2018 1;154(11):1247–8. pmid:30073260
  33. 33. Ribeiro MT, Singh S, Guestrin C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. arXiv:160204938 [cs, stat] [Internet]. 2016 Feb 16 [cited 2020 Oct 14];
  34. 34. matloff. Efron Updates breiman’s “two cultures” essay [Internet]. Mad (Data) Scientist. 2020 [cited 2020 Oct 6].
  35. 35. Tin Kam Ho. Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. 1995. p. 278–82 vol.1.
  36. 36. Decision Boundaries for Deep Learning and other Machine Learning classifiers [Internet]. KDnuggets. [cited 2020 Oct 12].
  37. 37. Burnham KP, Anderson DR. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach [Internet]. 2nd ed. New York: Springer-Verlag; 2002 [cited 2020 Oct 12].
  38. 38. Esposito F, Malerba D, Semeraro G, Kay J. A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1997 May;19(5):476–91.
  39. 39. Cheng Y, Luo R, Wang K, Zhang M, Wang Z, Dong L, et al. Kidney impairment is associated with in-hospital death of COVID-19 patients [Internet]. Nephrology; 2020 Feb [cited 2020 Sep 23].
  40. 40. Martinez-Rojas MA, Vega-Vega O, Bobadilla NA. Is the kidney a target of SARS-CoV-2? American Journal of Physiology-Renal Physiology. 2020 May 15;318(6):F1454–62. pmid:32412303
  41. 41. Gross O, Moerer O, Weber M, Huber TB, Scheithauer S. COVID-19-associated nephritis: early warning for disease severity and complications? The Lancet. 2020 May 16;395(10236):e87–8. pmid:32423587
  42. 42. Kang S-J, Jung SI. Age-Related Morbidity and Mortality among Patients with COVID-19. Infect Chemother. 2020 Jun;52(2):154–64. pmid:32537961
  43. 43. Cesari M, Proietti M. COVID-19 in Italy: Ageism and Decision Making in a Pandemic. J Am Med Dir Assoc. 2020;21(5):576–7. pmid:32334771
  44. 44. Zhao Z, Chen A, Hou W, Graham JM, Li H, Richman PS, et al. Prediction model and risk scores of ICU admission and mortality in COVID-19. PLoS One [Internet]. 2020 Jul 30 [cited 2020 Sep 4];15(7). Available from: pmid:32730358
  45. 45. Auld S, Caridi-Scheible M, Blum JM, Robichaux CJ, Kraft CS, Jacob JT, et al. ICU and ventilator mortality among critically ill adults with COVID-19. medRxiv. 2020 Apr 26. pmid:32511599
  46. 46. Hansrivijit P, Qian C, Boonpheng B, Thongprayoon C, Vallabhajosyula S, Cheungpasitporn W, et al. Incidence of acute kidney injury and its association with mortality in patients with COVID-19: a meta-analysis. Journal of Investigative Medicine: The Official Publication of the American Federation for Clinical Research. 2020;68(7):1261–70. pmid:32655013
  47. 47. Ruan Q, Yang K, Wang W, Jiang L, Song J. Correction to: Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care Med. 2020 Apr 6;1–4.
  48. 48. Grasselli G, Zangrillo A, Zanella A, Antonelli M, Cabrini L, Castelli A, et al. Baseline Characteristics and Outcomes of 1591 Patients Infected With SARS-CoV-2 Admitted to ICUs of the Lombardy Region, Italy. JAMA. 2020 Apr 28;323(16):1574–81. pmid:32250385
  49. 49. Yang J, Hu J, Zhu C. Obesity aggravates COVID-19: A systematic review and meta-analysis. J Med Virol. 2021 Jan;93(1):257–261. pmid:32603481 Epub 2020 Oct 5.
  50. 50. Esteban A, Frutos-Vivar F, Muriel A, Ferguson ND, Peñuelas O, Abraira V, et al. Evolution of mortality over time in patients receiving mechanical ventilation. Am J Respir Crit Care Med. 2013 Jul 15;188(2):220–30. pmid:23631814
  51. 51. Lombardi R, Nin N, Peñuelas O, Ferreiro A, Rios F, Marin MC, et al. Acute Kidney Injury in Mechanically Ventilated Patients: The Risk Factor Profile Depends on the Timing of Aki Onset. Shock. 2017 Oct;48(4):411–417. pmid:28379920
  52. 52. Rabb H. Kidney diseases in the time of COVID-19: major challenges to patient care. J Clin Invest. 130(6):2749–51. pmid:32250968
  53. 53. Liao X, Cheng Z, Wang L, Li B. Analysis of the risk factors of acute kidney injury in patients receiving extracorporeal membrane oxygenation. Clin Nephrol. 2018 Oct;90(4):270–5. pmid:30168414
  54. 54. Devasagayaraj R, Cavarocchi NC, Hirose H. Does acute kidney injury affect survival in adults with acute respiratory distress syndrome requiring extracorporeal membrane oxygenation? Perfusion. 2018;33(5):375–82. pmid:29360002
  55. 55. Yan R.; Zhang Y.; Li Y.; Xia L.; Guo Y.; Zhou Q. Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science 2020, 367, 1444–1448. pmid:32132184
  56. 56. Liu Z.; Xiao X.; Wei X.; Li J.; Yang J.; Tan H.; et al. Composition and divergenceof coronavirus spike proteins and host ACE2 receptors predict potential intermediate hosts of SARS-CoV-2. J. Med. Virol. 2020. pmid:32100877
  57. 57. Gagliardi I, Patella G, Michael A, Serra R, Provenzano M, Andreucci M. COVID-19 and the Kidney: From Epidemiology to Clinical Practice. J Clin Med. 2020 Aug 4;9(8):2506. pmid:32759645
  58. 58. Ielapi N, Licastro N, Provenzano M, Andreucci M, Franciscis S, Serra R. Cardiovascular disease as a biomarker for an increased risk of COVID-19 infection and related poor prognosis. Biomark Med. 2020 Jun;14(9):713–716. pmid:32426991 Epub 2020 May 19.