Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Metabolic profiling during COVID-19 infection in humans: Identification of potential biomarkers for occurrence, severity and outcomes using machine learning

  • Gamalat A. Elgedawy,

    Roles Conceptualization, Data curation, Investigation, Methodology, Validation, Visualization, Writing – original draft

    Affiliation Department of Clinical Biochemistry and Molecular Diagnostics, National Liver Institute, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Mohamed Samir ,

    Roles Data curation, Formal analysis, Funding acquisition, Software, Supervision, Validation, Writing – original draft, Writing – review & editing

    naglaa.alabd.12@med.menofia.edu.eg, naglaa_elabd@yahoo.com (NE); mohsamir2016@yahoo.com (MS)

    Affiliation Faculty of Veterinary Medicine, Department of Zoonoses, Zagazig University, Zagazig, Egypt

  • Naglaa S. Elabd ,

    Roles Conceptualization, Data curation, Resources, Validation, Visualization, Writing – original draft, Writing – review & editing

    naglaa.alabd.12@med.menofia.edu.eg, naglaa_elabd@yahoo.com (NE); mohsamir2016@yahoo.com (MS)

    Affiliation Faculty of Medicine, Department of Tropical Medicine, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Hala H. Elsaid,

    Roles Conceptualization, Data curation, Methodology, Validation, Writing – original draft

    Affiliation Department of Clinical Biochemistry and Molecular Diagnostics, National Liver Institute, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Mohamed Enar,

    Roles Data curation, Validation, Writing – original draft

    Affiliation Al Mahala Elkobra Fever Hospital, Al Mahala Elkobra, Egypt

  • Radwa H. Salem,

    Roles Data curation, Investigation, Resources, Validation

    Affiliation Department of Clinical Microbiology and Immunology, National Liver Institute, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Belal A. Montaser,

    Roles Investigation, Methodology, Validation, Writing – original draft

    Affiliation Faculty of Medicine, Department of Clinical Pathology, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Hind S. AboShabaan,

    Roles Investigation, Methodology, Validation, Writing – original draft

    Affiliation Ph.D. of Biochemistry, National Liver Institute Hospital, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Randa M. Seddik,

    Roles Data curation, Formal analysis, Resources, Validation, Writing – review & editing

    Affiliation Faculty of Medicine, Department of Tropical Medicine, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Shimaa M. El-Askaeri,

    Roles Methodology, Supervision, Validation, Writing – original draft

    Affiliation Department of Clinical Microbiology and Immunology, National Liver Institute, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Marwa M. Omar,

    Roles Data curation, Resources, Validation, Writing – original draft

    Affiliation Faculty of Medicine, Department of Clinical Pathology, Menoufia University, Shebin El-Kom, Menoufia, Egypt

  • Marwa L. Helal

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Writing – original draft

    Affiliation Department of Clinical Biochemistry and Molecular Diagnostics, National Liver Institute, Menoufia University, Shebin El-Kom, Menoufia, Egypt

Abstract

Background

After its emergence in China, the coronavirus SARS-CoV-2 has swept the world, leading to global health crises with millions of deaths. COVID-19 clinical manifestations differ in severity, ranging from mild symptoms to severe disease. Although perturbation of metabolism has been reported as a part of the host response to COVID-19 infection, scarce data exist that describe stage-specific changes in host metabolites during the infection and how this could stratify patients based on severity.

Methods

Given this knowledge gap, we performed targeted metabolomics profiling and then used machine learning models and biostatistics to characterize the alteration patterns of 50 metabolites and 17 blood parameters measured in a cohort of 295 human subjects. They were categorized into healthy controls, non-severe, severe and critical groups with their outcomes. Subject’s demographic and clinical data were also used in the analyses to provide more robust predictive models.

Results

The non-severe and severe COVID-19 patients experienced the strongest changes in metabolite repertoire, whereas less intense changes occur during the critical phase. Panels of 15, 14, 2 and 2 key metabolites were identified as predictors for non-severe, severe, critical and dead patients, respectively. Specifically, arginine and malonyl methylmalonyl succinylcarnitine were significant biomarkers for the onset of COVID-19 infection and tauroursodeoxycholic acid were potential biomarkers for disease progression. Measuring blood parameters enhanced the predictive power of metabolic signatures during critical illness.

Conclusions

Metabolomic signatures are distinctive for each stage of COVID-19 infection. This has great translation potential as it opens new therapeutic and diagnostic prospective based on key metabolites.

Introduction

Since its first emergence in Wuhan city, China in December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to be a public health threat and is still spreading around the world, especially with new variants [1,2]. As of May 2023, more than seven hundred million cases were confirmed by the WHO with ~ 7 million were confirmed deaths globally [3]. The infection with such virus has greatly imposed financial and social burden for countries and individuals throughout the world attributed to its disastrous consequences on both patients and their families [4]. Egypt has been one of the countries that was impacted by COVID-19 pandemic, with up-to-date number of cases slightly exceeding half million, of those patients ~ 5% died from the disease [3].

Most individuals who acquire SARS-CoV-2 infection experience mild disease. However, estimates of ~ 14% of COVID-19 patients could develop severe disease and ~ 5% experience shock or multiple organ failure, as well as lung dysfunction necessitating ventilation, especially those with complications or those suffering acute respiratory distress syndrome (ARDS) [5]. With the virus trying to hijack the host machinery for survival purposes, the infection with SARS-CoV-2 triggers a plethora of host responses [6], of which metabolome represents one important component. The repertoire of human metabolomes represents an ensemble of several thousands of molecules that cover an amble range of concentration (from <1 nM to >1 μM) and are produced by either the host genome or the genome of host microflora [7]. The blood is the primary carrier of host metabolites, the relative concentration of which mirrors the patho-physiological status of an individual, and thus could inform virus-induced tissue lesions and organ failure. It has been reported that an organism’s metabolome is a more accurate gauge of its metabolic state than its proteome or transcriptome [8].

Changes in endogenous metabolites is one of the characteristics of COVID-19 infection [911]. The intensive hypoxia associated with COVID-19-induced lung impairment possibly leads to altered metabolic profile [12]. Reports have indicated reductions in levels of some metabolites in severe COVID-19 infection in patients suffering from diabetes or hypertension [13]. Therefore, investigating the alteration in metabolites during COVID-19 infection could unravel important aspects of disease mechanisms such as revealing diagnostic or prognostic metabolic markers and discriminating patient groups based on disease severity [10,14,15]. Along with metabolite changes, hyperinflammation and over production of cytokines were observed during COVID-19 infection and has been recognized as a major cause of mortality in COVID-19 patients [16,17].

There have been multiple studies that profiled metabolites in COVID-19 patients aiming to seek novel biomarkers or stratifying the patients. However, most of these studies did not reflect on the stage-to-stage differential regulation of metabolites during disease progression. Indeed, these studies have either compared controls vs. patients at each of the severity stages [18], simultaneously compared between all stages [14], or only contrasted controls with COVID-19 +ve patients without stage definition [19,20]. Indeed, the genesis of COVID-19 is a multistep process that progresses over time [15,18]. It is generally assumed that different stages of virus replication cycle from entry to virus release are entirely fueled by host’s cell energy and metabolic resources [21]. This has already been demonstrated for SARS-CoV-2 infection in a robust animal model that mimics humans [22] and is published in the most recent WHO guidelines of patient stratification [23]. This already suggests that stage-to-stage reprogramming in host metabolites during COVID-19 infection is valid and worth investigated [18]. The accurate classification of patients group, however still challenging, particularly because of the wide and overlapped spectrum of patient’s symptoms and, as a result, different pathophysiological pathways that are being affected and interrupted during disease progression. Although few metabolomics studies have used nuclear magnetic resonance (NMR) [24], the preferred method for exploring potentially diagnostic biomarkers in COVID-19 disease has been mass spectrometry (MS)-based metabolomics. Even though liquid chromatography (LC) coupled with MS has been used in many of these studies, gas chromatography and MS have been shown to produce intriguing results about the evolution of illness [25].

In the current study, a targeted metabolic analysis was applied on a cohort of Egyptian subjects who exemplified consecutive stages of COVID-19 infection with the purpose of determining the most promising metabolites that could be used as biomarkers for disease occurrence, progression, and outcome. We also tested, by data analytics and machine learning models, how this could inform patient disease stage and whether the addition of blood indices measured on the same subjects could substantiate the clinical biomarker utility of the identified metabolites.

Material and method

2.1 Study design and study participants

From November 2021 to May 2022, 200 patients with confirmed COVID-19, who were admitted to Menoufia University Hospitals, Menoufia province, Egypt, were recruited. All patients had a clinical suspicion of COVID-19 and were confirmed to be positive using PCR applied on nasopharyngeal and oropharyngeal samples. The study subjects (initial number = 300) were categorized into 4 groups (100 healthy control (HC), 100 non-severe, 50 severe and 50 critical). This classification follows the last updates of WHO “COVID-19 Clinical management: Living guidance, 25 January 2021” [23]. The non-severe COVID-19 patients were described as having neither severe nor critical COVID-19 criteria. The severe COVID-19 included patients with any of the following criteria: Oxygen saturation < 90% on room air; in adults, signs of severe respiratory distress (accessory muscle use, inability to complete full sentences, respiratory rate > 30 breaths per minute), in addition to the signs of pneumonia. The critical COVID-19 patients should meet the criteria of acute respiratory distress syndrome (ARDS), sepsis, septic shock or other conditions that would normally require the provision of life-sustaining therapies such as mechanical ventilation (invasive or non-invasive) or vasopressor therapy. Patients with known history of hepatic, renal, cardiac diseases and coagulation disorders were excluded from the study. The HC included apparently healthy healthcare workers with no evidence of COVID-19 infection by standard clinical criteria and laboratory investigation. Patient’s demographic and clinical data were also included in the study for analysis (Table 1).

thumbnail
Table 1. Summary of demographic and clinical data of healthy individuals and COVID-19 patients.

https://doi.org/10.1371/journal.pone.0302977.t001

Ethical approval: The study was conducted in accordance with Helsinki Declaration and was approved by National Liver Institute Ethical Committee (IRB: NLI 00003413). Prior to enrollment, each participant was informed about the aims of the study and was offered the chance to sign their informed written consent.

2.2 Clinical assessment and samples collection

Patients with clinical suspicion of COVID-19 were assessed upon attending to the COVID-19 isolation unit at the Faculty of Medicine, Menoufia University Hospital throughout the research period. Following assessments by clinical, laboratory, and radiological means, patients were classified as non-severe, severe, or critical. For cases that were considered to be non-severe, an outpatient treatment prescription was provided, and further follow-up was conducted over the phone or in the COVID-19 outpatient clinic. Those with severe or critical presentations were admitted to the COVID-19 quarantine ward or ICU, where baseline clinical, laboratory, and radiological data were recorded at the time of admission. Additionally, a daily evaluation of the course of the illness, the response to treatment were evaluated and recorded. Blood samples (10 ml) from all subjects were drawn from the cubital vein by venipuncture after fasting at the time of admission/diagnosis. Of these, 5 ml were collected in a plain vacutainer tube and allowed to clot at room temperature then centrifuged (3000 rpm, 5 min.) and the clear supernatant sera were separated and collected in 3- aliquots. The 1st serum aliquot was used to measure ferritin, procalcitonin, c-reactive protein (CRP), LDH (lactate dehydrogenase), liver enzymes (ALT, AST), and kidney function tests (urea and creatinine). The 2nd serum aliquot was used to measure IL-6 using human IL6 ELISA kit following manufacturer instructions purchased from Thermofisher Scientific, US (Catalog no. EH2IL6). The 3rd serum aliquot was kept at -80 for bile acids analysis. From the remaining 5 ml, 2ml were collected into a tube containing Ethylene diaminetetraacetic acid (EDTA) for the complete blood count (CBC), 1 ml was spotted on filter paper (903 Whattman paper, NJ, USA), left to dry on a clean surface for 6 hours, and then stored at −80 C° until analyzing amino acids, carnitine, and acyl carnitine and the final 2 ml were collected into sodium citrate tubes for D-dimer, prothrombin time, and INR measurements.

2.3 Targeted metabolomics using ultra-performance liquid chromatography tandem mass spectrometry (UPLC MS/MS)

2.3.1 Amino acid and carnitines quantification.

Amino acid and blood carnitine and L carnitines were measured using MassChrom® Amino Acids and Acylcarnitines from Dried Blood / Non Derivatised—LC-MS/MS (order No.: 57000/F, Chromsystems Instruments & Chemicals GmbH, Germany). Three mm of the dried blood spot disk was punched into a well of the v-bottomed plate containing 100 μl of lyophilized internal standard reconstituted with 25 ml Extraction Buffer. The plate was sealed with a protective sheet and agitated at 600 rpm for 20 min at room temperature. The supernatant was transferred to a new v-bottomed well plate and covered by aluminum foil sheet. Ten μl of the elute was injected into the MS/MS system at a two-min interval in a flowing stream of 80% acetonitrile at a flow rate of 200 μl/min and reduced to 20 μl/min in 0.25 min. The flow rate increased to (600 μl/min in 1.25 min) then decreased again to (200 μl/min). The scan time of the MS/MS system was 1.2 min. The obtained spectra of all analytes analyzed with multiple reactions monitoring (MRM) mode. The quantitative analysis was achieved using Neolynx software (Neolynx Inc., Glendale, CA, USA) by comparing the signal intensity of an analyte against the corresponding internal standard. The quantification included 14 blood amino acids, 26 blood carnitine and L carnitines (S1 Table).

2.3.2 Bile acids quantification.

We used standards for the bile acids listed in S1 Table. These were purchased from Sigma-Aldrich Chemicals (Merck KGaA, Darmstadt, Germany). Sample preparation for bile acid quantification was done according to [26] with modification. First, 100 μL blood sample was added to 400 μL ice-cold methanol to precipitate the sample proteins. The mixture was then vortexed, centrifuged (13500 rpm for 15 minutes) and the supernatant was obtained and centrifuged (13500 rpm for 15 minute). Finally, 50 μL of the final supernatant was mixed with 100 μL water/formic acid (1000: 1, v/v). The solution was then injected into LC/MS/MS system. Stock solution of 14 individual bile acids standards were dissolved separately in methanol to form of 10 mmol/L, and then stored at −20°C [26]. The individual stock solutions were then pooled together to obtain mixture of 50 μmol /L in (50:50) deionized water and acetonitrile. Eight-point standard solutions ranging from 0, to 40 μmol/L were prepared by adding appropriate amounts of the mixture 50 μmol /L solution into the bile acid free pooled serum for external standard calibration. 5μmol/L of individual stock solutions was injected to LC. Chromatographic separation was carried out on a triple-quadruple tandem MS. The analytical column was ACQUITY UPLC BEH C18, 1.7 μm, 2.1x50mm, column (Waters) at 50°C. 5 μL of samples were eluted with a gradient at a flow rate of 0.28 mL/min. Mobile phase A was water/formic acid (1000: 1, v/v) and mobile phase B was acetonitrile. The elution started with 80% mobile phase A and 20% mobile phase B for an initial 2.1 min after injection, then with a linear gradient of mobile phase B of 20% to 30% over 5.2 min, followed by mobile phase B at 80% over 8 min, which was held for 0.5 min. the column was equilibrated with 80% mobile phase A for 2 min before the injection of the next sample.

2.4 Data analysis and machine learning models

The overall study design is shown in Fig 1. Initially, 54 metabolites (14 serum bile acids, 14 blood amino acid, 26 blood carnitine and L carnitines) that were measured in 300 subjects (100 HC, 100 non-severe, 50 sever and 50 critical) were included in the analyses. Principle component analyses (PCA) was used to visualize outliers in subjects and metabolites. Outliers were subsequently detected using boxplot with whisker. These analyses were done using base R functions in R software version 4.3.1. The raw concentration of metabolites was median normalized, log-transformed and then scaled by mean centralization divided by standard deviation of each variable. The normalized counts were used for further analyses. To compare metabolites concentration among and within study groups, we run 3- machine learning models; Partial least square discriminate analyses (PLS-DA), its orthogonal version (oPLS-DA) and random forest (RF) models. The parameters used to evaluate the performance of the PLS-DA model were accuracy, Q2 (classification ability) and R2 (predictability) [27]. These were generated with 3-component (to avoid over-fitting) and 10-fold cross validation. The performance of the oPLS-DA model was assessed using R2Y (variance among subjects as explained by the model) and Q2Y (model predictability) parameters. The significance of the oPLS-DA model was assessed at 0.5 for R2Y and Q2Y [28,29]. The RF model was applied using an ensemble of 1000 trees. The mtry function (R software version 4.3.1, random forest package [30]) was used to set the reliable number of variables randomly sampled as candidates at each split (square root of number of variable). For univariate analyses, data normality was calculated using Shapiro wike test [31], and student t-test was run to obtain significant differences between pairs of groups. One-way ANOVA (or its non-parametric mate Kruskal-Wallis test) was used whenever needed after correction for multiple comparison testing using Dunn’s test. Adj. P-value <0.05 was considered significant. To obtain the degree and pattern of differential expression in metabolites, data (non-normalized for features) were used to calculate the ratio between each group pairs. A fold change value of |1.5| and adj. P-value of 0.05 was used as cutoffs for significance. These data were visualized using volcano plots generated in R software version 4.3.1, package ggplot2 [32]. The finally selected panel of key diagnostic or prognostic metabolites should meet the following criteria: Fold change value of 1.5 (using only those that are significant with adj. P-value < 0.05), oPLS-DA- revealed VIP score > 1 and should be among top 15 metabolites as revealed by the RF model (based on the mean decrease in accuracy). ROC curves for comorbidities were visualized as fraction and its confidence intervals (CI) were estimated using Wilson/Brown method, and significant AUC was set at a cutoff = 0.05. Pathway enrichment analyses were done using Metaboanalyst v.5 [33] and is based on combined functional enrichment analyses and network topology. Hypergeometric test was used for functional enrichment analyses. Significant pathways were selected based on adj. P-value = 0.05.

thumbnail
Fig 1. Schematic diagram showing the overall design of the study.

The general steps were subject selection and stratification, data quantification, acquisition, data analyses, and model evaluation.

https://doi.org/10.1371/journal.pone.0302977.g001

3. Results

3.1 Exploratory data analyses

To exclude unwanted variation that could bias the analyses, PCA plot based on whole metabolites, metabolites subclasses, blood indices and the subjects were used to identify potential outliers. The analyses revealed some outliers within HC and severe groups. To more accurately identify outlier subject, mean normalized counts of metabolites in each class were visualized using boxplot with whiskers. Based on these analyses, 5 subjects (4 severe covid-19 patients based on amino acid profile and 1 HC based on the bile acid profile) were excluded. Other carnitines were also excluded because they were not detected in all subjects (e.g. tetradecanoyl carnitine, tetradecadienoyl and 3-hydroxyoctadecenoylcarnitine) or because it gave no signals in 94.3% (n = 283) of the subjects and had very low concentration in the remaining subjects (e.g. 3-Hydroxyhexadecanoyl). After outliers’ removal, a total of 295 subjects (99 HC, 100 non-sever, 46 sever and 50 critical), 50 metabolites (14 amino acids, 14 bile acids and 22 carnitines) and 17 blood parameters entered the formal analysis (S1 Table). The counts of metabolite and their subclasses were successfully normalized (S1 Fig).

3.2 Demographic, clinical characteristics, laboratory values and disease outcomes of study participants

After outlier removal, the clinical cohort presented in this study consisted of 295 subjects, whose demographic and clinical characteristics are shown in Table 1 and S2 and S3 Figs. The severe and critical patients have older participants than other groups, with their ages being significantly higher than those in the HC group (P-value < 0.0001). The whole cohort contained significantly (P-value = 0.0002) more females (n = 170, 57.6%) than males (n = 125, 42.4%) and female patients (n = 121) were significantly (P-value = 0.04) more than male patients (n = 75). Upon admission, the COVID-19 patients were evaluated for severity by COVID-19 Reporting and Data System (CO-RADS), and severity assessment. It was found that 53.5% of the patients had a CO-RADS score of 5, the majority of whom (77.1%) presented with critical (n = 41) or severe (n = 40) disease. All the COVID-19 patients were symptomatic, showing at least one symptom (S2F Fig), with fever being the most represented symptom (80.1%) followed by dyspnea (76.5%) and dry cough (75.5%). The highest recorded symptoms in the non-severe group were dry cough and fever recorded in 90 and 73% of these patients, respectively, whereas both fever and dyspnea represented the top symptoms in both severe (89.1, 80.4%, respectively) and critical COVID-19 patients (86, 84%, respectively). The number of individuals with diabetes, hypertension, or both together varied significantly among all groups (P-value < 0.01) (Table 1). Out of the 196 COVID-19 patients, 40.3% (n = 79) had no comorbidities, 56.1% (n = 110) had diabetes, 29.1% (n = 57) had hypertension, and 25.5% (n = 50) had both diabetes and hypertension. The presence of these comorbidities varied according to COVID-19 severity. Half of non-severe (51%), close to half of severe (43.4%) and only 16% of critical COVID-19 patients had neither diabetes nor hypertension. The representation of diabetes was higher than hypertension in all severity groups with hypertension being non-reported in sever patients (S3A Fig). We observed an upward increase in the proportion of patients with concurrent diabetes and hypertension as the disease severity increases with those having concurrent diabetes and hypertension representing considerably high proportion of severe (28.2%) and critical (54%) COVID-19 patients. We were also able to follow the patient’s outcomes. There was significant association between disease stage and the outcome (P-value = <0.0001). The majority (78.5%) of severe COVID-19 patients survived the infection, whereas the majority (74%) of the critical cases died (S3B Fig). Seventeen blood parameters were measured at the time of patient admission (S2 Table and S4 Fig). All blood parameters showed significant differences among the studied groups, in particular comparing HC and non-severe groups. However, the trends in among-group differences were characteristic for some parameters. For instance, inflammation-related indices (e.g. serum ferritin, CRP, LDH, procalcitonin, and D-dimer) showed an increase as the patients exhibited more severe disease, yet they exhibited a significant increase in severe compared to non-severe cases (S2B Table). IL-6 showed a significant increase in severe and critical cases over the HC but was none significantly altered between severe and critical patients. Furthermore, lymphocytes showed significant (P-value = 0.006) reduction in severe compared to non-severe patients, whereas creatinine showed the reverse pattern. INR was the only parameter that showed a significant increase in critical patients over the severe ones.

3.3 Diagnostic and prognostic machine learning models

Based on the normalized counts of all metabolites, the initial PCA revealed clear separation between HC and COVID-19 patients with various degrees of severity (i.e. non-severe, severe and critical patients), which rather appeared to overlap. This holds true when blood parameters are added to metabolites (S5 Fig). Comparable separation patterns were seen when the same analyses were applied to each metabolite subclass and blood parameters (S6 Fig). Comparing all groups, the PLS-DA model applied to the validation data reflected the results from the PCA, with moderate performance (accuracy: 0.67, R2: 0.7, Q2: 0.7, permutation test: P-value < 0.05) (S7 Fig and Table 2). When applied to the validation data, the RF classification model outperformed the PLS-DA model, producing a significantly higher classification accuracy of 0.97 (CI: 0.9–0.9), P-value < 0.0001, and misclassification error of only 0.02.

thumbnail
Table 2. Parameters for evaluation of each machine learning model discriminating pairs of groups using both the metabolites alone or metabolites plus blood indices.

NA: Not available. PLS-DAL partial least square discriminate analyses. oPLS-DA: Orthogonal partial least square discriminate analyses. RF: Random forest.

https://doi.org/10.1371/journal.pone.0302977.t002

3.3.1 Models to predict COVID-19 occurrence (distinguishing HC from non-severe patients).

To determine how changes in metabolites could discriminate HC from non-severe patients, who are at the early infection phase (i.e., suffering mild and moderate disease), we trained and validated several machine learning models on the data comparing these two categories. As shown in S8A Fig and Table 2, the PLS-DA gave a clear separation between both groups (accuracy = 1, R2 = 0.95, Q2 = 0.95 at the 1st component). Applying the oPLS-DA model further supports this (R2Y and Q2Y = 0.95, P-value = 5.152e-16). As shown in Fig 2A and Table 2, adopting the RF model validated the results of the previous models, yet with better and significant differential clustering with accuracy, sensitivity, and specificity equal to 1. To exclude that this is an overfitting problem, we rerun the RF model using different proportions of training and validation data sets, which gave the same results. The predictability of this RF model was high as determined by ROC analyses (Fig 2B). To determine key metabolites that drive the separation between HC and non-severe groups, we combined the prediction results of RF, oPLS-DA models and univariate analyses. The oPLS-DA model predicted 18 metabolites with VIP score > 1 (S3 Table), the top of which were arginine, malonyl methylmalonyl succinylcarnitine, and tyrosine. The RF model determined key metabolites with high discriminatory ability as determined by their mean decrease in accuracy and GINI indices (values of both indices are listed for all metabolites in S4 Table). The top 15 metabolites that have the highest mean decrease in accuracy by RF mode are visualized in Fig 2C. The highest 3 metabolites by mean decrease in accuracy were malonyl methylmalonyl muccinylcarnitine (carnitines), glycodeoxycholic acid (bile acid) and arginine (amino acid). The first two of these were also the top metabolites based on mean decrease in GINI index (S4 Table). The results of univariate analyses are shown as volcano plot in Fig 2D and S5 Table. Out of the 50 metabolites, 37 (74%) were significantly differentially expressed (DE) between HC and non-sever groups (16 up regulated and 21 down regulated). These were almost evenly represented across metabolite categories (11 bile acids, 12 amino acids and 14 carnitines). Intersecting the results of RF, oPLS-DA models with univariate analyses generated a list of 15 metabolites that are important predictors for non-severe disease (Panel 1) (Table 3, S9 Fig). The metabolic pathways associated with these predictor metabolites are shown in Table 4. Aminoacyl-tRNA biosynthesis and valine, leucine and isoleucine biosynthesis were significantly enriched in this group. Phenylalanine, tyrosine and tryptophan biosynthesis pathway has the highest impact but was non-significantly enriched (adj. P-value >0.05).

thumbnail
Fig 2. Machine learning models and univariate analyses discriminating HC from non-severe COVID-19 patients.

A. Proximity plot of the RF model discriminating controls from non-sever COVID-19 patients. The ellipse shows confidence intervals B. ROC analyses showing the prediction ability of the model. AUC: Area under the curve. C. Top 15 metabolites that are important predictors for non-severe COVID-19 patients as revealed by RF model. The metabolites are color-grouped by their class and are ranked descending by their mean decrease in accuracy (the higher the mean decrease in accuracy the more important the metabolite). D. Volcano plot showing the results of the univariate analyses. The figure depicts the relationship between log2FC value of each metabolite (x-axis) against its -log10FDR (y-axis). Red and blue dots refer to metabolites that are significantly up and down regulated, respectively. Non-significant DE metabolites are shown as grey dots (-log10 adj. p value <1.3). The metabolite class are shape coded. The dashed horizontal line refers to the value of 1.3, the -log10 for a 0.05 FDR. The vertical dashed lines refer to the cutoff that equates a log2fold change value of |1.5|.

https://doi.org/10.1371/journal.pone.0302977.g002

thumbnail
Table 3. Final panel of key metabolites as determined by univariate analysis, random forest and oPLA-DA models.

https://doi.org/10.1371/journal.pone.0302977.t003

thumbnail
Table 4. Pathways enriched by the predictor metabolites in each pairwise comparison.

https://doi.org/10.1371/journal.pone.0302977.t004

3.3.2. Models to predict COVID-19 severity.

3.3.2.1 Models comparing non-severe and severe COVID-19 patients. To reveal metabolites associated with severity and progression of COVID-19 infection, we first compared non-severe and severe groups. As shown in S8b Fig and Table 2, PLS-DA model generated significant separation between both groups (accuracy: 0.96, R2 = 0.76, Q2 = 0.74 at 1st component). In addition, oPLS-DA obtained comparable results (R2Y = 0.76, Q2Y = 0.73, P-value < 0.0001). Random forest model revealed perfect separation with an accuracy = 1 (P-value = 4.405e-08), 100% sensitivity and 100% specificity (Fig 3A). Applying RF on different proportions of training and validation data sets obtained similar results. This model was found to have excellent classification ability using ROC analyses (AUC = 1) (Fig 3B).

thumbnail
Fig 3. Machine learning models and univariate analyses discriminating non-severe and severe COVID-19 patients.

A. Proximity plot of the RF model discriminating non-sever from severe COVID-19 patients. The ellipse shows confidence intervals B. ROC analyses showing the prediction ability of the model. AUC: Area under the curve. C. Top 15 metabolites that are important predictors for severe COVID-19 patients as revealed by RF model. The metabolites are color-grouped by their class and are ranked descending by their mean decrease in accuracy (the higher the mean decrease in accuracy the more important the metabolite). D. Volcano plot showing the results of the univariate analyses. The figure depicts the relationship between log2FC value of each metabolite (x-axis) against its -log10FDR (y-axis). Red and blue dots refer to metabolites that are significantly up and down regulated, respectively. Non-significant DE metabolites are shown as grey dots (-log10 adj. p value <1.3). The metabolite class is shape coded. The dashed horizontal line refers to the value of 1.3, the -log10 for a 0.05 FDR. The vertical dashed lines refer to the cutoff that equates a log2fold change value of |1.5|.

https://doi.org/10.1371/journal.pone.0302977.g003

To obtain key metabolites with high prognostic value for severe disease, oPLS-DA model firstly revealed 20 prognostic metabolites with VIP score > 1, where glycodeoxycholic acid, taurodeoxycholic acid and tyrosine scored the top (Full list are in S3 Table). Applying RF models produced a list of key metabolites with high predictive value for severe disease (Full list are in S4 Table). The top 15 metabolites identified by RF model ranked by mean decrease in accuracy are shown in Fig 3C. The highest three of which were bile acids (nam+ely: lithocholic acid, taurolithocholic acid and taurodeoxycholic acid). Lithocholic acid and taurodeoxycholic acid were also the top ones as determined by mean decrease in GINI index. The univariate analyses performed on the same pairwise comparison identified 39 metabolites (78%) as being significant DE between non-severe and severe patients (Full list are in S5 Table and are shown Fig 3D). These included 27 up regulated and 12 down regulated metabolites. These significant DE metabolites were represented by 11 bile acids, 13 amino acids and 15 carnitines. Combining the lists of metabolites that are predicted by RF, oPLS-DA models and univariate analyses obtained a panel of 14 metabolites that are important predictors for severe disease (Panel 2) (Table 3 and S9 Fig). As shown in Table 4, the pathways enrichment analyses on those 14 metabolites revealed the presence of primary bile acid biosynthesis, tyrosine metabolism and arginine biosynthesis pathways as significantly enriched.

3.3.2.2 Models comparing severe and critical COVID-19 patients. To demonstrate the potential of metabolites as prognostic markers for critical COVID-19 patients, severe and critical groups were compared in the models. As shown in S10A Fig and Table 2, PLS-DA model revealed considerable overlap between those two groups with low accuracy (accuracy = 0.63, R2 = 0.47, Q2 = 0.10 at the 1st component). While oPLS-DA over performed PLS-DA model, yet it produced low classification and prediction power (R2Y = 0.47, R2Y = 0.15). Applying RF on the same data obtained better and significant performance (P-value = 7.95e-06), yet with relatively low accuracy (0.92), sensitivity (0.93) and specificity (0.92) (Fig 4A). To test if this was an overfitting, we rerun RF model on different proportion of training and validation data sets, which revealed similar results. ROC analyses on this model indicate a good classifier with an AUC of 0.88 (Fig 4B). The RF model were able to correctly predict 95.5 and 92.8% of the subjects when applied on train and test data, respectively. Given this, 4- subjects that belonged to severe patients were predicted to be critical patients by the RF model and 1 subject that was critical patient was predicted by the model as severe. Using prediction from oPLS-DA, we obtained 15 significant prognostic metabolites that have VIP score > 1 (Full list are in S3 Table), top of which were tauroursodeoxycholic acid, malonyl methylmalonyl succinylcarnitine and octenoylcarnitine. Furthermore, RF model predicted list of key metabolites with high discriminatory ability according to their mean decrease in accuracy and GINI index (Full list are shown in S4 Table). Top 15 metabolites detected by RF and ranked by mean decrease in accuracy are shown in Fig 4C. The highest 3 metabolites by both mean decrease in accuracy and GINI index were malonyl methylmalonyl succinylcarnitine (carnitines), tauroursodeoxycholic acid (bile acid) and glutarylcarnitine (carnitine) (S4 Table). For further exploration, the univariate analyses performed on the same pairwise comparison identified only 3 metabolites (6%) as significantly DE between severe and critical subjects, 2 of which are up-regulated (bile acids) and 1 carnitine is down regulated (Full list are in S5 Table and Fig 4D). Combining the lists of significant metabolites that are predicted by oPLS-DA, RF models and univariate analyses obtained a panel of 2 metabolites that are important predictors for critical disease (Panel 3) (Table 3 and S9 Fig).

thumbnail
Fig 4. Machine learning models and univariate analyses discriminating severe and critical COVID-19 patients.

A. Proximity plot of the RF model discriminating sever from critical COVID-19 patients. The ellipse shows confidence intervals B. ROC analyses showing the prediction ability of the model. AUC: Area under the curve. C. Top 15 metabolites that are important predictors for critical COVID-19 patients as revealed by RF model. The metabolites are color-grouped by their class and are ranked descending by their mean decrease in accuracy (the higher the mean decrease in accuracy the more important the metabolite). D. Volcano plot showing the results of the univariate analyses. The figure depicts the relationship between log2FC value of each metabolite (x-axis) against its -log10FDR (y-axis). Red and blue dots refer to metabolites that are significant up and down regulated, respectively. Non-significant DE metabolites are shown as grey dots (-log10 adj. p value <1.3). The metabolite class are shape-coded. The dashed horizontal line refers to the value of 1.3, the -log10 for a 0.05 FDR. The vertical dashed lines refer to the cutoff that equates a log2fold change value of |1.5|.

https://doi.org/10.1371/journal.pone.0302977.g004

With the reduced classification ability and accuracy of the best-applied model (i.e. RF model) in discriminating severe and critical cases, we thought to investigate if modeling only the top 15 metabolites revealed by the RF model (Those that are shown in Fig 4C) would enhance the model classification and accuracy. Training and validating the RF on this subset showed no additional improvement as evidenced by model accuracy (0.92) and ROC-based predictability (AUC = 0.88) (S11 Fig). The pathway enrichment analyses done on metabolites of panel 3 as well as that done on the top 15 metabolites identified by RF did not reveal any significant pathway, but in the latter case, aminoacyl-tRNA biosynthesis was the top enriched pathway.

3.3.3 Models to predict COVID-19 infection outcomes (distinguishing survivors from dead subjects).

Given the information on the disease outcome, it was possible to compare patients who survived the infection and those who dead by the end of disease course. As shown in S12 Fig, PLS-DA model revealed considerable overlap between survived and dead patients (accuracy: 0.74, R2: 0.39, Q2: 0.21 at 1st component). Training the data using oPLS-DA model obtained rather better separation between the two group but its performance was low (R2Y = 0.39, Q2Y = 0.27). Running RF model on the same data performed the best with a significant accuracy of 0.96, 100% sensitivity and 0.94% specificity (Fig 5A). The predictive ability of this model as classifier was high as determined by ROC analyses (AUC = 0.9) (Fig 5B).

thumbnail
Fig 5. Random forest classification model predicting the classification of different COVID-19 outcomes.

A. Proximity plot of the RF model discriminating survived from dead COVID-19 patients. The ellipse shows confidence intervals and each dot refers to one patient. B. ROC analyses showing the prediction ability of the RF model. AUC: Area under the curve.

https://doi.org/10.1371/journal.pone.0302977.g005

Applying the oPLS-DA model on survived and dead subjects uncover 20 metabolites with VIP score > 1 (full list are in S3 Table), top of which were deoxycholic acid, prop. /Acety. ratio and hexanoylcarnitine. In addition, RF models obtained key metabolites with high discriminatory ability as determined by their mean decrease in accuracy (Full list are shown in S4 Table). Top 15 metabolites detected by this model ranked by mean decrease in accuracy are visualized in S13 Fig. The metabolites with the highest mean decrease in accuracy were deoxycholic acid, glutarylcarnitine, and hexanoylcarnitine. The univariate analyses identified 7 metabolites (14%) as significant DE between the survivor and the dead groups (full list are in S5 Table and S13B Fig), of which 3 were up regulated (2 bile acids and one carnitine) and 4 carnitines were down regulated. Combining the predictions from oPLS-DA, RF models and univariate analyses produced only two metabolites that are important predictors for disease outcome (Panel 4) (Table 3).

3.4 Metabolic profiling in critical COVID-19 patients with and without comorbidities

Since metabolic disorders are known to be risk factors for progression of COVID-19 infection, we investigated the enrichment of diabetes and hypertension in our cohort, focusing on the critical patients group. Interestingly, PCA plot on this patient group suggests that the profile of the studied metabolites did not discriminate patients with only critical COVID-19 infection from those having critical COVID-19 infection with comorbidities. RF model run on the same data indicated a slight overlap between those having critical COVID-19 infection only from critical COVID-19 patients with comorbidities (accuracy = 0.93, P-value = 0.005). ROC analyses revealed that having hypertension alone or diabetes plus hypertension are significant predictors for critical COVID-19 infection (P-value < 0.05), with moderate AUCs of 0.7 and 0.6, respectively (S14 Fig).

Given the significant proportion of critical COVID-19 patients with comorbidities (S3A Fig), it could be that the 2 predicted significant metabolite (panel 3 in Table 3) are not prognostic markers for critical COVID-19 infection only, but are rather markers for this infection superimposed with comorbidities. To investigate this further, the levels of these 2 metabolites were compared across all categories within the critical COVID-19 patients (Fig 6). The level of taurochenodeoxycholic acid did not differ significantly across groups, whereas malonyl methylmalonyl succinylcarnitine showed significant increase in patients with critical COVID-19 plus hypertension over those with critical COVID-19 alone.

thumbnail
Fig 6. Differences in levels of key metabolites in patients with only critical COVID-19 infection and those with the infection with critical COVID-19 superimposed with other comorbidities.

DM: Diabetes, HT: Hypertension.

https://doi.org/10.1371/journal.pone.0302977.g006

3.5 Diagnostic and prognostic power of combined blood indices and metabolites models

We aimed to investigate whether combining measurements of blood and metabolites and use these as inputs in one model (combined model) would enhance the classification and predictability of COVID-19 infection compared to cases when only metabolites are used (single model). The results of different comparisons are detailed in Table 2, S7 and S10 Figs. Considering all subject groups, PCA showed no enhancement in the among-groups separation in the combined model over the single one. PLS-DA as well as RF models produced similar accuracies and predictability comparing the single and combined models (S7 Fig). OPLS-DA model (results are denoted as ‘NA’ in Table 2) was not applicable as it is limited to pairwise comparison. Interestingly, both PLS-DA and oPLS-DA models revealed reduced accuracies, predictability and classification ability of combined model compared to single model when contrasting HC vs non-severe and non-severe vs severe groups, whereas performance of RF remained the same (Table 2). When comparing severe and critical COVID-19 cases, PLS-DA and oPLS-DA demonstrated an enhancement in model performance in the combined model compared to single model (Table 2 and S10B Fig). In particular, the accuracy, interpretability (R2) and predictability (Q2) of PLS-DA models increased by about 30.1, 14.8, 200%, respectively in the combined model. Along the same line, oPLS-DA showed an increase of 14.8 and 60% in overall variance that is explained by all features between severe and critical groups (R2Y) and the goodness of prediction (Q2Y), respectively. However, RF model showed no enhancement in the combined model over the single one with similar accuracy of 0.92, yet its predictability for combined model exhibited 10.2% increase using the AUC (AUC = 0.97) value over that of the single model (AUC = 0.88).

To more accurately identify the combination of biomarkers that would result in better classification and prediction of critical cases, we build several linear support vector machine models (each with multiple combination of these features) using a combination of the 3-important metabolites (panel 3) and the measured blood indices (n = 17) to build and compared their performance and predictive power using confusion matrices and multivariate ROC analyses. These analyses indicated that increasing number of feature combinations resulted in a gradual, yet slight enhancement in predictability and accuracies of the models. The best model was obtained when all the 20 features were compiled revealing the highest accuracy of 91% (Fig 7A) and the highest predictability with AUC = 0.9 (Fig 7B).

thumbnail
Fig 7. Support vector machine models built using important metabolites in panel 3 (those that best discriminate severe from critical covid-19 cases) together with normalized counts of 17 blood indices.

A. Various predictive accuracies as determined by support vector machine models using feature combination (from 2–20). B. ROC curves showing the predictive ability of different support vector machine models of feature combination. Var. refers to different combination. AUC: Area under the curve. CI: Confidence intervals.

https://doi.org/10.1371/journal.pone.0302977.g007

3.6 Correlation of blood parameters with key metabolites in different COVID-19 patients

Here we tried to determine if correlation between blood indices and key metabolites remains the same in different patients’ group. The selection of significant correlation between pairs of metabolites and blood parameters was applied using stringent significance criteria (P-value < 0.001) and limiting this to the top 10 positive and 10 negative correlations. The analyses showed that none of the significant correlation partners that appeared in the HC was presented in other groups (S6 Table). Likewise, each of the COVID-19 stages showed unique pair of correlations. In particular, 80, 85 and 75% of correlation pairs appeared uniquely in non-severe, severe and critical groups, respectively.

4 Discussion

In this study, we aimed to investigate whether the reprogramming in metabolites that occur during COVID-19 infection could enable determining patients with varying degrees of disease severity. This would allow identifying key metabolites that are important diagnostic or prognostic biomarkers. Determining stage-specific changes in metabolites could enable informed decisions of hospital discharge or help modulating disease treatment if the infection progresses. In addition, understanding the alteration patterns in circulating metabolites during COVID-19 infection would possibly devise novel anti-SARS-CoV-2 therapeutics as shown previously [34,35].

Demographic data of the study participants

As expected, older patients were more enriched in the group with more severe forms of the disease (i.e. those in the severe and critical stages), with significant differences in age between severe and critical cases. Age has been already known as a good predictor of COVID-19 severity [36,37]. Despite the reports about sex-induced changes in lipid, amino acid, and other metabolites, there have been reports about discrepancies over the importance of sex and age as determinants of disease occurrence and fatality [38] or non-significant gender-related differences [18]. This suggests the importance of seeking other predictors of COVID-19 occurrence and severity.

How different is our design and analyses scheme from other research?

Machine learning models have been successfully applied on metabolomics data for predicting COVID-19 disease occurrence [11], severity and evolution [14,15,18]. However, some of these studies employed a single ML model possibly because of the inclusion of large number of patients [10,18]. In the current study, we combined the results from two common and robust ML models; the oPLS-DA and RF models, with results of univariate analyses to ensure constructing a more trusted and accurate multivariate prediction and classification scheme. Although the oPLS-DA model applied herein gave a good performance, our data indicate that the application of the non-linear more complex RF model over performed oPLS-DA (Table 2) in all comparisons. Indeed, RF model deems to be more suitable as its complex non-linear algorism fits the non-linear nature of most biological data [39] as stated previously [11]. RF model is known to be tolerant for outliers and is robust to over-fitting as shown on simulated and real data [40]. Using these models, our results suggest that HC were well discriminated from all other COVID-19 patients based on all metabolites or metabolite subclass indicating that the onset of COVID-19 infection is associated with strong metabolic footprint. With the intension of determining the stage-specific alteration in metabolites, we opted to run pairwise comparisons between consecutive disease stages.

Value of metabolites as biomarkers of early COVID-19 infection phase

In the current study, all applied ML models, and in particular the RF, showed clear discrimination between HC and non-severe COVID-19 patients even when the RF model was run on each metabolite subclass. This suggests that changes in metabolome could signal early phase of COVID-19 infection before sever disease develops. Previous data showed that metabolites change is able to distinguish COVID-19 patients from healthy subjects [15,41]. López-Hernández, Yamilé et al. showed that PCR+ non-hospitalized patients are well separated from matched controls with high accuracy of 0.88, R2: 0.8, Q2: 0.5 [37]. Similarly, Meoni et al used RF model to show a high discrimination between HC and COVID-19 patients using metabolites and lipoprotein parameters with high accuracy (0.87 to 0.91) [11]. Metabolites have shown dramatic changes at early COVID-19 infection even in the absence of clinical signs or changes in blood indices [42]. We observed that 74% of the metabolites were significant DE in the univariate analyses, with high fold change, especially malonyl methylmalonyl succinylcarnitine (4.9-fold change over HC) suggesting that the underlying reprogramming in metabolites was intense in the non-severe patients relative to HC. Our analyses showed that the differences between HC and non-severe patients originate from changes in a pool of 15 metabolites (panel 1) that are potential markers for disease initiation. Some of the metabolites from this panel were reported previously by others. For instance, methionine has been identified as a robust marker for COVID-19 occurrence in two subsequent COVID-19 waves [43]. Leucine and isoleucine were among the identified metabolites in positive COVID-19 patients, yet the difference is that leucine was down regulated in our study [44]. Our data highlights the differential regulation (mainly down-regulation) of bile acids upon COVID-19 infection in non-sever patients (except for chenodeoxycholic acid). In alignment, bile acids have been shown previously to be perturbed in COVID-19 infection and they were down regulated [45]. However, up regulation in bile acids was also observed in other studies [46,47]. The role of bile acids in COVID-19 pathogenesis has been puzzling. Generally, bile acids can limit in-vitro replication of some viruses (e.g. herpes simplex [48] and influenza A virus [49]) and promote in-vitro replication of other viruses such as hepatitis B and C [50]. Administration of antibiotics to SARS-CoV-2- infected mice resulted in reduction in certain microbiome that metabolizes primary bile acid to secondary bile acids. This leads to accumulation of primary bile acids, which subsequently were found to inhibit nsp15 endoribonuclease of the virus [51]. Furthermore, infection with SARS-CoV-2 itself causes reduction in the diversity of gut microbiome as shown in human patients [52] and primates [53] and microbiota in human gut are known to process primary bile acids into secondary bile acids such as deoxycholic acid and ursodeoxycholic/ chenodeoxycholic acid [54]. Bile acids are also biologically active molecules that organize a variety of immune functions, including inflammatory responses. Ursodeoxycholic acid does, in fact, have anti-inflammatory, antioxidant, anti-apoptotic as well as immunomodulatory properties [55]. Taken together, it is plausible to assume that the down regulation in bile acid in our study could be virus-induced to facilitate infection.

Except for malonylcarnitine (C3-DC) which showed strong up regulation, carnitines in non-severe group were down regulated. They were slightly up regulated in subsequent stages. Our data remained speculative and did not allow explaining why we observe such a variable DE pattern. Generally, L-carnitines tend to reduce inflammation and oxidative stress [56] and stimulate immunity by improving neutrophil and macrophage function [57]. Recently, it has been reported that the increase in the carnitine amount is associated with lower vulnerability to COVID-19 [58]. Therefore, we could assume that the host triggers expression of some carnitines (e.g. malonylcarnitine) at early disease phase as a defensive mechanism, while virus tries to down-regulate other carnitine species as the disease progresses. The down regulation of arginine in the non-severe patients in our study complements previous observation in both adults and children infected with COVID-19, who had substantially decreased levels of plasma l-Arginine than controls associated with low l-Arginine-to-ornithine ratio [59].

Value of metabolites in predicting COVID-19 severity

One core intension behind the current work was to determine how changes in metabolite could inform or explain the array of severity degrees seen in patients, and subsequently reveal prognostic biomarkers [16,41]. An obvious observation was a gradual decrease in the ability of ML models to classify or predict patients as the severity increases (Table 2). In parallel, the number of significant DE metabolites declined from 39 metabolites (78%), when contrasting non-severe and severe patients, to 3 metabolites (8%) when comparing severe vs critical patients. The magnitude of fold changes of these metabolites was also decreasing for most metabolites. This was also observable when analyzing metabolite subclasses. Regarding the comparison between non-severe and severe groups, our results partially agree with that reported in another study [15], where partial separation existed between symptomatic mild and more severe form of COVID-19. Similarly, PLS-DA model showed certain degree of overlap between mild (non-hospitalized) and both severe (hospitalized) plus critical (intubated) patients [37]. Of note, direct comparison among studies could be biased, especially if done on different populations because of the imminent influence of individual’s genetic backup on metabolome [60], in addition to other confounders (e.g. environment and life style). The studies also differ largely in the criteria of defining severity scale of patients. Interestingly, we observed considerable overlap between severe and critical patients based on metabolite changes and only small fraction of metabolites were significant DE (n = 3). Here, the predictive ability of the RF model was the lowest (AUC = 0.8) as compared to other comparisons. Although our study did not follow up the same patient, these results suggest that changes in metabolites at the peak of COVID-19 severity might be minimal and non-reflective of patient stage of severity. Similar results were obtained by Gu et al. in China, who found low separation between sever and critical groups [15]. Our blood analyses reflected the same notion. Indeed, multiple studies have reported increased levels of inflammatory or coagulation markers such as IL-6, CRP, C-reactive proteins, procalcitonin, ferritin and D-dimers in more severe forms of COVID-19 compared to less severe ones [10,6163]. Our data however reported that these indices increased significantly between HC, non-severe and severe patients, but not when comparing severe vs critical patients (S2 Table). Therefore, the inflammatory markers that were reported to be indicators for COVID-19 severity, as well as the studied metabolites, exhibited minimal differences between patients in severe and critical groups. The profile of clinical symptoms was also matched between these two groups (S4F Fig). Taken together, this suggests that the difference in the magnitude of changes in metabolites and inflammatory blood indices between severe and critical cases is minimal, pointing out that other biomarkers might be worth studying at that peak of disease severity, where additional hospitalization care (e.g. intubation) might be needed [37].

The analyses of the pathways suggest unique operating mechanisms in critical phase. For instance, aminoacyl-tRNA biosynthesis was significantly enriched in critical patients. Mining of transcriptomic and proteomic database of aminoacyl tRNA synthetases (aaRSa), essential enzymes in protein translation, revealed an overexpression of many aaRSa in response to infection with three SARS-CoV-2 viruses and that there is a physical interaction between virus M protein and members of these enzymes [64]. In addition, arginine biosynthesis pathway was highly enriched in this group. Generally, l-Arginine levels have been shown to affect T cell function [65,66]. These results are supported by previous studies, which found a reduced proliferation of lymphocytes in critically ill septic patients, which has been linked to a decrease in l-Arginine availability [65].

Combination of metabolites and blood parameters may enhance the stratification of critical COVID-19 patients

Given the low performance of metabolite model in predicting critical patients and the reported correlation between changes in metabolites and blood parameters [10,11], we sought to test whether addition of blood measurements to the metabolites (combined model) would by any means enhance the identification of critical COVID-19 patients. Similar ideas have been done by our group in the context of COVID-19 diagnosis [67]. Indeed, addition of blood parameters to metabolites (combined model) enhanced the predictive and classification power of the metabolites model (single model) for stratifying critical patients. It was also found that the addition of more blood parameters could enhance the performance of model and by extension provide better predictor panel. Similar results were obtained previously by Sindelar et al. [10], who found increased prediction ability of metabolite plus blood model (AUC = 0.7) compared to metabolite model (AUC = 0.6). This suggests the additional clinical value of measuring blood parameters during progression of COVID-19 infection.

We acknowledge that our study has some limitations. The cross-section nature of this study does not allow following up the same patients as they progress through different disease stages, which would have rather given a more precise snapshot of metabolic changes over the disease course. Indeed, doing so is challenging during the time of pandemic. Due to financial limitations, we were not able to study other blood and urine metabolites such as sphingolipids and organic acids. It is worth noting that some of our critically ill patients were either treated at home or were admitted from other clinics making it possible that prior medications given to those patients could have introduced some bias in the measured metabolites. While our study exemplifies how large sample size would allow convenient ML model construction in targeted metabolites-based investigations, a non-targeted screening of these molecules is highly warranted since it offers a complete picture of metabolite reprogramming and enable discovering novel molecules. It is also recommended to include additional patient metadata, especially the underlying metabolic disorders, when it comes to studying metabolic alterations associated with COVID-19 infection.

5 Conclusion

In conclusion, the underlying changes in metabolites were more characteristic, and thus could be important predictors for patients in non-severe and severe stages, but not for those suffering critical disease. Concurrent measurements of blood parameters and key metabolites could enhance the prediction ability of metabolites in those critically ill patients. Our analyses scheme suggests panels of key metabolites that could be used as diagnostic and prognostic markers for COVID-19 infection. Subsequent wide scale validation studies could further consolidate these results and open the door for using them in clinical settings.

Supporting information

S1 Fig. Box plot with whisker showing the mean of normalized counts of the analyzed features in each subjects in various group.

Each blue dot refers to mean normalized count of the respective metabolite subclass in one subject. The horizontal red line refers to the link between the median of the mean normalized counts across all groups and indicates the trends of change across multiple subject’s groups. Y. axes show the mean of normalized counts of metabolites in each subject.

https://doi.org/10.1371/journal.pone.0302977.s001

(TIF)

S2 Fig. Demographic, comorbidities and clinical symptoms of healthy controls and COVID-19 patients grouped by their infection severity.

A. Age (years) of the study participants. Each dot refer to one patient. The significant differences among groups were calculated using one-way ANOVA with post-hock test at a cutoff P-value of 0.05. B. Distribution of sex in all participants. M: Male, F: Female. C-E. Proportions of COVID-19 patients that show respective comorbidity (diabetes, hypertension or both). HTN: Hypertension, DM: Diabetes. F. Proportion of COVID-19 patients showing different symptoms.

https://doi.org/10.1371/journal.pone.0302977.s002

(TIF)

S3 Fig.

Frequency of occurrence of participants with and without comorbidities in patients group (A) and those with certain outcome (B). The figure shows proportions of respective class as a part of the total number of the patients within respective COVID-19 severity group.

https://doi.org/10.1371/journal.pone.0302977.s003

(TIF)

S4 Fig.

A & B. Values of blood parameters in controls and COVID-19 patients grouped by their disease severity. Each dot refers to one participant. Details of numerical values and statistical differences among groups for the laboratory parameters are shown in S2 Table.

https://doi.org/10.1371/journal.pone.0302977.s004

(TIF)

S5 Fig.

PCA plot of the study groups (shown as color-coded circles) based on the normalized counts of metabolites in all subjects (n = 295) (A) and metabolites + blood parameters concentration (B). The distance between points are the Euclidean distance.

https://doi.org/10.1371/journal.pone.0302977.s005

(TIF)

S6 Fig. PCA plot of the study groups (shown as color-coded circles) based on the concentration of each metabolite subclass (i.e. amino acid, bile acids, carnitines) and blood indices (n = 17).

The distance between points are the Euclidean distance.

https://doi.org/10.1371/journal.pone.0302977.s006

(TIF)

S7 Fig.

Score scatter plot of PLS-DA model showing the classification of patients in all groups based on the normalized concentration of all metabolites only (n = 50) (A) and based on normalized concentration of both metabolites and blood (B). Parameters for model evaluation are shown as accuracy, variation between classes (R2Y) and predictive ability (Q2Y).

https://doi.org/10.1371/journal.pone.0302977.s007

(TIF)

S8 Fig.

Score scatter plot of PLS-DA and oPLS-DA models comparing HC vs non-severe subjects (A) and non-severe vs severe patients (B). Each dot refers to one patient.

https://doi.org/10.1371/journal.pone.0302977.s008

(TIF)

S9 Fig. Venn diagrams showing the most important metabolites (features) as revealed by the overlap of two machine-learning models (oPLS-DA and random forest) and the univariate analyses.

The middle intersection among the 3-approaches refers to panel 1, panel 2 and panel 3, details of which are shown in Table 3.

https://doi.org/10.1371/journal.pone.0302977.s009

(TIF)

S10 Fig.

Score scatter plot of PLS-DA and oPLS-DA models discriminating severe and critical COVID-19 cases based on normalized concentrations of metabolites (A) and both metabolites and blood indices (B). Each dot refers to one patient.

https://doi.org/10.1371/journal.pone.0302977.s010

(TIF)

S11 Fig. Performance of random forest model in classifying severe and critical COVID-19 patients using the top 15 metabolites that are previously revealed by the model.

A. Score scatter plot showing the classification of both severe and critical. B. ROC analyses showing the predictability of the model as a classifier. AUC: Area under the curve.

https://doi.org/10.1371/journal.pone.0302977.s011

(TIF)

S12 Fig.

Score scatter plots showing the PLS-DA (A) and oPLS-DA models (B) using the normalized concentration of metabolites to classify patient’s outcomes (survived and dead). Each dot refers to one patient.

https://doi.org/10.1371/journal.pone.0302977.s012

(TIF)

S13 Fig.

A. Top 15 metabolites that are important predictors for patient outcome as revealed by RF model. The metabolites are color-grouped by their class and are ranked descending by their mean decrease in accuracy (the higher the mean decrease in accuracy the more important the metabolite). B. Volcano plot showing the results of the univariate analyses. The figure depicts the relationship between log2FC value of each metabolite (x-axis) against its -log10FDR (y-axis). The pattern of differential expression of each metabolite are color-coded and their class are shape coded. The dashed horizontal line refers to 1.3, the–log10 for a 0.05 FDR. The vertical dashed lines refer to the cutoff that equates to a fold change value of |1.5|.

https://doi.org/10.1371/journal.pone.0302977.s013

(TIF)

S14 Fig. ROC analyses showing the classification ability of comorbidities in discriminating different severity groups of COVID-19 patients.

AUC: Area under the curve.

https://doi.org/10.1371/journal.pone.0302977.s014

(TIF)

S1 Table. Normalized concentration of metabolites analyzed in the study measured in different patients’ group.

https://doi.org/10.1371/journal.pone.0302977.s015

(XLSX)

S2 Table. A. Laboratory parameters of HC and COVID-19 patients of various severity at admission. P-value was calculated using non-parametric Kruskal Wallis test.

B. Significance levels of Pairwise comparison in laboratory parameters between various groups. Red text indicates significant comparisons.

https://doi.org/10.1371/journal.pone.0302977.s016

(XLSX)

S3 Table. Top predictor metabolites discriminating various stages of COVID-19 infection as revealed by oPLS-DA model.

VIP: Variable importance in projection.

https://doi.org/10.1371/journal.pone.0302977.s017

(XLSX)

S4 Table. Ranks of different metabolites in pairwise comparisons as revealed by random forest model.

https://doi.org/10.1371/journal.pone.0302977.s018

(XLSX)

S5 Table. Univariate analyses of all metabolites comparing different groups of participants.

https://doi.org/10.1371/journal.pone.0302977.s019

(XLSX)

S6 Table. Top 20 Significant correlation between pairs of metabolites and blood parameters (Significance was determined by P-value < 0.001) in different patient groups.

R is the correlation coefficient.

https://doi.org/10.1371/journal.pone.0302977.s020

(XLSX)

Acknowledgments

The authors would like to thank members of staff at the Clinical Biochemistry and Molecular Diagnostics Department, National Liver Institute, for their support.

References

  1. 1. Medicine, J.H.U. Coronavirus Resource Center: COVID-19 Map. 2022 [cited 2022; Available from: https://coronavirus.jhu.edu/map.html
  2. 2. (WHO), W.H.O. WHO Director-General’s Opening Remarks at the Media Briefing on COVID-19–3 March 2020. 2020 [cited 2020 3.03.2020]; Available from: www.who.int/dg/speeches/detail/who-director-general-s-openingremarks-at-the-media-briefing-on-covid-19—3-march-2020.
  3. 3. (WHO), W.H.O., WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/, 2023.
  4. 4. Zhang X., et al., Viral and host factors related to the clinical outcome of COVID-19. Nature, 2020. 583(7816): p. 437–440. pmid:32434211
  5. 5. Sanyaolu A., et al., Comorbidity and its Impact on Patients with COVID-19. SN Compr Clin Med, 2020. 2(8): p. 1069–1076. pmid:32838147
  6. 6. Zhang Q., et al., Molecular mechanism of interaction between SARS-CoV-2 and host cells and interventional therapy. Signal Transduction and Targeted Therapy, 2021. 6(1): p. 233. pmid:34117216
  7. 7. Van Treuren W. and Dodd D., Microbial Contribution to the Human Metabolome: Implications for Health and Disease. Annu Rev Pathol, 2020. 15: p. 345–369. pmid:31622559
  8. 8. Kell D.B. and Oliver S.G., The metabolome 18 years on: a concept comes of age. Metabolomics, 2016. 12(9): p. 148. pmid:27695392
  9. 9. Masoodi M., et al., Disturbed lipid and amino acid metabolisms in COVID-19 patients. 2022. 100(4): p. 555–568. pmid:35064792
  10. 10. Sindelar M., et al., Longitudinal metabolomics of human plasma reveals prognostic markers of COVID-19 disease severity. Cell Rep Med, 2021. 2(8): p. 100369. pmid:34308390
  11. 11. Meoni G., et al., Metabolomic/lipidomic profiling of COVID-19 and individual response to tocilizumab. PLOS Pathogens, 2021. 17(2): p. e1009243. pmid:33524041
  12. 12. Maile M.D., et al., Associations of the plasma lipidome with mortality in the acute respiratory distress syndrome: a longitudinal cohort study. Respir Res, 2018. 19(1): p. 60. pmid:29636049
  13. 13. Elrayess M.A., et al., Metabolic Signatures of Type 2 Diabetes Mellitus and Hypertension in COVID-19 Patients With Different Disease Severity. Front Med (Lausanne), 2021. 8: p. 788687. pmid:35083246
  14. 14. Valdes A., et al., Metabolomics study of COVID-19 patients in four different clinical stages. Sci Rep, 2022. 12(1): p. 1650. pmid:35102215
  15. 15. Gu M., et al., Sera Metabolomics Characterization of Patients at Different Stages in Wuhan Identifies Critical Biomarkers of COVID-19. Frontiers in Cellular and Infection Microbiology, 2022. 12. pmid:35586248
  16. 16. Zhu Z., et al., Clinical value of immune-inflammatory parameters to assess the severity of coronavirus disease 2019. Int J Infect Dis, 2020. 95: p. 332–339. pmid:32334118
  17. 17. Tan L.Y., Komarasamy T.V., and Rmt Balasubramaniam V., Hyperinflammatory Immune Response and COVID-19: A Double Edged Sword. Front Immunol, 2021. 12: p. 742941. pmid:34659238
  18. 18. Jia H., et al., Metabolomic analyses reveal new stage-specific features of COVID-19. European Respiratory Journal, 2022. 59(2): p. 2100284. pmid:34289974
  19. 19. Doğan H.O., et al., Understanding the pathophysiological changes via untargeted metabolomics in COVID-19 patients. J Med Virol, 2021. 93(4): p. 2340–2349. pmid:33300133
  20. 20. Fraser D.D., et al., Metabolomics Profiling of Critically Ill Coronavirus Disease 2019 Patients: Identification of Diagnostic and Prognostic Biomarkers. Crit Care Explor, 2020. 2(10): p. e0272. pmid:33134953
  21. 21. Bley H., Schöbel A., and Herker E., Whole Lotta Lipids-from HCV RNA Replication to the Mature Viral Particle. 2020. 21(8).
  22. 22. Mulka K.R., et al., Progression and Resolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Infection in Golden Syrian Hamsters. Am J Pathol, 2022. 192(2): p. 195–207. pmid:34767812
  23. 23. World Health, O., COVID-19 clinical management: living guidance, 15 September 2022. 2022, World Health Organization: Geneva, accessed 19.11.2022
  24. 24. Lodge S., et al., NMR Spectroscopic Windows on the Systemic Effects of SARS-CoV-2 Infection on Plasma Lipoproteins and Metabolites in Relation to Circulating Cytokines. Journal of Proteome Research, 2021. 20(2): p. 1382–1396. pmid:33426894
  25. 25. Páez-Franco J.C., et al., Metabolomics analysis reveals a modified amino acid metabolism that correlates with altered oxygen homeostasis in COVID-19 patients. Sci Rep, 2021. 11(1): p. 6350. pmid:33737694
  26. 26. Sugita T., et al., Analysis of the serum bile Acid composition for differential diagnosis in patients with liver disease. Gastroenterol Res Pract, 2015. 2015: p. 717431. pmid:25821461
  27. 27. Szymańska E., et al., Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics, 2012. 8(Suppl 1): p. 3–16. pmid:22593721
  28. 28. Yan X., et al., Intestinal Flora Modulates Blood Pressure by Regulating the Synthesis of Intestinal-Derived Corticosterone in High Salt-Induced Hypertension. Circ Res, 2020. 126(7): p. 839–853. pmid:32078445
  29. 29. Hayashi M., et al., Comprehensive Serum Glycopeptide Spectra Analysis (CSGSA): A Potential New Tool for Early Detection of Ovarian Cancer. Cancers (Basel), 2019. 11(5). pmid:31035594
  30. 30. Liaw A, W.M., Classification and Regression by randomForest. R News, 2002. 2(3): p. 18–22.
  31. 31. Shapiro S.S.a.W., M.B., An Analysis of Variance Test for Normality (Complete Samples). Biometrika, 1965. 52: p. 591–611.
  32. 32. H, W., ggplot2: Elegant Graphics for Data Analysis. 2016: Springer-Verlag New York.
  33. 33. Pang Z., et al., MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Res, 2021. 49(W1): p. W388–W396. pmid:34019663
  34. 34. Bujak R., et al., Metabolomics for laboratory diagnostics. Journal of Pharmaceutical and Biomedical Analysis, 2015. 113: p. 108–120. pmid:25577715
  35. 35. Asim M., et al., A contemporary insight of metabolomics approach for COVID-19: Potential for novel therapeutic and diagnostic targets. Nepal J Epidemiol, 2020. 10(4): p. 923–927. pmid:33495710
  36. 36. Statsenko Y., et al., Impact of Age and Sex on COVID-19 Severity Assessed From Radiologic and Clinical Findings. Frontiers in Cellular and Infection Microbiology, 2022. 11. pmid:35282595
  37. 37. López-Hernández Y., et al., Targeted metabolomics identifies high performing diagnostic and prognostic biomarkers for COVID-19. Scientific Reports, 2021. 11(1): p. 14732. pmid:34282210
  38. 38. Doerre A. and Doblhammer G., The influence of gender on COVID-19 infections and mortality in Germany: Insights from age- and gender-specific modeling of contact rates, infections, and deaths in the early phase of the pandemic. PLOS ONE, 2022. 17(5): p. e0268119. pmid:35522614
  39. 39. Mosconi F., et al., Some nonlinear challenges in biology. Nonlinearity, 2008. 21(8): p. T131.
  40. 40. Ghosh T., et al., Predictive Modeling for Metabolomics Data. Methods Mol Biol, 2020. 2104: p. 313–336. pmid:31953824
  41. 41. Liu J., et al., Metabolite profile of COVID-19 revealed by UPLC-MS/MS-based widely targeted metabolomics. Front Immunol, 2022. 13: p. 894170. pmid:35924246
  42. 42. Hasan M.R., Suleiman M., and Pérez-López A., Metabolomics in the Diagnosis and Prognosis of COVID-19. Front Genet, 2021. 12: p. 721556. pmid:34367265
  43. 43. Lewis H.M., et al., Metabolomics Markers of COVID-19 Are Dependent on Collection Wave. 2022. 12(8).
  44. 44. Blasco H., et al., The specific metabolome profiling of patients infected by SARS-COV-2 supports the key role of tryptophan-nicotinamide pathway and cytosine metabolism. Scientific Reports, 2020. 10(1): p. 16824. pmid:33033346
  45. 45. Castañé H., et al., Machine learning and semi-targeted lipidomics identify distinct serum lipid signatures in hospitalized COVID-19-positive and COVID-19-negative patients. Metabolism, 2022. 131: p. 155197. pmid:35381232
  46. 46. Shen B., et al., Proteomic and Metabolomic Characterization of COVID-19 Patient Sera. Cell, 2020. 182(1): p. 59–72.e15. pmid:32492406
  47. 47. Xiao N., et al., Integrated cytokine and metabolite analysis reveals immunometabolic reprogramming in COVID-19 patients with therapeutic implications. Nat Commun, 2021. 12(1): p. 1618. pmid:33712622
  48. 48. Herold B.C., et al., Bile salts: natural detergents for the prevention of sexually transmitted diseases. Antimicrob Agents Chemother, 1999. 43(4): p. 745–51. pmid:10103175
  49. 49. Luo L., et al., Chenodeoxycholic Acid from Bile Inhibits Influenza A Virus Replication via Blocking Nuclear Export of Viral Ribonucleoprotein Complexes. 2018. 23(12).
  50. 50. Reese V.C., Oropeza C.E., and McLachlan A., Independent activation of hepatitis B virus biosynthesis by retinoids, peroxisome proliferators, and bile acids. J Virol, 2013. 87(2): p. 991–7. pmid:23135717
  51. 51. Ma Y., et al., Antibiotic-Induced Primary Biles Inhibit SARS-CoV-2 Endoribonuclease Nsp15 Activity in Mouse Gut. Frontiers in Cellular and Infection Microbiology, 2022. 12. pmid:35967852
  52. 52. Ren Z., et al., Alterations in the human oral and gut microbiomes and lipidomics in COVID-19. Gut, 2021. 70(7): p. 1253–1265. pmid:33789966
  53. 53. Sokol H., et al., SARS-CoV-2 infection in nonhuman primates alters the composition and functional activity of the gut microbiota. 2021. 13(1): p. 1–19.
  54. 54. Ridlon J.M., et al., Bile acids and the gut microbiome. Curr Opin Gastroenterol, 2014. 30(3): p. 332–8. pmid:24625896
  55. 55. Dumas A., et al., The role of the lung microbiota and the gut-lung axis in respiratory infectious diseases. Cell Microbiol, 2018. 20(12): p. e12966. pmid:30329198
  56. 56. Pekala J., et al., L-carnitine—metabolic functions and meaning in humans life. Curr Drug Metab, 2011. 12(7): p. 667–78. pmid:21561431
  57. 57. Thangasamy T., et al., Role of L-carnitine in the modulation of immune response in aged rats. Clin Chim Acta, 2008. 389(1–2): p. 19–24. pmid:18083121
  58. 58. Li C., et al., Carnitine and COVID-19 Susceptibility and Severity: A Mendelian Randomization Study. Front Nutr, 2021. 8: p. 780205. pmid:34901126
  59. 59. CA R., et al., Altered amino acid profile in patients with SARS-CoV-2 infection. Proc Natl Acad Sci U S A, 2021. 118(25).
  60. 60. Kastenmüller G., et al., Genetics of human metabolism: an update. Hum Mol Genet, 2015. 24(R1): p. R93–r101. pmid:26160913
  61. 61. Luo X., et al., Prognostic Value of C-Reactive Protein in Patients With Coronavirus 2019. Clin Infect Dis, 2020. 71(16): p. 2174–2179. pmid:32445579
  62. 62. Broman N., et al., IL-6 and other biomarkers as predictors of severity in COVID-19. 2021. 53(1): p. 410–412.
  63. 63. Lin Z., et al., Serum ferritin as an independent risk factor for severity in COVID-19 patients. J Infect, 2020. 81(4): p. 647–679. pmid:32592705
  64. 64. Feng Y., et al., The Landscape of Aminoacyl-tRNA Synthetases Involved in Severe Acute Respiratory Syndrome Coronavirus 2 Infection. Front Physiol, 2021. 12: p. 818297. pmid:35153822
  65. 65. Geiger R., et al., L-Arginine Modulates T Cell Metabolism and Enhances Survival and Anti-tumor Activity. Cell, 2016. 167(3): p. 829–842 e13. pmid:27745970
  66. 66. Li P., et al., Amino acids and immune function. Br J Nutr, 2007. 98(2): p. 237–52. pmid:17403271
  67. 67. Amer R.M., et al., Diagnostic performance of rapid antigen test for COVID-19 and the effect of viral load, sampling time, subject’s clinical and laboratory parameters on test accuracy. J Infect Public Health, 2021. 14(10): p. 1446–1453. pmid:34175237