Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Metabolomic profiling of microbial disease etiology in community-acquired pneumonia

  • Ilona den Hartog,

    Roles Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft

    Affiliation Division of Systems Biomedicine & Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

  • Laura B. Zwep,

    Roles Formal analysis, Software, Validation, Writing – review & editing

    Affiliations Division of Systems Biomedicine & Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands, Mathematical Institute, Leiden University, Leiden, The Netherlands, Leiden Centre of Data Science, Leiden University, Leiden, The Netherlands

  • Stefan M. T. Vestjens,

    Roles Data curation, Resources, Writing – review & editing

    Affiliation Department of Medical Microbiology and Immunology, St. Antonius Hospital, Nieuwegein, The Netherlands

  • Amy C. Harms,

    Roles Data curation, Funding acquisition, Project administration, Resources, Validation, Writing – review & editing

    Affiliation Division of Systems Biomedicine & Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

  • G. Paul Voorn,

    Roles Funding acquisition, Writing – review & editing

    Affiliation Department of Medical Microbiology and Immunology, St. Antonius Hospital, Nieuwegein, The Netherlands

  • Dylan W. de Lange,

    Roles Funding acquisition, Writing – review & editing

    Affiliations Intensive Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands, National Poison Information Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands

  • Willem J. W. Bos,

    Roles Funding acquisition, Resources, Writing – review & editing

    Affiliations Department of Internal Medicine, St. Antonius Hospital, Nieuwegein, The Netherlands, Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands

  • Thomas Hankemeier,

    Roles Conceptualization, Funding acquisition, Resources, Supervision, Writing – review & editing

    Affiliation Division of Systems Biomedicine & Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

  • Ewoudt M. W. van de Garde,

    Roles Conceptualization, Funding acquisition, Resources, Supervision, Writing – review & editing

    Affiliations Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht University, Utrecht, The Netherlands, Department of Clinical Pharmacy, St. Antonius Hospital, Nieuwegein, The Netherlands

  • J. G. Coen van Hasselt

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review & editing

    coen.vanhasselt@lacdr.leidenuniv.nl

    Affiliation Division of Systems Biomedicine & Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

Abstract

Diagnosis of microbial disease etiology in community-acquired pneumonia (CAP) remains challenging. We undertook a large-scale metabolomics study of serum samples in hospitalized CAP patients to determine if host-response associated metabolites can enable diagnosis of microbial etiology, with a specific focus on discrimination between the major CAP pathogen groups S. pneumoniae, atypical bacteria, and respiratory viruses. Targeted metabolomic profiling of serum samples was performed for three groups of hospitalized CAP patients with confirmed microbial etiologies: S. pneumoniae (n = 48), atypical bacteria (n = 47), or viral infections (n = 30). A wide range of 347 metabolites was targeted, including amines, acylcarnitines, organic acids, and lipids. Single discriminating metabolites were selected using Student’s T-test and their predictive performance was analyzed using logistic regression. Elastic net regression models were employed to discover metabolite signatures with predictive value for discrimination between pathogen groups. Metabolites to discriminate S. pneumoniae or viral pathogens from the other groups showed poor predictive capability, whereas discrimination of atypical pathogens from the other groups was found to be possible. Classification of atypical pathogens using elastic net regression models was associated with a predictive performance of 61% sensitivity, 86% specificity, and an AUC of 0.81. Targeted profiling of the host metabolic response revealed metabolites that can support diagnosis of microbial etiology in CAP patients with atypical bacterial pathogens compared to patients with S. pneumoniae or viral infections.

Introduction

Community-acquired pneumonia (CAP) is a commonly occurring respiratory tract infection caused by bacterial or viral pathogens that can lead to severe disease, especially in elderly patients [1]. The predominant pathogens found in hospitalized CAP patients are Streptococcus pneumoniae and to a lesser extent, Haemophilus influenzae, Legionella pneumophila, and respiratory viruses [2, 3]. Patients hospitalized with severe CAP typically receive empirical antibiotic treatment with broad-spectrum antibiotics until the microbial etiology is determined [4, 5]. Current standard diagnostic methods for microbial identification are pathogen-targeted and include culturing, antigen testing, and molecular diagnostics such as PCR [5]. In over 60% of CAP patients, no causative pathogen can be identified with these pathogen-targeted diagnostic techniques [2, 6]. As a consequence, broad-spectrum antibiotics are over-used, which facilitates the emergence of antimicrobial resistance [7, 8]. To this end, a need exists to explore innovative methods to enhance the diagnostic performance for the detection of microbial pathogens in CAP.

Evaluation of differences in the host-response to CAP-associated pathogens may be an alternative approach to improve diagnosis [9]. There is growing evidence that the host, i.e. the patient, metabolic response to infections can be a relevant source of novel host immune response biomarkers to infections [10, 11]. Several small studies have reported differences in metabolite profiles in blood and urine samples in patients with different types of infections (S1 Table) [1218]. For instance, studies comparing metabolomic changes in CAP and tuberculosis (TB) patients show increased levels of plasma lipids and decreased levels of metabolites involved in cholesterol synthesis [12, 15]. A study comparing viral and bacterial respiratory tract infections showed that plasma metabolite profiles of patients with influenza A and bacterial pneumonia differed significantly [17]. In another study, urine samples of patients with a respiratory syncytial virus (RSV) or a bacterial respiratory tract infection showed differences in metabolite levels as well [18]. An important limitation of these studies is that the comparisons made cannot yet support the etiological diagnosis of CAP but merely focus on differences between diseases such as TB versus CAP. The studies that compared viral and bacterial causative pathogen groups of CAP used an untargeted metabolomics approach. While an untargeted approach is especially useful for the discovery of new features and hypothesis-free analysis, a targeted approach that can be fully quantified to clinical laboratory standards may be preferable for clinical implementation. Furthermore, these studies have the limitation that they focus on the comparison of pediatric patients while most hospitalized CAP patients are adults. No studies have evaluated differences in metabolite profiles of CAP patients comparing different microbial etiologies relevant for treatment of CAP, i.e. S. pneumoniae, atypical pathogens, and viral infections.

In the current study, we performed extensive targeted metabolomic profiling for three groups of hospitalized CAP patients with confirmed microbial etiologies of S. pneumoniae, atypical bacteria, or viral infections. We aimed to determine whether host-response associated metabolites can enable diagnosis of microbial etiology, focusing on discrimination between the pathogen groups S. pneumoniae, atypical bacteria, and respiratory viruses in patients hospitalized with CAP.

Materials and methods

Study population

Serum samples were taken from 505 patients that were diagnosed with CAP in two previously conducted clinical studies that were executed between October 2004 and September 2010 [2, 3]. The samples were taken from CAP patients within 24 hours after hospital admission. In 57% of these patient samples, the causative pathogen could be identified using conventional diagnostic methods such as culturing, PCR, and urinary antigen tests. The most commonly found causative pathogen in these patients was S. pneumoniae, followed by atypical bacterial and viral pathogens. A minority of patients was diagnosed with other bacteria.

From the selection of patients in which a causative pathogen was identified, we excluded patients with mixed infections. Furthermore, we constructed three distinctive groups of patients with Streptococcus pneumoniae, atypical (Coxiella burnetii, Chlamydophila psittaci, Legionella pneumophila or Mycoplasma pneumoniae), or viral (influenza virus, herpes simplex virus (HSV), respiratory syncytial virus (RSV), parainfluenza virus, or another respiratory virus) infections. The number of available samples for the patient group with confirmed viral CAP infection was limited (n = 31). The patients included in the S. pneumoniae and atypical bacterial groups were randomly drawn from the remaining study population in an iterating fashion until the bacterial groups were composed in such a way that three groups showed comparable means for sex and pneumonia severity index scores. This resulted in a group of 49 patients with S. pneumoniae and a group of 50 patients with atypical infections (Fig 1). No matching of individual samples was performed. An overview of patient characteristics is provided in Table 1 and S2 Table. Patient characteristics that might be considered as possible covariates were: age, sex, nursing home resident, renal disease, congestive heart failure, CNS disease, malignancy, COPD, diabetes, altered mental status, respiratory rate, systolic blood pressure, temperature, pulse, pH, BUN, sodium, glucose, hematocrit, partial pressure of oxygen, pleural effusion on x-ray, duration of symptoms before admission, antibiotic treatment before admission. The analyses performed in this study were executed conform the informed consent given by the patients. The clinical data was anonymized before use.

thumbnail
Fig 1. Flow chart of the formation of the three studied patient groups.

https://doi.org/10.1371/journal.pone.0252378.g001

Bioanalytical procedures

Serum samples were analyzed with five liquid chromatography methods and one gas chromatography, mass spectrometry-based, targeted, metabolomics method. The metabolomics profiling covered 596 metabolite targets from 25 metabolite classes, including amino acids, biogenic amines, acylcarnitines, organic acids, and multiple classes of lipids (S3 Table). Levels of 374 unique metabolites were detected in the samples. The metabolomic profiling was performed within the Biomedical Metabolomics Facility of Leiden University in Leiden, The Netherlands. Details of the metabolomic analysis methods used are provided in S1 Method.

Data analysis

The data resulting from the metabolomic profiling was cleaned by removing patient samples with more than 10 missing metabolite values, for example, if results from one measurement platform were missing because of too low sample volumes, and by removing metabolites with missing patient samples, for example, because of a sample preparation error. The clean dataset consisted of 347 metabolite levels (S4 Table) for 125 patients diagnosed with the microbial etiology S. pneumoniae (n = 48), atypical (n = 47), or viral (n = 30). The pathogens identified in each group are shown in Table 2. The resulting metabolite levels were preprocessed by applying log transformation and standardized to correct for heteroscedasticity. The preprocessed metabolomics dataset was visually inspected using a principal component analysis.

thumbnail
Table 2. Distribution of causative microbial agents per pathogen group for statistical data analysis.

https://doi.org/10.1371/journal.pone.0252378.t002

Data imputation was performed for patient characteristics that were to be evaluated as covariates in the statistical analysis and showed missingness in the data. Five times repeated imputation using predictive mean matching was performed with the ‘mice’ package for R to impute the patient data for the covariates with less than 25% missing data. Predictive mean matching is suitable for both numeric and binary covariates. Patient characteristics with >25% missing data were excluded from further analysis.

We performed logistic regression and elastic net regression modeling to determine if patients in one pathogen group could be discriminated from patients in the remaining two groups. Also, we aimed to determine which metabolites were important for prediction of the causative pathogen. In both methods, five-fold cross-validation was used to make the most efficient use of the available data for estimation of the predictive performance of the models and its associated metabolites [19]. Furthermore, the model generation was repeated 100 times to obtain robust estimates of the predictive performance of the models.

To identify single discriminative metabolites, Student’s T-tests with false discovery rate (FDR) multiple testing corrections were performed (p < 0.05). Then, significant metabolites and a combination of significant metabolites were modeled using logistic regression. Also, models containing covariates age and sex and all covariates were generated. The predictive logistic regression models were analyzed by comparison of their area under the curve (AUC), sensitivity, specificity, balanced error rate (BER), and receiver operating characteristic (ROC) curve.

Elastic net regression was performed to test if the predictive power of the metabolite data could be increased by including correlations between metabolites in addition to evaluating single metabolites. In elastic net regression, metabolites that have no explanatory power can be set to zero, as in a lasso regression, and metabolites that explain the same amount of variance can all be included with balanced coefficient sizes, as in a ridge regression [20].

To obtain robust estimates of the predictive performance of the elastic net model, hyperparameters were optimized in a five-fold nested-cross validation, where the hyperparameters were selected truly independent of the calculation of the predictive performance, as is schematically shown in Fig 2 [21]. In the inner cross-validation loop, the model optimization loop, optimal values for model hyperparameters α and λ were determined. In the outer cross-validation loop, the model performance loop, the optimal model for the training fold was built on the set hyperparameters α and λ (S1 Fig). Hyperparameter selection was performed using the balanced error rate (BER), which can be calculated from the true- and false positive (TP, FP), and true- and false-negative rates (TN, FN, Eq 1). The BER accounts for different group sizes per model and therefore gives an accurate picture of the performance of models in the model optimization and model performance loop.

(1)
thumbnail
Fig 2. Schematic representation of stratified nested cross-validation for elastic net regression model optimization and performance [21].

Abbreviations: CV: cross-validation.

https://doi.org/10.1371/journal.pone.0252378.g002

The overall predictive diagnostic performance was evaluated using sensitivity and specificity performance measures, generated from the confusion matrix that represents the number of samples falling into each possible outcome (Eq 23). The average sensitivity and specificity of all 500 generated models and its standard deviation were used to compare the assay performance to currently used methods.

(2)(3)

The relative contribution of metabolites to provide predictions of the expected pathogen group were quantified using the variable importance in prediction (VIP) score, expressed as a percentage. The VIP score was calculated per metabolite per fold or repeat as follows: (4) where βj is the regression coefficient for fold j over the sum of all regression coefficient values in the model. Metabolites were arranged based on their mean VIP score over all folds and repeats. Metabolites with an absolute VIP > 1% were considered to be most important. Furthermore, to determine the need to include age and sex, or all covariates in the models we compared the BER for models with and without age and sex, or all covariates included. Finally, mean AUC values and ROC curves were calculated and generated to compare the performance of the elastic net models to the logistic regression models.

The scripts used for the statistical analyses were deposited in Github at http://github.com/vanhasseltlab/MetabolomicsEtiologyCAP.

Results

Metabolomics profiling and exploratory analysis of metabolomics data

Metabolomics profiling was performed for 130 patients and 596 metabolite targets. Preprocessing of the metabolomics dataset resulted in a reduced dataset including 125 patients and 347 metabolites (Fig 1). The patient characteristics of these 125 patients are displayed in Table 1. The patients were diagnosed with the microbial etiology S. pneumoniae (n = 48), atypical bacteria (n = 47), or respiratory virus (n = 30) (Table 2). A list of all targeted and detected metabolites and their identifiers can be found in S4 Table. Unsupervised principal component analysis showed no clear separation between pathogen groups (S2 Fig).

Single discriminating metabolites for pathogen groups

Three significant metabolites were found for the discrimination of atypical pathogens from S. pneumoniae and viral pathogens using a Student’s T-test with FDR multiple testing correction (p < 0.05): glycylglycine, symmetric dimethylarginine (SDMA), and lysophosphatidylinositol (18:1) (LPI (18:1)). For the other comparisons, no significantly discriminating metabolites were found.

The significantly differentiating metabolites were included in logistic regression models to differentiate patients with atypical pathogens from patients suffering from CAP caused by S. pneumoniae or viral pathogens. The logistic regression models were evaluated based on their AUC, sensitivity, specificity, BER, and ROC curve after fivefold cross-validation with 100 repeats (Table 3, Fig 3). They show that logistic regression models of the individual metabolites glycylglycine, SDMA, and LPI(18:1) can differentiate atypical pathogens from S. pneumoniae and viral pathogens with AUCs between 0.70–0.72, sensitivities between 0.32–0.36, sensitivities between 0.83–0.85, and BERs of 0.39–0.41. A logistic regression model including all three significantly discriminating metabolites yields a more successful separation with an AUC of 0.78, sensitivity of 0.57, specificity of 0.83, and BER of 0.30. Addition of the covariates age and sex to the three metabolite model, slightly improved the predictive performance of the model resulting in a sensitivity of 0.63 and a specificity of 0.84. This model also showed the highest AUC (0.79) and lowest BER (0.26) of the tested logistic regression models. The addition of other covariates to the logistic regression model resulted in lower performance, probably due to overfitting of the model. The ROC curves emphasize the increased model performance upon the addition of more discriminating metabolites to the logistic regression model (Fig 3).

thumbnail
Fig 3. ROC curves of the results from logistic regression and elastic net regression models that were tested in five-fold cross-validation with 100 repeats for the comparisons: atypical versus S. pneumoniae and viral pathogens; S. pneumoniae pathogens versus atypical and viral pathogens; and viral versus S. pneumoniae and atypical pathogens.

Abbreviations: LR: logistic regression, EN: elastic net regression, SDMA: symmetric dimethylarginine, LPI (18:1): lysophosphatidylinositol (18:1).

https://doi.org/10.1371/journal.pone.0252378.g003

thumbnail
Table 3. Results from the logistic regression and elastic net regression models that were tested in a fivefold cross-validation with 100 repeats.

https://doi.org/10.1371/journal.pone.0252378.t003

Predictive metabolites for diagnosis of CAP-associated pathogens

Elastic net models including multiple metabolites were fit to discriminate S. pneumoniae, atypical bacterial, and viral pathogens from the remaining two groups (e.g., S. pneumoniae versus atypical bacterial and viral pathogens). Elastic net models separating patients with atypical bacterial pathogens from patients with S. pneumoniae and viral infections resulted in a mean AUC of 0.81, a sensitivity of 0.61, a specificity of 0.86, and a BER of 0.26. Prediction of S. pneumoniae or viral infection etiologies showed lower predictive capabilities with AUC’s of 0.74 and 0.63, high sensitivities of 0.83 and 0.89, but low specificities of 0.5 and 0.23, and BER’s of 0.33 and 0.44, respectively (Table 3).

We included the covariates age and sex, and all covariates in the elastic net models to account for potential confounding effects. The addition of these covariates showed no improved performance of the elastic net models for differentiation of atypical pathogens or S. pneumoniae from the other groups. For the differentiation of viral pathogens from the other two pathogen groups, a slight performance improvement was seen upon the addition of the covariates age and sex resulting in an AUC of 0.63, a sensitivity of 0.89, a specificity of 0.23, and a BER of 0.44 (Table 3).

The ROC curves for the separation of atypical pathogens from S. pneumoniae and viral pathogens show that elastic net models perform better than the logistic regression models for single metabolites. However, the logistic regression model including the three significant metabolites and the covariates age and sex shows similar performance as the elastic net regression which included 100 metabolites on average (Fig 3).

Metabolite classes predictive for atypical bacterial pathogens

Focusing on the metabolites that have shown to be predictive for atypical bacterial pathogens, i.e., the only comparison with clinically relevant predictive performance, we identified 26 metabolites with an absolute VIP > 1% using elastic net regression (Fig 4). The metabolites originated from multiple metabolite classes. However, the classes of biogenic amines and lysophospholipids were well represented (4–5 metabolites per class), compared to the other classes. The number of metabolites included in the models varied across folds without a clear correlation to the BER. Commonly, models including all metabolites were favored, followed by models including 20–100 metabolites (S3 Fig). We visualized the separation of the different pathogens in the atypical pathogen group using an unsupervised PCA analysis including all metabolites. The PCA plot indicated that no clear sub-group is present within the atypical group that would prominently drive the separation from the S. pneumoniae and viral infections (S4 Fig).

thumbnail
Fig 4. Variable importance of metabolites for the prediction of an atypical bacterial infection versus S. pneumoniae and viral infections.

Only metabolites with an absolute mean percentage of influence > 1% are visualized.

https://doi.org/10.1371/journal.pone.0252378.g004

Discussion

Targeted profiling of the host metabolic response revealed metabolites that can support the diagnosis of microbial etiology in CAP patients with atypical bacterial pathogens compared to patients with S. pneumoniae or viral infections. CAP patients suffering from S. pneumoniae and viral infection could not be as successfully discriminated from the other groups based on the metabolic host-response.

The currently used clinical assays still outperform the metabolomics host-response assays developed in this study. For atypical pathogens, the sensitivity of 63% and specificity of 86% reported in this study are lower than the current urinary antigen tests for detection of Legionella pneumophila which shows a sensitivity of approximately 70% and a specificity up to 96% [22]. For detection of S. pneumoniae, the 83% sensitivity reached with the metabolomics-based assay outperforms the current antigen tests that show 70% sensitivity. However, the specificity of the metabolomics-based assay is only 50% while antigen tests reach specificity up to 96% [23, 24]. PCR assays of nasopharyngeal swabs for viral pathogens show sensitivities of up to 96% for influenza viruses A and B [25]. Our viral metabolomics-based assay shows a good sensitivity of 89% as well. However, the specificity of this assay is with 23% very low. The expected clinical utility of the studied metabolite classes as host-response biomarkers for etiological diagnosis of CAP may therefore be considered limited.

The combination of the metabolites glycylglycine, SDMA, and LPI (18:1) and the covariates age and sex showed predictive capacities similar to elastic net models including 100 metabolites in the comparison of atypical pathogens versus S. pneumoniae and viral pathogens. This result suggests that a simple model might perform as well as a more complex elastic net model, which is an important finding when considering the use of these biomarkers for clinical diagnostic applications, e.g., where a limited set of 3 metabolites is preferable.

Glycylglycine, a biogenic amine, showed to be significantly contributing to the differentiation of atypical pathogens from the other pathogens, but was not often included in elastic net models. In contrast, SDMA and LPI (18:1) were often included in the elastic net models as was shown in the overview of the 26 most influential metabolites. Metabolites of the classes biogenic amines and lysophospholipids, to which SDMA and LPI (18:1) have been assigned, were most represented in the 26 most influential metabolites compared to other metabolite classes in the comparison of atypical versus S. pneumoniae and viral pathogens. A comparison of the most influential metabolites in this study to metabolites of interest reported in previous studies of metabolomics in CAP patients shows limited overlap. Major reasons for this could be that (i) not all studies measured the same set of metabolic classes; (ii) some other studies poorly controlled patient comparator groups; and (iii) difference in bioanalytical methodologies, e.g. the use of NMR or MS as analytical method with their respective (dis)advantages might provide different results [26]. For example, most lipids found to be predictive in this study have not been reported previously, most likely because the applied bioanalytical methodologies did not allow their detection. However, some overlap was found between the most influential metabolites for the comparison of atypical versus S. pneumoniae and viral pathogens in this study, and the metabolites of interest from other metabolomics studies involving CAP patients. The amino acid alanine was found in multiple studies [14, 16, 17]. Ceramide (d18:1/16:0), two diacyl-phosphatidylcholines, and diacyl-phosphatidylethanolamine (38:2) were found in other studies as well, the latter in the form of choline and ethanolamine [15, 16, 18]. Lactic acid was identified by several other metabolomics studies to respiratory bacterial and viral infections [12, 14, 17]. Lactic acid levels are also known to rise in case of severe disease. However, because the three pathogen groups were balanced in terms of disease severity and, for example, did not show significant differences in pH levels, we hypothesize that the differences in lactate levels are, in this case, an effect of the pathogen-specific host-response to infection. The result showed that models including disease severity covariates do not perform better than models without these confounders, thus supporting this hypothesis. Finally, 3-hydroxyisovaleric acid and betaine have been reported in a previous study comparing viral and bacterial pneumonia [18]. The overlap in these findings may provide insights into common metabolic responses to pathogens involved in CAP.

Multiple biological processes besides infection can influence metabolic processes in patients. Inclusion of age and sex in the models did not improve the predictive performance of the elastic net models for atypical bacteria and S. pneumoniae but did improve the model for viral pathogens. The average age in the viral pathogen group was higher than in the other groups, which could explain this result. For the other comparisons, we see that a model including age and sex or more covariates does not outperform models without these possible confounders. This doesn’t imply there is no metabolomic effect of age in the bacterial pathogen groups but implies that the separation between bacterial pathogen groups is more dependent on the metabolomic host-response to the infection than on the age-related metabolomic changes. In this study, we included patients with mild to severe CAP, reflecting the target patient population for which improvements in a diagnostic assay are required. However, the combination of samples from patients with different disease severities may negatively influence the predictive capabilities of the model because the effect from the causative pathogen on the host-metabolism may be less pronounced for less severe disease [27]. However, separating the patients into groups with comparable disease severity scores would decrease the power for statistical analysis. Furthermore, no standardization of sampling times and conditions was applied, e.g., patients had not fasted before blood sampling, which may influence the metabolite patterns found. Since variations in sampling conditions were unknown, we were unable to consider these in our analyses. However, we expect that the impact of not standardizing and correcting for these factors is limited because the noise in metabolite levels introduced by these factors is expected to be random with regard to the pathogen groups compared in this study. A standardized sampling approach could improve the sensitivity of the models to detect predictive metabolites because some noise is reduced. However, the specificity of the models with respect to the prediction of specific pathogens would be unchanged, since no correlation with pathogen groups is likely.

The sample size of this study (n = 125) was relatively large compared to studies researching metabolomic differences between causative pathogens of CAP that included approximately 70 patients [17, 18]. The compared groups S. pneumoniae, atypical bacteria, and viruses were chosen because antibiotic treatment strategies differ between these three groups. Ideally, we would have further investigated differences within studied groups, e.g. to identify metabolic responses to specific pathogens within the atypical pathogens and viral infection groups. For example, it would be of interest to study Legionella species more in-depth because their intracellular growth might result in a differentiated host-response. However, this was considered not feasible in this study due to sample size restrictions. The heterogeneous pathogen population in the atypical bacterial and viral pathogen groups might have lowered the predictive performance of the metabolomic analysis. Studying the individual pathogens in bigger sample sizes might reveal more characteristic metabolite signatures. In this study, no control group was included because the goal of the study was to provide a faster and optimal diagnostic method and a guide for antibiotic treatment in hospitalized CAP patients. In further studies, it would be preferable to include patients with all causes of CAP, including the remaining microorganisms, which were excluded in the current study because of their low frequency, to enable a more comprehensive comparison with current clinical assays. In this study, CAP patients with unknown pathogens were excluded. In a follow-up study, the metabolite pattern of the patients with unknown causative pathogens could be compared to the metabolite patterns of the distinguished pathogen groups to gain more information about the metabolomic resemblance of the samples in which pathogens could and could not be identified using the conventional diagnostic techniques.

Metabolomics analysis resulted in some missing data because of sample preparation errors or the limited volume of the samples. Because the measurement platforms covered multiple metabolites within one pathway, metabolites with missing data could be removed without influencing the final results. Some patient samples had to be removed because of multiple missing metabolite levels, for example, if the results from a whole metabolomics platform were missing. Data imputation was not performed for the metabolomics data, because the wide range of patients included in the dataset did, in our opinion, not provide enough information for accurate data imputation.

In summary, this comprehensive analysis of the host metabolic response across multiple metabolic classes and based on a well-balanced study cohort of CAP patients has shown the possibility to identify atypical pathogens in CAP and limited utility of predicting S. pneumoniae and viral infection disease etiologies.

Supporting information

S1 Method. Details on metabolomic sample analysis.

https://doi.org/10.1371/journal.pone.0252378.s001

(DOCX)

S1 Fig. Optimization of α and λ in the inner Cross-Validation (CV) to reach a minimal Balanced Error Rate (BER) in the outer CV.

(A) Shows all α and λ values tested in inner CV against mean BER of the inner CV. (B) A plot of the optimal α and λ combinations chosen in the inner CV against their BER in the outer CV shows a variety of favorable α and λ concentrations. (C) A plot of the number of variables selected in the elastic net model in outer CV shows that with increasing alpha, the number of variables decreases as is expected in an elastic net model. The data shown in the Fig is a result of the comparison Atypical–(S. pneumoniae + viral).

https://doi.org/10.1371/journal.pone.0252378.s002

(DOCX)

S2 Fig. Unsupervised Principal Component Analysis (PCA) plot of all pathogen groups.

https://doi.org/10.1371/journal.pone.0252378.s003

(DOCX)

S3 Fig.

(A) Boxplot of BER per number of variables selected shows no clear relation between the number of variables selected and model performance. (B) Histogram of the number of variables selected shows that a model with all metabolites included is favored, followed by models including 34, 49, 82, 24, or 45 metabolites. Both Figs contain the data of all folds and repeats (n = 500) for the comparison between atypical versus S. pneumoniae and viral infections.

https://doi.org/10.1371/journal.pone.0252378.s004

(DOCX)

S4 Fig. Principal Component Analysis (PCA) of the atypical pathogen group (log-transformed and standardized data) shows that there is no clear subgroup within the atypical group that would prominently drive the separation from the S. pneumoniae and viral infections.

https://doi.org/10.1371/journal.pone.0252378.s005

(DOCX)

S1 Table. Summary of previous studies focusing on bacterial and viral respiratory tract infections and related metabolites.

https://doi.org/10.1371/journal.pone.0252378.s006

(DOCX)

S2 Table. Additional patient characteristics per pathogen group.

https://doi.org/10.1371/journal.pone.0252378.s007

(DOCX)

S3 Table. Overview of the number of metabolites included in the metabolomics platforms, measured in the samples and included in the data analysis.

https://doi.org/10.1371/journal.pone.0252378.s008

(DOCX)

S4 Table. Information on measurement platforms used, metabolite classes targeted per platform, targeted metabolites, their abbreviations and names in R (if detected) and identifiers (if available).

https://doi.org/10.1371/journal.pone.0252378.s009

(XLSX)

S5 Table. Metabolomics data after quality control.

https://doi.org/10.1371/journal.pone.0252378.s010

(CSV)

References

  1. 1. Kothe H, Bauer T, Marre R, Suttorp N, Welte T, Dalhoff K, et al. Outcome of community-acquired pneumonia: Influence of age, residence status and antimicrobial treatment. Eur Respir J. 2008 Jul 1;32(1):139–46. pmid:18287129
  2. 2. Meijvis SCA, Hardeman H, Remmelts HHF, Heijligenberg R, Rijkers GT, Van Velzen-Blad H, et al. Dexamethasone and length of hospital stay in patients with community-acquired pneumonia: A randomised, double-blind, placebo-controlled trial. Lancet. 2011;377(9782):2023–30. pmid:21636122
  3. 3. Endeman H, Schelfhout V, Paul Voorn G, van Velzen-Blad H, Grutters JC, Biesma DH. Clinical features predicting failure of pathogen identification in patients with community acquired pneumonia. Scand J Infect Dis. 2008 Jan 8;40(9):715–20. pmid:19086245
  4. 4. Wunderink RG, Waterer GW. Community-Acquired Pneumonia. Solomon CG, editor. N Engl J Med. 2014 Feb 6;370(6):543–51. pmid:24499212
  5. 5. Wiersinga WJ, Bonten MJ, Boersma WG, Jonkers RE, Aleva RM, Kullberg BJ, et al. Management of community-acquired pneumonia in adults: 2016 guideline update from the Dutch Working Party on Antibiotic Policy (SWAB) and Dutch Association of Chest Physicians (NVALT). Neth J Med. 2018;76(1).
  6. 6. Postma DF, van Werkhoven CH, van Elden LJR, Thijsen SFT, Hoepelman AIM, Kluytmans JAJW, et al. Antibiotic Treatment Strategies for Community-Acquired Pneumonia in Adults. N Engl J Med. 2015 Apr 2;372(14):1312–23. pmid:25830421
  7. 7. Bjarnason A, Westin J, Lindh M, Andersson LM, Kristinsson KG, Löve A, et al. Incidence, etiology, and outcomes of community-acquired pneumonia: A population-based study. Open Forum Infect Dis. 2018 Feb 1;5(2).
  8. 8. WHO. Antimicrobial resistance: global report on surveillance. 2014.
  9. 9. Saleh MAA, Van Hasselt CJG, Van De Garde EMW. Host-response biomarkers for the diagnosis of bacterial respiratory tract infections. Clin Chem Lab Med. 2018;
  10. 10. Pearce EL, Pearce EJ. Metabolic pathways in immune cell activation and quiescence. Vol. 38, Immunity. Cell Press; 2013. p. 633–43. https://doi.org/10.1016/j.immuni.2013.04.005 pmid:23601682
  11. 11. Khovidhunkit W, Kim MS, Memon RA, Shigenaga JK, Moser AH, Feingold KR, et al. Effects of infection and inflammation on lipid and lipoprotein metabolism: Mechanisms and consequences to the host. Vol. 45, Journal of Lipid Research. Lipid Research Inc.; 2004. p. 1169–96. https://doi.org/10.1194/jlr.R300019-JLR200 pmid:15102878
  12. 12. Zhou A, Ni J, Xu Z, Wang Y, Zhang H, Wu W, et al. Metabolomics specificity of tuberculosis plasma revealed by (1)H NMR spectroscopy. Tuberculosis (Edinb). 2015 May 1;95(3):294–302. pmid:25736521
  13. 13. Laiakis EC, Morris GAJ, Fornace AJ, Howie SRC, Howie SRC. Metabolomic Analysis in Severe Childhood Pneumonia in The Gambia, West Africa: Findings from a Pilot Study. Lau ATY, editor. PLoS One. 2010 Sep 9;5(9):e12655. pmid:20844590
  14. 14. Slupsky CM, Rankin KN, Fu H, Chang D, Rowe BH, Charles PGP, et al. Pneumococcal Pneumonia: Potential for Diagnosis through a Urinary Metabolic Profile. J Proteome Res. 2009 Dec 4;8(12):5550–8. pmid:19817432
  15. 15. Lau SKP, Lee K-C, Curreem SOT, Chow W-N, To KKW, Hung IFN, et al. Metabolomic Profiling of Plasma from Patients with Tuberculosis by Use of Untargeted Mass Spectrometry Reveals Novel Biomarkers for Diagnosis. Land GA, editor. J Clin Microbiol. 2015 Dec 1;53(12):3750–9. pmid:26378277
  16. 16. Antcliffe D, Jiménez B, Veselkov K, Holmes E, Gordon AC. Metabolic Profiling in Patients with Pneumonia on Intensive Care. EBioMedicine. 2017 Apr;18:244–53. pmid:28373096
  17. 17. Banoei MM, Vogel HJ, Weljie AM, Kumar A, Yende S, Angus DC, et al. Plasma metabolomics for the diagnosis and prognosis of H1N1 influenza pneumonia. Crit Care. 2017 Apr 19;21(1):97. pmid:28424077
  18. 18. Adamko DJ, Saude E, Bear M, Regush S, Robinson JL. Urine metabolomic profiling of children with respiratory tract infections in the emergency department: a pilot study. BMC Infect Dis. 2016;16(1):439. pmid:27549246
  19. 19. Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics. 2006 Feb 23;7:91. pmid:16504092
  20. 20. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Statistical Methodol. 2005 Apr;67(2):301–20.
  21. 21. Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF. GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform. 2005 Aug 1;74(7–8):491–503. pmid:15967710
  22. 22. Yzerman EPF, Den Boer JW, Lettinga KD, Schellekens J, Dankert J, Peeters M. Sensitivity of three urinary antigen tests associated with clinical severity in a large outbreak of Legionnaires’ disease in the Netherlands. J Clin Microbiol. 2002 Sep;40(9):3232–6. pmid:12202558
  23. 23. Gutierrez F, Masia M, Rodriguez JC, Ayelo A, Soldan B, Cebrian L, et al. Evaluation of the Immunochromatographic Binax NOW Assay for Detection of Streptococcus pneumoniae Urinary Antigen in a Prospective Study of Community‐Acquired Pneumonia in Spain. Clinical Infectious Diseases Feb, 2003 p. 286–92. pmid:12539069
  24. 24. Sordé R, Falcó V, Lowak M, Domingo E, Ferrer A, Burgos J, et al. Current and potential usefulness of pneumococcal urinary antigen detection in hospitalized patients with community-acquired pneumonia to guide antimicrobial therapy. Arch Intern Med. 2011 Jan 24;171(2):166–72. pmid:20876397
  25. 25. Van Elden LJR, Nijhuis M, Schipper P, Schuurman R, Van Loon AM. Simultaneous detection of influenza viruses A and B using real-time quantitative PCR. J Clin Microbiol. 2001;39(1):196–200. pmid:11136770
  26. 26. Emwas AH, Roy R, McKay RT, Tenori L, Saccenti E, Nagana Gowda GA, et al. Nmr spectroscopy for metabolomics research. Vol. 9, Metabolites. MDPI AG; 2019.
  27. 27. Ning P, Zheng Y, Luo Q, Liu X, Kang Y, Zhang Y, et al. Metabolic profiles in community-acquired pneumonia: developing assessment tools for disease severity. Crit Care. 2018 Dec 14;22(1):130. pmid:29759075