Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Computational reassessment of RNA-seq data reveals key genes in active tuberculosis

  • Rakesh Arya ,

    Contributed equally to this work with: Rakesh Arya, Hemlata Shakya

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Biotechnology, Yeungnam University, Gyeongsan, Gyeongbuk, South Korea

  • Hemlata Shakya ,

    Contributed equally to this work with: Rakesh Arya, Hemlata Shakya

    Roles Formal analysis, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Biomedical Engineering, Shri G. S. Institute of Technology and Science, Indore, Madhya Pradesh, India

  • Reetika Chaurasia ,

    Roles Data curation, Formal analysis, Investigation, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    kimjj@ynu.ac.kr (JJK); reetika.chaurasia@yale.edu (RC)

    Affiliation Department of Internal Medicine, Section of Infectious Diseases, Yale University School of Medicine, New Haven, CT, United States of America

  • Surendra Kumar,

    Roles Formal analysis, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Orthopaedic Surgery, The Johns Hopkins University School of Medicine, Baltimore, MD, United States of America

  • Joseph M. Vinetz,

    Roles Formal analysis, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Internal Medicine, Section of Infectious Diseases, Yale University School of Medicine, New Haven, CT, United States of America

  • Jong Joo Kim

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Resources, Validation, Visualization, Writing – original draft, Writing – review & editing

    kimjj@ynu.ac.kr (JJK); reetika.chaurasia@yale.edu (RC)

    Affiliation Department of Biotechnology, Yeungnam University, Gyeongsan, Gyeongbuk, South Korea

Correction

4 Mar 2025: Arya R, Shakya H, Chaurasia R, Kumar S, Vinetz JM, et al. (2025) Correction: Computational reassessment of RNA-seq data reveals key genes in active tuberculosis. PLOS ONE 20(3): e0319694. https://doi.org/10.1371/journal.pone.0319694 View correction

Abstract

Background

Tuberculosis is a serious life-threatening disease among the top global health challenges and rapid and effective diagnostic biomarkers are vital for early diagnosis especially given the increasing prevalence of multidrug resistance.

Methods

Two human whole blood microarray datasets, GSE42826 and GSE42830 were retrieved from publicly available gene expression omnibus (GEO) database. Deregulated genes (DEGs) were identified using GEO2R online tool and Gene Ontology (GO), protein-protein interaction (PPI) network analysis was performed using Metascape and STRING databases. Significant genes (n = 8) were identified using T-test/ANOVA and Molecular Complex Detection (MCODE) score ≥10, which was validated in GSE34608 dataset. The diagnostic potential of three biomarkers was assessed using Area Under Curve (AUC) of Receiver Operating Characteristic (ROC) plot. The transcriptional levels of these genes were also examined in a separate dataset GSE31348, to monitor the patterns of variation during tuberculosis treatment.

Results

A total of 62 common DEGs (57 upregulated, 7 downregulated genes) were identified in two discovery datasets. GO functions and pathway enrichment analysis shed light on the functional roles of these DEGs in immune response and type-II interferon signaling. The genes in Module-1 (n = 18) were linked to innate immune response, interferon-gamma signaling. The common genes (n = 8) were validated in GSE34608 dataset, that corroborates the results obtained from discovery sets. The gene expression levels demonstrated responsiveness to Mtb infection during anti-TB therapy in GSE31348 dataset. In GSE34608 dataset, the expression levels of three specific genes, GBP5, IFITM3, and EPSTI1, emerged as potential diagnostic makers. In combination, these genes scored remarkable diagnostic performance with 100% sensitivity and 89% specificity, resulting in an impressive Area Under Curve (AUC) of 0.958. However, GBP5 alone showed the highest AUC of 0.986 with 100% sensitivity and 89% specificity.

Conclusions

The study presents valuable insights into the critical gene network perturbed during tuberculosis. These genes are determinants for assessing the effectiveness of an anti-TB response and distinguishing between active TB and healthy individuals. GBP5, IFITM3 and EPSTI1 emerged as candidate core genes in TB and holds potential as novel molecular targets for the development of interventions in the treatment of TB.

Introduction

Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), is one of the lethal infectious diseases with a high mortality rate. Globally, TB affects more than 10 million individuals, leading to approximately 1.6 million annual fatalities [1]. Around one-fourth of the world’s total population carries latent tuberculosis infections (LTBI), which can potentially progress to active tuberculosis infections (ATBI) during their lifetime [2]. The risk of progression increases when the immune system is compromised due to concurrent conditions like HIV-coinfection, diabetes, and organ transplantation [2, 3]. Mtb is often resistant to one or more antituberculous drugs, determination of which is challenging in the resource-limited setting. Early identification of lack of response to therapy using novel biomarkers is an important goal of clinical tuberculosis care.

Current diagnostic methods such as chest X-ray, acid-fast bacilli (AFB) staining, solid/liquid cultures, and nucleic acid amplification tests such as GeneXpert MTB/RIF, are carried out in response to symptoms and are laboratory-based [4]. Other tests that assess the immune response to Mtb antigens such as tuberculin skin tests (TST), and interferon-gamma release assays (IGRAs) come with drawbacks such as time, expense, potential cross-reactivity, and limited sensitivity [5]. These tests cannot distinguish different clinical manifestations of tuberculosis. Therefore, rapid and effective diagnostics approaches are needed to identify novel biomarkers in various body fluids such as blood/serum, urine, sputum, and bronchoalveolar lavage (BAL) fluid, to differentiate among different categories of TB and specially to monitor early responses to treatment.

Samples obtained from TB patients, including blood/serum, sputum, saliva, and urine contain both host and pathogen biomarkers [6]. Biomarkers derived from pathogens in the blood/serum show exceptional specificity for diagnosing tuberculosis. Comparison to host-related proteins, pathogens-derived proteins, non-peptide antigens (lipids, carbohydrates), or peptides is present in lower quantities and often near analytical methods’s lower limit of detection (for example, <10 ppM). Because of low-abundance peptides, powerful techniques like mass spectrometry, specifically liquid chromatography coupled with tandem mass spectrometry LC-MS/MS, can be employed for the identification of potential diagnostic biomarkers, as demonstrated by Kruh-Garcia et al. [7]. Several ‘omics’ approaches, including genomics, transcriptomics, metabolomics, and proteomics, have advanced the use of high-throughput methods/instruments for rapid and accurate analysis of comprehensive expression changes during tuberculosis infection [8]. The progression of TB disease may be correlated with the proteomics patterns of proteins released by Mtb and those produced by the normal lung flora. High-throughput techniques, including Next-Generation Sequencing (NGS) and Microarray, are employed to conduct transcriptome profiling, aiming to pinpoint the genes that exhibit differential expression in human diseases like tuberculosis.

Microarray gene chips can generate large amounts of data which can be re-analyzed to identify the deregulated biosignatures and related functional pathways [9]. However, the results often exhibit inconsistency due to the constraints of conducting studies with a single population and the inherent variability in the samples. Therefore, expression profiling data from several different studies can be combined which can offer insights into the quest for discovery biomarkers.

In the present study, we reanalyzed two publicly accessible microarray datasets, namely GSE342826 and GSE42830. These datasets encompass a total of 27 individuals diagnosed with tuberculosis (TB) and 90 individuals serving as healthy controls, all of whom contributed whole blood samples. We employed various online tools and computation approaches to detect genes that exhibit deregulation (referred to as “deregulated genes” or DEGs). To further elucidate the functional implications of these DEGs in tuberculosis, we performed Gene Ontology (GO) pathway analysis and protein-protein interaction (PPI) network analysis. This comprehensive analysis allowed us to delineate the roles and associations of the key deregulated genes in tuberculosis. These findings have significant implications for the identification of potential biomarkers for diagnosing tuberculosis. These findings hold promise for their application in anti-TB diagnosis and assessing response therapy, potentially aiding in different phases of treatment.

Methods

Acquisition of gene expression microarray data from NCBI GEO

The NCBI Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo) is a freely accessible public repository that stores and shares high-throughput functional genomics data, including microarray and next-generation sequencing. In this study, we obtained four microarray datasets from the GEO database by searching for ’tuberculosis, control’. Considering previous studies on blood-based sample diagnostics for tuberculosis, we inferred that the sample sizes in all four datasets, comprising both tuberculosis and control samples, were suitable for our analysis. The R/Bioconductor package (ver. 4.3.0), GEOquery was used to extract the gene expression data from two discovery datasets GSE42826 and GSE42830 [10] and two validation datasets GSE34608 and GSE31348. GSE42826 consists of 52 control and 11 TB samples (GPL10558 platform; Illumina HumanHT-12 V4.0 expression bead chip). The GSE42830 dataset consists of 38 control and 16 TB samples (GPL10558 platform; Illumina HumanHT-12 V4.0 expression bead chip). GSE34608 and GSE31348 datasets were used for the validation of important genes. GSE34608 consists of 18 controls and 8 TB samples (GPL6480 platform; Agilent-014850 Whole Human Genome Microarray 4x44K G4112F) [11]. GSE31348 consists of 27 subject samples at five-time points (135 samples): diagnosis, treatment for 1, 2, 4, and 26 weeks (GPL570 platform; Affymetrix Human Genome U133 Plus 2.0 Array) (Cliff et al., 2013) [12]. All the data from the GEO database was accessed on 22 September 2023. The data has been taken from public database, so IRB or ethical committee or informed consent statements are not required.

Identification of deregulated genes (DEGs) between TB and control groups

The 4 datasets were downloaded as raw data matrix files with microarray platform annotations from NCBI. These data were analyzed using an online tool, MetaboAnalyst (ver. 5.0; https://www.metaboanalyst.ca/), heatmaps and Principal Component Analysis (PCA) plots were generated after selecting the normalization type as quantile normalization and data scaling as auto-scaling [13]. Gene expression data were analyzed with the inbuilt tool, GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) for identifying deregulated genes (DEGs) between TB and control groups [14]. The p-value adjustment of Benjamini & Hochberg (False discovery rate) was used. The data transformation was set to automatic mode, limma precision weights (vooma) were applied along with data normalization. The rows with missing corresponding gene symbol information were omitted from the analysis. The filtering criteria of log2 (fold change) ≥ ±1 and p-value < 0.01 were followed for the selection of statistically significant DEGs from both GSE42826 and GSE42830 datasets. The common up- or down-regulated DEGs from both datasets were represented by a Venn diagram.

Gene Ontology (GO) and pathway enrichment analyses of DEGs

Metascape (https://www.metascape.org/) is an online portal that provides comprehensive gene list annotations and biological information about genes and proteins [15]. The function of these DEGs was investigated using Gene Oncology (GO) of biological processes (BP), Reactome, and WikiPathways enrichment analysis using Metascape. The network analysis of enriched terms was performed where the nodes that share the same cluster ID and are typically close to each other are colored by cluster ID, and the nodes where terms containing more genes tend to have a more significant p-value are colored by p-value.

Protein-protein interaction (PPI) network and Module analysis

The PPI network was prepared using the online database, Search Tool for the Retrieval of Interacting Genes (STRING, ver. 12; https://www.string-db.org) [16]. The minimum interaction score required was set at medium confidence of 0.4. The PPI network was further visualized and analyzed with an open-source software platform, Cytoscape (ver. 3.10.1). The Molecular Complex Detection (MCODE) app embedded in the Cytoscape environment was used for clustering a given network based on topology to find highly interconnected regions as Modules. The GO functions (biological processes and molecular functions), pathways (WikiPathways, KEGG, Reactome), and protein domains (PFAM, INTERPRO) of Module-1 were analyzed using Metascape and STRING.

Validation of important genes and ROC analysis

The significant genes were selected based on two criteria of common top 8 proteins in the heatmap analysis by T-test/ANOVA in MetaboAnalyst and MCODE score ≥10. The expression levels of important genes were validated in independent datasets GSE34608 and GSE31348 and shown as boxplots with interquartile ranges. By using data mining and validation by clinical study approaches, the geometric mean of the three gene transcription levels is defined as the TB score [TBscore = ()] [17]. The TBscore and individual genes were tested for diagnostic potential by Receiver Operating Characteristic (ROC) curves using an online web tool, SRPLOT (https://www.bioinformatics.com.cn/).

Results

Identification of deregulated genes between TB and control groups

Both the gene expression data, GSE42826 and GSE42830 showed 31266 features individually. In GSE42826, there were 52 control samples and 11 samples from patients with TB, while GSE42830 had 38 control samples and 16 TB patients’ samples. Both datasets showed distinct separation between the control and TB sample groups, as indicated by the heatmaps and PCA plots (Fig 1).

thumbnail
Fig 1. Gene expression levels of TB and control groups using MetaboAnalyst.

(A) Heatmap and PCA plot of GSE42826 dataset; (B) Heatmap and PCA plot of GSE42830 dataset.

https://doi.org/10.1371/journal.pone.0305582.g001

Utilizing GEO2R analysis, we identified 113 differentially expressed genes (DEGs) in GSE42826, 102 genes upregulated, and 11 genes downregulated. In GSE42826, there were 324 DEGs, containing 138 upregulated and 186 downregulated genes (Fig 2A). A Venn diagram revealed that among the 62 common DEGs, 57 genes exhibited upregulation, while 5 genes were downregulated in both datasets (Fig 2A).

thumbnail
Fig 2. Selection and function of deregulated genes (DEGs).

(A) The Venn diagram illustrates DEGs identified in two datasets: GSE42826 and GSE42830. In total, 57 were found to be upregulated, while 5 genes were downregulated, and these were shared between both GSE datasets. (B) The functional annotation of DEGs was conducted using Metascape online tool, resulting in the representation of the top 16 terms in a bar plot based on their p-value (log10 scale). (C) The Network of enriched terms is color-coded by cluster ID, a configuration where nodes sharing the same cluster ID tend to be closely positioned to one another; (D) In this network, the terms are color-coded based on their p-value, with terms containing more associated genes having more significant p-value.

https://doi.org/10.1371/journal.pone.0305582.g002

Gene Ontology (GO) and pathway enrichment analyses of DEGs

Metascape was employed to conduct Gene Ontology (GO) function and pathway enrichment analyses on the identified DEGs. The results showed a notable enrichment of DEGs associated with various processes, including response to biotic stimulus, interferon gamma signaling, Type-II interferon signaling, immune response-regulating signaling pathway, complement and coagulation cascades (Fig 2B). Using Metascape, the subclass of representative terms from gene function analysis was transformed into a network arrangement (Fig 2C). Additionally, a tree-based hierarchical clustering method, based on Kappa-statistical similarities, was employed to group the important terms derived from the gene function analysis. Each term was represented by a circle node, and the size of the node depended on how many input genes fell under that term. Terms with a kappa score > 0.3 are linked by an edge. A network of enriched terms colored by cluster ID, where nodes with the same cluster ID are situated close to one another (Fig 2C). Furthermore, the network was colored by p-value, highlighting that terms encompassing more genes tended to have more significant p-values (Fig 2D).

Protein-protein interaction network and Module analysis

The PPI (Protein-Protein Interaction) network analysis was conducted on 62 common DEGs using STRING, followed by an in-depth examination using the MCODE app within the Cytoscape environment. The analysis yielded two modules, Module-1 consisting of 18 genes, representing a core set of functional genes, and Module 2 with 3 genes (S1 Table). Gene Ontology (GO) analysis of these 18 genes revealed their involvement in a range of functions and pathways, including interferon-gamma signaling, and immune response to tuberculosis (S1 Fig). The STRING network analysis of these 18 genes within Module 1 showed that differentially expressed genes are involved in known and predicted protein-protein interactions supported by a high confidence score > 0.4. The predicted associations among all genes are designated by seven different types of colors as fusion, neighborhood, co-occurrence, experimental, text mining, database, and co-expression evidences (S1 Fig). Examining the expression levels of these 18 genes in the GSE42826 dataset, it was evident that they were significantly upregulated in tuberculosis patients, as represented in the heatmap (S1 Fig). This consistency in gene expression levels was also observed in GSE42830, as indicated by heatmap analysis (S1 Fig). Moreover, the analysis of GO, pathways, and protein domains highlights that the GO functions were notably associated with biological processes such as defense response and innate immune response, as well as molecular functions like identical protein binding. Pathway analysis, including Wiki, KEGG, and Reactome pathways indicated that these 18 genes were involved in processes like Type-II interferon signaling, immune response to tuberculosis, NOD-like receptor signaling pathway, and interferon-gamma signaling. PFAM and INTERPRO protein domain analysis identified N-terminal and C-terminal domains in Guanylate-binding protein as significant features (Table 1).

thumbnail
Table 1. Functional analysis of 18 genes in Module-1 using STRING database.

https://doi.org/10.1371/journal.pone.0305582.t001

Statistically important gene analysis

Eight prominent genes, namely (EIF2AK2, GBP5, GBP2, IFIT2, IFITM3, EPSTI1, BATF2, and TAP1) were selected as crucial genes from both datasets. They were chosen based on specific criteria, and their significance was further confirmed through heatmap analysis employing T-test/ANOVA in MetaboAnalyst, as well as an MCODE score ≥10 (Fig 3A and 3B, S2 Table). The volcano plots provided a clear representation of these eight vital genes, demonstrating their upregulation in both the discovery datasets (Fig 3C and 3D). Notably, these important genes were found to be significantly associated with interferon-gamma signaling pathways and the innate immune response as retrieved by the UniProt database (S2 Table).

thumbnail
Fig 3. Selection of 8 important common deregulated genes (DEGs) in both datasets.

Heatmaps showing 8 important genes selected based on T-test/ANOVA and MCODE score ≥10 (A) GSE42826 dataset; (B) GSE42830 dataset. In addition, the volcano plot depicts the differential gene expression in both datasets, employing a cutoff criterion of log2 (fold change) ≥ 1 and p-value < 0.01 (C) Volcano plot for the GSE42826 dataset; (D) Volcano plot for the GSE42830 dataset.

https://doi.org/10.1371/journal.pone.0305582.g003

Validation and detection of gene expression during TB treatment

The validation of gene expression in the GSE34608 dataset revealed significant differences in the expression level of all 8 genes between the control and TB patient groups, with the exception TAP1 (Fig 4A). To explore how the expression level changes during the TB treatment regimen, we examined another validation dataset, GSE31348, which included a total of 135 samples from 27 TB patients at five different time points: diagnosis, treatment for 1, 2, 4, and 26 weeks. Heatmap analysis illustrated that the expression levels of most of the genes were down-regulated during TB treatment (S2 Fig). Among these 8 genes, GBP5, IFITM3, and EPSTI1 showed a significant decrease in expression level during the TB treatment (Fig 4B). This suggests that the three-gene panel could potentially serve as a valuable drug target for TB diagnosis.

thumbnail
Fig 4. Expression validation of 8 important genes in GSE34608 and GSE31348 datasets.

The boxplots show the expression level of GBP5, GBP2, EIF2AK2, IFITM3, IFIT2, EPSTI1, BATF2 and TAP1 genes during TB infection. (A) The boxplots depict the GSE34608 dataset; (B) The boxplots depict the GSE31348 dataset during TB treatment at five distinct time points. In each boxplot, the central horizontal line represents the median, while the ends represent the first quartile [Q1] and third quartile [Q3] defining the interquartile range (IQR). The ends of the central vertical line denote the minimum and maximum values. C = Control, TB = Tuberculosis, *p-value < 0.05, **p-value < 0.01, **p-value < 0.001, ****p-value < 0.0001, ns = non-significant results.

https://doi.org/10.1371/journal.pone.0305582.g004

Diagnostic performance of significant genes

From the pool of 8 important genes, we conducted a diagnostic assessment of three specific genes, GBP5, IFITM3, and EPSTI1, employing ROC(AUC) curve analysis. These genes showed substantial variations between the TB and control groups in the GSE34608 validation dataset, which corroborates with the discovery dataset. To ascertain the sensitivity and specificity of these three genes, we assessed the TBscore in an independent gene expression validation dataset, GSE34608. The results revealed that GBP5 had a relatively higher AUC (0.986) compared to the combination TBscore which had an AUC of 0.958. However, IFITM3 (AUC 0.798) and EPSTI1 (AUC 0.902) showed lower AUC values when compared to the combination. It is also worth noting that the ROC curves of every gene differed significantly from those of TBscore combination, with a p-value <0.05, as outlined by Venkatraman [18]. The combined TBscore met all criteria effectively, with an AUC of 0.958 (95% CI 0.89–1), achieving a sensitivity of 100% and specificity of 89%. On the other hand, GBP5 alone demonstrated a sensitivity of 100% and a specificity of 89%, with a higher AUC of 0.986 compared to the TBscore. Therefore, GBP5 possesses the potential to effectively distinguish TB patients from control individuals (Fig 5).

thumbnail
Fig 5. Performance of 3 genes by using ROC curve and the difference between the ROC curve of each gene and TBscore.

(*based on Youden’s index).

https://doi.org/10.1371/journal.pone.0305582.g005

Discussion

The 2022 Global Tuberculosis Report by the World Health Organization (WHO) indicates that 1.6 million individuals died from tuberculosis worldwide in 2021 [1]. The global challenge of tuberculosis is exacerbated by the expense and availability of effective diagnostic techniques and treatments. To address this issue, studying the immune system’s defense mechanisms combating Mtb can provide insights into the development of innovative diagnostic and therapeutic approaches. Recent studies have shed light on the role of the innate immune response system, which serves as the first defense and possesses the ability to recognize foreign pathogenic bacterial antigens [19].

In the current study, we studied two GEO datasets, GSE42826 and GSE42830, to pinpoint key genes that undergo deregulation during tuberculosis infection. Both datasets demonstrated a robust separation between TB-affected and control groups by the heatmap and PCA plot (Fig 1). In total, we identified 62 differentially expressed genes (DEGs) through a combined analysis of these two GEO datasets (Fig 2). By performing a protein-protein interaction network analysis using the STRING database and MCODE analysis, we identified two modules: Module-1 encompasses 18 genes, including PLSCR1, STAT1, TRIM22, SAMD9L, BATF2, GBP5, GBP1, IFIT3, PARP9, IFI35, TAP1, GBP2, IFIT2, EPSTI1, OASL, TNFSF10, IFITM3, EIF2AK2, while Module-2 consists of 3 genes: FCGR1A, FCGR1B, and SERPING1 (S1 Table). To delve deeper into the functionality of these 18 genes within Module-1, we conducted a comprehensive functional and pathway analysis using the STRING database. This analysis revealed their involvement in critical processes such as the innate immune response to tuberculosis, interferon-gamma signaling, and the NOD-like receptor signaling pathway (Fig 2 and Table 1). Subsequent data analysis, employing STRING, T-Test/ANOVA in heatmap analysis, and MCODE score cutoff of ≥10, enabled us to pinpoint 8 important differentially expressed genes; GBP5, GBP2, EIF2AK2, IFITM3, IFIT2, EPSTI1, BATF2, and TAP1 (Fig 3 and S2 Table). These genes play essential roles in biological functions and pathways associated with interferon-gamma signaling and immune response to pathogens (S2 Table). The identification and deregulation of these important genes demonstrates a significant correlation with TB infection, as well as with the impact of anti-TB therapy at various treatment stages (Fig 4).

The three candidate genes, namely GBP5, IFITM3, and EPSTI1, consistently showed significant deregulation in both the GSE34608 and GSE31348 datasets, warranting their evaluation for diagnostic efficiency through ROC curve analysis. These genes showed a gradual reduction in expression levels with effective treatment, making them potential biomarkers for monitoring treatment effectiveness. The ROC analysis was conducted using the GSE34608 dataset, revealing that the combination of the three genes, referred to as “TBscore”, achieved an impressive sensitivity and specificity of 100% and 89% respectively with an AUC of 0.958. However, when considered individually, IFITM3 and EPSTI1 showed lower AUC of 0.798, (sensitivity- 87.5% and specificity- 72.2%) and 0.902 (sensitivity- 100% and specificity- 72.2%) respectively. In contrast, GBP5 stood out with the highest AUC of 0.986 when evaluated on its own, along with sensitivity and specificity of 100% and 88.9% respectively. The diagnostic potency of GBP5 surpasses that of the combined three-gene set. Consequently, GBP5 emerges as a promising biomarker for clinical tuberculosis diagnosis and the monitoring of treatment response (Fig 5).

Our investigation unveiled the perturbation of crucial gene sets in response to Mtb infection. The gene expression profile in individuals with tuberculosis (TB) predominantly revolves around the activation of defense responses to bacterial and viral infections [20]. Numerous comparative studies have been undertaken to explore various biomarkers or biomarker combinations for the diagnosis of tuberculosis [2127]. The Type-II interferon signaling pathway is very well-explored which is crucial for both innate and adaptive immunity against viral, bacterial, and protozoan infections. Within this pathway, IFN-gamma (IFN-γ) is primarily secreted by various immune cells such as natural killer (NK) cells, macrophages, and T-cells, typically in response to IL-12 and IL-18 [28]. Guanylate binding proteins (GBPs), including GBP5, function as hydrolases and are induced by IFN. They play a vital role in regulating host innate immune responses, notably by eliciting the host apoptosis processes during pathogenic infection [29, 30]. Previous studies have reported elevated whole blood transcriptional levels of GBP5 in active TB [17, 27, 31]. The upregulation of the GBP5 protein in whole blood samples of active TB was initially reported by Yao et al. [32]. It is worth noting that one of the GBPs, GBP1, has a protective role during TB infection [33], and its expression levels showed a correlation with the GBP5, BATF2, and EPSTI1, aligning with our findings [34].

IFN-γ triggers the JAK-STAT pathway, leading to the activation of IRF1 and AIM2. IRF1 promotes the expression of IFNs and GBPs, such as GBP5, which permeabilize the Mtb membrane, releasing Mtb DNA and other components. AIM2 detects intracellular DNA, facilitating the release of pro-inflammatory cytokines IL-18 and IL-1β, which have been associated with protection against pulmonary tuberculosis (PTB) [35]. Mtb infection also triggers the NFκB signaling pathway, regulating the expression of IL-10 and other GBPs, and can activate ASC-1, leading to plasma membrane lysis for cytokine release and mediating cellular Pyroptosis [36]. Our study revealed a strong correlation between the upregulation of GBP5 and the high expression levels of innate immune response proteins like GBP2, EIF2AK2, IFITM3, and IFIT2. This finding further underscores the potential role of GBP5 in innate immunity processes, particularly in the activation of the AIM2 inflammasome during tuberculosis infection.

The IFITM family of proteins encodes three transmembrane proteins, specifically IFITM1, IFITM2, and IFITM3, which are widely recognized as interferon (IFN)-induced transmembrane (IFITM) genes. These genes are known to be stimulated by various proinflammatory cytokines including IL-6, IL-1β, IFNβ, TNF, especially within the context of TLR2/4 signaling pathways and in response to Mtb infection. These genes have been demonstrated to limit the intracellular growth of Mtb [37]. In Mtb-infected monocytes, IFITM3 is observed to co-localize with Mtb within maturing phagosome, contributing to the enhancement of endosomal acidification. The overexpression of IFITM3 has a significant inhibitory effect on Mtb growth [38]. Several biomarkers GBP1, GBP2, GBP5, STAT1, IFIT3, and IFITM3, identified and validated in this study are found to be crucial components of TB diagnostic panels in other research as highly valuable elements of their TB diagnostic markers [10, 25, 39]. In addition, an interesting study emphasized the significance of a 4-gene transcriptional signature, which includes GBP1, ID3, P2RY14, and IFITM3 [40]. Another study showed a distinct 3-gene transcriptional signature, consisting of GBP5, DUSP3, and KLF2 [17], serves as the basis for TB diagnostic tests.

EPSTI1 (epithelial stromal interaction 1) is an interferon (IFN) inducible gene [41] which has been found as a stromal fibroblast-induced gene in breast cancer and also highly upregulated in invasive breast carcinomas as compared to normal breast [42]. EPSTI1 is involved in the regulation of apoptotic pathways via physical interactions with the apoptotic initiator Caspase-8 and also with AKT1 and BCAR3 [43]. In the current study, the EPSTI1 gene was found to be upregulated which suggests that increased levels of EPSTI1 may increase Caspase-8 levels and potentially modulate apoptotic pathways through alternative mechanisms, such as by interacting with AKT1 or BCAR3 proteins.

One other important protein, BATF2, identified in our study, is a member of the activator protein-1 (AP-1) transcription factor family. It is induced by interferon (IFN) in mononuclear phagocytic cells and has been demonstrated to upregulate in response to innate immune stimulation, triggered by factors like lipopolysaccharide (LPS) or Mtb. BATF2 can bind with IRF1 (IFN regulatory factor 1), promoting the activation of downstream elements, some of which are also a part of the host immune response against Mtb [44, 45]. Although systemic IFN activity is well-established in TB [46], the high expression of BATF2 is likely a result of IFN responses rather than direct Mtb stimulation of circulating blood cells. A study also reported that BATF2 gene expression served as a unique blood transcript capable of accurately distinguishing individuals with active from those with latent tuberculosis (LTBI) and healthy individuals [47]. Another upregulated gene in TB patient samples is TAP1, which plays a role in peptide transport for antigen presentation. It forms a complex with MHC class-I molecules [48], influencing the activation of cytotoxic T-cells. Zak et al., reported GBP1 as a signature of disease risk wherein GBP1, STAT1, and TAP1 were found to have a protective role during TB infection and were associated with favorable clinical outcomes [33].

Conclusions

In summary, our comprehensive analysis of a gene network highlights perturbed gene biosignatures triggered by Mtb infection and their changes during TB treatment. GBP5, IFITM3, and EPSTI1 showed significant perturbations in response to TB infection, rendering them potential targets for molecular drug interventions in TB infection. GBP5 demonstrates a substantial diagnostic utility and holds promise as a prime biomarker candidate for the development of transcriptome-based TB prognostic or diagnostic assays.

Supporting information

S1 Fig. Functional and expression analysis of 18 genes of Module-1.

(A) Gene Ontology (GO) analysis uncovers the genes functionally related to interferon-gamma signaling during TB infection; (B) The network of these Module-1 genes was constructed using the STRING database, providing insights into their interactions and relationships (C) and (D) features Heatmap showing the expression of 18 genes within Module, derived from dataset GSE42826 and GSE42830 respectively. Each row in the Heatmaps corresponds to an individual gene.

https://doi.org/10.1371/journal.pone.0305582.s001

(PDF)

S2 Fig. Heatmap representing the expression validation of 8 important genes in GSE31348 dataset during TB treatment at 5 different time points.

https://doi.org/10.1371/journal.pone.0305582.s002

(PDF)

S1 Table. Two Modules were identified by the MCODE app in Cytoscape.

https://doi.org/10.1371/journal.pone.0305582.s003

(PDF)

S2 Table. The functions of the common top 8 proteins selected based on T-test/ ANOVA and MCODE score greater than equal to 10.

https://doi.org/10.1371/journal.pone.0305582.s004

(PDF)

References

  1. 1. Bagcchi S. WHO’s Global Tuberculosis Report 2022. Lancet Microbe. 2023;4: e20. pmid:36521512
  2. 2. Scordo JM, Aguillón-Durán GP, Ayala D, Quirino-Cerrillo AP, Rodríguez-Reyna E, Joya-Ayala M, et al. Interferon gamma release assays for detection of latent Mycobacterium tuberculosis in older Hispanic people. International Journal of Infectious Diseases. 2021;111: 85–91. pmid:34389503
  3. 3. Jenkins HE, Tolman AW, Yuen CM, Parr JB, Keshavjee S, Pérez-Vélez CM, et al. Incidence of multidrug-resistant tuberculosis disease in children: systematic review and global estimates. The Lancet. 2014;383: 1572–1579. pmid:24671080
  4. 4. Singer SN, Ndumnego OC, Kim RS, Ndung’u T, Anastos K, French A, et al. Plasma host protein biomarkers correlating with increasing Mycobacterium tuberculosis infection activity prior to tuberculosis diagnosis in people living with HIV. EBioMedicine. 2022;75: 103787. pmid:34968761
  5. 5. Herrera V, Perry S, Parsonnet J, Banaei N. Clinical Application and Limitations of Interferon- Release Assays for the Diagnosis of Latent Tuberculosis Infection. Clinical Infectious Diseases. 2011;52: 1031–1037. pmid:21460320
  6. 6. Mousavian Z, Källenius G, Sundling C. From simple to complex: Protein‐based biomarker discovery in tuberculosis. Eur J Immunol. 2023. pmid:37740950
  7. 7. Kruh-Garcia NA, Wolfe LM, Chaisson LH, Worodria WO, Nahid P, Schorey JS, et al. Detection of Mycobacterium tuberculosis Peptides in the Exosomes of Patients with Active and Latent M. tuberculosis Infection Using MRM-MS. PLoS One. 2014;9: e103811. pmid:25080351
  8. 8. Yong YK, Tan HY, Saeidi A, Wong WF, Vignesh R, Velu V, et al. Immune Biomarkers for Diagnosis and Treatment Monitoring of Tuberculosis: Current Developments and Future Prospects. Front Microbiol. 2019;10. pmid:31921004
  9. 9. Qin X-B, Zhang W-J, Zou L, Huang P-J, Sun B-J. Identification potential biomarkers in pulmonary tuberculosis and latent infection based on bioinformatics analysis. BMC Infect Dis. 2016;16: 500. pmid:27655333
  10. 10. Bloom CI, Graham CM, Berry MPR, Rozakeas F, Redford PS, Wang Y, et al. Transcriptional Blood Signatures Distinguish Pulmonary Tuberculosis, Pulmonary Sarcoidosis, Pneumonias and Lung Cancers. PLoS One. 2013;8: e70630. pmid:23940611
  11. 11. Maertzdorf J, Weiner J, Mollenkopf H-J, Network Tb, Bauer T, Prasse A, et al. Common patterns and disease-related signatures in tuberculosis and sarcoidosis. Proceedings of the National Academy of Sciences. 2012;109: 7853–7858. pmid:22547807
  12. 12. Cliff JM, Lee J-S, Constantinou N, Cho J-E, Clark TG, Ronacher K, et al. Distinct Phases of Blood Gene Expression Pattern Through Tuberculosis Treatment Reflect Modulation of the Humoral Immune Response. J Infect Dis. 2013;207: 18–29. pmid:22872737
  13. 13. Pang Z, Chong J, Zhou G, de Lima Morais DA, Chang L, Barrette M, et al. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Res. 2021;49: W388–W396. pmid:34019663
  14. 14. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2012;41: D991–D995. pmid:23193258
  15. 15. Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10: 1523. pmid:30944313
  16. 16. Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 2017;45: D362–D368. pmid:27924014
  17. 17. Sweeney TE, Braviak L, Tato CM, Khatri P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. Lancet Respir Med. 2016;4: 213–224. pmid:26907218
  18. 18. Venkatraman ES. A Permutation Test to Compare Receiver Operating Characteristic Curves. Biometrics. 2000;56: 1134–1138. pmid:11129471
  19. 19. Takeuchi O, Akira S. Pattern Recognition Receptors and Inflammation. Cell. 2010;140: 805–820. pmid:20303872
  20. 20. Shi C, Pamer EG. Monocyte recruitment during infection and inflammation. Nat Rev Immunol. 2011;11: 762–774. pmid:21984070
  21. 21. MacLean E, Broger T, Yerlikaya S, Fernandez-Carballo BL, Pai M, Denkinger CM. A systematic review of biomarkers to detect active tuberculosis. Nat Microbiol. 2019;4: 748–758. pmid:30804546
  22. 22. Gupta RK, Turner CT, Venturini C, Esmail H, Rangaka MX, Copas A, et al. Concise whole blood transcriptional signatures for incipient tuberculosis: a systematic review and patient-level pooled meta-analysis. Lancet Respir Med. 2020;8: 395–406. pmid:31958400
  23. 23. Warsinske H, Vashisht R, Khatri P. Host-response-based gene signatures for tuberculosis diagnosis: A systematic comparison of 16 signatures. PLoS Med. 2019;16: e1002786. pmid:31013272
  24. 24. Singhania A, Wilkinson RJ, Rodrigue M, Haldar P, O’Garra A. The value of transcriptomics in advancing knowledge of the immune response and diagnosis in tuberculosis. Nat Immunol. 2018;19: 1159–1168. pmid:30333612
  25. 25. Leong S, Zhao Y, Joseph NM, Hochberg NS, Sarkar S, Pleskunas J, et al. Existing blood transcriptional classifiers accurately discriminate active tuberculosis from latent infection in individuals from south India. Tuberculosis. 2018;109: 41–51. pmid:29559120
  26. 26. Leong S, Zhao Y, Ribeiro-Rodrigues R, Jones-López EC, Acuña-Villaorduña C, Rodrigues PM, et al. Cross-validation of existing signatures and derivation of a novel 29-gene transcriptomic signature predictive of progression to TB in a Brazilian cohort of household contacts of pulmonary TB. Tuberculosis. 2020;120: 101898. pmid:32090859
  27. 27. Turner CT, Gupta RK, Tsaliki E, Roe JK, Mondal P, Nyawo GR, et al. Blood transcriptional biomarkers for active pulmonary tuberculosis in a high-burden setting: a prospective, observational, diagnostic accuracy study. Lancet Respir Med. 2020;8: 407–419. pmid:32178775
  28. 28. Liu S-Y, Sanchez DJ, Aliyari R, Lu S, Cheng G. Systematic identification of type I and type II interferon-induced antiviral factors. Proceedings of the National Academy of Sciences. 2012;109: 4239–4244. pmid:22371602
  29. 29. Fisch D, Clough B, Domart M-C, Encheva V, Bando H, Snijders AP, et al. Human GBP1 Differentially Targets Salmonella and Toxoplasma to License Recognition of Microbial Ligands and Caspase-Mediated Death. Cell Rep. 2020;32: 108008. pmid:32783936
  30. 30. Santos JC, Broz P. Sensing of invading pathogens by GBPs: At the crossroads between cell-autonomous and innate immunity. J Leukoc Biol. 2018;104: 729–735. pmid:30020539
  31. 31. Francisco NM, Fang Y-M, Ding L, Feng S, Yang Y, Wu M, et al. Diagnostic accuracy of a selected signature gene set that discriminates active pulmonary tuberculosis and other pulmonary diseases. Journal of Infection. 2017;75: 499–510. pmid:28941629
  32. 32. Yao X, Liu W, Li X, Deng C, Li T, Zhong Z, et al. Whole blood GBP5 protein levels in patients with and without active tuberculosis. BMC Infect Dis. 2022;22: 328. pmid:35369870
  33. 33. Zak DE, Penn-Nicholson A, Scriba TJ, Thompson E, Suliman S, Amon LM, et al. A blood RNA signature for tuberculosis disease risk: a prospective cohort study. The Lancet. 2016;387: 2312–2322. pmid:27017310
  34. 34. Shi T, Huang L, Zhou Y, Tian J. Role of GBP1 in innate immunity and potential as a tuberculosis biomarker. Sci Rep. 2022;12: 11097. pmid:35773466
  35. 35. Figueira MB de A, de Lima DS, Boechat AL, Filho MG do N, Antunes IA, Matsuda J da S, et al. Single-Nucleotide Variants in the AIM2 –Absent in Melanoma 2 Gene (rs1103577) Associated With Protection for Tuberculosis. Front Immunol. 2021;12. pmid:33868225
  36. 36. Nisa A, Kipper FC, Panigrahy D, Tiwari S, Kupz A, Subbian S. Different modalities of host cell death and their impact on Mycobacterium tuberculosis infection. American Journal of Physiology-Cell Physiology. 2022;323: C1444–C1474. pmid:36189975
  37. 37. Drennan MB, Nicolle D, Quesniaux VJF, Jacobs M, Allie N, Mpagi J, et al. Toll-Like Receptor 2-Deficient Mice Succumb to Mycobacterium tuberculosis Infection. Am J Pathol. 2004;164: 49–57. pmid:14695318
  38. 38. Ranjbar S, Haridas V, Jasenosky LD, Falvo JV, Goldfeld AE. A Role for IFITM Proteins in Restriction of Mycobacterium tuberculosis Infection. Cell Rep. 2015;13: 874–883. pmid:26565900
  39. 39. Penn-Nicholson A, Mbandi SK, Thompson E, Mendelsohn SC, Suliman S, Chegou NN, et al. RISK6, a 6-gene transcriptomic signature of TB disease risk, diagnosis and treatment response. Sci Rep. 2020;10: 8629. pmid:32451443
  40. 40. Maertzdorf J, McEwen G, Weiner J, Tian S, Lader E, Schriek U, et al. Concise gene signature for point‐of‐care classification of tuberculosis. EMBO Mol Med. 2016;8: 86–95. pmid:26682570
  41. 41. Buess M, Nuyten DS, Hastie T, Nielsen T, Pesich R, Brown PO. Characterization of heterotypic interaction effects in vitro to deconvolute global gene expression profiles in cancer. Genome Biol. 2007;8: R191. pmid:17868458
  42. 42. Nielsen HL, Rønnov-Jessen L, Villadsen R, Petersen OW. Identification of EPSTI1, a Novel Gene Induced by Epithelial–Stromal Interaction in Human Breast Cancer. Genomics. 2002;79: 703–710. pmid:11991720
  43. 43. Capdevila-Busquets E, Badiola N, Arroyo R, Alcalde V, Soler-López M, Aloy P. Breast Cancer Genes PSMC3IP and EPSTI1 Play a Role in Apoptosis Regulation. PLoS One. 2015;10: e0115352. pmid:25590583
  44. 44. Murphy TL, Tussiwand R, Murphy KM. Specificity through cooperation: BATF–IRF interactions control immune-regulatory networks. Nat Rev Immunol. 2013;13: 499–509. pmid:23787991
  45. 45. Roy S, Guler R, Parihar SP, Schmeier S, Kaczkowski B, Nishimura H, et al. Batf2/Irf1 Induces Inflammatory Responses in Classically Activated Macrophages, Lipopolysaccharides, and Mycobacterial Infection. The Journal of Immunology. 2015;194: 6035–6044. pmid:25957166
  46. 46. Berry MPR, Graham CM, McNab FW, Xu Z, Bloch SAA, Oni T, et al. An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis. Nature. 2010;466: 973–977. pmid:20725040
  47. 47. Roe JK, Thomas N, Gil E, Best K, Tsaliki E, Morris-Jones S, et al. Blood transcriptomic diagnosis of pulmonary and extrapulmonary tuberculosis. JCI Insight. 2016;1. pmid:27734027
  48. 48. Eggensperger S, Tampé R. The transporter associated with antigen processing: a key player in adaptive immunity. Biol Chem. 2015;396: 1059–1072. pmid:25781678