Fig 1.
Clinical parameters of patients included in microarray dataset.
(A) percentage of cohort by sex (B) percentage of cohort by ethnicity (ISC; Indian sub-continent) and (C) age (mean; min-max bars). Whole blood composition; (D) total leukocytes (E) Granulocytes (F) lymphocytes and (G) monocytes. Statistical tests: Kruskal Wallis with Dunn’s multiple testing correction.
Fig 2.
Testing the meta-signature and the most consistently identified genes in a new dataset.
(A) The 380 gene meta-signature were mapped to 687 Illumina transcripts, of which 113 transcripts were excluded as they were not significantly detected from background in 10% of samples. Hierarchical clustering (Pearson uncentred (cosine) with averaged linkage) on individuals and transcripts broadly cluster healthy controls from pulmonary and extra-pulmonary TB patients. (B) Receiver operator curves for extra-pulmonary and pulmonary cohorts against healthy controls using MDTH derived from the most consistently identified genes (15 genes identified in at least 15 of the meta-analysis datasets [18]–representing the most robustly identified genes in that analysis, healthy controls used as control group for MDTH) as potential diagnostic biomarker. Both prediction results were validated using k-fold cross validation with k equal to 10 with 1,000 iterations. The mean AUC for the EPTB and PTB validation results are 0.865 (95% confidence interval: 0.857–0.872) and 0.977 (95% confidence interval: 0.974–0.981) respectively.
Fig 3.
Molecular distance to health is linked to symptoms status and site of disease.
(A) Molecular distance to health (MDTH) calculated for each individual (from 3409 transcripts which represent the transcripts of the 38 annotated modules shown in Fig 2A, healthy controls used as the control group). Individuals grouped by disease status, (median value, error bars SD) statistical testing Kruskal Wallis with Dunn’s multiple testing correction. (B) Patients grouped by number of reported symptoms; night sweats, fever, weight loss, chest pain or cough (median value, error bars SD) statistical testing Kruskal Wallis with Dunn’s multiple testing correction. (C) Patients grouped by site of disease and number of reported symptoms (as previous). (D) Mean MDTH (error bars, Standard error of the mean) plotted for each site of disease against % of the cohort suffering from one or more symptom (from list of symptoms described previously). Blue line represents Pearson’s correlation (R2 0.95, p = 0.0090).
Fig 4.
Symptom status linked to alteration in modular profile and blood counts.
(A) Individual modular analysis for all TB patients grouped by symptom status and site of disease, boxed are modules annotated as interferon or inflammation. 38 annotated modules are displayed as a heatmap with red indicates significant over-abundance of transcripts and a blue indicating significant under-abundance (p <0.05). The colour intensity represents the percentage of genes in that module that are significantly differentially expressed. (B) MDTH calculated for all interferon module transcripts and all inflammatory module transcripts (healthy controls used as control group). TB patients grouped according to symptom status (median value, error bars SD) statistical testing Kruskal Wallis with Dunn’s multiple testing correction. (C) TB patients grouped according to symptom status with lymphocyte, granulocyte and monocytes counts shown (median value, error bars SD) statistical testing Mann-Whitney U.
Fig 5.
Tuberculosis and sarcoidosis have similar differentially regulated genes.
(A) Differentially expressed genes identified from new dataset for TB and sarcoidosis groups compared to healthy controls (transcripts filtered which were not significantly detected from background in at least 10% of samples, low expressed transcripts filtered (less than 2 fold change from median in 10% of samples), followed by statistical testing (independent t-test with Benjamini Hochberg multiple testing correction (q-value <0.05) between groups of interest). Transcripts were matched to Entrez gene identifiers and duplicates and non-matched transcripts filtered. Venn diagram showing the overlap of DEGs between these two diseases and the most significant 15 gene list from the meta-analysis. (B) Heatmap of normalised expression using the Bloom et al 144 transcript list are able to broadly differentiate pulmonary TB patients from sarcoidosis patients. Clustering (Pearson’s uncentred (Cosine) with averaged linkage) on transcripts (rows) and individual patient blood samples (columns). (C) The same 144 transcript list (16) is unable to differentiate between mediastinal TB patients and sarcoidosis patients by clustering (as before) (D) Molecular scores calculated for transcripts from 380 gene meta-signature, MDTH and transcripts representing the interferon modules (healthy controls acting as control group). Z scores calculated (with healthy controls used as reference group for calculating mean and SD) and then TB patients and sarcoidosis patients ranked according to Z score of the 380 gene meta-signature. For all three outcomes the mean of pulmonary TB patients was significantly higher than both the extra-pulmonary TB and sarcoidosis patients (p<0.05). Differences among means were tested with a generalized linear model assuming a normal distribution and a Bonferroni multiple testing correction. No symptoms indicates absence of any of the five symptoms listed previously.
Table 1.
Top 10 differentially upregulated genes in TB and sarcoidosisa.