Fig 1.
PRISMA flow chart of the microarray meta-analysis.
Selection of eligible GEO datasets for systems biology analysis according to PRISMA 2019 flow diagram.
Table 1.
Clinical and demographic characteristics of the study participants included in each dataset evaluated.
Fig 2.
Differentially expressed genes associated with cigarette smoking.
We analyzed publicly available data of 2 datasets of small airways transcriptome (RNAseq). (A) A principal component analysis (PCA) model of 13,516 genes was used to distinguish smokers from nonsmokers. (B) Volcano plot of all genes (smoker vs. nonsmokers). (C) 22 differentially expressed genes (DEGs), defined as p<0.05 after 1%FDR and 1.0-fold change expression, were found and together were able to discriminate the clinical conditions.
Table 2.
Detailed information obtained from the ROC curve analysis used in the study.
Fig 3.
Gene pathway analysis in smokers and nonsmokers.
(A) Co-expressed modules of all genes. Circle sizes are proportional to the normalized enrichment scores (NES). (B) The modules were annotated using Keg package for R. Dashed lines represent significance threshold. (C) Hierarchical cluster analysis (Ward’s method) using the NES scores for each annotated module and calculated for each person was employed test discrimination between smokers and nonsmokers.
Fig 4.
Defining the molecular signatures of smoking.
(A) Data on the 22 DEGs found in our discovery analyses were used to validate discrimination between smokers and nonsmokers in 3 different previously published datasets. (B) Machine-learning decision trees were built for each dataset to describe the most relevant genes driving discrimination. Of note, the gene AKR1B10 was found to be the main discriminator in 3 out of the 4 datasets examined. (C) Scatter plots of the AKR1B10 gene expression in the 4 datasets. (D) Venn diagram of the DEGs in each dataset shows AKR1B10 in the intersection of 3 datasets extracted from lung tissue specimens but not included among DEGs from alveolar macrophages. *p<0.05 (Student’s t-test).
Fig 5.
In nonsmokers, higher AKR1B10 expression is detected in lung cancer.
(A) We analyzed AKR1B10 gene expression values in a published dataset of neoplastic lung tissue microarray in nonsmoking individuals who were diagnosed with lung cancer and compared to ipsilateral healthy lung tissue specimens (controls.) Scatter plots of AKR1B10 gene expression in the groups. *p<0.05 (Student’s t-test). (B) Receiver Operator Characteristics (ROC) indicated a high accuracy to discriminate cancer tissue from controls.