Fig 1.
(a) Dataset split and usage. The number in each cell represents the number of subjects. The training set is equally split into 5 folds for deep learning model optimization (cross-validation for tuning the hyperparameters and architecture search in a deep learning model). The validation set is used to select the optimal model and the testing set is held out for performance evaluation. (b) Model overview. Our model consists of a Feature Selection Layer (FSL), an Isoform Map Layer (IML) (if the input feature is exon) and standard fully connected layers. FSL associates each input feature with a non-negative learnable weight, which represents the importance of features with respect to smoking status. IML encodes exon to isoform relationships via a binary matrix R, such that if exon i is contained within isoform j, we set Rij = 1, otherwise Rij = 0. By (element-wise) multiplying Rij with corresponding learnable weights W, we only consider canonical exon to isoform relationships.
Table 1.
Characteristics of subjects.
Fig 2.
ROC curves in test data for the 4-gene modified Beineke model using gene (black), isoform (blue), and exon-level (red) quantifications.
Isoform and exon-level data outperform gene-level data (Delong p = 0.002 and <0.001, respectively).
Table 2.
Predictive performance of modified Beineke models using gene, isoform and exon-level expression data.
Fig 3.
Cross-validation accuracy calculated during model optimization for exon-level data.
Table 3.
Predictive performance of various models using exon-level data, including elastic net for comparison.
Fig 4.
ROC curves in test data for the deep learning base exon model (black) and the model including the isoform map layer and feature selection layer (red) which has significantly better performance (Delong test p = 0.02).
Fig 5.
ROC curves in test data for the serum cotinine (black) and the exon model including the (Exon-to-)isoform map layer and feature selection layer (red) which has significantly better performance (Delong test p = 0.01).
Table 4.
Top 10 enriched GO pathways.