Improved prediction of smoking status via isoform-aware RNA-seq deep learning models

doi:10.1371/journal.pcbi.1009433

Improved prediction of smoking status via isoform-aware RNA-seq deep learning models

Fig 1

Visual abstract.

(a) Dataset split and usage. The number in each cell represents the number of subjects. The training set is equally split into 5 folds for deep learning model optimization (cross-validation for tuning the hyperparameters and architecture search in a deep learning model). The validation set is used to select the optimal model and the testing set is held out for performance evaluation. (b) Model overview. Our model consists of a Feature Selection Layer (FSL), an Isoform Map Layer (IML) (if the input feature is exon) and standard fully connected layers. FSL associates each input feature with a non-negative learnable weight, which represents the importance of features with respect to smoking status. IML encodes exon to isoform relationships via a binary matrix R, such that if exon i is contained within isoform j, we set R_ij = 1, otherwise R_ij = 0. By (element-wise) multiplying R_ij with corresponding learnable weights W, we only consider canonical exon to isoform relationships.

doi: https://doi.org/10.1371/journal.pcbi.1009433.g001