Fig 1.
Performance observation of the PASO model (A-C) comparison of PASO with classical machine learning models Random Forest and SVM.
For each machine learning model, the same features (619 pathway differential features and 256-dimensional drug SMILES digital encoding) were used as input. (D-E) Predictive performance of PASO under Cell-Blind and Drug-Blind conditions, respectively. (F) The bar chart shows the comparison of drug response prediction among different approaches. Approaches with the ‘Omics’ suffix indicate the use of three types of omics data (Gep, CNV, and Mut).
Table 1.
Comparison of drug response prediction across different approaches.
Fig 2.
Comprehensive analysis of lung cancer cell lines (A) Box plot depicting the LN IC50 (Z-score) values predicted for all drugs across each lung cancer cell line.
(B) Heatmap showing the predicted LN IC50 (Z-score) for each drug in each lung cancer cell line. (C) Ridge plot displaying the overall distribution of predicted LN IC50 (Z-scores) across the four lung cancer subtypes. (D) Waterfall chart showing the ranking of predictive performance for each drug. The red portion is used to differentiate drugs with high accuracy (ρ>0.7). The bar plot displays the top 10 drugs with highest accuracy. (E) PCA plot showing the clustering result of pathway-level attention weights based on different lung cancer subtypes. (F) PCA plot showing the clustering result of pathway-level attention weights for these lung cancer subtypes based on different drugs.
Fig 3.
Overall analysis of pathway attention weights (A) Waterfall chart showing the ranking of average attention scores across all pathways.
The red portion is used to differentiate the top 20% of pathways. The bar plot displays the 9 pathways most closely related to cancer development and progression among the top 20%. (B) Pathway attention scores are divided into five groups by proportion. The red box plot shows the distribution of pathway attention scores in each group, and the blue triangles indicate the proportion of pathways closely related to cancer development and progression. (C) Hierarchical clustering result of pathway attention scores, with many pathways of the same or similar cancer types clustered together. (D) Waterfall chart showing the distribution of the ranking of the targeted pathway of the drug Erlotinib among all pathway scores.
Fig 4.
Drug efficacy prediction analysis and drug molecular weight analysis for targeted drugs (A-C) Bar plots showing the predicted ln IC50 values compared to observed values of Erlotinib in four LUAD cell lines; line plots showing the distribution of drug molecular attention weights of Erlotinib in PC14 and HCC827 cell lines; molecular weight visualization analysis for Erlotinib highlights molecular structures with attention scores > 0.01, using red to indicate molecular structures commonly attended to by cell lines, and green to indicate molecular structures individually attended to by each cell line.
(D-F) Bar plots showing the predicted ln IC50 values compared to observed values of Refametinib in four LUAD cell lines; line plots showing the distribution of drug molecular attention weights of Refametinib in IGR37 and SKMEL1 cell lines; molecular weight visualization analysis for Refametinib highlights molecular structures with attention scores > 0.01, using red to indicate molecular structures commonly attended to by cell lines, and green to indicate molecular structures individually attended to by each cell line.
Fig 5.
Drug efficacy analysis (A-C) Bar plots showing the predicted results for the top ten most sensitive cell lines to the three drugs PD0325901, Dactolisib, and Pevonedistat, respectively.
(D-F) Bar plots showing the predicted results for the top ten most effective drugs in the three cell lines LAMA84, SW948, and WSUDLCL2, respectively.
Fig 6.
Visualization analysis of drug efficacy (A) Displays the distribution of relative sensitivity of the COLO800 cell line to various drugs.
Each data point corresponds to the response of the COLO800 cell line to a specific drug, evaluated and classified based on the Z-score of ln IC50. Drugs with a Z-score less than -2 are defined as sensitive drugs for this cell line and marked in green, while those with a Z-score greater than 2 are considered drugs to which the cell line exhibits resistance and marked in orange. (B) Shows the distribution of relative sensitivity of different cell lines to the drug AZD5991, where each data point corresponds to the result of the reaction between AZD5991 and a specific cell line.
Table 2.
Resistant & sensitive drug info on COLO800.
Table 3.
Resistant & sensitive cell line Info on AZD5991.
Fig 7.
Evaluation of PASO-TCGA-Classifier model efficiency (A) Precision-recall curve representing the performance of the PASO-TCGA-Classifier model on the TCGA test dataset.
(B) Boxplot displays the distribution of predicted drug response probabilities for different clinical drug responses on the TCGA test dataset. (C) Survival analysis of the TCGA test dataset across multiple cancer types. Patients were classified into two groups based on the median value of the predicted probability. Kaplan-Meier analysis was performed, and the log-rank test yielded a p-value of 1.59e-9. (D-E) Survival analysis of the TCGA test dataset for BRCA and BLCA. (F) Bar chart displays the predicted responses of Cisplatin and Carboplatin across different cancer types on the TCGA test dataset.
Table 4.
Gene mutation contingency table.
Table 5.
Copy number contingency table.
Fig 8.
Workflow diagram (A) Data Selection: We acquired diverse datasets for training PASO from multiple databases.
(B) Data Preprocessing: Statistical methods were employed to calculate the differences of various omics data within and outside biological pathways. These pathway-based difference values were utilized as cell line features. Additionally, pytoda was used to process SMILES chemical structure information. (C) Model Architecture: The model is presented in three sections, from top to bottom: The upper section illustrates the SMILES Encoding Network. The middle section depicts the overall model workflow. The lower section details the internal network structure of SMAN. (D) Evaluation of Predictions and Attention Weight Analysis: We evaluated the model using three distinct data partitioning strategies. Subsequently, we conducted drug efficacy analysis and attention weight analysis on the predicted results.
Fig 9.
Attention network (A) The SMILES attention layer receives drug features re-encoded by the SMILES Encoding Network and preprocessed omics features, responsible for computing the complex interactions between SMILES features and specific omics features, outputting SMILES features fused with interactions with the specific omics features.
(B) The Omic Attention Layer receives drug features re-encoded by the SMILES Encoding Network and preprocessed omics features, responsible for computing the complex interactions between specific omics features and SMILES features, outputting omics features fused with interactions with SMILES features.