Fig 1.
Boxplots and normal Q-Q plots of the activity areas in the CCLE dataset.
Panel (A) shows the boxplots of activity areas for 24 drugs. Panel (B) shows the normal Q-Q plots of activity area for two example drugs Lapatinib and Paclitaxel.
Fig 2.
Workflow of the three-step quantile regression forest method.
All features were screened by their Pearson correlations with drug response. Then a random forest was trained to rank selected features by their importance. The variables with the importance of twice standard deviation greater than the mean of importance were selected for the final quantile regression forest.
Fig 3.
Prediction performance of quantile regression forests for CCLE data set.
(A) Bar chart of Pearson correlation coefficients of drug responses and predicted values by QRFs, ENR, ISIS, and CRF-20000. QRFs (mean): (conditional) mean prediction of drug response given genomic features using QRFs; QRFs (median): median prediction of drug response using QRFs. (B) Scatter plots of observed and predicted drug responses (activity area) for four drugs in CCLE using QRFs.
Table 1.
The Pearson correlation coefficients of observed and predicted drug responses (activity area) by QRFs.
Table 2.
Information of the 95% and 80% prediction intervals of drug responses for 24 drugs.
Fig 4.
The 95% prediction intervals and mean predictions by quantile regression forests.
Red triangular indicates the point (or mean) prediction of drug response, two red dots indicates the upper and lower boundaries of 95% prediction interval. (A) and (B) show the comparisons of 24 drugs for cell lines “CAPAN2” and “C2BBE1”, respectively. (C) and (D) are the comparisons of four different cell lines to drugs “Irinotecan” and “Topotecan”, respectively.
Fig 5.
Variable importance and word clouds of functional annotations for the genes used by QRFs.
Panels (A) and (B) are the bar charts of variable importance for drugs 17-AAG and AZD6244. Word clouds of functional annotations of the genes for 24 drugs are in panel (C) (all genes) and panel (D) (ensemble of top 30 genes of each drug), where font size of each annotation indicates its enrichment score.