A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

doi:10.1371/journal.pone.0144490

Table 1.

Integral of Copula Differences for different dimensions.

More »

Expand

Fig 1.

D₂ vs D₁ for left node.

(a) shows an example Pareto frontier (red circles) for the left child node for the first split of a specific tree (the D₁ and D₂ values are denoted by D_1L and D_2L respectively). (b) shows that the Pareto frontier can be approximated by two straight lines: one with slope greater than 1 and another with slope less than 1.

More »

Expand

Fig 2.

D₂ vs D₁ for right node.

(a) shows an example Pareto frontier (red circles) for the right child node for the first split of a specific tree (the D₁ and D₂ values are denoted by D_1R and D_2R respectively). (b) shows that the Pareto frontier can be approximated by two straight lines: one with slope greater than 1 and another with slope less than 1.

More »

Expand

Fig 3.

Scatter plot of α’s across the trees.

(a) and (b) are scatter-plots for the first split of all the trees for α > 1 and α < 1 respectively.

More »

Expand

Fig 4.

Two multivariate regression trees trained on the same input X and same output responses [Y₁,Y₂] but the node cost criteria being copula based (Tree{C, [Y₁,Y₂]}) and covariance based (Tree{V, [Y₁,Y₂]}) respectively.

The empty circles represent leaf nodes and the circles enclosing a number signifies a split node; the number inside the circle indicates the featured selected on that node for splitting.

More »

Expand

Fig 5.

CDF created from left and right child node for a single split using CMRF.

It is compared visually with the original CDF created from the training samples.

More »

Expand

Fig 6.

CDF created from left and right child node for a single split using VMRF.

It is compared visually with the original CDF created from the training samples.

More »

Expand

Table 2.

Variable importance measure calculated using CMRF_{Y₁, Y₂} and VMRF_{Y₁, Y₂}.

More »

Expand

Table 3.

5 fold CV results for GDSC Dataset drug sensitivity prediction for four drug sets in the form of correlation coefficients.

VMRF, CMRF represent Multivariate Random Forest using Covariance and Copula respectively. KBMTL represents Kernelized Bayesian multitask learning (Parameters considered are 200 iterations, α = β = 1 and subspace dimensionality = 20).

More »

Expand

Table 4.

5 fold CV results for GDSC Dataset drug sensitivity prediction for four drug sets in the form of MAE and NRMSE for RF, VMRF, CMRF and KBMTL approaches.

More »

Expand

Fig 7.

Scatter plots of predicted response vs original response for Erlotinib and Lapatinib (GDSC).

Here corr-coef stands for correlation coefficient between predicted response and output response.

More »

Expand

Table 5.

5 fold CV results for CCLE Dataset drug sensitivity prediction for four drug sets in the form of correlation coefficients for RF, VMRF, CMRF and KBMTL.

More »

Expand

Table 6.

5 fold CV results for CCLE Dataset drug sensitivity prediction for four drug sets in the form of MAE and NRMSE for RF, VMRF, CMRF and KBMTL.

More »

Expand

Fig 8.

Scatter plots of predicted response vs original response for Crizotinib and PHA-665752 (CCLE).

Here corr-coef stands for correlation coefficient between predicted response and output response.

More »

Expand

Fig 9.

Protein-protein interaction network observed between top regulators found from CMRF in GDSC dataset S_C1.

Disconnected nodes are hidden.

More »

Expand

Fig 10.

Protein-protein interaction network observed between top regulators found from VMRF in GDSC dataset S_C1.

Disconnected nodes are hidden.

More »

Expand