Genomic Models of Short-Term Exposure Accurately Predict Long-Term Chemical Carcinogenicity and Identify Putative Mechanisms of Action
a) Classification performance (AUC, averaged over 100 iterations of random resampling) of a random forest classifier as a function of the number of gene sets used as predictors. 150 gene sets are needed to reach maximum AUC, while 50 are sufficient to get 99% of the expected maximum AUC. b) Heatmaps of the top 50 pathways as ranked by their variable importance derived from a random forest classifier of hepato-carcinogenicity. Rows correspond to pathways, clustered into biological processes; columns correspond to chemical compounds. The left and right heatmaps show all non-carcinogenic and carcinogenic compounds, respectively. Only profiles corresponding to maximum duration and dose treatments, with replicates averaged, are displayed. A detailed version of the right heatmap with all pathways and compounds labeled is available in Figure S11. c) Details of the biological processes associated with the clustering, showing the single differentially regulated pathways and their variable importance ranking, as well as the driving genes.