Fig 1.
Overview of toxicity biomarker discovery.
First, we collected toxicogenomic meta-data from public resources, preprocessed gene expression array data, and assigned toxicity classes. Second, we attempted to identify differentially expressed genes (DEGs) through meta-analysis and subsequent multistage feature reductions. DEGs were subjected to systems analysis of biological pathways and networks, and an optimized set of biomarkers was used to generate and validate a prediction model. The final step involved computationally and experimentally testing the applicability of the discovered biomarkers in human cells. GEO, Gene Expression Omnibus at the National Center for Biotechnology Information; ArrayExpress, ArrayExpress at the European Bioinformatics Institute; TG-GATEs, Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System of the National Institute of Health Sciences of Japan; CEBS, Chemical Effects in Biological Systems at the National Institute of Environmental Health Sciences; sPLS-DA, sparse partial least squares discriminant analysis.
Table 1.
Number of selected genes after meta-analysis, sPLS-DA, and wrapper approaches for all five meta-analysis comparisons.
Fig 2.
Characterization of pharmacogenomics meta-data.
(A) Distribution of toxicity levels for 391 compounds from single-organ studies. Compounds were rank-ordered by relative toxicity level. (B) Distribution of toxicity levels for 62 compounds from multi-organ studies. Asterisks indicate compounds showing organ-specific toxicity. (C) Distribution of toxicity levels for two selected drugs at different doses and treatment durations. The same color scale is used in all panels. Missing information is shown in grey. For each row, the sum of samples with level-0 and level-1 toxicity per each organ is 100%. See S5 Table for the exact values used to generate this figure.
Fig 3.
Meta-analysis identifies candidate biomarkers of organ toxicity.
(A) Schematic flow chart of the meta-analysis. Untreated, untreated samples (pathology score < 0.5); level-0, innocuous treatment; level-1, toxic treatment. (B) Venn diagram for the overlap of DEGs identified from the four drug-related meta-analysis comparisons in (A). Numbers indicate gene counts. (C-I) Forest plots display the study-specific meta-analysis effect-sizes and 95% confidence intervals for the studies included in the training dataset. Plots for the seven DEGs from the MA1, MA3, and MA4 datasets with the greatest absolute average effect-size (> 0.55; See S8 Table) are shown. Plots for the remaining 11 DEGs are shown in S3 Fig. Nrep in untreated versus treated specimens; Spp1 (secreted phosphoprotein 1), Ctss (cathepsin S), Tubb5 (tubulin β5), and Trpm4 (transient receptor potential cation channel, subfamily M, member 4) in level-0 versus level-1 kidney specimens; and Ctsd (cathepsin D) and Tpm4 (tropomyosin 4) in level-0 versus level-1 liver specimens. The sizes of the circles are proportional to the fold-change (log2 ratio). The summarized effect-size (mean fold-change) of all enrolled studies is shown as a black circle at the bottom of the plot. p-value, Z-test for the overall effect of the summarized meta-analysis results for each gene.
Table 2.
Performance of prediction models using the 21 identified differentially expressed genes.
Fig 4.
(A) Enriched GO terms associated with DEGs from three meta-analysis comparisons. DEGs from MA5 were excluded from the analysis owing to insufficient dataset size. p-value: modified Fisher’s exact test implemented in the Database for Annotation, Visualization and Integrated Discovery (DAVID). (B, C) Highly interconnected subnetworks present within the individual sets of DEGs from MA3 and MA4. A circular node indicates proteins, a diamond node indicates proteins/genes, and solid lines and dashed arrows respectively indicate physical and genetic interactions reported in our input databases (see Methods for details). Node color indicates the median expression fold-change of the training dataset (level-1/level-0).
Fig 5.
Computational and experimental validations identify NREP and CTSD as biomarkers of toxicity in human cell lines.
(A) Density plots comparing expression levels of NREP between untreated and treated samples of liver primary hepatocytes reported in TG-GATEs. (B) Boxplots display fold-changes in CTSD (toxic/innocuous) at each of the given cell viability thresholds measured for the liver primary hepatocytes reported in TG-GATEs. * t-test p-value < 0.05, ** < 0.001. (C, D) Dose-responsive viability of HEK293 (C) and HepG2 (D) cells exposed to cisplatin (C) or acetaminophen (D). DMSO was used to dissolve the compounds. Cell viability was measured by MTS assay. Error bars represent ± standard deviation of triplicate experiments. See S4A and S4B Fig for the results with the same compounds dissolved in growth media. (E, F) NREP mRNA levels after exposure to the indicated concentrations of cisplatin for 72 h and acetaminophen for 48 h, respectively, determined by RT-PCR. (G-H) qRT-PCR assays for NREP (G) and CTSD (H). Y-axis indicates fold-changes in expression compared to chemically untreated samples (n = 5). Level-0 and level-1 drug concentrations for DMSO and DMEM were selected based on cell viability of > or < 60%, respectively, in C-D and S4A and S4B Fig. *p < 0.05, ** p < 0.001; Student’s t-test. Error bars represent ± standard deviation.
Table 3.
List of the 21 candidate biomarkers of drug-induced toxicity.