Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Meta-Analysis of Large-Scale Toxicogenomic Data Finds Neuronal Regeneration Related Protein and Cathepsin D to Be Novel Biomarkers of Drug-Induced Toxicity

  • Hyosil Kim,

    Affiliations Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea, Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Korea

  • Ju-Hwa Kim,

    Affiliation Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Korea

  • So Youn Kim,

    Affiliation Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea

  • Deokyeon Jo,

    Affiliation Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea

  • Ho Jun Park,

    Affiliation Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea

  • Jihyun Kim,

    Affiliation Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea

  • Sungwon Jung , (HSK); (SJ)

    Affiliation Department of Genome Medicine and Science, School of Medicine, Gachon University, Incheon, Korea

  • Hyun Seok Kim , (HSK); (SJ)

    Affiliations Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Korea, Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea

  • KiYoung Lee †

    † Deceased.

    Affiliation Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea

Meta-Analysis of Large-Scale Toxicogenomic Data Finds Neuronal Regeneration Related Protein and Cathepsin D to Be Novel Biomarkers of Drug-Induced Toxicity

  • Hyosil Kim, 
  • Ju-Hwa Kim, 
  • So Youn Kim, 
  • Deokyeon Jo, 
  • Ho Jun Park, 
  • Jihyun Kim, 
  • Sungwon Jung, 
  • Hyun Seok Kim, 
  • KiYoung Lee


15 Sep 2016: Kim H, Kim JH, Kim SY, Jo D, Park HJ, et al. (2016) Correction: Meta-Analysis of Large-Scale Toxicogenomic Data Finds Neuronal Regeneration Related Protein and Cathepsin D to Be Novel Biomarkers of Drug-Induced Toxicity. PLOS ONE 11(9): e0163403. doi: 10.1371/journal.pone.0163403 View correction


Undesirable toxicity is one of the main reasons for withdrawing drugs from the market or eliminating them as candidates in clinical trials. Although numerous studies have attempted to identify biomarkers capable of predicting pharmacotoxicity, few have attempted to discover robust biomarkers that are coherent across various species and experimental settings. To identify such biomarkers, we conducted meta-analyses of massive gene expression profiles for 6,567 in vivo rat samples and 453 compounds. After applying rigorous feature reduction procedures, our analyses identified 18 genes to be related with toxicity upon comparisons of untreated versus treated and innocuous versus toxic specimens of kidney, liver and heart tissue. We then independently validated these genes in human cell lines. In doing so, we found several of these genes to be coherently regulated in both in vivo rat specimens and in human cell lines. Specifically, mRNA expression of neuronal regeneration-related protein was robustly down-regulated in both liver and kidney cells, while mRNA expression of cathepsin D was commonly up-regulated in liver cells after exposure to toxic concentrations of chemical compounds. Use of these novel toxicity biomarkers may enhance the efficiency of screening for safe lead compounds in early-phase drug development prior to animal testing.


In the early phases of drug development, efficient methods of assessing the safety of new drugs are needed. Current toxicity assessments typically require excessive animal sacrifice, large quantities of the drug compound, and long-term testing [1]. Generally involving the observation of drug responses in animals and the extrapolation thereof to humans, these assessment methods can be expensive, time consuming, and low in throughput [2]. Accordingly, demand for in vitro methods capable of predicting compound toxicity in humans is growing. Recently, molecular biomarker-based methods of predicting toxicity have gained traction for their potentially greater speed and accuracy compared to conventional methods [3].

In toxicogenomics, researchers seek to identify reliable molecular markers whose expression is tightly coupled to the development of specific target organ/systemic toxicity [3]. Historically, the rat has been the preferred model system for identifying organ-specific markers that respond to a wide variety of clinical compounds [4]. For example, in rats, 19 genetic biomarkers, including Kim1 (kidney injury molecule-1) and Spp1 (secreted phosphoprotein 1), and 35 genes, including Grik4 (glutamate receptor, ionotropic kainite 4) and Hspb7 (heat shock 27kDa protein family, member 7), have been identified as markers for kidney toxicity [5,6], while a 200-gene signature has been discovered for liver toxicity [7]. Alternatively, in vitro model systems utilizing human cell lines have also identified EGR1 (early growth response 1), ATF3 (activating transcription factor 3), GDF15 (growth differentiation factor 15), and FGF21 (fibroblast growth factor 21) to be biomarkers of drug toxicity, based on responses to 158 clinical compounds [8].

Nevertheless, although previous toxicogenomic studies have identified many candidate genes, their results are typically limited to specific compounds and the context in which they were derived (i.e., species and experimental setting) [9]. This raises major challenges in discerning the coherence between in vitro and in vivo settings, the appropriateness of the use of animal-derived markers in humans, and the robustness of a biomarker to different types of chemical perturbations. To address these challenges, we conducted a meta-analysis of publicly available toxico-transcriptomic datasets, followed by stepwise feature-selection procedures, to identify molecular biomarkers that respond robustly to a broad range of drugs in both in vivo rat specimens and in vitro human cell lines. Importantly, using computational and experimental cross-validation, we identified two novel toxicity molecular biomarkers, neuronal regeneration related protein (NREP) and cathepsin D (CTSD), as holding distinct prediction capabilities. As these marker proteins exhibit the ability to forecast in vivo toxicity and are easily detected in human cell line models, we believe that they can be important additions to existing toxicity assessment, with the potential to filter or prioritize drug candidates in the early stages of drug development.

Materials and Methods


Our data mining and experimental workflow consisted of three main stages: 1) data preparation, 2) computational identification of biomarker candidates and generation/validation of the prediction model, and 3) human cell line-based evaluation of biomarker candidates. The workflow is presented in Fig 1 and is explained in detail below.

Fig 1. Overview of toxicity biomarker discovery.

First, we collected toxicogenomic meta-data from public resources, preprocessed gene expression array data, and assigned toxicity classes. Second, we attempted to identify differentially expressed genes (DEGs) through meta-analysis and subsequent multistage feature reductions. DEGs were subjected to systems analysis of biological pathways and networks, and an optimized set of biomarkers was used to generate and validate a prediction model. The final step involved computationally and experimentally testing the applicability of the discovered biomarkers in human cells. GEO, Gene Expression Omnibus at the National Center for Biotechnology Information; ArrayExpress, ArrayExpress at the European Bioinformatics Institute; TG-GATEs, Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System of the National Institute of Health Sciences of Japan; CEBS, Chemical Effects in Biological Systems at the National Institute of Environmental Health Sciences; sPLS-DA, sparse partial least squares discriminant analysis.

Data collection and preprocessing

We initially performed a keyword-based search and downloaded toxicity-related gene expression profiles with associated pathology information from in vivo rat studies stored in multiple repositories and databases. The following keywords were utilized: organ toxicity, kidney toxicity, nephrotoxicity, liver toxicity, hepatotoxicity, heart toxicity, cardiac toxicity, brain toxicity, neurotoxicity, blood toxicity, hemotoxicity, lung toxicity, respiratory system toxicity, skin toxicity, dermotoxicity, phototoxicity, immune system toxicity, immunotoxicity, ocular and visual system toxicity, phototoxicity, endocrine system toxicity, and pituitary toxicity. We collected gene expression profiles (n = 19,521) from the following resources: i) Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information [10], ii) ArrayExpress at the European Bioinformatics Institute [11], iii) Chemical Effects in Biological Systems (CEBS) at the National Institute of Environmental Health Sciences [12], and iv) the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System (TG-GATEs) of the National Institute of Health Sciences of Japan [13]. Various microarray platforms from Affymetrix Inc. (Santa Clara, CA), Agilent Technologies Inc. (Santa Rosa CA), Illumina Inc. (San Diego, CA), and GE Healthcare/Amersham Biosciences (Tempe, AZ) were used for these studies. Affymetrix CEL data format files were quantile normalized using the Robust Multiarray Averaging (RMA) method [14]. Background-subtracted median intensity values for Agilent and pre-processed values for Illumina and GE Healthcare/Amersham data were quantile normalized. After additionally downloading related data, including experimental conditions and pathology results, we conducted in-depth manual curation of the experimental conditions, based on the ToxRefDB format [15], and mapped histopathological descriptions of specific organs to standardized specific pathology codes (S1 Table). For toxicity class assignment, we used the pathology scores assigned by the individual studies: namely, 0 (within normal limits), 1 (minimal), 2 (mild), 3 (moderate), or 4 (severe), based on the pathology code of the corresponding organ and according to previous guidelines [16]. We focused on kidney, liver, and heart tissue, which are the primary organs used in research on drug-induced organ injury. We excluded genes with > 50% missing values in each study. We also excluded samples with > 80% missing genes [17]. After filtering out samples and genes with missing values, inter-array gene expression values were quantile normalized for 8,508 sample arrays (1,335 kidney, 6,491 liver, and 682 heart specimens) to make the entire dataset coherent in distribution. No further correction of nonsystematic variation from between-batch effects was implemented, since the datasets for our study were originally generated by several different groups, who used a wide variety of chemical perturbations, and thus, are highly confounded with numerous sources of potential batch effect.

Toxicity class assignment

The toxicity class of each sample was assigned using the average of severity scores for the mapped histopathological codes. For this, we used the partial least squares discriminant analysis (PLS-DA) approach [18] implemented in the mixOmics R package [19] to determine the optimal threshold value of the average severity score that minimizes the classification error rate in a 10-fold cross-validation (S1 Fig). PLS-DA is one of the most popular classification techniques for generating a multivariate model that maximizes the discrimination between pre-defined sample groups [20]; however, the approach is prone to overfitting that requires rigorous internal validation by cross-validation and/or externally with an independent dataset. In the present study, the PLS-DA classification model was built for the top 300 DEGs from the meta-analysis for each of the discretized threshold values. To calculate the error rate of the PLS-DA classification model, a three-dimensional projection space was considered (S1 Fig). Further details on the meta-analysis are listed below. Using the optimal threshold of 0.5, we assigned 6,567 of the 8,508 samples with histopathological information on 453 human drugs to one of three toxicity classes: i) untreated, normal samples showing a lower average toxicity score than the threshold; ii) level-0, drug treated with no or minor histopathological phenotype with an average toxicity score below the threshold; and iii) level-1, drug treated with major toxicity phenotype and an average toxicity score above the threshold. Of the 6,567 samples, 4,373 were used for discovering biomarkers and generating a prediction model, and 2,194 were used for validation (S2 Table and S3 Table). Dosage and treatment schedule were not stratified to reduce the complexity of the analysis.

Identification of DEGs by meta-analysis

For 4,373 training samples, we identified DEGs (i.e., candidate molecular biomarkers) using the random-effect meta-analysis method Hedges’ g, applying the standardized mean difference as an effect-size index in a stepwise manner [21]. The random-effect model assumes that all studies are heterogeneous; therefore, in assigning weights of studies, it simultaneously considers intra-study and inter-study variance. This characteristic of the random-effect model helps to reduce bias between studies, including batch effects, during analysis. Subsequent meta-analysis (MA) comparisons were used to identify DEGs for five target classes: (i) MA1, untreated versus treated to identify pan-organ treatment-specific DEGs; (ii) MA2, kidney versus liver versus heart specimens among drug-treated samples to identify organ-specific DEGs; and (iii) MA3, level-0 versus level-1 kidney specimens; (iv) MA4, level-0 versus level-1 liver specimens; and (v) MA5, level-0 versus level-1 heart specimens to identify organ-specific DEGs in accordance with toxicity responses. DEGs were identified in each comparison using a p-value cut-off of 0.0005. This p-value threshold was selected, because 44 known toxicity markers (S4 Table) [5, 9, 2225] were most significantly enriched under this threshold and, at the same time, it provided a sufficiently large enough number of resultant DEGs for us to perform downstream analysis.

Feature reduction by sparse PLS-DA and wrappers

Sparse PLS-DA (sPLS-DA) was performed to select the most discriminative genes among DEGs obtained by the MA comparisons for the training dataset. sPLS-DA achieved variable selection and classification in one procedure by iterating the following steps over discrete tuning parameters, such as sparsity and number of latent variables: i) generation of a multi-variate model using a given number of genes, ii) selection of pre-determined number of variables with the longest Euclidean distance, and iii) model-validation by 10 repetitions (10x) of 10-fold cross-validation. For classification, we considered the first three sparse PLS-DA (sPLS-DA) dimensions because it outperformed classifiers with one or two dimensions. When more than one gene shared distance rank, all were selected. sPLS-DA was conducted using the mixOmics R package [19].

A wrapper method was applied for further feature reduction using additional classification methods other than PLS-DA. A wrapper approach is powerful in identifying optimal variable subsets when the number of variables is relatively small [26]. Again, sPLS-DA was used to generate gene subsets for each of the five MA comparisons, followed by five different classifiers (linear discriminant analysis [LDA], random forest [RF], K-nearest neighbor [KNN], probabilistic K-nearest neighbors [PKNN], and support vector machine [SVM] methods), to find conditions with the lowest median classification error in a 10-fold cross-validation using different wrappers for each comparison. The optimal number of genes was determined by taking the median value of the numbers of DEGs from the five classifiers that showed the lowest error for each comparison (S2 Fig). The same Euclidean distance based method employed for sPLS-DA was used to identify the optimal set of DEGs.

Prediction model generation and performance assessment

Using the optimal set of DEGs selected above, we compared the performances of the five classification models (LDA, RF, KNN, PKNN, and SVM). The accuracies of 10x, 10-fold cross-validations were averaged. The best performing classifier model with the lowest error rate was selected and further optimized with regard to the number and size of the decision tree. We assessed the performance of the generated model in 2,194 independent test samples and compared it with the performance of a model built with the 44 known genomic biomarkers.

Gene Ontology and protein-protein interaction network analysis

Gene ontology (GO) analysis was performed with DEGs from the MA3 and MA4 comparisons against biological process terms using the Database for Annotation, Visualization and Integrated Discovery (DAVID) [27] to identify enriched biological functions in response to drug-induced toxicity in kidney and liver tissue. DEGs from MA5 (heart toxicity) were not considered since the number of DEGs from this comparison was not sufficiently large enough for GO analysis. Significantly enriched terms were identified using a p-value cut-off of 0.05. For network analysis, we used the Cytoscape plugin “Molecular Complex Detection” (MCODE) to identify protein-protein interaction (PPI) subnetworks [28]. The results were visualized using Cytoscape [29]. Rat PPIs from the following 13 sources were combined: BIND [30], BIND_t [30], BioGRID [31], CORUM [32], DIP [33], HPRD [34], IntAct [35], MINT [36], MPPI [37], OPHID [38], InnateDB [39], MatrixDB [40], and mentha [41]. Using the ortholog mapping relationships reported in the NCBI HomoloGene database [42], we combined the most recent human PPIs [43] with the rat PPIs. A total of 169,723 interactions involving 13,768 proteins were included in the network model.

Analysis of a toxicogenomic dataset for human primary hepatocytes

A large-scale human cell-based pharmacotoxicity assay dataset is publically available for the liver. We obtained data from the TG-GATEs database [13], which contains a large-scale gene expression profile (n = 2,004 samples) and associated cell viability data (determined based on DNA content) for human primary hepatocyte cells treated with 158 chemical compounds. Information in this dataset was considered to be relevant to our biomarker candidates discovered in comparisons MA1 and MA4. Before the analysis, we first filtered out untreated samples with low DNA content (< 80%), which may represent contamination or some unknown environmental stress, and removed treated samples with > 100% DNA content to limit our search space to toxicity, not to hyper-proliferation. Application of these filters yielded a total of 572 untreated and 838 treated samples. In these samples, we examined the differential expressions of the five MA1 DEGs (NREP, ATRN, TBXA2R, KIFC1, and EPHX1) obtained in our study after all feature reductions (Table 1). Second, using varying thresholds of 76–98% DNA content, we assigned toxicity classes of level-0 (> = threshold) or level-1 (< threshold) to the treated samples, and further analyzed the differential expressions of the three MA4 DEGs (CTSD, TPM4 and RPL35A) for liver toxicity obtained after feature reductions (Table 1). For the analysis, raw expression data (.cel format) were RMA normalized with the Affy R package.

Table 1. Number of selected genes after meta-analysis, sPLS-DA, and wrapper approaches for all five meta-analysis comparisons.

Cell lines and drug assays

HepG2 (human liver carcinoma) and HEK293 (human embryonic kidney) cell lines were purchased from American Type Tissue Collection (Rockville, MD, USA) and cultured in Dulbecco’s modified Eagle’s medium (DMEM) with 10% fetal bovine serum (FBS) under a humidified atmosphere with 5% CO2 at 37°C. HepG2 cells at 70–80% confluence were incubated for 48 h with 0–20 mM acetaminophen; HEK293 cells at 70–80% confluence were incubated for 72 h with 0–40 μM cisplatin. For these experiments, each of the two compounds was dissolved in both dimethyl sulfoxide (DMSO) and growth medium (DMEM with 10% FBS) before being added to cells. Cells incubated with 1% DMSO or growth medium served as controls. The effect of exposure to the respective compounds on cell viability was determined using MTS assay (Promega Corp., Madison, WI, USA), which estimates titers of metabolically active cells. Levels of toxicity for each cell line was determined based on MTS assay results: level-0, viability ≥ 60%, or level-1, viability < 60%. On the basis of preliminary MTS assay results, HepG2 cells were treated with 1 mM (level-0) and 10 mM (level-1 for DMSO) or 20 mM (level-1 for DMEM) acetaminophen for 48 h, and HEK293 cells were treated with 2 μM (level-0) or 20 μM (level-1) cisplatin for 72 h.

Semi-quantitative RT-PCR and qRT-PCR

Total RNA was extracted using QIAzol lysis reagent (Qiagen, Hilden, Germany), according to the manufacturer’s instructions. Aliquots of total RNA (1 μg) were used to synthesize first-strand cDNA with Superscript reverse transcriptase (Invitrogen, Carlsbad, CA, USA) for PCR amplification. Semi-quantitative RT-PCR was then performed under the following conditions: 40 cycles of denaturation at 95°C for 15s, annealing at 60°C for 30 s, and extension at 72°C for 10 s, followed by a terminal extension at 72°C for 10 min. A house keeping gene, GAPDH, served as an internal control. Quantitative RT-PCR (qRT-PCR) was performed using the 7500 Real-Time PCR system (Applied Biosystems, Foster City, CA, USA) with Power SYBR Master Mix (Applied Biosystems). The thermocycling conditions used for the PCR experiments were 40 cycles of denaturation at 95°C for 15 s and extension at 60°C for 1 min. The following primer sequences for PCR were used: NREP, 5’-catgcactgcacttcttcgt-3’ and 5’-catgcactgcacttcttcgt-3’; CTSD, 5’-catgcactgcacttcttcgt-3’ and 5’-catgcactgcacttcttcgt-3’; TPM4, 5’-ttgaggaggagttggacagg-3’ and 5’-gctgcatctcctgaatctcc-3’; TRPM4, 5’-ccactgtcaggaccaccttt-3’ and 5’-ccccagtgtgaggaatctgt-3’; and GAPDH: 5’-gagtcaacggatttggtcgt-3’ and 5’-gacaagcttcccgttctcag-3’.


Characterization of toxicogenomic meta-data

We characterized the 6,567 meta-data samples in terms of tested organs, drug identity, and treatment conditions (dose and duration) in relation to toxicity levels. Toxicity to 453 drugs was evaluated in kidney (n = 47), liver (n = 278), heart (n = 66), or multiple organ (n = 62) specimens (Fig 2A and 2B). Varying degrees of toxicity were observed for drugs tested in a single organ (Fig 2A and S5 Table). Drugs tested in multiple organs exhibited organ-selective toxicity (Fig 2B): for example, well-known nephrotoxins, such as cisplatin and gentamicin, were toxic to the kidney but not to the liver. Conversely, the well-known liver-damaging drugs acetaminophen and fluvastatin showed selective hepatotoxicity but not nephrotoxicity in our dataset (Fig 2B). As expected, dose-dependent and treatment time-dependent toxicity was also observed in our dataset (Fig 2C). Therein, phenacetin was toxic at 15 μg/kg and 45 μg/kg in the liver and kidney, respectively; meanwhile, thioacetamide induced kidney and liver toxicity in all samples at 1,000 μg/kg after 8–15 days of treatment (Fig 2C).

Fig 2. Characterization of pharmacogenomics meta-data.

(A) Distribution of toxicity levels for 391 compounds from single-organ studies. Compounds were rank-ordered by relative toxicity level. (B) Distribution of toxicity levels for 62 compounds from multi-organ studies. Asterisks indicate compounds showing organ-specific toxicity. (C) Distribution of toxicity levels for two selected drugs at different doses and treatment durations. The same color scale is used in all panels. Missing information is shown in grey. For each row, the sum of samples with level-0 and level-1 toxicity per each organ is 100%. See S5 Table for the exact values used to generate this figure.

Discovery of pharmacotoxicity biomarkers

Using a multi-step method, we attempted to identify robust molecular biomarkers of toxicity to 453 drugs among 4,373 meta-data samples. With these biomarkers, we then built a prediction model that was subsequently tested in an independent dataset of 2,194 samples (Fig 3A and S3 Table). First, we applied Hedges’ g statistic to identify DEGs across multiple orthogonal datasets that compared innocuous versus toxic drug treatments separately for kidney (MA3), liver (MA4), and heart (MA5) tissue. Two additional comparisons were conducted to detect primary responders to broad chemical perturbations in treated versus un-treated specimens for multi-organ datasets (MA1) and to distinguish organ-specific responses to drug treatment for liver, kidney, and heart specimens (MA2). DEGs discovered in MA1 and MA2 were considered along with DEGs from MA3-5 during feature selection so that the results from organ-specific comparisons of toxic versus non-toxic treatment in MA3-5 would be supported by those from comparison of pan-organ and organ-specific responses to drug treatment in MA1 and MA2, respectively.

Fig 3. Meta-analysis identifies candidate biomarkers of organ toxicity.

(A) Schematic flow chart of the meta-analysis. Untreated, untreated samples (pathology score < 0.5); level-0, innocuous treatment; level-1, toxic treatment. (B) Venn diagram for the overlap of DEGs identified from the four drug-related meta-analysis comparisons in (A). Numbers indicate gene counts. (C-I) Forest plots display the study-specific meta-analysis effect-sizes and 95% confidence intervals for the studies included in the training dataset. Plots for the seven DEGs from the MA1, MA3, and MA4 datasets with the greatest absolute average effect-size (> 0.55; See S8 Table) are shown. Plots for the remaining 11 DEGs are shown in S3 Fig. Nrep in untreated versus treated specimens; Spp1 (secreted phosphoprotein 1), Ctss (cathepsin S), Tubb5 (tubulin β5), and Trpm4 (transient receptor potential cation channel, subfamily M, member 4) in level-0 versus level-1 kidney specimens; and Ctsd (cathepsin D) and Tpm4 (tropomyosin 4) in level-0 versus level-1 liver specimens. The sizes of the circles are proportional to the fold-change (log2 ratio). The summarized effect-size (mean fold-change) of all enrolled studies is shown as a black circle at the bottom of the plot. p-value, Z-test for the overall effect of the summarized meta-analysis results for each gene.

Excluding the MA2 comparison, a total of 982 DEGs were found to be relevant to responses to drug treatment (Fig 3B). Intriguingly, about 30% of the DEGs from MA3, MA4, and MA5 were co-detected in more than one organ, suggesting their use in detecting toxicity in multiple organs. However, none of these DEGs was detected in all three organs, likely reflecting a lower detection power in the heart than in the kidney and liver. Some of the organ-selective DEGs identified here correspond to previously reported toxicity markers (S6 Table). For example, DEGs noted in MA3, such as kidney injury molecule-1 (Kim1), ceruloplasmin (Cp), clusterin (Clu), and secreted phosphoprotein 1 (Spp1), are known kidney toxicity markers [24]. DEGs in MA4 included known liver toxicity markers, such as heme oxygenase (decycling) 1 (Hmox1), cathepsin L1 (Ctsl) [9], receptor-interacting serine-threonine kinase 3 (Ripk3), solute carrier family 7 (cationic amino acid transporter, y+ system), member 1 (Slc7a1), and chemokine (C-C motif) ligand 2 (Ccl2) [23].

To identify an efficient, succinct, and robust set of markers with sufficient resolution power to discriminate target classes, we performed feature reduction using the 4,023 DEGs obtained therein. To this end, we applied sPLS-DA, a multivariate exploratory approach that is a computationally efficient one-stage variable-selection and classification method [19], to select the smallest number of features from each comparison that would minimize the average misclassification error rate using a 10-fold cross-validation. A total of 40 genes were selected from the five comparisons (S7 Table). We further used five wrapper methods, LDA, RF, KNN, PKNN, and SVM, in an attempt to further reduce features, especially for comparison MA1 and 3 (S2 Fig). Application of these methods resulted in the selection of 21 genes from the five comparisons (Table 1). This rigorous feature reduction was necessary to narrow down our discoveries to the most discriminative and smallest set of biomarkers for greater utility in multiplex gene expression assay platforms, where the number of biomarkers (probes) is often limited technically by the number of resolvable detection channels and economically by the cost per probe.

Fig 3C–3I) highlights seven of 18 genes with the highest fold-change values from the four drug-relevant comparisons. Of these, the top-ranked Spp1 and third-ranked Ctss are well-known kidney pharmacotoxicity markers [5]. The other five genes Nrep, Trpm4, Tubb5, Ctsd, and Tpm4 are all novel pharmacotoxicity biomarker candidates.

Evaluation of the toxicity prediction model with a test dataset

We compared the performance of five different classification models built using the training samples and selected the best classifier to build a toxicity prediction model for the kidneys, liver, and heart. We applied LDA, RF, KNN, PKNN, and SVM as classification algorithms and assessed the prediction accuracy thereof via 10-fold cross-validation as a performance measure (Table 2). Among these methods, RF with 10-fold cross-validation achieved the best performance (82% correct classification); thus, it was selected as the final classifier. Next, we evaluated the performance of the 21-gene RF prediction model in 2,194 independent test samples in comparison to a model with an identical degree of complexity built using previously known organ-specific toxicity markers, which we compiled from various studies [5, 9, 2225]. As shown in Table 2, the accuracy of our 21-gene RF prediction model was slightly higher (62%) than the model built with previously known biomarkers (60.5%).

Table 2. Performance of prediction models using the 21 identified differentially expressed genes.

Pathway and network level functional characterization of the differentially expressed genes

To characterize underlying biological responses to chemical stress, we separately investigated enriched biological processes associated with DEGs discovered in the MA1, MA3, and MA4 comparisons by GO analysis. Due to a lack of available DEGs, MA5 (level-0 versus level-1 heart specimens) was excluded from the analysis. GO terms related to cell death, stress response, immune response, metabolic process, and signal transduction were identified (Fig 4A). Specifically, in the comparison of untreated versus treated specimens (MA1), the GO terms with the most significant p-values were “response to external stimulus” (p = 2.56×10−5), “inflammatory response” (p = 3.24×10−5), and “cellular metabolic process” (p = 5.57×10−3), whereas terms related to cell death or apoptosis were not significant, suggesting that this comparison detected gene expression signals relevant to the early responses to chemical stress prior to the development of pathophysiological responses. In contrast, DEGs associated with both kidney and liver toxicity responses were enriched for cell death and mitochondrial apoptosis, as well as “response to external stimulus” and “inflammatory response.” Interestingly, GO analyses also yielded liver-specific terms, such as “organ regeneration” (p = 1.05×10−03) and “cellular metabolic process” (p = 3.35×10−15). The liver is the only internal human organ capable of regeneration upon tissue loss or after acute toxic injury [44]. Moreover, one of the liver’s most important roles is to metabolize various xenobiotics. Accordingly, our DEGs well represent the biology of organ-specific responses to toxicity.

Fig 4. Functional analysis of DEGs.

(A) Enriched GO terms associated with DEGs from three meta-analysis comparisons. DEGs from MA5 were excluded from the analysis owing to insufficient dataset size. p-value: modified Fisher’s exact test implemented in the Database for Annotation, Visualization and Integrated Discovery (DAVID). (B, C) Highly interconnected subnetworks present within the individual sets of DEGs from MA3 and MA4. A circular node indicates proteins, a diamond node indicates proteins/genes, and solid lines and dashed arrows respectively indicate physical and genetic interactions reported in our input databases (see Methods for details). Node color indicates the median expression fold-change of the training dataset (level-1/level-0).

To further identify protein complexes within DEGs noted in MA3 and MA4 comparisons, we applied the cluster finding algorithm MCODE to the PPI network formed by the 303 genes from MA3 and the 661 genes from MA4 that corresponded with the DEGs. MCODE identified five clusters for MA3 and 14 clusters for MA4 (Fig 4B and 4C). Notably, proteasomes and ribosomes were identified for both MA3 and MA4. Cluster-1 from MA3 and cluster-1 from MA4 are different subsets of the giant proteasome complex. Proteasome mediated degradation of damaged proteins is an important defense mechanism against xenobiotic-driven reactive oxygen species-mediated stress [4547]. Meanwhile, cluster-2 from MA3 and cluster-2, -3, -4, and -5 from MA4 are part of the ribosomal complex. Up-regulation of ribosomal machinery is also a known cellular defense mechanism against various genotoxic compounds that restores homeostasis by activating translation [48].

Different PPI components from DNA damage response-related processes were also discovered for both comparisons, namely, cluster-3 from MA3 and cluster-6, -7, -8, -9, -10, and -11 from MA4. Consistent with the GO analyses, MCODE identified protein complexes involved in inflammatory responses, including caspase-1-induced activation of interleukins (IL)-1B and IL-18 (cluster-4 of MA3), elevated components of complement (cluster-5 of MA3), and β-colony-stimulating factor 2 receptor (CSF2RB) complex (cluster-7 of MA4). Notably, complement activation is associated with various organ injuries, including acetaminophen-induced liver injury [49], and CSF2RB is a high-affinity receptor for IL-3, IL-5, and colony-stimulating factor [50]. Intriguingly, we identified several liver-specific PPI subnetworks, including nucleoporin (cluster-12) and hepatic lectin (cluster-14 of MA4), the latter of which is down-regulated in the liver upon exposure to various drugs. As a transmembrane protein, hepatic lectin is a known target of liver-specific drug delivery that internalizes receptor-bound molecules and viruses through endocytosis [51]. Thus, its down-regulation may possibly reflect a defense mechanism against hepatotoxins.

NREP and CTSD: novel biomarkers of toxicity in human cell lines

Recapitulation of our biomarker candidates in appropriate human cell models would potentially allow for their adoption into prediction of toxicity for drug candidates early in drug development. Accordingly, the following two-track approach was undertaken to identify toxicity markers applicable to human cell lines: 1) evaluating the performance of eight of the 21 candidate markers from MA1 (untreated versus treated) and MA4 (level-0 versus level-1 liver specimens) in TG-GATEs, a massive in vitro toxicogenomic dataset for human hepatocytes [13]; 2) experimentally testing expression changes for five of the 21 candidate markers from MA1, MA3, and MA4 that were selected based on pooled effect-size (> 0.55) after exposure to relevant pharmacological compounds in the human cell lines HEK293 and HepG2. Three of the 21 candidate genes from MA5 were not included because a human heart cell line is not commercially available (S8 Table).

TG-GATEs comprises 1,410 transcriptome profiles and associated cell titers for human primary hepatocytes exposed to 119 pharmacological compounds. For this analysis, we first performed t-tests for differences in expression of the five early-toxicity marker candidates, NREP, ATRN, TBXA2R, KIFC1, and EPHX1, between untreated and treated samples. Of these, only NREP, which had the largest fold-changes in the meta-analysis (S8 Table), showed consistent and statistically significant depletion upon drug treatment (Fig 5A). Second, fold-changes in the expression of each of the three liver marker candidate genes, CTSD, TPM4, and RPL35A, were estimated using varying toxicity thresholds determined by relative cell titers (Fig 5B). Of these candidates, CTSD was found to be a statistically significant positive toxicity marker that showed an increasing fold-change with decreasing cell viability threshold (Fig 5B).

Fig 5. Computational and experimental validations identify NREP and CTSD as biomarkers of toxicity in human cell lines.

(A) Density plots comparing expression levels of NREP between untreated and treated samples of liver primary hepatocytes reported in TG-GATEs. (B) Boxplots display fold-changes in CTSD (toxic/innocuous) at each of the given cell viability thresholds measured for the liver primary hepatocytes reported in TG-GATEs. * t-test p-value < 0.05, ** < 0.001. (C, D) Dose-responsive viability of HEK293 (C) and HepG2 (D) cells exposed to cisplatin (C) or acetaminophen (D). DMSO was used to dissolve the compounds. Cell viability was measured by MTS assay. Error bars represent ± standard deviation of triplicate experiments. See S4A and S4B Fig for the results with the same compounds dissolved in growth media. (E, F) NREP mRNA levels after exposure to the indicated concentrations of cisplatin for 72 h and acetaminophen for 48 h, respectively, determined by RT-PCR. (G-H) qRT-PCR assays for NREP (G) and CTSD (H). Y-axis indicates fold-changes in expression compared to chemically untreated samples (n = 5). Level-0 and level-1 drug concentrations for DMSO and DMEM were selected based on cell viability of > or < 60%, respectively, in C-D and S4A and S4B Fig. *p < 0.05, ** p < 0.001; Student’s t-test. Error bars represent ± standard deviation.

In parallel, we experimentally evaluated five novel toxicity biomarker candidates, NREP, TUBB5, TRPM4, CTSD, and TPM4, which showed the largest pooled effect-size (> 0.55) in RF analysis (S8 Table), in the human cell lines HEK293 and HepG2. In comparison to innocuous samples (level-0), NREP, a DEG identified from the comparison of untreated and treated specimens, was markedly down-regulated in both HEK293 (Fig 5C, 5E and 5G) and HepG2 cells (Fig 5D, 5F and 5G) treated with toxic concentrations (level-1) of known organ-selective toxins. Expression of CTSD, a DEG identified from the comparison of level-0 and level-1 liver specimens (MA4), was significantly elevated upon exposure to toxic concentrations of the liver toxin acetaminophen (Fig 5D and 5H). Notably, the observed expression changes in the two validated biomarker candidates were found to be robust against the type of vehicle used for drug preparation (DMSO or growth medium) (Fig 5G and 5H).

On the contrary, TRPM4 and TPM4, DEGs discovered in comparison of level-0 and level-1 kidney specimens (MA3) and liver specimens (MA4), respectively, did not show consistent expression changes in relevant human cells treated with toxic concentrations of the drug compounds (S4E and S4F Fig). TUBB5 showed no changes in gene expression in HEK293 cells (data not shown).

Importantly, NREP and CTSD were cross-validated by a two-track approach in human cells: computationally validated in silico via TG-GATEs and experimentally validated in vitro in the human cell lines HEK293 and HepG2. We found that NREP is a multi-organ biomarker for a wide variety of chemical perturbations and is down-regulated in response to drug-induced toxicity. We also found that CTSD is a liver-specific biomarker of toxicity that is concurrently induced with the onset of a pathological phenotype following various chemical perturbations.


Biomarkers of drug-induced toxicity enable cost effective pre-clinical evaluation of drug candidates in cell line models; however, their use is currently limited by a lack of markers robustly applicable to a wide variety of chemical compounds and predictive of in vivo pathological outcomes.

In the present study, we utilized massive toxicogenomic meta-datasets from various in vivo rat studies covering 453 pharmacological compounds at different doses and durations and accompanied by histopathological information. With these datasets, we performed meta-analysis, followed by multiple feature reduction procedures, to identify organ-specific biomarker candidates of drug-induced toxicity (Table 3). Subsequent in silico and in vitro analyses in human cell lines validated NREP and CTSD as strong biomarkers of drug-induced toxicity. The canonical functions of NREP include cell migration through activation of RalA in the initial wound matrix of proto-myofibroblasts and myofibroblasts [52] and wound healing in human and mouse cell lines [53, 54]. CTSD is a lysosomal aspartic endopeptidase known to mediate apoptosis in response to oxidative stress [55]. In the present study, NREP was down-regulated in response to general chemical stress and coupled to the onset of multi-organ toxicity. As well, CTSD was found to be up-regulated upon drug-induced toxicity in the liver.

Herein, the compilation of the large number of relevant toxicogenomic studies enabled us to perform meta-analyses capable of identifying robust prediction biomarkers that were otherwise undiscovered in the individual studies because of their limited statistical power. Nevertheless, combining different studies inevitably incorporates experimental biases. Potential sources of bias in toxicogenomic meta-analyses include experimental model selection, drug selection, organ selection, and pathological phenotype selection. In the present study, to tackle the bias in experimental model selection, both in vivo rat models and human cell line models were employed in the initial discovery and subsequent validation, respectively. Several previous studies have indeed shown that these types of pre-clinical models successfully predict toxicity in humans [56, 57]. Regarding the potential bias in drug selection, among 453 unique compounds compiled in our study, the three most-frequently tested drugs were acetaminophen (n = 191, 2.9%), diquat dibromide (n = 142, 2.2%) and 1,4-dichlorobenzene (n = 80, 1.2%), indicating that our study is not heavily biased in favor of only a few compounds. To avoid potential organ bias in combined analyses, we conducted separate analyses for three major organs: kidney (n = 1,140), liver (n = 4,813), and heart (n = 614). With regard to possible pathological phenotype bias, our compiled dataset was annotated with comprehensive organ-specific histopathology terms, as shown in S1 Table. The most frequently used terms were “infiltration, cellular” (n = 4,701, 71.6%) and “necrosis” (n = 3,599, 54.8%).

Additional challenges associated with heterogeneous sources of data involve addressing nonsystematic variation originating from between-batch effect, such as laboratory conditions, reagent lots, and personnel differences [58]. Although, several available algorithms are able to correct for batch effects, these would not be appropriate or efficient for use in studies such as ours, since our compiled datasets originated from studies of differing experimental conditions and of many different chemical stress conditions. Thus, the studies would be highly confounded with several potential sources of batch effect. The presence of nonsystematic variation may result in false positives and/or false negatives. Between these, false positives were the greatest concern in our study. To address this concern, we attempted to validate positive results experimentally in human cell lines. In doing so, we identified and validated two toxicity biomarkers that were highly robust against drug-induced toxicity.

In our study, organ-specific interactions were assessed only for toxicity levels (level-0 versus level-1) not for treatment status (untreated versus treated), as the focus of our study was to identify candidate biomarkers that could predict organ-specific toxicity rather than to outline organ-specific responses to chemical stress. Accordingly, we initially attempted to discover DEGs in MA1 without considering organ-specificity, since, when identified, these DEGs may serve as ubiquitous chemical stress markers to help interpret the results of organ-specific comparisons (MA3-5). NREP, which was identified from MA1, was both computationally and experimentally validated in our study as a multi-organ responder to chemical stress coupled to toxic phenotype.

Not all of the biomarkers identified in vivo were recapitulated in human cells. Only two of eight genes were reproduced in an orthogonal liver cell line toxicogenomic dataset, and four of five genes were experimentally validated by qRT-PCR in human cell lines. This limited coherence may originate from differences in species and between in vivo and in vitro experimental settings. One potential difference in the experimental settings that may have affected our results was the type of vehicle used for drug preparation. DMSO, a popular cryoprotectant for long-term cell storage, is widely used as a solvent and vehicle for many pharmaceutical compounds, especially in in vitro settings. Frequently, the effect of DMSO on drug responses and toxicity is ignored. Nevertheless, in our study, we experimentally confirmed that the two drug-induced toxicity biomarkers were not affected by the type of drug-solvent.

In this study, we identified and validated two novel gene expression biomarkers that were found to be predictive of drug-induced toxicity. If incorporated into an existing panel of biomarkers or further developed into an independent cell line-based molecular assay system, these biomarkers might enable guided design of lead compounds, obviating the need to test large numbers of drugs in animals in order to evaluate in vivo toxicity. This would improve the overall efficiency of drug-development processes.

Supporting Information

S1 Fig. Determination of the PLS-DA threshold of the average toxicity score for class assignment.



S2 Fig. Feature reduction by wrappers.



S3 Fig. Meta-analysis identifies toxicity biomarker candidates (second tier), related to Fig 3C–3I.



S4 Fig. Experimental validations identify NREP and CTSD as biomarkers of toxicity in human cell lines, related to Fig 5.



S1 Table. List of curated pathology terms for each organ according to the standardized ToxRefDB vocabulary.



S2 Table. List of samples for meta-analysis.



S3 Table. Summary of the analyzed meta-data.



S4 Table. The 44 previously known biomarkers.



S5 Table. Distribution of compound selective toxicity levels in meta-data, related to Fig 2.



S6 Table. List of DEGs identified in each meta-analysis.



S7 Table. List of 40 genes selected from sPLS-DA.



S8 Table. Summary statistics for the 18 toxicity biomarker candidates.




We are deeply grateful to KiYoung Lee for his mentorship, support, and seminal contributions to computational biology and systems medicine.

Author Contributions

Conceived and designed the experiments: HSK KL. Performed the experiments: Ju-Hwa K. SYK DJ. Analyzed the data: HK. Contributed reagents/materials/analysis tools: HJP Jihyun K. Wrote the paper: HSK KL HK SJ.


  1. 1. Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, et al. How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nature reviews Drug discovery. 2010; 9(3):203–14. doi: 10.1038/nrd3078 pmid:20168317.
  2. 2. Andersen ME, Krewski D. Toxicity testing in the 21st century: bringing the vision to life. Toxicological sciences: an official journal of the Society of Toxicology. 2009;107(2):324–30. doi: 10.1093/toxsci/kfn255 pmid:19074763.
  3. 3. Chen M, Zhang M, Borlak J, Tong W. A decade of toxicogenomic research and its contribution to toxicological science. Toxicological sciences: an official journal of the Society of Toxicology. 2012;130(2):217–28. doi: 10.1093/toxsci/kfs223 pmid:22790972.
  4. 4. Ganter B, Tugendreich S, Pearson CI, Ayanoglu E, Baumhueter S, Bostian KA, et al. Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action. Journal of biotechnology. 2005;119(3):219–44. doi: 10.1016/j.jbiotec.2005.03.022 pmid:16005536.
  5. 5. Wang EJ, Snyder RD, Fielden MR, Smith RJ, Gu YZ. Validation of putative genomic biomarkers of nephrotoxicity in rats. Toxicology. 2008;246(2–3):91–100. doi: 10.1016/j.tox.2007.12.031 pmid:18289764.
  6. 6. Fielden MR, Eynon BP, Natsoulis G, Jarnagin K, Banas D, Kolaja KL. A gene expression signature that predicts the future onset of drug-induced renal tubular toxicity. Toxicologic pathology. 2005;33(6):675–83. doi: 10.1080/01926230500321213 pmid:16239200.
  7. 7. Natsoulis G, Pearson CI, Gollub J, B PE, Ferng J, Nair R, et al. The liver pharmacological and xenobiotic gene response repertoire. Molecular systems biology. 2008;4:175. doi: 10.1038/msb.2008.9 pmid:18364709. Pubmed Central PMCID: 2290941.
  8. 8. Zhang JD, Berntenis N, Roth A, Ebeling M. Data mining reveals a network of early-response genes as a consensus signature of drug-induced in vitro and in vivo toxicity. The pharmacogenomics journal. 2014;14(3):208–16. pmid:24217556. doi: 10.1038/tpj.2013.39 Pubmed Central PMCID: 4034126.
  9. 9. Minami K, Saito T, Narahara M, Tomita H, Kato H, Sugiyama H, et al. Relationship between hepatic gene expression profiles and hepatotoxicity in five typical hepatotoxicant-administered rats. Toxicological sciences: an official journal of the Society of Toxicology. 2005;87(1):296–305. doi: 10.1093/toxsci/kfi235 pmid:15976192.
  10. 10. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic acids research. 2013;41(Database issue):D991–5. doi: 10.1093/nar/gks1193 pmid:23193258. Pubmed Central PMCID: 3531084.
  11. 11. Rustici G, Kolesnikov N, Brandizi M, Burdett T, Dylag M, Emam I, et al. ArrayExpress update—trends in database growth and links to data analysis tools. Nucleic acids research. 2013;41(Database issue):D987–90. doi: 10.1093/nar/gks1174 pmid:23193272. Pubmed Central PMCID: 3531147.
  12. 12. Waters M, Stasiewicz S, Merrick BA, Tomer K, Bushel P, Paules R, et al. CEBS—Chemical Effects in Biological Systems: a public data repository integrating study design and toxicity data with microarray and proteomics data. Nucleic acids research. 2008;36(Database issue):D892–900. doi: 10.1093/nar/gkm755 pmid:17962311. Pubmed Central PMCID: 2238989.
  13. 13. Igarashi Y, Nakatsu N, Yamashita T, Ono A, Ohno Y, Urushidani T, et al. Open TG-GATEs: a large-scale toxicogenomics database. Nucleic acids research. 2015;43(Database issue):D921–7. doi: 10.1093/nar/gku955 pmid:25313160. Pubmed Central PMCID: 4384023.
  14. 14. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–64. doi: 10.1093/biostatistics/4.2.249 pmid:12925520.
  15. 15. Knudsen TB, Martin MT, Kavlock RJ, Judson RS, Dix DJ, Singh AV. Profiling the activity of environmental chemicals in prenatal developmental toxicity studies using the U.S. EPA's ToxRefDB. Reproductive toxicology. 2009;28(2):209–19. doi: 10.1016/j.reprotox.2009.03.016 pmid:19446433.
  16. 16. Mann PC, Vahle J, Keenan CM, Baker JF, Bradley AE, Goodman DG, et al. International harmonization of toxicologic pathology nomenclature: an overview and review of basic principles. Toxicologic pathology. 2012;40(4 Suppl):7S–13S. doi: 10.1177/0192623312438738 pmid:22637736.
  17. 17. Yu T, Peng H, Sun W. Incorporating Nonlinear Relationships in Microarray Missing Value Imputation. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM. 2011;8(3):723–31. doi: 10.1109/TCBB.2010.73 pmid:20733236. Pubmed Central PMCID: 3624752.
  18. 18. Perez-Enciso M, Tenenhaus M. Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Human genetics. 2003;112(5–6):581–92. pmid:12607117.
  19. 19. Le Cao KA, Boitard S, Besse P. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC bioinformatics. 2011;12:253. doi: 10.1186/1471-2105-12-253 pmid:21693065. Pubmed Central PMCID: 3133555.
  20. 20. Bartel J, Krumsiek J, Theis FJ. Statistical methods for the analysis of high-throughput metabolomics data. Computational and structural biotechnology journal. 2013;4:e201301009. doi: 10.5936/csbj.201301009 pmid:24688690. Pubmed Central PMCID: 3962125.
  21. 21. Hedges LV, Olkin I. Statistical methods for meta-analysis. Orlando: Academic Press; 1985.
  22. 22. Dunnick J, Blackshear P, Kissling G, Cunningham M, Parker J, Nyska A. Critical pathways in heart function: bis(2-chloroethoxy)methane-induced heart gene transcript change in F344 rats. Toxicologic pathology. 2006;34(4):348–56. doi: 10.1080/01926230600798583 pmid:16844662.
  23. 23. Huang L, Heinloth AN, Zeng ZB, Paules RS, Bushel PR. Genes related to apoptosis predict necrosis of the liver as a phenotype observed in rats exposed to a compendium of hepatotoxicants. BMC genomics. 2008;9:288. doi: 10.1186/1471-2164-9-288 pmid:18558008. Pubmed Central PMCID: 2478688.
  24. 24. Kondo C, Minowa Y, Uehara T, Okuno Y, Nakatsu N, Ono A, et al. Identification of genomic biomarkers for concurrent diagnosis of drug-induced renal tubular injury using a large-scale toxicogenomics database. Toxicology. 2009;265(1–2):15–26. doi: 10.1016/j.tox.2009.09.003 pmid:19761811.
  25. 25. Mori Y, Kondo C, Tonomura Y, Torii M, Uehara T. Identification of potential genomic biomarkers for early detection of chemically induced cardiotoxicity in rats. Toxicology. 2010;271(1–2):36–44. doi: 10.1016/j.tox.2010.02.015 pmid:20211217.
  26. 26. Inza I, Larranaga P, Blanco R, Cerrolaza AJ. Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial intelligence in medicine. 2004;31(2):91–103. doi: 10.1016/j.artmed.2004.01.007 pmid:15219288.
  27. 27. Dennis G Jr., Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome biology. 2003;4(5):P3. pmid:12734009.
  28. 28. Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics. 2003;4:2. pmid:12525261. Pubmed Central PMCID: 149346.
  29. 29. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research. 2003;13(11):2498–504. doi: 10.1101/gr.1239303 pmid:14597658. Pubmed Central PMCID: 403769.
  30. 30. Bader GD, Betel D, Hogue CW. BIND: the Biomolecular Interaction Network Database. Nucleic acids research. 2003;31(1):248–50. pmid:12519993. Pubmed Central PMCID: 165503.
  31. 31. Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, et al. The BioGRID interaction database: 2013 update. Nucleic acids research. 2013;41(Database issue):D816–23. doi: 10.1093/nar/gks1158 pmid:23203989. Pubmed Central PMCID: 3531226.
  32. 32. Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, et al. CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic acids research. 2010;38(Database issue):D497–501. doi: 10.1093/nar/gkp914 pmid:19884131. Pubmed Central PMCID: 2808912.
  33. 33. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic acids research. 2004;32(Database issue):D449–51. doi: 10.1093/nar/gkh086 pmid:14681454. Pubmed Central PMCID: 308820.
  34. 34. Mathivanan S, Ahmed M, Ahn NG, Alexandre H, Amanchy R, Andrews PC, et al. Human Proteinpedia enables sharing of human protein data. Nature biotechnology. 2008;26(2):164–7. doi: 10.1038/nbt0208-164 pmid:18259167.
  35. 35. Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, et al. The IntAct molecular interaction database in 2012. Nucleic acids research. 2012;40(Database issue):D841–6. doi: 10.1093/nar/gkr1088 pmid:22121220. Pubmed Central PMCID: 3245075.
  36. 36. Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, et al. MINT, the molecular interaction database: 2012 update. Nucleic acids research. 2012;40(Database issue):D857–61. doi: 10.1093/nar/gkr930 pmid:22096227. Pubmed Central PMCID: 3244991.
  37. 37. Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, et al. The MIPS mammalian protein-protein interaction database. Bioinformatics. 2005;21(6):832–4. doi: 10.1093/bioinformatics/bti115 pmid:15531608.
  38. 38. Brown KR, Jurisica I. Online predicted human interaction database. Bioinformatics. 2005;21(9):2076–82. doi: 10.1093/bioinformatics/bti273 pmid:15657099.
  39. 39. Breuer K, Foroushani AK, Laird MR, Chen C, Sribnaia A, Lo R, et al. InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation. Nucleic acids research. 2013;41(Database issue):D1228–33. doi: 10.1093/nar/gks1147 pmid:23180781. Pubmed Central PMCID: 3531080.
  40. 40. Chautard E, Fatoux-Ardore M, Ballut L, Thierry-Mieg N, Ricard-Blum S. MatrixDB, the extracellular matrix interaction database. Nucleic acids research. 2011;39(Database issue):D235–40. doi: 10.1093/nar/gkq830 pmid:20852260. Pubmed Central PMCID: 3013758.
  41. 41. Calderone A, Castagnoli L, Cesareni G. mentha: a resource for browsing integrated protein-interaction networks. Nature methods. 2013;10(8):690–1. doi: 10.1038/nmeth.2561 pmid:23900247.
  42. 42. Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, et al. Database resources of the National Center for Biotechnology Information. Nucleic acids research. 2000;28(1):10–4. pmid:10592169. Pubmed Central PMCID: 102437.
  43. 43. Lee K, Byun K, Hong W, Chuang HY, Pack CG, Bayarsaikhan E, et al. Proteome-wide discovery of mislocated proteins in cancer. Genome Res. 2013;23(8):1283–94. doi: 10.1101/gr.155499.113 pmid:23674306. Pubmed Central PMCID: 3730102. Epub 2013/05/16. eng.
  44. 44. Stanger BZ. Cellular homeostasis and repair in the mammalian liver. Annual review of physiology. 2015;77:179–200. doi: 10.1146/annurev-physiol-021113-170255 pmid:25668020.
  45. 45. Aiken CT, Kaake RM, Wang X, Huang L. Oxidative stress-mediated regulation of proteasome complexes. Molecular & cellular proteomics: MCP. 2011;10(5):R110–006924. doi: 10.1074/mcp.M110.006924 pmid:21543789. Pubmed Central PMCID: 3098605.
  46. 46. Ray PD, Huang BW, Tsuji Y. Reactive oxygen species (ROS) homeostasis and redox regulation in cellular signaling. Cellular signalling. 2012;24(5):981–90. doi: 10.1016/j.cellsig.2012.01.008 pmid:22286106. Pubmed Central PMCID: 3454471.
  47. 47. Schmidt M, Finley D. Regulation of proteasome activity in health and disease. Biochimica et biophysica acta. 2014;1843(1):13–25. doi: 10.1016/j.bbamcr.2013.08.012 pmid:23994620. Pubmed Central PMCID: 3858528.
  48. 48. Godderis L, Thomas R, Hubbard AE, Tabish AM, Hoet P, Zhang L, et al. Effect of chemical mutagens and carcinogens on gene expression profiles in human TK6 cells. PloS one. 2012;7(6):e39205. doi: 10.1371/journal.pone.0039205 pmid:22723965. Pubmed Central PMCID: 3377624.
  49. 49. Singhal R, Ganey PE, Roth RA. Complement activation in acetaminophen-induced liver injury in mice. The Journal of pharmacology and experimental therapeutics. 2012;341(2):377–85. doi: 10.1124/jpet.111.189837 pmid:22319198. Pubmed Central PMCID: 3336815.
  50. 50. Chen Q, Wang X, O'Neill FA, Walsh D, Fanous A, Kendler KS, et al. Association study of CSF2RB with schizophrenia in Irish family and case—control samples. Molecular psychiatry. 2008;13(10):930–8. doi: 10.1038/ pmid:17667962. Pubmed Central PMCID: 4034748.
  51. 51. Onizuka T, Shimizu H, Moriwaki Y, Nakano T, Kanai S, Shimada I, et al. NMR study of ligand release from asialoglycoprotein receptor under solution conditions in early endosomes. The FEBS journal. 2012;279(15):2645–56. doi: 10.1111/j.1742-4658.2012.08643.x pmid:22613667.
  52. 52. Shi J, Badri KR, Choudhury R, Schuger L. P311-induced myofibroblasts exhibit ameboid-like migration through RalA activation. Experimental cell research. 2006;312(17):3432–42. doi: 10.1016/j.yexcr.2006.07.016 pmid:16934802.
  53. 53. Tan JL, Peng X, Luo GX, Ma B, Cao C, He WF, et al. Investigating the Role of P311 in the Hypertrophic Scar. PloS one. 2010;5(4). doi: 10.1371/journal.pone.0009995 pmid:20404911.
  54. 54. Sun W, Yao ZH, Zhan RX, Zhang XR, Cui YY, Tan JL, et al. [Effects of P 311 on the migration of epidermal stem cells in mice with superficial partial-thickness burn and injured cell model in vitro]. Zhonghua shao shang za zhi = Zhonghua shaoshang zazhi = Chinese journal of burns. 2012;28(3):213–8. pmid:22967977.
  55. 55. Hah YS, Noh HS, Ha JH, Ahn JS, Hahm JR, Cho HY, et al. Cathepsin D inhibits oxidative stress-induced cell death via activation of autophagy in cancer cells. Cancer letters. 2012;323(2):208–14. doi: 10.1016/j.canlet.2012.04.012 pmid:22542809.
  56. 56. Olson H, Betton G, Robinson D, Thomas K, Monro A, Kolaja G, et al. Concordance of the toxicity of pharmaceuticals in humans and in animals. Regulatory toxicology and pharmacology: RTP. 2000;32(1):56–67. doi: 10.1006/rtph.2000.1399 pmid:11029269.
  57. 57. Tamaki C, Nagayama T, Hashiba M, Fujiyoshi M, Hizue M, Kodaira H, et al. Potentials and limitations of nonclinical safety assessment for predicting clinical adverse drug reactions: correlation analysis of 142 approved drugs in Japan. The Journal of toxicological sciences. 2013;38(4):581–98. pmid:23824014.
  58. 58. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature reviews Genetics. 2010;11(10):733–9. doi: 10.1038/nrg2825 pmid:20838408. Pubmed Central PMCID: 3880143.