• Loading metrics

Translating Clinical Findings into Knowledge in Drug Safety Evaluation - Drug Induced Liver Injury Prediction System (DILIps)

  • Zhichao Liu,

    Affiliation Division of Systems Biology, National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, Arizona, United States of America

  • Qiang Shi,

    Affiliation Division of Systems Biology, National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, Arizona, United States of America

  • Don Ding,

    Affiliation ICF International Company at FDA's National Center for Toxicological Research, Jefferson, Arizona, United States of America

  • Reagan Kelly,

    Affiliation ICF International Company at FDA's National Center for Toxicological Research, Jefferson, Arizona, United States of America

  • Hong Fang,

    Affiliation ICF International Company at FDA's National Center for Toxicological Research, Jefferson, Arizona, United States of America

  • Weida Tong

    Affiliation Division of Systems Biology, National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, Arizona, United States of America

Translating Clinical Findings into Knowledge in Drug Safety Evaluation - Drug Induced Liver Injury Prediction System (DILIps)

  • Zhichao Liu, 
  • Qiang Shi, 
  • Don Ding, 
  • Reagan Kelly, 
  • Hong Fang, 
  • Weida Tong


Drug-induced liver injury (DILI) is a significant concern in drug development due to the poor concordance between preclinical and clinical findings of liver toxicity. We hypothesized that the DILI types (hepatotoxic side effects) seen in the clinic can be translated into the development of predictive in silico models for use in the drug discovery phase. We identified 13 hepatotoxic side effects with high accuracy for classifying marketed drugs for their DILI potential. We then developed in silico predictive models for each of these 13 side effects, which were further combined to construct a DILI prediction system (DILIps). The DILIps yielded 60–70% prediction accuracy for three independent validation sets. To enhance the confidence for identification of drugs that cause severe DILI in humans, the “Rule of Three” was developed in DILIps by using a consensus strategy based on 13 models. This gave high positive predictive value (91%) when applied to an external dataset containing 206 drugs from three independent literature datasets. Using the DILIps, we screened all the drugs in DrugBank and investigated their DILI potential in terms of protein targets and therapeutic categories through network modeling. We demonstrated that two therapeutic categories, anti-infectives for systemic use and musculoskeletal system drugs, were enriched for DILI, which is consistent with current knowledge. We also identified protein targets and pathways that are related to drugs that cause DILI by using pathway analysis and co-occurrence text mining. While marketed drugs were the focus of this study, the DILIps has a potential as an evaluation tool to screen and prioritize new drug candidates or chemicals, such as environmental chemicals, to avoid those that might cause liver toxicity. We expect that the methodology can be also applied to other drug safety endpoints, such as renal or cardiovascular toxicity.

Author Summary

Translational research involves utilization of clinical data to address challenges in drug discovery and development. The rationale behind this study is that the side effects observed in clinical trial and post-marketing surveillance can be translated into a screening system for use in drug discovery. As a proof-of-concept study, we developed an in silico system based on 13 hepatotoxic side effects to predict drug-induced liver injury (DILI), which is one of the most frequent causes of drug failure in clinical trial and withdrawal from post-marketing application, and also one of the most difficult clinical endpoints to predict from preclinical studies. We first identified 13 types of liver injury which yielded high prediction accuracy to distinguish drugs known to cause DILI from these don't. To effectively apply these 13 hepatotoxic side effects to the drug discovery process for DILI, we developed in silico models for each of these side effects solely based on chemical structure data. Finally, we constructed a DILI prediction system (DILIps) by combining these 13 in silico models in a consensus fashion, which yielded >91% positive predictive value for DILI in humans. The DILIps methodology can be extended in applications for addressing other drug safety issues, such as renal and cardiovascular toxicity.


Drug-induced liver injury (DILI) poses a significant challenge to medical and pharmaceutical professionals as well as regulatory agencies. It is the leading cause of acute liver failure, which has a high mortality rate (30%) as treatment is limited due to the availability of livers for transplantation [1]. Although many dangerous drugs are identified during animal testing thus protecting humans from this damage, a consortium determined that about half of the drugs that cause human hepatotoxicity were not identified as having this potential in nonclinical animal testing [2]. Many drugs have been withdrawn from the market or have received restrictions and warnings due to DILI [3]. DILI information and guidance for pharmaceutical industries has been released by regulatory agencies such as the U.S. Food and Drug Administration (FDA) (, European Medicines Agency (EMA) ( and Health Canada (, highlighting both the significance and difficulties in DILI research. In the FDA, the Critical Path Initiative identified DILI as a key area of focus in a concerted effort to broaden the agency's knowledge for better evaluation tools and safety biomarkers (

Determining the potential for a drug candidate to cause DILI in humans is a challenge. First, the standard pre-clinical animal studies do not effectively predict DILI events in humans. In one notorious example, five subjects in a phase 2 clinical trial experienced fatal hepatotoxicity induced by fialuridine, an investigational nucleoside analogue that showed no liver damage in animal studies [4]. Out of 221 pharmaceuticals, the overall concordance of liver toxicity in humans and experimental animals is as low as 55%, which is in sharp contrast with the concordance of other target organs such as the hematological (91%), gastrointestinal (85%), and cardiovascular (80%) systems [2]. Secondly, even well-controlled clinical trials fail to accurately predict post-marketing DILI events. The main reason for this is the statistical power of the trials – the risk of severe DILI of an idiosyncratic nature is very low per exposed subject, while clinical trials are usually carried out with only several thousand patients [5], rendering them significantly underpowered to predict rare DILI events.

To enhance the predictability of DILI, novel approaches have been explored by many researchers. Notable examples include (a) development of new DILI biomarkers [6], (b) introduction of high-content screening [7], (c) adoption of more sensitive animal models [8], [9], [10], and (d) utilization of toxicogenomics [11]. Most of these investigations are focused on developing biomarkers using either animal or in vitro models for predicting DILI in humans. This still would involve synthesis of the drug and elaborate testing. An in silico approach could inform chemists at the earliest point in the drug discovery pipeline and enable them to select the best chemical structures.

We hypothesized that there exists a distinct set of liver side effects that can be used to characterize the DILI risk of drugs in humans. We identified 13 types of hepatotoxicity (hepatotoxic side effects or HepSEs) from the organ levels of hepatobiliary disorders in the Medical Dictionary for Regulatory Activities (MedDRA) ontology ( We found that these 13 HepSEs can discriminate DILI drugs from non-DILI drugs with high accuracy (∼83%). Since the side effects are clinical observations obtained either from clinical trials or from post-marketing surveillance with limited utility in drug discovery, we developed quantitative structure-activity relationship (QSAR) models for each of the HepSEs. We then constructed a DILI prediction system (DILIps) based on the 13 HepSE models with an improved prediction strategy using a “Rule of Three” (RO3) criterion (incriminated by 3 or more HepSE models). The systems were evaluated in several external test sets with performance surpassing most in silico models in the field. We screened the entire drug list using the DILIps and evaluated the RO3 drugs in terms of therapeutic use and drug targets.


Identification and assessment of hepatotoxic side effects (HepSEs)

Figure 1 is an overview of the approach taken. First, the identification and assessment of HepSEs were performed. We used the SIDER database [12] to identify drugs and associated side effects. Out of 1450 side effects in the database, we selected only those that were caused by more than 20 drugs (an arbitrary cut-off). This yielded 473 side effects. The distribution of 888 drugs over 473 side effects and vice verse were depicted in Supplementary Figure S1, indicating that over 90% drugs were associated with at least 10 side effects. These side effects were then directly mapped onto low level terms of MedDRA. The terms were linked to the system organ classes (SOC) level according to the hierarchical structure of MedDRA (Supplementary Table S1) in order to determine the terms' attributes at the organ level. Finally, we considered side effects defined by the MedDRA ontology as related to the hepatobilliary disorders SOC term as HepSEs, and identified 13 HepSEs: bilirubinemia, cholecystitis, cholelithiasis, cirrhosis, elevated liver function tests, hepatic failure, hepatic necrosis, hepatitis, hepatomegaly, jaundice, liver disease, liver fatty, and liver function tests abnormal.

Figure 1. Overview of the workflow of DILIps development and its evaluation.

We evaluated these 13 HepSEs for their ability to differentiate drugs that do and do not cause DILI using the Liver Toxicity Knowledge Base Benchmark Dataset (LTKB-BD) [13] and PfizerData [14]. For both datasets, we used only the drugs that they had in common with SIDER. There are several differences between two datasets to call a drug as DILI or non-DILI (see Materials and methods), including (1) LTKB-BD is based on the FDA-approved drug labeling while PfizerData is according to the case reports; (2) two datasets apply different criteria for DILI assessment; and (3) drugs are grouped differently between two datasets. To obtain an objective evaluation for 13 HepSEs, we took the following actions to select DILI positive and negative drugs from two datasets: (a) in LTKB-BD [13], Most-DILI-Concern drugs were classified as positive while No-DILI-Concern drugs were classified as negative; and (b) in PfizerData [14], drugs with evidence in human toxicity were considered DILI positive while drugs with no evidence in any species were considered DILI negative. Defining a drug as causing DILI if it was positive in any of the 13 HepSEs, this approach yielded 91% and 74% accuracy for LTKB-BD and PfizerData, respectively.

It is important to note that the 26 MedDRA SOCs are not all strictly related to human organs in a conventional sense. For example, “investigations” and “general disorders and administration site conditions” are not organs (the complete list of MedDRA SOC is available in Supplementary Table S1). Some side effects with DILI indication are resided in a SOC other than the hepatobilliary disorders. For example, the SOC of “investigations” include the “elevate liver enzyme” and “alkaline phosphatase increased”, both are conventional DILI indicators. Moreover, some side effects in the SOC of “general disorders and administration site conditions” could also be the manifestations of DILI. Thus, we conducted a permutation test with the purposes of confirming that the 13 HepSEs do in fact have significant performance over the chance to distinguish DILI drugs from non-DILI drugs. We randomly selected 3, 5,…, 21 side effects from the 473 side effect pool with each selection repeated 20,000 times. As shown in Figure 2, the classification accuracy of the 13 HepSEs, indicated by the red dot, was considerably higher than the average accuracy for each of the sets of randomly selected side effects, demonstrating that the observed classification accuracy for the 13 HepSEs was not due to chance.

Figure 2. Box plot of classification accuracy with the number of selected side effects using a permutation test.

(a) The test consisted of 128 drugs with a ratio of 69 DILI positives versus 59 DILI negatives in the LTKB-BD, and (b) 258 drugs with a ratio of 168 DILI positive drugs and 90 DILI negative drugs in PfizerData. Given a randomly selected number of side effects, a drug showing positive in any of the side effects was considered as a DILI positive drug. The process was repeated 20,000 times for each of the selected number of side effects. The red dot denotes the data based on the 13 HepSEs selected from the MedDRA hepatobilliary disorders category.

DILI prediction system (DILIps)

As illustrated in step 2 (Figure 1), QSAR models were developed for each of the 13 HepSEs to enable their use in screening new drug candidates computationally. The QSAR models developed from the drugs related to each of these 13 HepSEs had high prediction accuracy (>93%) in a leave-one-out cross-validation (LOO-CV) process (Table 1).

Table 1. Performance of leave-one out cross-validation for the 13 HepSE models.

Based on the 13 HepSE models, we further developed the DILIps (step 3 of Figure 1, left box). Using the same classification rule described above (i.e., drugs incriminated by any of the 13 HepSEs models are considered as DILI positives), we applied DILIps to three external validation sets. The validation sets of LTKB-BD and PfizerData contain drugs not used in developing the 13 HepSE models. For the O'Brien et al. dataset [15], the severe and moderate hepatotoxicity drugs were combined as DILI positive drugs while the non-toxic drugs were defined as DILI negative drugs (only the drugs not used by the 13 HepSE models were included). As summarized in Table 2, the DILIps exhibited a reasonable prediction performance for three external validation sets with the prediction accuracy between 60–70%.

Table 2. Performance of DILI prediction system (DILIps) on three literature datasets.

Development of the “Rule of Three” criterion in DILIps

Identifying the drugs of severe DILI potential with high confidence has an important application since these drugs are likely withdrawn from the market or restricted in use with black box warning (BBW) due to the serious public health concern. We assume that the number of models calling a drug causing DILI is positively correlates with the drug's severity for DILI and to the extension of the confidence to make such a call. We generated a union set based on the three validation datasets listed in Table 2. We removed three drugs having an inconsistent DILI assignment among three datasets (only three drugs were removed: moxisylyte, carbidopa and terfenadine), i.e., positive in one dataset and negative in another. This process resulted in 145 DILI positives and 63 DILI negatives (see Supplementary Table S2). We used this union set to assess how many HepSE models to be combined likely identify high risk DILI drugs (i.e., withdrawal or BBW drugs) with high positive predictive value (PPV).

Specifically, for each of the possible HepSE combination models requiring a drug to be incriminated by N HepSE models (“Rule of N”), we calculate PPV and the number of drugs retained by each of the HepSE combination models. As depicted in Figure 3, the PPV reaches a maximum of 91.3% when N = 3. Additionally, the percentage of high risk DILI drugs reached a local maximum at N = 3. Therefore, we established the RO3 criterion in the DILIps for identifying drugs that might cause severe DILI with high confidence (step 3 of Figure 1, right box). The number of drugs meeting the RO3 is 23, dramatically decreased from 100 (RO1) and 49 (RO2), which was expected when the optimization was tilted toward increasing PPV. In order to identify the drugs of severe DILI potential with high confidence, the trade-off was accepted in the context of an application. Therefore, the RO3 was selected to carry out further study.

Figure 3. The evaluation of the “Rule of Three” (RO3).

The predicted positive value, percentage of withdrawn or boxed warning (BW) drugs, and the number of drugs meeting the “Rule of N” for different values of N in the combined HepSE model.

DILI potential varies for different therapeutic categories

We applied the RO3 criterion to the drugs (small molecules only) in DrugBank to investigate which therapeutic categories were most likely associated with DILI (represented by the graph at the right of step 4, Figure 1). Figure 4(a) shows the drug distribution across 14 therapeutic categories as defined by Anatomic Therapeutic Class (ATC) ( with the RO3 positive drugs highlighted in red. The enrichment of the RO3 drugs in each therapeutic category was determined by Fisher's exact test. We found that two therapeutic categories (i.e., anti-infective for systemic use and musculoskeletal system drugs) were significantly associated with drugs that cause DILI (p-value = 5.00E-11 and 0.002, respectively). To confirm the findings, we carried out the same analysis for drugs in the SIDER database that met the RO3. As shown in Figure 4(b), the same two therapeutic categories were also found to be significantly associated with drugs that cause DILI (p-value = 8.94E-8 and 2.36E-7, respectively). Both results demonstrated that care must be taken when drugs are developed with existing targets in these two categories.

Figure 4. The distribution of small molecular drugs in (a) DrugBank and (b) SIDER that satisfy the “Rule of Three” (RO3) at the first level of Anatomical Therapeutic Chemical Classification System (ATC).

The probability of the presence of DILI drugs is statistically significant in two therapeutic categories (J and M).

The findings are consistent with real-world observations; for example, non-steroidal anti-inflammatory drugs (NSAIDs, a subcategory of anti-infectives for systemic use) are often associated with DILI. A good example is didanosine (Videx®) which is an antiviral drug used to treat human immunodeficiency virus (HIV) infection. On Jan 29th, 2010, the FDA notified healthcare professionals and patients about a rare but serious complication in the liver known as non-cirrhotic portal hypertension in patients using the drug. Subsequently, a black box warning was added to the drug label to warn doctors and consumers of this risk. Didanosine can also cause lactic acidosis and severe hepatomegaly with steatosis, and has resulted in several fatal cases (

Associating the protein targets with DILI potential via network analysis

It was important to determine if the drug target is related to the drug's likelihood of causing DILI. Accordingly, we investigated the drugs that were RO3 positive from DrugBank in the target space using network analysis as illustrated on the left side of Figure 1, step 4. These drugs were associated with 134 human protein targets. In the network analysis, we considered that two protein targets are directly related (connected with an edge in network analysis) if one or more drugs were associated with both targets.

As depicted in Figure 5, the network contains two large modules (Modules #1 and #2) with several small modules. There are 72 targets in Module #1 associated with 125 RO3 positive drugs, and 23 targets in Module #2 associated with 8 drugs. We conducted toxicity function and pathway analyses using Ingenuity Pathway Analysis (IPA, for both modules. In each module, particularly Module #1, the biological functions related to disease and disorder were investigated to assess if the targets of the drugs meeting the RO3 have a relationship with hepatic system diseases or disorders. As shown in Table 3, liver injury and disease related functions enriched in Module #1 were hepatic system disorder, jaundice, liver cancer, hepatocellular carcinoma, and hepatitis C. All the liver injury and disease functions are under the hepatic system diseases branch of the top toxicity functions in IPA. The other significant toxicity functions of Module #1 can be found in Supplementary Table S3.

Figure 5. Target network of corresponding drugs satisfying the “Rule of Three”.

Table 3. Summary of significant functions and pathways for module #1 identified in the network analysis of DILI targets using IPA.

We also found that every drug in the two largest modules was associated with more than three targets on average. Note that drugs are prone to having multiple side effects if they interact with multiple targets since different targets may invoke different side effects [16], [17]. We conducted text mining to verify the association of 13 HepSEs and 134 targets identified by RO3 positive drugs. We identified 45 proteins associated with eight HepSEs in a co-occurrence analysis (Figure 6 and Supplementary Table S4). Most of these targets are associated with hepatitis, while targets such as PTGS2/COX-2 (prostaglandin-endoperoxide synthase 2) and ABCD1 (ATP-binding cassette, sub-family, and member 1) are related to multiple HepSEs.

Figure 6. Text mining results to associate types of DILI (columns) with protein targets (rows).

The number of co-occurrences (papers) between a target and a side effect type is indicated in the cell. In each cell, the total number of reports as well as the normalized value (shown in parenthesis) is provided. The normalized value is the ratio of the number of co-occurrence reports divided by the total number of reports for a protein target.


One application of translational science involves utilization of clinical data to address challenges in drug discovery. The key concept of this study is that the side effects observed in clinical trials and post-marketing surveillance can be translated for use in drug discovery. As a proof-of-concept study, we developed the DILIps to address one of the most difficult clinical endpoints to predict from preclinical studies, that is DILI. The DILIps contains three distinct and sequential approaches. First, we identified 13 HepSEs based on the MedDRA ontology, which provided excellent discrimination of a drug's potential to cause DILI (91% and 74% accuracy for LTKB-BD and PfizerData, respectively). Secondly, HepSE-based QSAR models were developed by using all 888 drugs in SIDER, which were highly predictive as compared to published models [14], [18], [19] and offered a robust translation of clinical observation (i.e., side effects) using in silico techniques to the drug discovery/preclinical testing aspect of drug development. Next, we developed DILIps by combining these 13 HepSE QSAR models, which yielded 60–70% prediction accuracy for three independent validation sets. Lastly, a RO3 criterion was implemented in DILIps, which had >91% confidence for identification of drugs that might cause severe DILI.

The DILIps is a modular system; each of its components can be replaced by other methods or constructed using different variables. For example, besides selecting 13 HepSEs from the hepatobiliary disorders category in MedDRA, we also examined the effect of including additional two DILI related terms from the investigation SOC category, or selecting 14 DILI relevant terms as suggested by an expert (Supplementary Table S5). Both yielded similar performance compared to the 13 HepSE-based approach. Given the fact that each MedDRA category is a stand-alone ontology and other options did not yield exceptional performance, we choose the terms under hepatobiliary disorders as representative types of DILI in this study. For the second component of the DILIps, we developed HepSE-based QSAR models because chemical structure data were readily available for the entire set of 888 drugs in SIDER, providing a sufficiently large sample from which to build the HepSE-based models. Other technologies, such as gene expression microarrays, might be able to construct better HepSE models. However, the data from these technologies was not available for the complete set of SIDER drugs. With different choices in components 1 and 2, the criterion in component 3 of DILIps could be altered to optimize DILI classification using different consensus approaches instead of RO3. Therefore, the DILIps is subject to change and improvement when new data, technology, and knowledge are available.

Development of predictive models for drugs that might cause DILI in humans has been an active research field, with much of the work being done using QSARs. However, the DILI labels used in these studies are from different sources, some focused on case reports and others developed using text mining. Furthermore, the methods used to develop the models are also different. Thus, it is difficult to compare these methods. For example, Greene et al. [14] developed Derek for Windows (DfW), a knowledge-based expert system, to predict a drug's potential to cause DILI using the DILI classification scheme developed by Pfizer. The system has 56% overall accuracy with 73% specificity and 46% sensitivity. Fourches et al. [18] applied text mining for DILI reported in different species using MEDLINE abstracts, suggesting that the concordance of liver effects is low (i.e., 39–44%) between different species. They also developed QSAR models using a text mining approach to define DILI classification with external prediction accuracies ranging from 56 to 73%. Very recently, Ekins et al. [19] developed a Bayesian model based on DILI endpoint from cellular imaging predictions [7], which gave a concordance of 60%, sensitivity of 56%, and specificity of 67%.

Development of DILI models for humans is always confronted by two distinct but related challenges: (1) a comprehensive drug list with DILI annotation is usually not available, and (2) there is no authoritative assessment of whether a drug causes DILI or not. In this study, we compiled three large datasets from our LTKB project. We used only the drugs of the opposite extremes in DILI classification (positive or negative in relationship to DILI) by removing drugs with ambiguous call. The RO3 criterion of DILIps reached >91% positive predictive value for a combined drug list from these three literature datasets. We also applied DILIps for the drugs with ambiguous call and the results are available from Supplementary Table S6.

The translation of clinical observations to evaluation of drugs earlier in the drug development pipeline is a goal of translational medicine [20]. DILI is an endpoint influenced by several important factors, and it is difficult to adequately predict with a single model. The SIDER database has collected clinical observation data (side effects) from drug labels and the scientific literature, which allows the linkage of disease endpoints and related symptom profiles. This, in turn, provides an opportunity to combine drug information and patient information into a unified prediction method, a focus of this study. The HepSEs provide a new direction to predict DILI based on the consensus of multiple clinical endpoints (side effects) using an in silico method.

Elucidation of therapeutic uses, drug targets, and pathways related to DILI from a systematic perspective is of great use in drug discovery and pharmacovigilance. By applying the RO3 criterion to the entire drug space defined by DrugBank, we constructed a DILI landscape in terms of therapeutic and drug target space. We do acknowledge that the findings from this excise are dependent on the accuracy in annotation in DrugBank.

We identified two therapeutic categories (i.e., anti-infectives for systemic use and musculoskeletal system drugs) in which the drugs have a high risk for causing DILI. This is consistent with the general understanding that, for example, NSAIDs (a subcategory of anti-infectives for systemic use) are often associated with DILI and have been subject to a broad range of studies looking into drug-specific, therapeutic class-specific, and genetic-specific effects [21]. Another possibility is that these drugs may have higher exposure rates; they are widely used by many people over prolonged periods, which may inadvertently increase the risk of DILI. The RO3 positive criterion was able to identify most “bad actors” among NSAIDs including celecoxib, diclofenac, diflunisal, ibuprofen, leflunomide, and rofecoxib. Most of them are PTGS2 (COX-2) protein inhibitors. This gene is also involved in several hepatic system pathways such as hepatic system disorder, liver cancer, and hepatocellular carcinoma. COX (Cyclooxygenase) is an enzyme that is responsible for formation of important biological mediators called prostanoids. Pharmacological inhibition of COX can provide relief from symptoms of inflammation and pain. However, more and more reports indicated that the selective inhibition profile of COXs can cause certain serious adverse drug reactions. A classic example is rofecoxib (brand name Vioxx®), which was withdrawn in 2004 because of the risk of heart attack caused by selective inhibition of COX-2. Rofecoxib was also associated with DILI [22]. Another example is lumiracoxib, a selective COX-2 inhibitor developed for the symptomatic treatment of osteoarthritis and acute pain. Concern over hepatotoxicity has contributed to the withdrawal or non-approval of lumiracoxib in most major drug markets worldwide [23]. Therefore, the study of the relationship between drug target and DILI, such as COX selectivity and DILI, may provide new insights into DILI at a molecular level [24].

We also found that DILI drugs often involve multiple targets, which is often associated with drugs applied in multiple therapeutic categories [25]. Drugs interacting with multiple targets are considered “dirty” since they have a potential to initiate different adverse reactions. On the other hand, these drugs may also hold the potential to be repositioned for use outside of their original therapeutic indications. One such example is diclofenac, which is used to relieve pain, tenderness, swelling and stiffness caused by osteoarthritis, rheumatoid arthritis, and ankylosing spondylitis. Diclofenac is labeled with four different ATC codes (i.e., four different therapeutic uses) and associated with a number of targets categorized by DrugBank, including prostaglandin G/H synthase 1 and 2, the cytochrome P450 family (2C18/2E1/2C19/1A2/2C8/2D6/2C9/3A4/1A1/2B6), the UDP-glucuronosyltransferase family (1–1,2B7), prostaglandin G/H synthase 1, etc. Several case-control studies have been carried out to investigate the role of polymorphisms in the gene encoding regions of the aforementioned drug-metabolizing enzymes and transporters to determine susceptibility to diclofenac-induced hepatotoxicity [26], [27], [28], [29], [30]. Diclofenac has been withdrawn in several countries due to liver injury and other adverse drug reactions, including ulcers, bleeding, and ulcerations in the stomach and intestinal linings [31]. Diclofenac induced liver injury causes a number of side effect patterns, including cirrhosis, hepatic failure, hepatic necrosis, hepatitis, jaundice, all of which were included in our set of 13 HepSEs.

DILI is associated with two distinct but related parameters: drug properties and patient susceptibility. Some drugs are more likely to cause DILI, while some patients are more likely to show DILI. The DILIps is primarily capable of addressing the former challenge with an aim to enhance DILI identification in drug discovery. Identifying genetic variations and their associated protein products that contribute to DILI is another important research area, but one that requires the costly and time-consuming collection of samples from large numbers of affected individuals. Study of the genetic risk factors to DILI or other conditions usually involves the identification of genes associated with key disease mechanisms and immunological reactions using genotyping approaches. The network analysis conducted in this study connected DILI drugs with pathways and targets and might contribute to the identification of mechanisms that relate a patient's genetic predisposition and DILI. There are a small number of genetic risk factors identified for DILI, most are associated with a drug interaction with a specific HLA (human leukocyte antigen system) polymorphism within the major histocompatibility complex (MHC) such as lumiracoxib (HLA-DRB1*15∶01) [23], antituberculosis chemotherapy (HLA-DQB1*02∶01) [32], ticlopidine (HLA-A*33∶03) [33], ximelagatran (HLA-DRB1*07∶01) [34], flucloxacillin (HLA-B*57∶01) [21], and amoxicillin-clavulanate (HLA-DRB1*15∶01) [35]. Other genetic risk factors such as those involving drug metabolizing enzymes are exemplified by CYP2C8*4 (diclofenac), CYP2E1*1A (isoniazid), GSTT1-M1 (troglitazone), and UGT2B7*2 (diclofenac) are also reported [36], [37], [38].

Drug safety is a key area of focus in the FDA. Modernizing safety evaluation has been advocated by the FDA's recent initiative on advancing regulatory science with a proposal of incorporating both in vitro and in silico methodologies in drug development and safety assessment [39]. The DILIps follows the same philosophy that underlies this new initiative at the FDA. It could be a predictive system for FDA to utilize and reference when hepatotoxicity issues arise during the various stages of the regulatory review process. It could also serve as a proof-of-concept approach of using predictive systems for drug safety to support the FDA's regulatory science. While the DILIps was developed for DILI, its methodology can be applied equally well to address other drug safety issues, such as renal and cardiovascular toxicity.

Materials and Methods

Preparation of datasets

SIDER database.

SIDER is computer-readable database of side effects which connects 888 drugs with 1450 different side effect terms [12]. The side effects were extracted from drug labels in either Structured Product Labeling (SPL) or Portable Document Format (PDF) documents. The standardized Coding Symbols for a Thesaurus of Adverse Reaction Terms (COSTART), a part of the Unified Medical Language System (UMLS) Metatheasaurus, was used as the basic lexicon of side effects. In this study, we downloaded the entire database from We then constructed a matrix with 888 drugs corresponding to 1450 side effects with supplementing the chemical structure data.


DrugBank ( is a richly annotated database of drugs and drug target information [40], [41]. It contains extensive information about nomenclature, chemistry, structure, function, mode of action, pharmacology, pharmacokinetics, metabolism, and pharmaceutical properties of both small molecule and large molecule (biotech) drugs. The updated DrugBank 3.0 contains 6,800 drug entries including 1,400 FDA-approved small molecule drugs, 132 FDA-approved biotech (protein/peptide) drugs, 82 nutraceuticals and 5,200 experimental drugs. In additional, more than 4,300 non-redundant protein (i.e. drug target) sequences are linked to these drug entries [42].

In this study, information about 6620 small molecule drugs (1,400 FDA drugs and 5,200 experimental drugs) was retrieved including chemical structure, approval status, therapeutic categories and protein targets for use to generate the DILI landscape in terms of therapeutic uses and drug targets.

LTKB benchmark dataset (LTKB-BD).

As a part of the LTKB project, a research team from the FDA's National Center for Toxicological Research has developed the LTKB-BD dataset that contains 287 drugs with DILI annotation based on the FDA-approved drug labels. The data are available from [13]. The drugs are classified into three categories: those of Most-DILI-Concern, Less-DILI-Concern, and No-DILI-Concern. In this study, only those in the Most-DILI-Concern (gemtuzumab was excluded since it is a biotechnology product) and No-DILI-Concern categories were used. The dataset was divided into two sets. One set overlapped with the SIDER database and contained 69 drugs of Most-DILI-Concern and 59 No-DILI-Concern. This was used to evaluate the performance of HepSEs. The rest of the LTKB-BD contained 67 drugs of Most-DILI-Concern and 6 of No-DILI-Concern that were not in SIDER and were used to validate the DILIps performance (Supplementary Table S6).

Pfizer hepatotoxicity dataset (PfizerData).

Another independent test set comes from part of the Derek for Windows (DfW) system [14], which is a knowledge-based expert system designed to assess the potential toxicity of a chemical from its structure. A total of 626 compounds were classified into four categories based on case reports, including evidence of human hepatotoxicity (HH), no evidence of hepatotoxicity in any species (NE), weak evidence (<10 case reports) of human hepatotoxicity and evidence for animal hepatotoxicity but not tested in humans. In this study, only HH and NE drugs were used, As a result, there were 406 drugs remaining; 168 HH (positive) and 90 NE (negative) overlapped with the SIDER database. The other 92 HH (positive) and 56 NE (negative) that were not contained in the SIDER database were selected as another independent test set (Supplementary Table S6).

O'Brien et al. dataset.

O'Brien et al. classified drugs into four categories according to the severity of human hepatotoxicity based on the frequency of an observed increase in ALT and other evidence [15]. In this study, the categories of “Severely” and “Moderately” hepatotoxic drugs were considered DILI positive drugs while non-toxic drugs were considered DILI negative, and those that did not overlap with the SIDER database were employed. The ratio of positive to negative drugs was 25/15 (Supplementary Table S6).

Data analysis method

Identification and assessment of Hepatotoxic Side Effects (HepSEs).

This section is shown as step 1 in Figure 1. There are 1450 different side effects listed in the SIDER database. We identified 473 side effects for HepSE identification, with each side effect associated with more than 20 drugs. We used MedDRA to identify HepSEs. MedDRA is an ontology that provides a controlled vocabulary describing adverse events. The 473 side effect terms were mapped to the System Organ Class (SOC) level of hepatobiliary disorders in MedDRA to extract the HepSEs (Supplementary Table S1). The drugs in LTKB-BD which overlapped with drugs in SIDER (128 total drugs) as well as those in PfizerData which overlapped with drugs in SIDER (258 total drugs) were employed to assess the performance of HepSEs. If a drug was associated with any HepSE as observed in the SIDER database, it was considered as DILI positive. To determine if the predictive performance of the 13 HepSE models was better than would be expected by chance alone we randomly selected a set of M side effects (M = 3,5,…,21) and used these to predict DILI potential. The selection process for each M was repeated 20,000 times, and the average performance of each M was compared to the performance of the 13 HepSEs.

DILI prediction system (DILIps).

Development of DILIps consists of two steps (steps 2 and 3 of Figure 1). In step 2, all of the drugs were transformed into well-established functional class fingerprints (FCFP_6), structural fingerprint developed by Pipeline Pilot 8.0 from SciTegic ( It has been shown in other studies that Bayesian models built using circular fingerprints work very well in virtual screening tasks [14], [43], [44], [45], [46], [47]. Then, multiple-category naïve Bayesian classifiers were trained for each of the selected HepSE endpoints. In the training set, leave-one-out cross-validation (LOO-CV) was employed to investigate the model performance. For each model, a receiver operating characteristic plot (ROC plot) was drawn to select the best Bayesian score (cut-off value) to distinguish DILI drugs and non-DILI drugs.

In step 3, the independent test sets were submitted to the 13 HepSE models to calculate the Bayesian scores and give the prediction results: For each HepSE endpoint, the predicted Bayesian scores (PB-SCOREi, i = 1, 2…n) compared to cut-off Bayesian score obtained in step 2. If PB-SCOREi > cut-off value, the drug was considered positive for this endpoint and vice versa. A drug was considered to have the potential to cause DILI if any of the HepSE endpoints was called as positive (the left side of Figure 1, step 3). In the right side of Figure 1 (step 3), consensus prediction strategies were used to investigate the effectiveness of combining results from multiple HepSEs into a single prediction. A “Rule of N” strategy was evaluated, where 13 separate consensus prediction strategies were examined with each predicting a drug as causing DILI if N (N = 1,2,…,13) HepSEs were positive for that drug.

DILI landscape.

Three sets of analysis were conducted, which is summarized in step 4 of Figure 1. The right side of Figure 1 is to assess the relationship of therapeutic use and DILI potential of RO3 drugs. The Anatomical Therapeutic Chemical (ATC) codes [48] for small molecule drugs which meet the “Rule of Three” were extracted for this analysis.

The right side of Figure 1 is to assess the association of protein targets and DILI potential of RO3 drugs. The protein targets associated with small molecule drugs which meet the RO3 criterion were obtained from DrugBank 3.0. There are 4437 different protein targets from different organisms, and only the human protein targets were selected. The protein target network was built by considering two protein targets as connected if at least one drug was associated with both targets. Two large modules were identified using the SCAN algorithm, which is used to find modules in the network [49]. The protein targets in these two large modules were submitted to Ingenuity Pathway Analysis (IPA) software ( for pathway analysis. In addition, a text mining with co-occurrence analysis [50] was also employed to verify the protein target and HepSE relationship from the network analysis. In this analysis, the number of papers in PubMed that links a target with a HepSE in a co-occurrence analysis was extracted. Since some proteins are more extensively studied than others, the number of papers associating the protein to the HepSE was normalized by dividing the number of co-occurrence reports by the total number of reports of the related protein.

Supporting Information

Figure S1.

The distribution of 888 drugs over 473 side effects and vice versa.



Table S1.

Information for the 473 side effects. Information included: (1) the number of drugs involved; (2) system organ classes (SOC) number and annotation of MedDRA; and (3) the distribution of 473 side effects in the SOC levels of MedDRA.



Table S2.

Information of the validation set from three literature datasets (i.e., LTKB-BD, PfizerData, and O'Brien et al.) for drugs that do not overlap with SIDER.



Table S3.

The top toxicity functions of Module 1.



Table S4.

The literature proof about the co-occurrence between the HepSE terms and protein target. The EntrezGene ID and PubMed ID (which can be linked to PubMed directly) are provided.



Table S5.

The QSAR models performance for the 13 HepSEs and an additional two terms from the Investigation category, and for 14 HepSEs as suggested by an expert.



Table S6.

Information for datasets. Datasets include: (1) SIDER, (2) LTKB-BD, (3) PfizerData, (4) O'Brien et al., and (5) Small molecules in DrugBank.



Author Contributions

Conceived and designed the experiments: ZL WT. Performed the experiments: ZL QS. Analyzed the data: ZL. Contributed reagents/materials/analysis tools: ZL WT. Wrote the paper: ZL QS RK HF WT DD.


  1. 1. Lee WM, Squires RH Jr, Nyberg SL, Doo E, Hoofnagle JH (2008) Acute liver failure: Summary of a workshop. Hepatology 47: 1401–1415.
  2. 2. Olson H, Betton G, Robinson D, Thomas K, Monro A, et al. (2000) Concordance of the toxicity of pharmaceuticals in humans and in animals. Regul Toxicol Pharmacol 32: 56–67.
  3. 3. Kaplowitz N (2005) Idiosyncratic drug hepatotoxicity. Nat Rev Drug Discov 4: 489–499.
  4. 4. McKenzie R, Fried MW, Sallie R, Conjeevaram H, Di Bisceglie AM, et al. (1995) Hepatic failure and lactic acidosis due to fialuridine (FIAU), an investigational nucleoside analogue for chronic hepatitis B. N Engl J Med 333: 1099–1105.
  5. 5. Navarro VJ, Senior JR (2006) Drug-related hepatotoxicity. N Engl J Med 354: 731–739.
  6. 6. Shi Q, Hong H, Senior J, Tong W (2010) Biomarkers for drug-induced liver injury. Expert Rev Gastroenterol Hepatol 4: 225–234.
  7. 7. Xu JHJ, Henstock PV, Dunn MC, Smith AR, Chabot JR, et al. (2008) Cellular imaging predictions of clinical drug-induced liver injury. Toxicol Sci 105: 97–105.
  8. 8. Hosomi H, Fukami T, Iwamura A, Nakajima M, Yokoi T (2011) Development of A Highly Sensitive Cytotoxicity Assay System for CYP3A4-mediated Metabolic Activation. Drug Metab Dispos 39: 1388–1395.
  9. 9. Deng XM, Luyendyk JP, Ganey PE, Roth RA (2009) Inflammatory Stress and Idiosyncratic Hepatotoxicity: Hints from Animal Models. Pharmacol Rev 61: 262–282.
  10. 10. Boelsterli UA, Lim PLK (2007) Mitochondrial abnonnalities - A link to idiosyncratic drug hepatotoxicity? Toxicol Appl Pharmacol 220: 92–107.
  11. 11. Anderson N, Meier T, Borlak (2009) Toxicogenomics Applied to Cultures of Human Hepatocytes Enabled an Identification of Novel Petasites hybridus Extracts for the Treatment of Migraine with Improved Hepatobiliary Safety. Toxicol Sci 112: 507–520.
  12. 12. Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P (2010) A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol 6: 6.
  13. 13. Chen M, Vijay V, Shi Q, Liu Z, Fang H, et al. (2011) FDA-approved drug labeling for the study of drug-induced liver injury. Drug Discov Today 16: 697–703.
  14. 14. Greene N, Fisk L, Naven RT, Note RR, Patel ML, et al. (2010) Developing Structure-Activity Relationships for the Prediction of Hepatotoxicity. Chem Res Toxicol 23: 1215–1222.
  15. 15. O'Brien PJ, Irwin W, Diaz D, Howard-Cofield E, Krejsa CM, et al. (2006) High concordance of drug-induced human hepatotoxicity with in vitro cytotoxicity measured in a novel cell-based model using high content screening. Arch Toxicol 80: 580–604.
  16. 16. Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M (2007) Drug-target network. Nat Biotechnol 25: 1119–1126.
  17. 17. Hopkins AL (2009) Drug Discovery: Predicting promiscuity. Nature 462: 167–168.
  18. 18. Fourches D, Barnes JC, Day NC, Bradley P, Reed JZ, et al. (2010) Cheminformatics Analysis of Assertions Mined from Literature That Describe Drug-Induced Liver Injury in Different Species. Chem Res Toxicol 23: 171–183.
  19. 19. Ekins S, Williams AJ, Xu JJ (2010) A Predictive Ligand-Based Bayesian Model for Human Drug-Induced Liver Injury. Drug Metab Dispos 38: 2302–2308.
  20. 20. (2009) Translational medicine. Nat Rev Drug Discov 8: 593–593.
  21. 21. Daly AK, Day CP (2009) Genetic Association Studies in Drug-Induced Liver Injury. Semin Liver Dis 29: 400–411.
  22. 22. Yan B, Leung Y, Urbanski SJ, Myers RP (2006) Rofecoxib-induced hepatotoxicity: A forgotten complication of the coxibs. Can J Gastroenterol 20: 351–355.
  23. 23. Singer JB, Lewitzky S, Leroy E, Yang F, Zhao XJ, et al. (2010) A genome-wide study identifies HLA alleles associated with lumiracoxib-related liver injury. Nat Genet 42: 711–U794.
  24. 24. Aithal GP (2011) Hepatotoxicity related to antirheumatic drugs. Nat Rev Rheumatol 7: 139–150.
  25. 25. Nacher JC, Schwartz JM (2008) A global view of drug-therapy interactions. BMC Pharmacol 8: 5–13.
  26. 26. Daly AK, Donaldson PT, Bhatnagar P, Shen YF, Pe'er I, et al. (2009) HLA-B(star)5701 genotype is a major determinant of drug-induced liver injury due to flucloxacillin. Nat Genet 41: 816–U871.
  27. 27. Thibaudeau J, Lepine J, Tojcic J, Duguay Y, Pelletier G, et al. (2006) Characterization of common UGT1A8, UGT1A9, and UGT2B7 variants with different capacities to inactivate mutagenic 4-hydroxylated metabolites of estradiol and estrone. Cancer Res 66: 125–133.
  28. 28. Duguay Y, Baar C, Skorpen F, Guillemette C (2004) A novel functional polymorphism in the uridine diphosphate-glucuronosyltransferase 2B7 promoter with significant impact on promoter activity. Clin Pharmacol Ther 75: 223–233.
  29. 29. Daly AK, Aithal GP, Leathart JBS, Swainsbury RA, Dang TS, et al. (2007) Genetic susceptibility to diclofenac-induced hepatotoxicity: Contribution of UGT2B7, CYP2C8, and ABCC2 genotypes. Gastroenterology 132: 272–281.
  30. 30. Jensen LE, Wall AM, Cook M, Hoess K, Thorn CF, et al. (2004) A common ABCC2 promoter polymorphism is not a determinant of the risk of spina bifida. Birth Defects Res Part A 70: 396–399.
  31. 31. Morison WL, Baughman RD, Day RM, Forbes PD, Hoenigsmann H, et al. (1998) Consensus workshop on the toxic effects of long-term PUVA therapy. Arch Dermatol 134: 595–598.
  32. 32. Sharma SK, Balamurugan A, Saha PK, Pandey RM, Mehra NK (2002) Evaluation of clinical and immunogenetic risk factors for the development of hepatotoxicity during antituberculosis treatment. Am J Respir Crit Care Med 166: 916–919.
  33. 33. Hirata K, Takagi H, Yamamoto M, Matsumoto T, Nishiya T, et al. (2008) Ticlopidine-induced hepatotoxicity is associated with specific human leukocyte antigen genomic subtypes in Japanese patients: a preliminary case-control study. Pharmacogenomics J 8: 29–33.
  34. 34. Kindmark A, Jawaid A, Harbron CG, Barratt BJ, Bengtsson OF, et al. (2008) Genome-wide pharmacogenetic investigation of a hepatic adverse event without clinical signs of immunopathology suggests an underlying immune pathogenesis. Pharmacogenomics J 8: 186–195.
  35. 35. O'Donohue J, Oien KA, Donaldson P, Underhill J, Clare M, et al. (2000) Co-amoxiclav jaundice: clinical and histological features and HLA class II association. Gut 47: 717–720.
  36. 36. Russmann S, Jetter A, Kullak-Ublick GA (2010) Pharmacogenetics of Drug-Induced Liver Injury. Hepatology 52: 748–761.
  37. 37. Andrade RJ, Robles M, Ulzurrun E, Lucena MI (2009) Drug-induced liver injury: insights from genetic studies. Pharmacogenomics 10: 1467–1487.
  38. 38. Wilke RA, Lin DW, Roden DM, Watkins PB, Flockhart D, et al. (2007) Identifying genetic risk factors for serious adverse drug reactions: current progress and challenges. Nat Rev Drug Discov 6: 904–916.
  39. 39. Hamburg MA (2011) Advancing Regulatory Science. Science 331: 987.
  40. 40. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, et al. (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36: D901–D906.
  41. 41. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, et al. (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34: D668–D672.
  42. 42. Knox C, Law V, Jewison T, Liu P, Ly S, et al. (2011) DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res 39: D1035–D1041.
  43. 43. Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, et al. (2007) Analysis of pharmacology data and the prediction of adverse drug reactions and off-target effects from chemical structure. Chemmedchem 2: 861–873.
  44. 44. Crisman TJ, Parker CN, Jenkins JL, Scheiber J, Thoma M, et al. (2007) Understanding false positives in reporter gene assays: in silico chemogenomics approaches to prioritize cell-based HTS data. J Chem Inf Model 47: 1319–1327.
  45. 45. Glen RC, Bender A, Arnby CH, Carlsson L, Boyer S, et al. (2006) Circular fingerprints: Flexible molecular descriptors with applications from physical chemistry to ADME. Idrugs 9: 199–204.
  46. 46. Nettles JH, Jenkins JL, Bender A, Deng Z, Davies JW, et al. (2006) Bridging chemical and biological space: “Target fishing” using 2D and 3D molecular descriptors. J Med Chem 49: 6802–6810.
  47. 47. Nidhi , Glick M, Davies JW, Jenkins JL (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J Chem Inf Model 46: 1124–1133.
  48. 48. (2011) ATC The Anatomical Therapeutic Chemical (ATC) classification. WHO Collaborating Centre for Drug Statistics and Methodology. Available:
  49. 49. Xu XYN, Feng Z, Schweiger TA (2007) SCAN: a structrural clustrering algorithm for networks. pp. 824–833. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge Discovery and Data Mining. San Jose, California, USA.
  50. 50. Cohen AM, Hersh WR, Dubay C, Spackman K (2005) Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts. BMC Bioinformatics 6: 103.