Genome Sequence Variability Predicts Drug Precautions and Withdrawals from the Market

Despite substantial premarket efforts, a significant portion of approved drugs has been withdrawn from the market for safety reasons. The deleterious impact of nonsynonymous substitutions predicted by the SIFT algorithm on structure and function of drug-related proteins was evaluated for 2504 personal genomes. Both withdrawn (n = 154) and precautionary (Beers criteria (n = 90), and US FDA pharmacogenomic biomarkers (n = 96)) drugs showed significantly lower genomic deleteriousness scores (P < 0.001) compared to others (n = 752). Furthermore, the rates of drug withdrawals and precautions correlated significantly with the deleteriousness scores of the drugs (P < 0.01); this trend was confirmed for all drugs included in the withdrawal and precaution lists by the United Nations, European Medicines Agency, DrugBank, Beers criteria, and US FDA. Our findings suggest that the person-to-person genome sequence variability is a strong independent predictor of drug withdrawals and precautions. We propose novel measures of drug safety based on personal genome sequence analysis.


Introduction
The person-to-person variability in drug response is a major challenge for current clinical practice, drug development, and drug regulation. [1] A drug with proven clinical efficacy in some patients often fails to work in others and may even cause serious side effects, including death. [1][2][3] The incidence of severe adverse drug reactions (ADRs) has been estimated at 6.2-6.7% in hospitalized patients, and more than 2 million ADR cases occur annually in the United States (US), including 100,000 deaths. [3,4] As a result, many drugs causing unexpected severe ADRs are eventually withdrawn from the market.
The impact and cost burden of pharmaceutical market withdrawals are enormous. Of the 548 drugs that were newly approved by the US Food and Drug Administration (FDA) between market. [5,6] Twenty (3.8%) of the 528 drugs approved between 1990 and 2009 in Canada were withdrawn for safety reasons, and the percentage of drug removals from the market did not change significantly during that period indicating that there was also no change in the effectiveness of the drug review system. [7] Despite substantial premarket evaluation efforts, including long and costly clinical trials, postmarket drug withdrawals are generally not preventable by the currently available means. [8,9] The variability of drug responses among different individuals, which makes long-term predictability of drug performance difficult, may be explained by the genotypic diversity in the population. Advancements in pharmacogenomics now provides the information about the effects of genetic variations on individual drug responses, [10] which is currently listed on the labels of approximately 150 drugs that have been approved by the US FDA. [11] This accumulation of genetic data relevant to drug response is a significant step towards realization of personalized medicine. Population-based genome-wide observational research, such as Genome-Wide Association Studies (GWAS), is currently one of the most powerful tools for investigating the relationship between genotypic variability and drug responses. However, the current population-based approach in an affected-versus-nonaffected case-control setting is inherently limited because the genotypes, drugs, and their associations are too numerous to be reliably tested in a foreseeable future. Thus, a large number of commonly used drugs are not examined by GWAS.
Genome sequencing technology has revealed hundreds of genetic variants that are predictive of loss of function (LoF) in protein-coding genes. [12] The genes related to drug pharmacokinetics and pharmacodynamics (PK/PD) have many LoF variants, and their prevalence shows significant person-to-person variability. [13] The strikingly large number of deleterious genetic variants even in apparently healthy subjects and their uneven distribution across individual genomes may explain the unusual responses to certain drugs in small subpopulations. [14] Clinical trials are generally conducted in groups that are homogeneous in age, gender, and/or ethnicity, and the results are then extrapolated to the general population. [15] However, the extrapolation may not be applicable to certain genetically differentiated subpopulations, which may not have been significantly represented during phase-III clinical trials. The significant inclusion of the susceptible subpopulations following premarket approval of a drug by regulatory authorities may then result in unexpected ADRs that may ultimately lead to its withdrawal from the pharmaceutical market. Therefore, we hypothesized that the person-to-person variability in the distribution of deleterious variants of PK/PD genes in different ethnic groups will reveal subpopulations predisposed to severe ADRs, which may subsequently result in drug removal from the market.
Our findings suggest that the person-to-person genetic variability is a strong independent predictor of drug withdrawals for safety reasons. Therefore, we propose a method to identify individuals (or subpopulations) with increased vulnerability to side effects from use of specific drugs and suggest novel measures to ensure drug safety based on personal genome sequence analysis, which can improve drug development, use, and regulations.

Drugs, Genes, and Genomes
The DrugBank database is a bioinformatics resource that provides detailed drug information, including chemical structure, pharmacological mechanism, drug targets, metabolic enzymes, carriers, and transporters. [16] Among the 7793 drugs in the DrugBank, 5099 drugs had at least one identified PK/PD genes (accessed September 4, 2015 at http://www.drugbank.ca/), of which 1041 drugs having five or more PK/PD genes were included in the following analysis (Fig 1) by excluding 4058 drugs having less than five PK/PD genes. In our study, we analyzed the 1041 drugs with five or more PK/PD genes and 2807 PK/PD genes known to be involved in drug responses via 12,887 drug-gene interactions (Fig 1 and S1 File). We downloaded 2504 publicly available personal genomes from the 1000 Genomes Project, which provided unbiased data on the genetic variability within 26 ethnic subgroups from Europe, East Asia, South Asia, Africa, and the Americas [17] (accessed September 4, 2015 at http://www.1000genomes.org/).

Selection of Precautionary and Market-withdrawn Drugs
In order to create a comprehensive list of withdrawn drugs, we performed an in-depth review of the following publically available resources: 1) the 8th, 10th, 12th, and 14th Issues of the In total, 1041 drugs from the DrugBank with at least five identified genes involved in drug pharmacokinetics (PK) and pharmacodynamics (PD) were included in the study. A comprehensive list of withdrawn drugs (n = 154) was obtained by reviewing three resources: 1) the 8th, 10th, 12th, and 14th Issues of the 'Consolidated List of Products Whose Consumption and/or Sale Have Been Banned, Withdrawn, Severely Restricted, or Not Approved by Governments: Pharmaceuticals' published by the United Nations [18][19][20][21]; 2) applications for withdrawal from European Medicines Agency (EMA) [22]; and 3) the DrugBank annotations for withdrawals. Precautionary drugs (n = 165) were identified based on the Beers criteria [23] and US FDA table of pharmacogenomics biomarkers [11]. The remaining 752 drugs were classified as other drugs.  [18][19][20][21] which are the most comprehensive lists of withdrawn and severely restricted drugs produced by at least one of 94 governments, 2) drug applications withdrawn by the Committee for Medicinal Products for Human Use (CHMP) at the European Medicines Agency (EMA), [22] and 3) DrugBank annotations for pharmaceutical market withdrawals. [16] (S2 File) In our study, a drug was defined as "withdrawn" if it had been removed, banned, or disapproved by at least one country for any reason. We manually selected 350 drugs that were withdrawn by at least one country according to the UN Consolidated Lists [18][19][20][21] and 174 drugs that were withdrawn from January 2006 to September 2015 according to the EMA. [22] Furthermore, the DrugBank annotated 181 drugs as withdrawn. For our analysis, we also selected "precautionary" drugs by applying the Beers criteria (n = 137) [23] and US FDA pharmacogenomic biomarker information in drug labels (n = 148). [11] The Beers criteria for Potentially Inappropriate Medication Use in Older Adults, which was initially published in 1991 [24] and most recently revised by the American Geriatric Society in 2012, [25] is widely used in clinical care to prevent ADRs in the elderly population. Fig 1 shows that among the 1041 drugs with at least five PK/PD genes, 289 were classified as withdrawn or precautionary (n = 165). "Other" drugs were not included in the list of precautionary drugs and also not withdrawn by any of the available resources (n = 752). Among the 154 withdrawn drugs, 16 and 18 were also redundantly classified as precautionary according to the Beers criteria (n = 90) and US FDA pharmacogenomic biomarker information (n = 96), respectively (Fig 1).

Deleteriousness Scores for Genes, Variants, Drugs, and Populations
The genes encoding for drug targets, related metabolic enzymes, and drug transporters affect an individual's response to a drug. Furthermore, previous studies have demonstrated that the impact of nonsynonymous substitutions on protein structure and function can be reliably predicted by applying straightforward empirical rules using specific algorithms such as SIFT, [26] PolyPhen-2, [27] MutationAssessor, [28] CADD, [29] and Condel. [30] For our analysis, we used the SIFT algorithm [26] to compute the variant deleteriousness score (V) for evaluating the impact of all nonsynonymous coding variants found in the 2,807 PK/PD genes of the 2,504 personal genomes selected from the 1000 Genomes Project. The lower the SIFT score (range, 0~1) of a variant, the more deleterious the impact of the variant on the function of the gene. The gene deleteriousness score (G) of a gene was defined as the geometric mean of the V scores for all nonsynonymous coding variants of the gene to evaluate the overall impact of multiple deleterious variants on the gene (Fig 2). Similarly, the drug deleteriousness score (D) was defined as the geometric mean of the G scores for all drugrelated PK/PD genes (Fig 2). All PK/PD genes of a drug were equally weighted without considering pharmacokinetic parameters. The geometric mean or the nth root of the product of the numbers, where n is the number count, was used to identify the central tendency in the change of the analyzed parameters. The V, G, and D deleteriousness scores ranged from 0 to 1; low V and G scores indicated severely altered structure and/or function of the corresponding gene, while low D value indicated increased predisposition to unintended drug responses.

Statistical Analysis
ANOVA and post-hoc Tukey analysis were used to compare the AUCs among the four drug categories: withdrawn, precautionary by Beers criteria, precautionary by FDA pharmacogenomics labeling, and other drugs (Fig 3). Relative frequencies of drug withdrawals (Fig 4) and precautions ( Fig 5) were computed across AUC score bins for further analysis. The Cochrane-Armitage Trend test was performed across AUC score bins to estimate the effect of the AUC scores on the frequencies of drug withdrawal and precaution (Table 1). All P values were two sided, and considered statistically significance at < 0.05. All statistical analyses were conducted using the R statistical package (ver. 3.01). [31] Results

Drug Withdrawals and Precautions
Altogether, we analyzed 1041 drugs from the DrugBank that had at least five listed PK/PD genes (Fig 1). Overall, 154 (14.8%) of the 1041 analyzed drugs were eventually withdrawn from the market by at least one country. Among the 137 drugs in the Beers list and 148 drugs with FDA pharmacogenomic labels, 90 (65.7%) and 96 (64.9%), respectively, met our inclusion criteria of having at least five known PK/PD genes. Interestingly, 30 (18.2%) of these precautionary drugs were also withdrawn from the market by at least one country (Fig 1, Venn diagram).  Table, it was consistently confirmed at all thresholds of the number of PK/PD genes from 1 to 10 for study drug inclusion that the AUC values of withdrawn and precautionary drugs were significantly lower than that of other drugs (P < 0.001 by ANOVA) .  Fig 4 consists of histograms that display the frequencies of drug withdrawal relative to the population deleteriousness score (P) or AUC of the drug according to the UN Consolidated Lists, [18][19][20][21] EMA, [22] and DrugBank. [16] (S2 Fig) Because each country has different policies regarding the approval of pharmaceutical products, we analyzed these resources separately as well as in combination. The results indicated that lower AUC values correlate significantly with higher relative frequencies of drug withdrawals from the market in every configuration  Table 1). [11] Genomic Variability and Drug Safety "If it were not for the great variability among individuals, medicine might as well be a science and not be an art, " said Sir William Osler. [32] The present study demonstrated unexpectedly high person-to-person variability of deleterious variants among drug-related genes. A drug might as well be 'genetically ideal' if the drug-related genes have no deleterious variants across the entire human population, which is rather rare. Based on the analysis of drug-related gene  Table 1). Relative frequencies of drug precautions according to the Beers criteria (E) and US FDA pharmacogenomics labels (F) were significantly higher in lower AUC score bins compared to higher AUC score bins (P < 0.01, Cochran-Armitage Trend test; see Table 1). AUC, area under the drug deleteriousness score curve; EMA, European Medicines Agency; UN, United Nations. variability in the population, we propose here two genomic parameters to assess drug safety, AUC and V t , which may facilitate drug development, use, and regulation, and eventually prevent drug withdrawal. Fig 5 shows the distribution curve of the D scores among 2504 individuals for three drugs (disopyramide, procainamide, and quinidine) that belong to class Ia antiarrythmics (C01BA) according to the Anatomical Therapeutic Chemical (ATC) classification system. The AUC for disopyramide (β+γ+δ) is larger than that of procainamide (γ+δ) and quinidine (δ) but smaller than that of a 'genetically ideal' drug (α+β+γ+δ), which is equal to 1.0 and has zero person-toperson variability in the drug specific PK/PD genes. Thus, α (or 1-AUC of disopyramide) represents the distance of disopyramide from a 'genetically ideal' drug in terms of genome sequence variation. Interestingly, 1-AUC of disopyramide is smaller than that of procainamide (1-[γ+δ]) and quinidine (1-δ; Fig 5) indicating that among the 2504 individuals in the 1000 Genomes Project, the genes related to PK/PD of disopyramide have less deleterious variants than those of procainamide and quinidine.

Discussion
The decision process of drug withdrawals is highly complex and is influenced by political, commercial, ethical, and other factors. In this study, we have demonstrated that the person-to-person genome variability is a strong independent prognostic factor for drug withdrawals from the pharmaceutical market, which has long been considered unpredictable. Therefore, we propose the use of these novel measures of drug safety assessment based on personal genome sequence analysis to improve drug development, use, and regulation. The proposed method may be considered for future application in the clinical setting to identify individuals with increased genetic vulnerability to adverse side effects from a specific drug. When clinically validated for specific drugs and ADRs, healthcare providers can use this method as a tool to support clinical decisions and prevent unintended drug reactions. For pharmaceutical companies, the proposed genomic measures can be used for efficient screening of candidate pharmaceuticals, development of safer drugs, genome-based pharmacovigilance, and regulation of drug safety.
Critics have argued that clinical trials involving up to 2000~3000 participants cannot reliably detect rare ADRs with an incidence of less than 1 per 10,000. [33] Although increasing the number of trial participants is an option, albeit costly, it may still be inadequate to address the complexity of personal genome sequence variations. [34] The current affected-versus-unaffected case-control approach used in pharmacogenomics research [35] cannot provide comprehensive analysis of multiple complex associations between numerous genotypes and drugs. In contrast, the proposed ab initio method is scalable to increasing numbers of genotypes and drugs and relies only on integrated public resources: personal genome sequence databases covering different ethnic groups, drug PK/PD information, and the records of withdrawn and/ or precautionary drugs. Recently, US FDA recommended incorporating the ethnic category of participants in clinical trials.
[36] Incorporating ethnicity-related data regarding genome sequence variability, which can affect drug safety, may be necessary for future clinical trial design. For example, to reliably detect ADRs before marketing, drugs with low AUC values (i.e., large genomic distance from a 'genetically ideal' drug) should be tested in larger populations than drugs with high AUC values. Considering the distribution of genetically vulnerable subpopulations or ethnic groups for individual drugs based on genome sequence variability may lead to a more rational clinical trial design and ultimately improve drug safety.
For the purpose of comprehensive evaluation, the present study was replicated with other variant function-prediction algorithms. ANNOVAR [37] provides with seven variant function prediction scores (i.e., SIFT [26], PolyPhen HIVD [27], PolyPhen HVAR [27], LRT [38], Muta-tionTaster [28], MutationAssessor [39], FATHMM [40]) and three variant conservation scores (i.e., GERP++ [41], phyloP [42], Siphy [43]). Of these 10 methods, MutationAssessor, FATHMM, and Siphy were excluded for further analysis due to their severely right-skewed variant score distributions, resulting no drug remaining in the lower score bins for the drugs with the lowest deleteriousness scores. LRT was also excluded because its binary weighting scheme of variant scores based on dN/dS ratio. As shown in S2 Table, all six prediction scores consistantely confirmed that the AUC values of withdrawn and precautionary drugs were statistically significantly lower than those of other drugs (P value <0.001 by ANOVA, post-hoc Tukey test). More importantly, drug withdrawal rates significantly increased as the AUC scores decreased for all six scores (P value < 0.05, Cochrane-Armitage tests for trend, S2 Table).
The computation of D scores conducted in our study is subject to some limitations due to the imperfect knowledge about the PK/PD of drugs [16] and the precise effects of nonsynonymous coding variants on protein structure and function. [26][27][28][29][30] The scope and accuracy of the D scores are also limited by insufficiency of public resources such as those that provide personal genome sequence data from various ethnic groups and standardized reports of withdrawn and precautionary drugs. Further development of these critical resources could, therefore, greatly improve and refine drug safety strategies. The proposed method has some limitations. First, the current knowledge on PK/PD drug-gene relationships is not perfect. We discovered that many drugs eligible for inclusion in our study lacked PK/PD information. Second, the current method assigns the same scores to different drugs having the same PK/PD genes without considering different pharmacokinetic parameters such as Km/Vmax. We found that less than 10% of all DrugBank drugs have at least one properly matched pharmaokinetic parameters after systematic survey of the current databases including PubChem [44], BRENDA [45], SABIO-RK [46] and MetaCyc [47]. Moreover, it may not be true for the drugs with the same PK/PD genes to have the same genetic effect in vivo because multiple factors other than genetic variants such as PK (i.e., route of administation, dose, and form), PD (i.e., targets and their physiological roles), and patient (i.e., disease, behavior, and environment) factors will affect toxicity and efficacy. Thus, given the substantial effects and utmost importance of genomic variability on drug responses, the availability of more accurate and comprehensive PK/PD data could greatly enhance drug development, use, and regulation. Third, since noncoding regions of the human genome contain up to 88% of the weakly phenotype-associated variants as identified by the GWAS [48] and ENCODE (ENCyclopedia Of DNA Elements) [49] the integration of both coding and noncoding gene information could further improve the present scoring system.
To the best of our knowledge, this is the first study that has addressed the long-standing problem of unpredictability in pharmaceutical market withdrawals by incorporating state-of-the-art personal genome sequencing technology. The strong correlation between the population deleteriousness score (or AUC) and pharmaceutical market withdrawal rate was consistent among different databases and lists of drug withdrawals and precautions, including UN Consolidated Lists, DrugBank, European Medicines Agency, Beers guidelines, and US FDA. One interesting byproduct of our study is that the proposed method may rescue necessary drugs that have already been withdrawn from the market for safety reasons [8]. Clinical reinstatement of a previously withdrawn drug, however, should be deferred until the genetic cause of ADRs in a vulnerable subpopulation has been clearly identified and corresponding diagnostic tests have been developed.
Supporting Information S1 Fig. Comparison of population deleteriousness scores between withdrawn, precautionary, and other drugs across different numbers of drug-related genes. Population deleteriousness scores (AUC) were significantly lower for the three withdrawn and precautionary drug groups than others at all thresholds (i.e., the numbers of PK/PD genes from 1 to 10) (P < 0.05 by post-hoc Tukey tests after one-way ANOVA (P < 0.001)) for study drug inclusion. In contrast, population deleteriousness scores did not show statistically significant difference among the three withdrawn and precautionary drug groups at all thresholds (P > 0.05). ÃÃ P < 0.001 and Ã P < 0.05 by post-hoc Tukey test (see Table 1). Numbers in parentheses represent the numbers of included drugs at each threshold. AUC, area under the drug deleteriousness score curve; FDA PGx, FDA-approved drugs with pharmacogenomic information on drug labels; PD, Pharmacodynamics; PK, Pharmacokinetics.  ) show drug deleteriousness score curves typical for the corresponding AUC score bins. The shaded area representing 1-AUC shows the distance of the drug from a 'genetically ideal' pharmaceutical in terms of genome sequence variation, i.e., no variation of relevant genes between individuals. AUC, area under the drug deleteriousness score curve; EMA, European Medicines Agency; UN, United Nations. (TIFF) S1 File. 1041 including drugs with five or more pharmacokinetics/pharmacodynamic genes. Among 5099 drugs had at least one identified PK/PD genes from the DrugBank Version 4.1, excluding 4058 drugs having less than five PK/PD genes, 1041 drugs having five or more PK/PD genes were included. For these inclusion drugs, we matched the drug regulation information for further analysis. (XLSX) S2 File. Total withdrawn and precautionary drugs. Withdrawn or precautionary drug information was collected comprehensively from the multiple publicly available resources. For withdrawn drugs, we reviewed reports from United Nations, European Medicines Agency, and DrugBank annotation. For precautionary drugs, we collected information from the Beers criteria and US FDA pharmacogenomic biomarker information in drug labels. (XLSX) S1 Table. Comparison of population deleteriousness scores between withdrawn, precautionary, and other drugs at all thresholds of different number of PK/PD genes for drug inclusion. Ã P < 0.05 and ÃÃ P < 0.001 by post-hoc Tukey test after one-way ANOVA in comparison to other drugs. † P < 0.001 by one-way ANOVA at all thresholds of different numbers of drugs for study drug inclusion. ‡ P > 0.05 by post-hoc Tukey test after one-way ANOVA for all pairwise comparisons between the the three drug groups that are withdrawn and precautionary. Population score values are mean (SD) (see S1 Fig.) AUC, area under the drug deleteriousness score curve; FDA PGx, FDA-approved drugs with pharmacogenomic information on drug labels; PD, Pharmacodynamics; PK, Pharmacokinetics; P score, Population deleteriousness score. (DOCX) S2 Table. Descriptive statistics and statistical test results for six function prediction scores. For each score, all including drugs were collected independently. We included drugs with at least five identified pharmacokinetics (PK) and pharmacodynamics (PD) gene relationships (which have specific score annotated variants) in the analysis. Ã P < 0.05 and ÃÃ P < 0.001. AUC, area under the drug deleteriousness score curve; EMA, European Medicines Agency; FDA PGx, FDA-approved drugs with pharmacogenomic information on drug labels; UN, United Nations. (DOCX)