Polymorphic Cytochrome P450 Enzymes (CYPs) and Their Role in Personalized Therapy

The cytochrome P450 (CYP) enzymes are major players in drug metabolism. More than 2,000 mutations have been described, and certain single nucleotide polymorphisms (SNPs) have been shown to have a large impact on CYP activity. Therefore, CYPs play an important role in inter-individual drug response and their genetic variability should be factored into personalized medicine. To identify the most relevant polymorphisms in human CYPs, a text mining approach was used. We investigated their frequencies in different ethnic groups, the number of drugs that are metabolized by each CYP, the impact of CYP SNPs, as well as CYP expression patterns in different tissues. The most important polymorphic CYPs were found to be 1A2, 2D6, 2C9 and 2C19. Thirty-four common allele variants in Caucasians led to altered enzyme activity. To compare the relevant Caucasian SNPs with those of other ethnicities a search in 1,000 individual genomes was undertaken. We found 199 non-synonymous SNPs with frequencies over one percent in the 1,000 genomes, many of them not described so far. With knowledge of frequent mutations and their impact on CYP activities, it may be possible to predict patient response to certain drugs, as well as adverse side effects. With improved availability of genotyping, our data may provide a resource for an understanding of the effects of specific SNPs in CYPs, enabling the selection of a more personalized treatment regimen.


Introduction
Inter-individual variability of drug response and drug clearance is a complex and common problem in clinical practice [1]. Overlapping substrate specificity of enzymes, a multitude of single nucleotide polymorphisms (SNPs) [2] and variations between ethnic groups [3] make prediction of phenotypic drug response difficult. To avoid treatment failure and unnecessary toxicity, tailoring dosages and drug-cocktails for each individual is essential [4].
Differences in drug response can be attributed to variability in DNA sequences of specific genes which's products are crucial for drug metabolism. For instance, SNPs in phase 1 enzymes, such as cytochrome P450 oxidases (CYPs) [3], phase 2 enzymes, such as Uridine 5'-diphosphoglucuronosyltransferase (UGTs) [5], and absorptive and efflux transporters, such as ATP-binding cassette transporters (ABCtransporters) [4], have been previously reported.
Characterization of these enzymes and the effects of minor allele variants on the metabolism of specific drugs have been described in the literature and have recently been compiled by our group into a comprehensive database called SuperCYP [6]. Phase I reactions include oxidation, reduction, hydrolysis and cyclization. Using oxygen and NADPH as a co-substrate, CYPs are the major enzymes responsible for catalyzing such reactions [7] and account for approximately 75% of total drug metabolism [8].
The Human Genome Project identified 57 human CYPs, which were classified into 18 families and 43 subfamilies based on sequence similarity [9]. CYP families 1, 2 and 3 are responsible for metabolism of drugs, xenobiotics and certain endogenous molecules [3] and hence are of particular relevance to this current study. Most CYPs metabolize more than one drug. Similarly, a drug is often metabolized by multiple CYPs. Drugs can also inhibit or induce CYP activity, either by directly interacting with the enzyme or altering its expression. Characterization of these interactions is important to determine and predict compatible drug combinations [10]. Human CYPs are primarily membrane-associated proteins [11] that are ubiquitously expressed in most tissues. Highest expressions are generally found in liver tissue, but the distribution of particular CYPs varies [12], which indicates that the actual efficiency of a drug is likely to depend on CYP expression in the target tissue. There are significant interindividual differences in enzyme activity leading to distinct phenotypes. For example the most frequent phenotype of CYP 2D6 is the extensive-metabolizer (78.8%), followed by intermediate- (12.1%), poor (7.6%) and ultra-rapid metabolizers (1.5%) [13].
In addition to drug catabolism, many CYPs are responsible for activation of prodrugs, such as cancer therapeutics [14] and antipsychotics [15]. Prodrugs are pharmacologically inactive compounds that require activation via metabolic conversion [16], allowing control of where, when and how much drug activity occurs [17]. This is particularly important for chemotherapeutic drugs, where the active drug ideally only acts on tumor cells in order to reduce toxic side effects [18]. Prodrugs can be activated by photo irradiation [19], change in pH [20] or enzymatically [21], for instance by CYPs [22]. Polymorphisms in CYPs can result in ineffective or aberrant activation of prodrugs [22], which can lead to toxicity [4]. Fortunately, advances in genetic research have made genotyping of a large number of patients possible, leading to identification of SNPs that alter expression or activity of drug metabolizing enzymes [3]. In this study we set out to determine the most frequent CYP polymorphisms having the highest impact on drug metabolism in Caucasians. This knowledge could facilitate the development of tests for efficient genotyping of patients thus leading to a better and more personalized treatment.

Text mining
Information on drug metabolism can be found in more than 100,000 PubMed articles, yet limited data is available regarding the frequencies of SNPs in human CYPs. To identify relevant articles, a specific search tool was developed for text mining literature using Apache Lucene™ (http://lucenenet.apache.org) as a search engine library and LingPipe (http://alias-i.com/ lingpipe). Figure 1 summarizes the different methods used for the textmining approach. Complete Medline/PubMed data were downloaded from the NCBI FTP site in xml-format and then indexed. The indexed data was dynamically queried by a search engine written in Java that outputs an sql-file with the text mining hits, which served afterwards for manual validation. The search engine comprises several lists of synonyms for identifying entities, such as chemical compounds, biological targets, genes, cell types and polymorphisms, as well as interaction-related entities. If available, information on CYP polymorphism was extracted from the literature. Definitions and synonyms are included from UMLS® Metathesaurus®, that contains millions of biomedical and health related concepts, their synonymous names, and their relationships. As an example, the query for CYP2C19 was like: (Abstract: CYP2C19\** OR Title: CYP2C19\**) AND (Abstract: population OR Title: population) AND (Abstract: effect OR Title: effect) AND (Abstract: frequenc* OR Title: frequenc*).
The term 'CYP2C19' was replaced through each human CYP and synonyms, as well as different ethnicity and outcome terms were used for 'population' and 'effect'. The positional distance between the different terms had to be restricted to reduce false positive hits, when terms occurred far from each other in the abstract. The records found were scored rulebased. The rules employed order, redundancy, distance, topic segmentation and sentence breaking for boundaries. For example, a distance ≦ 7 between the CYP and the ethnicity and ≦ 6 between the frequency and the CYP was given a score of 100. Greater distances and negative interaction words resulted in lower scores. Duplicates were removed and a team of scientists manually processed 1,037 papers found in PubMed for relevance to polymorphisms and their frequency in Caucasian populations. The team consisted of three medical scientists, with three years experience in validation of text mining results. During this time, they reviewed over 10,000 abstracts with the focus on CYPs. A weekly meeting took place to ensure and raise the quality of text mining and to discuss problems. The aim was to achieve a coherent review operation. The text mining validation tool is shown in Figure 2. CYP polymorphisms that occurred with a frequency of more than one percent in the Caucasian population were included in this study.

Localization of SNPs in a 3D CYP model
The evolutionary conservation taken from a multiple sequence alignment of CYPs was projected onto the 3D structure using CYP 2D6 as template (PDB ID: 3TDA). Frequent SNPs in the four most polymorphic CYPs (1A2, 2C9, 2C19 and 2D6) were labeled in the 3D model. The number of mutations was used to determine the thickness of the ribbon ( Figure 3).

CYP SNPs and 1,000 Genomes
The 1,000 Genomes Project (www.1000genomes.org) is an international initiative designed to provide full genomic sequence information from an ethnically diverse population [23]. CYP SNPs in 1,092 individuals were extracted using the online data slicer from the 1,000 Genomes Project (http:// browser.1000genomes.org). Frequency analysis focused on non-synonymous coding SNPs with a prevalence of one percent or higher in all genomes regardless of ethnicity. The search included the main 29 CYP alleles from "The Human Cytochrome P450 (CYP) Allele Nomenclature Database" (http://www.cypalleles.ki.se/) [24]. In addition, 16 CYP alleles not listed in the CYP allele database due to very heterogeneous distributions were included. The 1,000 Genomes Database includes SNP effect predictions on CYPs, calculated by PolyPhen [25], which predicts possible functional alterations in human proteins after amino acid substitution based on physical and comparative considerations [26].

Expression data
Affymetrix data was used to compare human CYP mRNA expression in 41 different types of tissue, further subcategorized into different regions of an organ, yielding a total of 65 tissue types. The series of datasets obtained from GEO (Gene Expression Omnibus, http://www.ncbi.nlm.nih.gov/ geo/) were originally generated from 10 post-mortem donors (5 females and 5 males), and represent normal human tissues (Series GSE3526) [27]. The 84 probe sets, which measure the expression level of CYPs were normalized and assigned to 40 types of CYPs. To display differences in expression, a heatmap was generated using Genesis software [28]. Relative expression was calculated as the intensity of the gene in the region minus the mean intensity of the gene in all regions then divided by the standard deviation. This heat-map served as data source for the CYP body map in which only two-fold decreased or increased values were considered.
Our work would not have been possible without the publicly available datasets mentioned above. We are grateful and honor the work of involved research groups.

Frequencies of SNPs in CYPs
Analysis of the SNPs identified by text mining, showed that SNPs predominantly occurred in 3 polymorphic CYPs (2D6, 2A6 and 2B6) regardless of ethnic group. Only frequencies of known nucleotide changes were assessed to identify the extent of SNPs in CYPs. Figure 4 displays 9 CYPs, including 2D6 (114 SNPs), 2A6 (68 SNPs) and 2B6 (57 SNPs), which showed the highest number of SNPs. For other CYPs, the number of known SNPs was less than 22. CYP 2D6 is a major polymorphic CYP and, as expected, was the greatest contributor of polymorphic alleles in Caucasians.
Comparison among different ethnic groups revealed that frequencies differed considerably and displayed a heterogeneous distribution of CYP alleles. For instance, in Asian and African populations, CYP2A6*2 possessed a frequency of 28.0% and 62.0%, respectively, whereas a frequency of 8.0% was observed in Caucasians. A more detailed table with additional information on CYP SNPs in Caucasians and other ethnic groups is available in Table S1.

Major drug metabolizing CYPs
Not all 57 human CYPs are involved in drug metabolism. The primary CYPs responsible for drug metabolism were determined by first ranking the CYPs according to the total number of drug substrates ( Figure 5). Twelve CYPs accounted for 93.0% of drug metabolism, regarding to the entire number of 1.839 known drug-metabolizing-reactions in the SuperCYP database. CYP 1A2, 2D6, 2C9 and 2C19 were responsible for nearly 40.0% of drug metabolism and including CYP 3A4 even for 60.0%.
Since the described four CYPs are highly polymorphic and commonly occur in Caucasians, further detailed analyses were The table shows the text mining validation tool with columns for score, PubMedID, relation sentence and checkboxes for the validation. The SNPs are highlighted in blue, frequencies in green, effects in orange and the ethnicities in red. 'Scientist 1' reads the abstract and, if necessary, has access to full text. Afterwards, the relation has to be validated as 'true', 'false' or 'not sure'. If the relation is 'true', the relation is copied into the 'comment' field. These relations are copied into a new sql-file. If 'Scientist 1' activates the 'not sure' field, the relation has to be validated again by another scientist. doi: 10.1371/journal.pone.0082562.g002 restricted to these four CYPs. On the overall CYP system, it is expected that these four CYPs would have the greatest impact on inter-individual variability of drug response. Although CYP 2A6 and 2B6 possess various relevant alleles in Caucasians, they do not cover a large range of drug interactions (51 and 74 substrates, respectively).
All SNPs in the four most polymorphic CYPs (1A2, 2C9, 2C19 and 2D6) influenced enzymatic activity due to localization in the substrate-binding cavity as shown in Figure 3.

Expression data
Because of high CYP expression levels in some tissues, an impact of CYP isoforms in particular tissues can be deduced. The work of Nishimura and colleagues demonstrated differences in CYP mRNA expression in various human tissues. For example, CYP 2F1, 4B1, 4F8, 11S, 11A, 11B1, 11B2, 19 and 24 are not expressed in the liver [29]. The current study confirmed these results and extended the findings, which are shown in Figure 6. Nishimura analyzed mRNA levels of 30 CYP isoforms in 11 tissue types. Similarly, the current study investigated the expression of 40 CYP isoforms in 41 tissue types. The liver was considered separately in the analysis in order to identify the differences between the other tissues. In 21 different tissues, a heterogeneous distribution of CYPs was observed. For instance, 39 different CYP isoforms showed higher mRNA expression in at least one or more tissue types. Significant differences were observed in the adrenal gland cortex, which possessed 6-fold higher expression of CYP 11A1, 11B1 and 11B2 (compared to the mean expression). Interestingly, no other tissue showed high levels of expression of these three CYPs. Large differential expression compared to other tissues was also observed in the kidneys, where a 6-fold increase in CYP 4A22, a 5-fold increase in CYP 8B1 and 4-fold increases in 4V2, 4F2, 4A11 and 2B6 were noted. In addition, 5-fold higher expression of CYP 2C8 was found in lung, CYP 4F8 in prostate, 4F3 in bone, 2F1 in bronchial tubes and 2C8 in  stomach. Furthermore, CYP 2C18 showed a high level distribution restricted to the oral cavity, pharynx and esophagus. Two-fold lower expression was detected for CYP 2A1 in the esophagus, 2A7 in the prostate, as well as 2C9 and 2D6 in the spleen.

CYP SNPs and 1,000 Genomes
The current study identified 199 non-synonymous coding SNPs with frequencies greater than one percent (Table S2). Compared to the "Human Cytochrome P450 Allele Nomenclature Database" (http://www.cypalleles.ki.se/), we found several SNPs in 1,000 Genomes not related to alleles defined and named in the Database. To elucidate the difference between the 'The Human Cytochrome P450 Allele Nomenclature' and 1000genome data regarding new SNPs, we examined CYP2A6 exemplary. Table 2 summarizes SNPs most likely to alter enzyme activity [25]. It displays five SNPs, which can lead to an altered enzyme activity with frequencies between 1.4 and 5.1 %. Only I471T (rs5031016) is also contained in the CYP nomenclature and reflects the CYP2A6*36 allele. New updates have to be done to map a comprehensive CYP SNP data source.
With the potential to alter drug metabolism, the 72 listed SNPs occurred in 24 CYPs. The most frequent SNPs were CYP 4A11 rs112743 (42.6%; highly expressed in kidney tissue), CYP 4F11 rs1060463 (49.5%; highly expressed in bronchus tissue) and CYP 2A7 rs3869579 (46.7%; highly expressed in the pituitary gland but low in the prostate gland).
Our study findings show that some CYPs are not only heterogeneously expressed, but also highly polymorphic.

Genetic diversity and polymorphisms
Mutations in a CYP gene can lead to functional alterations, such as increased or decreased activity. If a mutant allele occurs at a frequency of at least one percent in a population, it is referred to as a pharmacogenetic polymorphism. Such polymorphisms can be discovered at the genotype level and/or the phenotype level based on altered function of the enzyme [30].
Individuals in a population can be stratified according to metabolic ratios of particular CYPs, which have great clinical relevance. For example, a CYP 2D6 poor metabolizer should not be administered codeine since the drug would have no effect. Conversely, a CYP 2D6 ultra-rapid metabolizer would likely suffer side effects from a normal dosage [31,32]. CYP 2D6 is a highly polymorphic CYP with at least 70 allelic variants [33] that can be categorized into four phenotypic classes. Overall CYP 2D6 expression in liver tissue is only approximately 2%, but hundreds of drugs are metabolized by this enzyme, including opiates, beta-blockers, anti-arrhythmics, tricyclic antidepressants, SSRIs, 5-HT3-antagonists and neuroleptics [34]. About 10% of the Caucasian population have difficulties in fully metabolizing these drugs [35], leading to harmful side effects [32,36]. Therefore, personalized prescriptions will become of great importance [37].

Personalized medicine
Since 2009, the Clinical Pharmacogenetics Implementation Consortium (CPIC) provides information on how genetic test results can be used to optimize drug therapy. The guidelines center on genes or on specific drugs. For some drugs, they also provide dosing guidelines for clinicians [38]. Psychiatric drugs.
As most psychiatric drugs are metabolized by highly polymorphic CYP 2D6 and CYP 2C19, psychiatrists were first to propose the idea of CYP genotyping [39][40][41]. Three state hospitals in Kentucky recruited 4,532 psychiatric patients for genotyping of both CYPs with the help of DNA microarray technology.
Results from the current study were consistent with previous studies of allele frequency [35], demonstrating the importance of personalized prescription given that more than one tenth of patients are not likely to respond to standard treatment and suffer unwarranted toxicity. In the study performed by de Leon and colleagues, the dosage was adapted to the guidelines of Kirchheiner [15] for antipsychotics and antidepressants. The authors propose a numeric dosage adaptation system that reflects expression of CYP 2D6 and CYP 2C19.
Cardiovascular drugs. An important area of focus is stent implantation and/or inhibition of blood clots after an acute coronary syndrome (ACS) to prevent ischemic events. Therefore, antiplatelet agents are administered before and after percutaneous coronary intervention (PCI) to reduce the risk of ischemic events. Currently, the gold standard therapy is a combination of aspirin and clopidogrel [42,43]. Unfortunately, approximately 29% of people respond poorly to clopidogrel [44] and, therefore, have an increased risk for recurrent ischemic events after PCI [45]. Several different factors were discovered to contribute to the variability in clopidogrel response, including polymorphisms, impaired absorption or bioavailability, poor compliance and pre-existing conditions (increased body mass index, diabetes mellitus, ACS) [46]. In addition, clopidogrel is a prodrug that requires activation through the CYP system. The activated metabolite inhibits the ADP P2Y12 receptor [47]. Polymorphisms causing loss of function in the CYP system are associated with poor drug response. Most notably, the CYP 2C19*2 polymorphism was shown to lead to a 30% increased risk of major adverse cardiovascular events during treatment with clopidogrel [48][49][50][51]. Furthermore, the CYP 2B6*5 and P2Y12 polymorphisms are also associated with clopidogrel resistance [52]. In contrast, an enhanced response due to increased transcriptional activity occurs with the CYP 2C19*17 polymorphism, leading to increased risk of bleeding during clopidogrel therapy [53,54].
The CYP3A4*2 allele with a frequency of 2.7 % in Caucasian leads in vitro to reduced (six fold to nine fold) intrinsic clearance for nifedipine [55]. This could have a great influence on the tolerability of patients getting this dihydropyridine calcium channel blocker. Indications for nifedipine are widely distributed, e.g. Angina pectoris, Hypertonia, Achalasia and Raynaud's phenomenon, so the application is very common. An in vivo research regarding the alteration of nifedipine metabolism in CYP3A4*2 patients should be done, to possibly prevent toxic and/or increased side effects.
Previous findings described above, emphasize the importance of CYP polymorphisms and alternatively metabolized drugs in clinical practice. Prediction of CYP activity may be helpful to assess drug response. For instance, the (13)C-pantoprazole breath test, which measures CYP 2C19 activity, can detect clopidogrel resistance [56] and support use of suitable drug alternatives like Ticagrelor (no activation required, metabolized via CYP 3A4).

Additional observed effects
Apart from altered drug metabolism, CYP polymorphisms were also potentially associated with neoplastic growth, adverse psychological behavior and other diseases. In women, polymorphisms in CYP 1A1 seemed to increase susceptibility to genital cancers [57,58]. Conversely, the CYP 2D6*4 polymorphism has been shown to have a protective effect against breast cancer [59]. Furthermore, 2C19*2, 2D6*4, 2D6*10 and 1A1*2A have been associated with increased risk of head and neck squamous cell carcinoma [60].
In addition to the role that CYP polymorphisms play in pathological processes and susceptibility to certain diseases, recent genome-wide association studies (GWASs) have demonstrated an association between increased coffee consumption and SNPs rs2472297-T (located between CYP1A1 and CYP1A2) and rs6968865 (next to aryl hydrocarbon receptor) [61]. Huo et al. (2012) determined that certain SNPs are associated with increased susceptibility to schizophrenia [62], while Peñas-Lledó and colleagues found a positive association between the extent of active CYP 2D6 and frequency of suicide attempts, providing evidence that CYP diversity may need to be accounted for in clinical practice [63].

Diversity of expression in human tissues
Variable expression of functionally distinct CYP isoforms across different tissue types indicates that certain isoforms play specific roles in a tissue-dependent manner. Figure 6 provides an illustrative overview of CYP expression in the human body. Such knowledge may be useful for development of new prodrugs activated by a specific CYP highly expressed in the preferentially targeted tissue, ultimately leading to increased bioavailability at the target site and reduced side effects. On the other hand, variable expression of CYPs in different tissues may adversely affect drug efficacy in some tissues. Such a case could occur if drugs undergo an inactivation through a higher expressed CYP in their target tissues. Regardless, further clinical investigation is required.
Even polymorphic CYP isoforms show a heterogeneous tissue distribution. In particular, CYP 1A2, 2C19 and 2D6 are highly expressed in the pituitary gland. Furthermore, highest expression of CYP 2C9 was detected in the cerebellum, while greatest expression of 2D6 and 2B6 were found in skeletal muscle and kidneys. The influence of mutations in CYPs in particular organs remains to be determined and requires further investigation.
Differential distribution of CYPs may have an influence on specific side effects of drugs. For example, cyclophosphamide (CPA) therapy can lead to development of hyponatremia. CPA is a prodrug converted by CYP 2B6 into the active form [64]. The hyponatremia is the result of increased expression of aquaporins 1 and 7, which is induced by CPA [65]. CYP 2B6 has high expression in kidneys, indicating that a higher level of active CPA is likely to occur in the kidneys and lead to the undesirable side effect.

Conclusions
In summary, the current study identified four major CYPs (1A2, 2D6, 2C9 and 2C19) and 34 polymorphic alleles with a significant impact on the drug metabolism in the Caucasian population. Once genomic testing becomes part of routine analysis, this data enables prediction of complications in drug therapy and development of a personalized treatment regimen, where drug dosages are based on an individual's specific CYP profile [6]. Ultimately, this approach may prevent treatment failures and avoid unnecessary side effects. Another interesting field could be the consideration of CYP polymorphisms in clinical trials. Potentially, it would decrease the failures if information of potential polymorphisms in different ethnic groups was included. Findings from the current study will be included in the SuperCYP database.
With the aim of assessing the effects of CYP polymorphisms on chemotherapy and establishing a cost efficient method to detect relevant CYP polymorphisms, a retrospective study in leukemia cells from pediatric patients is currently under way [66].