A Novel Simple Method for Determining CYP2D6 Gene Copy Number and Identifying Allele(s) with Duplication/Multiplication

Background Cytochrome P450 2D6 (CYP2D6) gene duplication and multiplication can result in ultrarapid drug metabolism and therapeutic failure or excessive response in patients. Long range polymerase chain reaction (PCR), restriction fragment length polymorphism (RFLP) and sequencing are usually used for genotyping CYP2D6 duplication/multiplications and identification, but are labor intensive, time consuming, and costly. Methods We developed a simple allele quantification-based Pyrosequencing genotyping method that facilitates CYP2D6 copy number variation (CNV) genotyping while also identifying allele-specific CYP2D6 CNV in heterozygous samples. Most routine assays do not identify the allele containing a CNV. A total of 237 clinical and Coriell DNA samples with different known CYP2D6 gene copy numbers were genotyped for CYP2D6 *2, *3, *4, *6, *10, *17, *41 polymorphisms and CNV determination. Results The CYP2D6 gene allele quantification/identification were determined simultaneously with CYP2D6*2, *3, *4, *6, *10, *17, *41 genotyping. We determined the exact CYP2D6 gene copy number, identified which allele had the duplication or multiplication, and assigned the correct phenotype and activity score for all samples. Conclusions Our method can efficiently identify the duplicated CYP2D6 allele in heterozygous samples, determine its copy number in a fraction of time compared to conventional methods and prevent incorrect ultrarapid phenotype calls. It also greatly reduces the cost, effort and time associated with CYP2D6 CNV genotyping.


Introduction
The cytochrome CYP2D6 is responsible for the metabolism of more than 30% of all orally administered drugs including many antipsychotics, antidepressants, antiarrhythmics, and opioid analgesics [1]. The interest in studying the CYP2D6 gene continues to grow because of its significant contribution to interindividual variability in drug metabolism, resulting in higher incidence of adverse events or lack of therapeutic efficacy.
The CYP2D6 gene is located on chromosome 22, and is flanked by two pseudogenes CYP2D7 and CYP2D8 that share 95% sequence homology with CYP2D6 [2,3]. Owing to its highly polymorphic nature, more than 100 CYP2D6 variant alleles have been identified to date (www.cypalleles.ki.se). The genetic variations in CYP2D6 results in four different drug metabolism phenotypes; poor metabolizer (PM), intermediate metabolizer (IM), extensive metabolizer (EM), and ultrarapid metabolizer (UM). The latter is the result of gene duplication/multiplication and occurs with inheritance of more than two copies of the fully functional CYP2D6 alleles [4].
Current genotyping techniques targeting the CYP2D6 variant alleles have facilitated phenotype prediction from the genotype information without the need for the administration of CYP2D6 probe drugs [5]. Hence, the correct genotyping and genotype to phenotype translation is very important in a clinical setting where this information may serve as a guide for individualization of drug therapy [4].
The presence of different genetic variations in CYP2D6 such as single nucleotide polymorphisms (SNPs), short insertion and deletions, gene conversions, copy number variations (CNVs), including gene deletion and duplication/multiplications of whole gene, mandates the use of more than one assay to determine the CYP2D6 genotypes [3,6]. Additionally, in certain cases of gene duplications or multiplications, the process of deriving phenotypes from genotypes via the activity score system becomes difficult because most of the copy number assays do not identify which of the two alleles carries the duplication or multiplication. Long range PCR, sequencing or Southern-RFLP methods are typically used to resolve this issue [3,7,8,9,10,11]. However, these methods are laborious and costly and hence impractical in clinical settings where rapid retrieval of phenotype information is often warranted.
In this study we describe a simple rapid allele quantification-based method by pyrosequencing that can identify the duplicated CYP2D6 allele and estimate its copy number in heterozygous samples that carry duplications/multiplications of the CYP2D6 gene.

DNA samples
A total of 218 DNA samples from the Pharmacogenomic Evaluation of Antihypertensive Responses (PEAR-II) clinical study (clinical trial.gov identifier: NCT01203852) that were isolated from blood leukocytes by the FlexiGene DNA Kit (Qiagen, Valencia, CA, USA) were used to genotype CYP2D6 SNPs and copy number variations. All the subjects from the PEAR-II clinical study (clinical trial.gov identifier: NCT01203852) provided written informed consent to participate for screening before active enrollment and supply genetic material. The study protocol was approved by the institutional review board of each participating institution [12]. These 218 samples were collected from 125 Caucasians (57.5%), 84 African Americans (38.5%), 1 Asian (0.5%), and 8 other (3.5%). Eighteen (8.25%) of 218 PEAR-II DNA samples with different CYP2D6 gene copy numbers (1, 2, 3, and 4 copies) and all those with duplications or multiplications were selected for further analysis in this study [12]. These 18 samples along with another 19 DNA samples (Table 1) [13] with known CYP2D6 copy number purchased from Coriell Cell Repository (http://ccr.coriell.org/) were used in this study to evaluate the new CYP2D6 pyrosequencing allele quantification-based genotyping method. The Coriell DNA samples with their genotypes are shown in Table 1.
CYP2D6 genotype and phenotype determination All DNA samples were genotyped for CYP2D6 Ã 2, Ã 3, Ã 4, Ã 6, Ã 10, Ã 17 and Ã 41 variant alleles by pyrosequencing [12,14]. All the pyrosequencing reactions were carried out on Pyrosequencing PSQ HS 96 platform according to the manufacturer's recommendations (Qiagen, Valencia, CA, USA). The PCR reaction was performed in a final volume of 12.5 μl that consisted of; 6.5 μl HotStarTaq Master Mix (Qiagen), 1 μl of 10 pmole forward PCR primer, 1 μl of 10 pmole reverse PCR primer, 2 μl of H2O, and 2 μl of (20 ng/ μl) genomic DNA. The PCR reactions included 45 cycles with the following conditions: 95°C for 15 minutes followed by 95°C for 30 seconds, 60°C for 30 seconds, 72°C for 1 minute, and a final extension at 72°C for 7 minutes. The PCR and sequencing primers and annealing temperature for pyrosequencing assays are shown in table 2. The CYP2D6 metabolizer phenotype was inferred from the genotype information based on the activity score system recommended by the Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines [4].  NA18959

CYP2D6 copy number determination
The CYP2D6 gene copy number for all DNA samples was first determined by TaqMan Copy Number Assay (Life Technologies, CA), and then by the new pyrosequencing allele quantification-based method. For the TaqMan method, RNase P assay (ID: 431683) served as the internal control for copy number analysis (Life Technologies, CA). The primers used in this method were selected to target a specific sequence on CYP2D6 exon 9 (TaqMan Copy Number Assay ID: Hs00010001_cn), and intron 6 (TaqMan Copy Number Assay ID: Hs04502391_cn). The CYP2D6 exon 9 copy number assay was used to quantify all non-CYP2D6 Ã 36 alleles and the CYP2D6 intron 6 copy number assay was used to detect and quantify all alleles including the CYP2D6 Ã 36 allele [11,13]. All the samples were run in quadruplicates along with 4 Corriel DNA samples [15] with known copy number used as positive controls carrying a CYP2D6 deletion (1 gene copy), and 2, 3, and 4 CYP2D6 gene copies. The TaqMan copy number assay was performed according to the manufacturer's recommendations and published protocol [11,13]. Relative quantification of CYP2D6 gene copy number was performed by using CopyCaller Software (Life Technologies, CA) following the comparative ΔΔC T method [6,13,16]. After genotyping the DNA samples for CYP2D6 Ã 2, Ã 3, Ã 4, Ã 6, Ã 10, Ã 17, Ã 41 by pyrosequencing on PSQ HS 96 platform, the allele quantification application of the pyrosequencing was used to analyze and quantify the CYP2D6 gene copy number, and also to identify which allele carried the duplication or multiplication. The variations in CYP2D6 gene copy number were assessed by measuring the percentage of each allelic base at the polymorphic site which defines each variant allele by using the allele quantification (QA) option in the Pyrosequencing analysis software. The number of samples to be genotyped by this method can vary from 1 to 96 sample(s) at a time based on the workload.

Statistical Analysis
The chi-squared test with one degree of freedom was used to test the departure from Hardy-Weinberg equilibrium (HWE) for each CYP2D6 variant allele in different race/ethnic group. Mean allelic ratios are represented with standard deviations (SD). Data (means of allelic ratios) for samples with 2, 3, and 4 gene copies were compared by Analysis of Variance test (ANOVA), and P<0.05 was considered statistically significant. Post hoc analysis using Tukey's Table 2. Pyrosequencing PCR and sequencing primers for CYP2D6*2, *3, *4, *6*10, *17, *41 alleles.

CYP2D6 alleles PCR Primer
Pyrosequencing Primer test was done to identify the copy number associated with a mean allelic ratio that was significantly different from the others. Pearson's correlation test was also performed to determine whether there is a positive linear correlation between gene copy number and allelic ratio. All statistical analysis were performed with SAS software (version 9.3, NC, USA)

PCR and Pyrosequencing assay optimization
Since the genotyping and allele quantification are performed simultaneously in this method, the PCR reactions and pyrosequencing assays for CYP2D6 Ã 2, Ã 3, Ã 4, Ã 6, Ã 10, Ã 17, Ã 41 were optimized to be robust, efficient and reproducible. The annealing temperature for all PCR reactions was adjusted to 60°C, and the annealing temperature for pyrosequencing reaction was kept at 80°C for 3 minutes for all the assays. Under these conditions the DNA samples with known [13,15] and unknown CYP2D6 copy numbers [12] (2, 3, and 4) were repeatedly genotyped (at least 4 times on different days) to assure the reproducibility of our assay.

Assay verification
DNA samples from Coriell Cell Repository (http://ccr.coriell.org/) with known CYP2D6 copy numbers (2, 3, and 4 copies) and genotypes (Table 1) were used to verify the gene copy number by the new pyrosequencing quantification-based method. These samples were genotyped by pyrosequencing for CYP2D6 Ã 2, Ã 3, Ã 4, Ã 6, Ã 10, Ã 17, Ã 41 alleles and copy number variations. The CYP2D6 genotypes and copy number variations results from the current method and known Coriell samples [13] (Table 1) were in 100% concordance. There was also no difference between pyrosequencing method and TaqMan copy number assay in determining the CYP2D6 copy number variations.

CYP2D6 genotyping results
The genotypes and phenotypes of CYP2D6 Ã 2, Ã 3, Ã 4, Ã 6, Ã 10, Ã 17, Ã 41 alleles for all 218 PEAR-II DNA samples are shown in Table 3 [12]. All the genotypes were in Hardy-Weinberg Equilibrium (HWE). CYP2D6 Ã 1/ Ã 2 (18.8%) and CYP2D6 Ã 1/ Ã 1 (13.76%) were the most prevalent genotypes and made up 32.5% of the samples. The EM (84.4%), and UM (3.7%) phenotypes were the most and least frequent phenotypes respectively based on the CYP2D6 activity score. The percent distribution of each allele specific nucleotide varied in a manner proportional with the copy number of its allele specifically in heterozygous cases with no gene copy number variations (Fig. 1A), the percent distribution of both alleles was equal or close to 50% whereas those with a total of 3 and 4 gene copy numbers, the pyrosequencing software yielded allele percent distributions of (62% to 70%), and (72% to 76%), respectively ( Fig. 1B and 1D). There was no discordance between the results from our method (allele quantification-based Pyrosequencing), those from TaqMan copy number assay and reported known CYP2D6 copy number samples.
Deriving accurate phenotypes from the genotype information by standard approaches that do not identify the duplicated/multiplicated allele (e.g. TaqMan copy number assay) was a challenge for 12 samples that had heterozygous genotypes and copy number variations in the CYP2D6 gene. The inferred metabolic phenotype for each of the 12 samples varied depending on which of the two alleles existed in multiplications or duplications (Table 3). For example, one sample had a genotype of Ã 1/ Ã 17 and a total gene copy number of 3. According to the recent CPIC guidelines for codeine therapy, the Ã 17 variant allele is partially active and thereby is assigned an activity score of 0.5 whereas the wild type allele, Ã 1, is given an activity score of 1 indicating a fully functional allele. Therefore, if the Ã 1 allele is duplicated, then the predicted phenotype would be ultrarapid metabolizer (activity score = 2.5, greater than 2, the cut off for EM phenotype), whereas if the Ã 17 allele is duplicated it would have an activity score = 2, thus an EM. Based on our method, the percent distribution of the two alleles (C and T) at the polymorphic position in this example were 65.3% and 34.7%, respectively, suggesting that the variant allele, Ã 17 had a higher percent distribution or copy number in comparison with the wild type allele, Ã 1 (Fig. 1C). Hence the genotype derived phenotype for this sample was EM and not UM (Table 4).
In homozygous states where there are 2 copies of the same CYP2D6 allele, the percent distribution of the allele was about 100%. This method yields the same percentage (100%) for allele quantification in cases with one gene copy (carriers of the CYP2D6 Ã 5 variant allele), or in homozygous samples with 2, 3, and 4 CYP2D6 gene copies. The main limitation of this method is the inability to detect Ã 5 and homozygous samples carrying more than 2 copies of CYP2D6 gene. Of the 218 samples, 55 (25.2%) were homozygous for the CYP2D6 gene and 18 (8.3%) were carriers of the Ã 5 variant allele. Therefore, inferring the metabolizer phenotypes from the assigned CYP2D6 genotypes for these samples was not possible by this method.

CYP2D6 allelic ratio
The mean allelic ratios for samples with 2, 3, and 4 gene copies were 0.97, 1.83, and 2.76, respectively (Table 5 and Fig. 2), and varied significantly with the number of the CYP2D6 gene copy (ANOVA test, p = <0.0001). Additionally, Pearson's correlation coefficient (r = 0.95) indicated a positive linear correlation between allele ratio and gene copy number.

Discussion
In this study we describe a simple and reliable allele quantification-based Pyrosequencing genotyping method that facilitates CYP2D6 CNV genotyping while also identifying allele-specific CYP2D6 copy number variation in heterozygote samples. DNA samples carrying 2, 3, and 4 CYP2D6 copy numbers were identified and also the allele which had the duplication or multiplication was determined. Determination of CYP2D6 genotype as well as copy number is important in the clinical setting where phenotype prediction and individualization of drug therapy depends on the accuracy of the phenotypes that is inferred from genotype information.
In the current method for analyzing CYP2D6 gene copy number, we used a different and novel strategy based on pyrosequencing. Our approach is simply based on calculating the ratio of the two CYP2D6 alleles from the percentage (quantification) of each allele that is generated by the software at the end of the pyrosequencing reaction. Nevertheless, the means of the allelic ratios for samples with 2 and 3 gene copies were comparable to what was reported before (0.98 for 2 gene copies and 1.7 for three gene copies) [17]. Additionally, we were able to estimate the mean allelic ratio for samples that carried a total of 4 CYP2D6 gene copies (allelic ratio = 2.7), which further underscores the importance of our approach in term of distinguishing between samples with 2, 3 and 4 copies of the CYP2D6 gene as reported by others [6,17].
TaqMan copy number assay is the most commonly used method for quantification of gene copy number mainly because of its simplicity in comparison with other complex assays such as long range PCR, RFLP, sequencing and Southern blotting [3,6,7,8,9,13]. Its accuracy and reproducibility have been evaluated in several published studies [18]. Söderbäck et al, 2005, described a pyrosequencing-based method to determine CYP2D6 gene copy number measuring the ratio of the CYP2D6 and CYP2D8 specific peak heights. They reported peak height ratios of 0.5 for samples with one copy of CYP2D6 gene, 1 for samples with 2 copies and finally 1.5 for those with 3 gene copies.
Although there are different methods described in the literature for analyzing copy number variations [6,[17][18][19][20], they all share the same limitation, and that is their inability to determine which of the two CYP2D6 alleles in a DNA sample carries duplication or multiplication, thereby adding more complexity to the process of inferring genotype derived phenotypes.
In contrast to long range PCR and sequencing or Southern-RFLP, which are labor intensive and time consuming, this method allows rapid estimation of gene copy number and most importantly identification of duplication/multiplication in heterozygous samples. One of the limitations of this assay is that, at present it cannot quantify gene copy number in samples that are homozygous for the CYP2D6 gene or carry the Ã 5 allele since only one CYP2D6 allele is present for sequencing. The other limitation is that our assay cannot discriminate between CYP2D6 Ã 10 and CYP2D6 Ã 36 that share the same SNP (100 C>T). There is ongoing research in our lab to find a way to overcome these limitations.
To our knowledge, our method for determining copy number variations in CYP2D6 has never been explored before. The method described here is straightforward, easy to perform, and reduces the cost as well as the time of genotyping significantly because it can identify the CYP2D6 SNPs of interest and measure CNVs in a single reaction. Most importantly, it overcomes the limitations of the other techniques with regards to identifying the duplicated CYP2D6 allele in heterozygous states, and thereby it allows faster prediction of the phenotypes from the genotypic data.