Actionable pharmacogenetic variants in Hong Kong Chinese exome sequencing data and projected prescription impact in the Hong Kong population

Preemptive pharmacogenetic testing has the potential to improve drug dosing by providing point-of-care patient genotype information. Nonetheless, its implementation in the Chinese population is limited by the lack of population-wide data. In this study, secondary analysis of exome sequencing data was conducted to study pharmacogenomics in 1116 Hong Kong Chinese. We aimed to identify the spectrum of actionable pharmacogenetic variants and rare, predicted deleterious variants that are potentially actionable in Hong Kong Chinese, and to estimate the proportion of dispensed drugs that may potentially benefit from genotype-guided prescription. The projected preemptive pharmacogenetic testing prescription impact was evaluated based on the patient prescription data of the public healthcare system in 2019, serving 7.5 million people. Twenty-nine actionable pharmacogenetic variants/ alleles were identified in our cohort. Nearly all (99.6%) subjects carried at least one actionable pharmacogenetic variant, whereas 93.5% of subjects harbored at least one rare deleterious pharmacogenetic variant. Based on the prescription data in 2019, 13.4% of the Hong Kong population was prescribed with drugs with pharmacogenetic clinical practice guideline recommendations. The total expenditure on actionable drugs was 33,520,000 USD, and it was estimated that 8,219,000 USD (24.5%) worth of drugs were prescribed to patients with an implicated actionable phenotype. Secondary use of exome sequencing data for pharmacogenetic analysis is feasible, and preemptive pharmacogenetic testing has the potential to support prescription decisions in the Hong Kong Chinese population.


Introduction
Pharmacogenetics is the study of variability in drug response caused by genetic variations [1]. It is estimated that over two million hospitalized patients develop severe adverse drug reactions in the United States annually, incurring a direct medical cost of 200 billion US dollars (USD) [2]. This indicates a huge potential in reducing healthcare costs even if only a small proportion of adverse drug reactions are preventable by genotype-guided prescription. The current clinical applications of pharmacogenetics are mostly limited to reactive pharmacogenetic testing, in which investigations are ordered only when certain high-risk medications are prescribed, or after an adverse drug reaction has occurred. In contrast, preemptive pharmacogenetic testing allows the optimization of dosing based on genotype information at the time of prescription, minimizing the risk of undesired outcomes [3].
The pharmacogenetics community has established open-access platforms to facilitate the implementation of clinical pharmacogenetics. The Clinical Pharmacogenetics Implementation Consortium (CPIC) provides peer-reviewed, evidence-based guidelines for individual genedrug combinations to assist clinicians to incorporate pharmacogenetics into daily practice [4]. The Pharmacogenomics Knowledgebase (PharmGKB) provides a multitude of pharmacogenetic information and assigns a level of evidence on variant-medication pairs based on the quantity and quality of available evidence [5,6]. PharmGKB level 1A/B represents the highest level of evidence, which is either supported by large-scale studies or is widely accepted by the pharmacogenetics community. Multiple large population pharmacogenetics studies revealed that nearly all subjects (96.2-97.0%) had at least one actionable pharmacogenetic variant, with a median of two to three [7,8]. PharmGKB level 2A/B annotations are assigned to variant-drug combinations with moderate evidence of association, whereas levels 3 and 4 annotations are assigned to variants with low and preliminary levels of evidence, respectively [6] . The study of pharmacogenetic variants is often limited by sample size. Many actionable pharmacogenetic variants are known because they are relatively common in the population, or their associated adverse reactions are severe. In contrast, the effects of rare variants are largely unknown, but they should not be neglected since they are consistently found in the population and have been predicted to account for nearly all inter-individual variability in more than half of the known pharmacogenes [7,[9][10][11][12]. Therefore, the study of rare pharmacogenetic variants is important as it can potentially improve the prediction of drug responses.
It is known that pharmacogenetic variations exist across different ethnicities. Currently, Chinese pharmacogenetic data is limited. For example, Asians account for 2% of the eMER-GE-PGRNseq cohort [7]. The first large-scale analysis of actionable pharmacogenetic variants in Chinese was published only recently, yet the study did not examine rare variants and its prescription pattern analysis was performed based on data from a children hospital [13]. Besides, the Southern Chinese sub-population accounted for 22.4% of the study subjects and was underrepresented [13]. To address these issues, we examined the spectrum of 133 actionable pharmacogenetic variants and rare deleterious variants in 108 pharmacogenes using an exome sequencing cohort consisting of 1116 Hong Kong (HK) Chinese subjects, who are representative of the Southern Chinese subpopulation. In addition, the potential prescription impact of actionable pharmacogenetic variants was projected on the HK population.

Subjects and exome sequencing (S1 Fig)
A total of 1,141 unrelated, self-reported Chinese were enrolled for exome sequencing for rare disease diagnosis or complex disease research from 2012 to 2019. Exome sequencing was performed on genomic DNA derived from peripheral blood or buccal mucosa by Illumina sequencing platforms, and different exome capture kits were used (S1 Table). The processing of raw exome sequencing data is described in detail in the Supplementary Methods (S1 Text). Briefly, variant calling was performed using a pipeline based on the Genome Analysis Toolkit (GATK), and human leukocyte antigen (HLA) typing was performed using HLA typing from High-quality Dictionary (HLA-HD) [14,15]. The exome sequencing dataset was subjected to stringent quality control (QC) procedures at the sample, variant, and genotype levels and the output data were annotated using wANNOVAR [16]. To avoid over-representation of disease-associated variants, the samples collected from subjects with respiratory diseases and neuromuscular disorders were removed for CFTR and RYR1 analysis, respectively. In this study, a rare variant was defined as a variant having a Genome Aggregation Database (gnomAD) global allele frequency (AF) <1%. A missense variant was considered deleterious when it possessed a Phred-scaled Combined Annotation Dependent Depletion (CADD) score �20 [17], or Rare Exome Variant Ensemble Learner (REVEL) score �0.7, or PREDICT score �0.6; whereas a loss-of-function (LoF) variant was considered deleterious when it possessed a Phred-scaled CADD score �20 or a Loss-Of-Function Transcript Effect Estimator (LOFTEE) of "high-confidence" [18][19][20].

Selection of actionable pharmacogenetic variants and high-confidence pharmacogenes
To examine the spectrum of known actionable pharmacogenetic variants, those with clinical annotations of PharmGKB level 1A or 1B were selected since they represent variants with the highest level of evidence for clinical actionability of gene-drug associations [5,6,21]. In addition, the literature was reviewed to identify Chinese-specific pharmacogenetic variants that were actionable based on the CPIC guideline [22,23]. This review resulted in a list of 133 actionable pharmacogenetic variants and HLA alleles in 19 genes (S2 Table). To analyze rare pharmacogenetic variants that are potentially actionable, genes with at least one PharmGKB level 1 or 2 clinical annotation were selected, resulting in a list of 108 genes that were considered "high confidence pharmacogenes" (S3 Table) [5,6].

Projected prescription impact of actionable pharmacogenetic variants in Hong Kong
HK is a city in Southern China with a population of 7.5 million, of which 92% are ethnic Han Chinese [24]. In HK, over 90% of all secondary and tertiary health services are provided by the public healthcare system under the management of the Hospital Authority (HA) [25,26]. All public healthcare service data, including drug prescriptions in the inpatient and outpatient settings, are available in the Clinical Data Analysis and Reporting System (CDARS) database in an unlisted and anonymous manner.
Based on CPIC gene-drug annotations, 44 drugs were matched with known actionable pharmacogenetic variants observed in the exome dataset. Under the public healthcare system in 2019, 36 of these drugs were dispensed by pharmacies and had available data, and eight drugs were not prescribed. All patients in HK who had a prescription record of one or more of the 36 drugs at one or more of the hospitals under the HA between January 1, 2019 and December 31, 2019 were identified from the CDARS database. The number of subjects, drug quantity, and unit cost of each drug were retrieved for the estimation of prescription impact and drug expenditure. Duplications from multiple prescriptions within the same year, different routes of administration and doses for each drug were removed to identify the unique number of subjects. Pharmacophenotypes were assigned based on the genotype definition stated in the CPIC guideline. Phenotypes with at least one prescription recommendation based on the CPIC guideline were considered as actionable. The frequency of each actionable phenotype was derived from the allele frequency of the Hong Kong Chinese exome dataset based on the Hardy-Weinberg equation, except for CYP2D6. The frequency of actionable phenotype of CYP2D6 was retrieved from a published Hong Kong study, since exome sequencing cannot detect haplotype and copy number variations accurately [27]. The projected prescription impact for each drug was estimated by multiplying the frequency of actionable phenotype by the total number of subjects for each individual drug retrieved from CDARS. Drug expenditures were converted from HK dollars (HKD) to USD based on the conversion rate of 7.8 HKD = 1.0 USD. Of all variants identified, 3,501 (26.6%) have never been reported in public databases including gnomAD, dbSNP, and ClinVar (S4 Table). A significant linear relationship between gene transcript length and total number of variants in each gene (p = 0.0073) was observed, with an increase of 0.17 variants per kilobase of gene length (Fig 1).

Spectrum of known actionable pharmacogenetic variants
The majority of the 129 known actionable pharmacogenetic variants and four HLA alleles were well-covered in the exome sequencing data, except for four variants which could not be detected by exome sequencing because they are located in non-coding regions (S2 Table). For more than 90% of the samples, depths of >8X and >30X were achieved in 121 (93.8%) and 62 (48.1%) variants, respectively (S4 Fig). In our cohort, 25 known actionable variants and all four HLA alleles were identified, accounting for 15 genes and 44 implicated drugs (S5 Table). 104 actionable variants are absent in the HK Chinese population (AF = 0). The most prevalent variant in our cohort was rs1065852 in CYP2D6 (AF = 60.95%), a marker single nucleotide polymorphism (SNP) of a markedly reduced or null allele, while the most prevalent HLA risk allele was HLA-B � 15:02 (AF = 9.68%; S6 Table). Analyzing using a per-sample approach, 1,111 (99.6%) individuals harbored at least one actionable variant, with a median of four (Fig 2A). At the gene level, CYP2C19 (57.21%), CYP3A5 (43.38%), and CYP2B6 (40.51%) were the genes with the highest frequency of actionable phenotypes (Table 1). In terms of individual drugs, the antiplatelet drug clopidogrel (57.21%), immunosuppressant tacrolimus (43.38%), and anticoagulant warfarin (43.13%) had the highest frequency of actionable phenotypes (S7 Table).

Projected prescription impact of actionable pharmacophenotypes in HK
Based on the available prescription data in CDARS in 2019, a total of 1,006,046 HK Chinese patients had received at least one of the 36 drugs with CPIC guideline recommendations, representing 13.4% of the HK population. In general, the percentage of people prescribed with drugs with CPIC guideline recommendations increased with ages (S8 Table). For elderly patients (aged >60), 616,553 received one of the 36 drugs, accounting for 31.9% of the total elderly population. Preemptive pharmacogenetic testing is projected to have the greatest impact for simvastatin (146,167 patients, frequency: 25.81%), clopidogrel (26,304 patients, frequency: 57.21%), and ibuprofen (12,000 patients, frequency: 5.39%; Fig 3). In terms of drug The relationship between variant count, gene transcript length, and constraint (o/e score reported in gnomAD) was analyzed using multiple linear regression analysis. There was significant association between gene transcript length and total variant count (P = 0.0073). In general, the number of variants increased by 0.17 for every kilobase increase in gene length, although outliers existed. In the highly polymorphic gene CYP2D6, 29.5 variants were observed for every kilobase of gene length.
https://doi.org/10.1371/journal.pgen.1009323.g001 costs, the 36 drugs had a total expenditure of 33,520,000 USD, which accounted for 3.6% of the annual drug expenditure of HA in the 2018-2019 fiscal year. It was estimated that 8,219,000 USD (i.e., 24.5% of drug expenditures among the 36 implicated drugs) worth of drugs were prescribed to patients with an implicated actionable phenotype. Remarkably, tacrolimus (4,301,000 USD), escitalopram (777,000 USD), and simvastatin (710,000 USD) accounted for 70.4% of the total expenditure of drugs that were prescribed to patients with an implicated actionable phenotype (Fig 3). The use of genotype-guided prescriptions was estimated to have the greatest impact for ibuprofen (1,417

Spectrum of rare, predicted deleterious variants that are potentially actionable
A total of 13,165 variants were identified in the 108 pharmacogenes and 3,586 of them (27.2%) were rare variants (gnomAD AF<1%; S4 Table). Five hundred and thirty-one rare variants were predicted to be deleterious, including 475 (89.5%) missense variants and 56 (10.5%) LoF variants ( S6 Fig and S9 Table). Among the 531 rare deleterious variants, 96 (18.1%) have never been reported in public databases including dbSNP, gnomAD, and ClinVar.

Discussion
As exome sequencing becomes increasingly available, it has the potential to facilitate personalized medicine without significant additional costs. This study demonstrated the secondary use of exome data for pharmacogenetic analysis, and we found that the majority of known actionable pharmacogenetic variants and the coding regions of most pharmacogenes were well-covered. To our knowledge, this is the second study to investigate actionable pharmacogenetic variations in the Chinese population and it contained the largest sample size of Chinese subjects to date. Our data are representative of Southern Chinese, a Chinese subpopulation that is underrepresented in the literature. To the best of our knowledge, we, for the first time, have provided information on rare, predicted deleterious pharmacogenetic variants in Chinese. Additionally, the potential prescription impact of actionable pharmacogenetic variants was projected in the HK population of 7.5 million.

Burden and projected prescription impact of known actionable pharmacogenetic variations
Consistent with other populations, nearly all HK Chinese (99.6%) harbored at least one pharmacogenetic variant, with a median of four variants [7,8] . Nonetheless, the spectrum of actionable genotypes was different compared to that of African and European populations (Fig 4). The highest actionable phenotype in Europeans and Africans was IFNL3 and CYP3A5, respectively, which had a frequency of more than 80% in their respective populations; however, less than half of the HK Chinese carried an actionable phenotype in these genes. In contrast, while NUDT15 ranked seventh among the drugs with highest actionable phenotypes in HK Chinese (frequency: 18.58%), only 1% of Europeans and Africans carried actionable phenotypes in this gene. It is therefore more important to consider defective NUDT15 alleles when azathioprine is prescribed in the Chinese population. In contrast to NUDT15, which is the key determinant of azathioprine-induced myelosuppression in Asians, defective TPMT alleles should be considered in Europeans instead [28,29]. This suggests that even when the same drug is prescribed, different pharmacogenes should be considered in each population. Furthermore, considering the same pharmacogene, the alleles contributing to actionable phenotypes could be different across different populations. For example, while G6PD deficiency is common in Africans and Chinese, the A allele alone explains nearly all G6PD deficiencies in Africans, whereas seven G6PD alleles contribute to the deficiency in HK Chinese [30]. Based on prescription data of the HK public healthcare system, 13.4% of the HK population (1,006,046 patients) received at least one of the 36 drugs with CPIC guideline recommendations. The total expenditure on CPIC actionable drugs in 2019 was 33,520,000 USD, and it was estimated that 8,219,000 USD (24.5%) worth of drugs were prescribed to subjects with an implicated actionable phenotype. Pharmacogenetic results would improve patient care and allocation of resources. For example, in order to reduce the risk of thiopurine-related myelosuppression, 1,354 (17.63%) patients taking azathioprine would require a starting dose reduction of 30-80%, and 66 (0.95%) patients would require alternative medications. Another example is the lipid-lowering drug simvastatin, which was found to have the greatest prescription impact in terms of headcount in our study. A previous study on SLCO1B1 reported an 18% cumulative risk of simvastatin-induced myopathy for the CC genotype (frequency in HK: 1.92%) and a 3% cumulative risk for the CT genotype (frequency in HK: 23.89%) [31]. Projecting from the frequency of actionable phenotype derived from our dataset onto the CDARS prescription data, it was estimated that 146,167 patients prescribed with simvastatin had an actionable phenotype in 2019. If all of these patients prescribed with a lower dose of simvastatin or another statin, it was estimated that 6,019 cases of simvastatin-induced myopathy can be prevented. Dosing can also be increased based on genotype-guided prescription. Tacrolimus, an immunosuppressive drug used by organ transplant recipients to lower the risk of organ rejection, was predicted to affect 1,813 (43.38%) people, accounting for the highest expenditure (4,301,000 USD) among all the CPIC actionable drugs in HK. Based on the pharmacogenetic results, these 1,813 people required a 1.5 to 2 times increase in starting dose, further increasing the prescription cost. Nevertheless, genotype-guided prescription in tacrolimus is known to achieve therapeutic concentrations earlier and had fewer out-of-range concentrations when compared to standard dosing [32]. With expansion of the CPIC guidelines to include other commonly prescribed drugs such as tramadol and proton pump inhibitors, the prescription impact of pharmacogenetic testing is likely to increase in the near future.

Spectrum of rare, deleterious pharmacogenetic variants
The effect of rare pharmacogenetic variants on drug responses should not be neglected, since they may account for nearly all inter-individual variabilities in more than half of the pharmacogenes [11]. At present, pharmacogenetic testing is performed by SNP arrays targeting specific alleles and hence, the detection of rare variants is impossible. In contrast, exome or genome sequencing would result in an abundance of rare variants, but determining the effect of rare variants on drug responses is difficult without functional data [19]. In the present study, we aimed to maximize the discovery of potentially deleterious pharmacogenetic variants. We found that 93.5% of subjects carried at least one rare, deleterious pharmacogenetic variant, with a median of two variants. This suggests that despite being individually rare, pharmacogenetic variants with AF <1% are collectively common. The rare, deleterious pharmacogenetic variants reported in this study can be prioritized for functional studies, especially for variants with consensus deleterious effects predicted across multiple bioinformatics tools and with known gene mechanisms. An example would be the CYP2C9 splice variant c.1291 +1G>T, which was predicted to be deleterious by both CADD and LOFTEE. Although this variant has not been reported in PharmVar, other LoF variants in CYP2C9 have been reported as "no function" [26]. In the future, saturation mutagenesis studies will likely aid in determining how deleterious rare pharmacogenetic variants are [33].

Study limitations
First, due to the technical limitations of exome sequencing, non-coding regions, copy number variations, structural variations, and loci with high genomic complexity either were poorly/not covered with exome sequencing or were difficult to detect with bioinformatics. For example, four actionable variants located in non-coding regions were not sequenced, and four variants, UGT1A1 � 28, IFNL3 rs12979860, CYP2C19 rs4244285 and CYP3A5 rs776746, were sequenced in less than 70% of the samples (S2 Table). The reported AF of these variants should therefore be interpreted with caution. Advancements in bioinformatics may have overcome some technical difficulties, such as using HLA-HD in this study for HLA typing. The AF of HLA identified in this study agreed well with that of the HK Bone Marrow Donor Registry (S6 Table), suggesting the accuracy of HLA-HD [34]. In the future, copy number variations and structural variations could be more reliably identified from exome data with bioinformatics. The second limitation is related to the projected impact of preemptive pharmacogenetic testing. The prescription data of private medical practitioners and over-the-counter medications were not available in this study. Therefore, our analysis was limited to prescription data in the public healthcare setting. Although our data were limited to public hospitals, the public healthcare system accounted for approximately 90% of all secondary and tertiary health services in HK. Lastly, indications for drug prescription could not be retrieved from the CDARS database. The CPIC guideline only recommended CYP2C19 genotype-guided prescription of clopidogrel in patients with acute coronary syndrome receiving anti-platelet therapy for percutaneous coronary intervention, therefore the projected impact of clopidogrel could be over-estimated [35].
In conclusion, nearly all individuals carried at least one actionable pharmacogenetic variant and one rare, deleterious pharmacogenetic variant in our cohort. It was estimated that oneseventh of the HK population received at least one of the 36 drugs with CPIC guideline recommendations, and 8,219,000 USD worth of drugs were prescribed to patients with an implicated actionable phenotype. This indicates that preemptive pharmacogenetic testing has the potential to improve patient care and allocation of resources significantly.
Supporting information S1 Text. Supplementary Methods. (DOCX) S1  Concatenated exome sequencing data was first run through sample-, variant-, and genotype-level quality control (QC) procedures. In the analysis of known actionable pharmacogenetic variants, we first extracted data based on a curated list of 129 variants and four HLA alleles, and subsequently projected the prescription impact in the Hong Kong public healthcare system. We further processed the dataset for analysis of rare variants of the 108 high-confidence pharmacogenes. The final list of rare, predicted deleterious variants included only missense and loss-of-function (LoF) variants with gnomAD allele frequency (AF) <1% and at least one deleterious prediction by bioinformatics algorithm. (TIF) S2 Fig. Coverage of the coding regions of the 108 high-confidence pharmacogenes. In general, exome sequencing covered the 108 high-confidence pharmacogenes well, with 104 of them having a mean coverage of at least 8X in over 75% of the samples. Genes that did not have a mean coverage of 8X included CCHCR1, TNF, IFNL4, and GSTM1. Mean coverages of 30X and 50X were achieved in over 75% of samples for 90 and 31genes, respectively. (TIF) rare (gnomAD global AF <1%) missense variants in the 108 high-confidence pharmacogenes, 475 variants were predicted to be deleterious by at least one of the three bioinformatics tools (CADD, REVEL, and PREDICT), and 89 variants had consensus deleterious predictions. There were 354 rare missense variants that were not predicted as deleterious by all of the bioinformatics tools. (B) Among the 63 rare LoF variants, 56 were predicted to be deleterious by either CADD or LOFTEE, and 43 variants had consensus deleterious predictions. There were 7 rare LoF variants that were not predicted as deleterious by both of the bioinformatics tools. AF, allele frequency; LoF, loss-of-function.