Drug addiction is a costly and recurring healthcare problem, necessitating a need to understand risk factors and mechanisms of addiction, and to identify new biomarkers. To date, genome-wide association studies (GWAS) for heroin addiction have been limited; moreover they have been restricted to examining samples of European and African-American origin due to difficulty of recruiting samples from other populations. This is the first study to test a Han Chinese population; we performed a GWAS on a homogeneous sample of 370 Han Chinese subjects diagnosed with heroin dependence using the DSM-IV criteria and 134 ethnically matched controls. Analysis using the diagnostic criteria of heroin dependence yielded suggestive evidence for association between variants in the genes CCDC42 (coiled coil domain 42; p = 2.8x10-7) and BRSK2 (BR serine/threonine 2; p = 4.110−6). In addition, we found evidence for risk variants within the ARHGEF10 (Rho guanine nucleotide exchange factor 10) gene on chromosome 8 and variants in a region on chromosome 20q13, which is gene-poor but has a concentration of mRNAs and predicted miRNAs. Gene-based association analysis identified genome-wide significant association between variants in CCDC42 and heroin addiction. Additionally, when we investigated shared risk variants between heroin addiction and risk of other addiction-related and psychiatric phenotypes using polygenic risk scores, we found a suggestive relationship with variants predicting tobacco addiction, and a significant relationship with variants predicting schizophrenia. Our genome wide association study of heroin dependence provides data in a novel sample, with functionally plausible results and evidence of genetic data of value to the field.
Citation: Kalsi G, Euesden J, Coleman JRI, Ducci F, Aliev F, Newhouse SJ, et al. (2016) Genome-Wide Association of Heroin Dependence in Han Chinese. PLoS ONE 11(12): e0167388. https://doi.org/10.1371/journal.pone.0167388
Editor: Brion Maher, Johns Hopkins University Bloomberg School of Public Health, UNITED STATES
Received: March 15, 2016; Accepted: November 14, 2016; Published: December 9, 2016
Copyright: © 2016 Kalsi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are contained in the manuscript and Supporting Information.
Funding: This research was supported by the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007-2013) under the REA grant agreement no 302040. Lilly UK, provided support in the form of salaries for authors DAC and GB, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: We have the following interests. Prof David A. Collier is employed by Lilly UK and Dr Gerome Breen is a consultant with Lilly UK through his work on schizophrenia. There are no patents, products in development or marketed products to declare. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.
Substance dependence, such as heroin addiction, can have a devastating impact on the lives of the affected individuals, their families and the wider society. According to the World Health Organization (UNODC, World Drug Report 2012; www.who.int), heroin use has increased two to three fold since the 1980s and the ensuing health concerns are severe; the risk of death in drug users is 20 to 30 times greater than in non-drug users, mostly from overdose or acquired infections. Heroin produces strong euphoric effects and addiction can develop rapidly in vulnerable individuals. Dependence on the drug represents a chronic and relapsing condition, characterized by compulsive consumption, craving, tolerance, withdrawal symptoms and negative behavioral effects. Although dependency is frequently a culmination of complex interactions between behavioral, cognitive, and physiological factors, genetic factors can contribute 30–80% of the liability to risk [1,2]. The combination of findings from genetic studies, model organisms and molecular studies in humans has led to the hypothesis of a biological underpinning to heroin dependence, with particular emphasis on the role of the central nervous system . This has inevitably led to an interest in genes encoding molecules in neural systems; these represent biologically plausible candidate genes and those involved in reward-processing, cognition, stress and anxiety have been studied most intensively . As opioid receptors are critical for modulating the euphoric effects of the drug, variants in the genes encoding the opioid receptors have been tested extensively and there is good evidence in support of the A118G polymorphism in the mu opioid receptor gene (OPRM1) [5,6], albeit with mixed results. A recent meta-analysis has conducted a thorough analysis of this polymorphism, by defining uniform phenotypes across a range of addictive substances, including heroin, and reported a modest protective effect of the G allele in European populations . Other neural systems have also been tested, including the dopaminergic, glutamatergic, GABAergic, and serotonergic [8–10] systems. The candidate gene approach relies on a prior hypothesis of functionally plausible genes, yet has yielded few robust or replicated findings, being limited by a number of flaws including a neglect of the false positive rate inherent in genetic studies. One can posit that other biological systems, hitherto untested, may be relevant and indeed provide information related to mechanisms underlying heroin addiction; the genome-wide approach which is agnostic to prior hypotheses offers the opportunity of finding such novel genetic variants.
Genome wide association studies (GWAS) testing heroin addiction [11,12] and opioid dependence [13,14] have produced evidence of the role of novel loci and even replicated evidence from candidate gene studies. In the very first GWAS on heroin addiction, , testing 104 methadone-maintained former heroin addicts of Caucasian ethnicity and 101 matched controls, the authors reported suggestive association with GABRA3 (gamma-aminobutyric acid, receptor subunit alpha 3), a candidate gene for heroin addiction. Despite the small sample size, one marker (rs965972; chr1q31.2) survived correction for multiple testing, however it was not located near any gene. Building on these early findings, the authors expanded the study by increasing the sample size as well as the number of markers tested ; 325 ethnically mixed, methadone-stabilized former heroin addicts were compared to 250 control individuals using a 100K Affymetrix array. Some overlap was observed between the two studies, whereby the top markers were located on chromosome 1q23, albeit about 30kb apart. In the African-American sub-sample (125 cases; 100 controls), the most significant SNP (rs950302) was located in the gene DUSP27 (cytosolic dual specificity phosphatase 27), with point-wise significance (p = 0.0079) for association with heroin addiction vulnerability.
Two recent GWAS are of particular relevance, one because of the larger sample size and the additional analyses for risk pathways , while the second one used exposed and non-exposed controls . Gelernter et al  performed a well-designed study utilizing an initial discovery sample of 5697 individuals, followed by two replication stages: stage 1, N = 4063 participants; stage 2, N = 2549 participants, all satisfying the criteria for opioid dependence. The increase in sample size was sufficient to produce results that were replicated across the different phases and interestingly, that were functionally relevant to the phenotype. Association analyses produced population-specific variants for the analyses using case-control status, symptom count and meta-analyzed results of all three phases. The African-American sample yielded the most significant results with variants in KCNG2 (potassium voltage-gate channel modifier subfamily G member 2). Subsequent pathway analyses captured the role of subthreshold results to reveal two potentially functional pathways, calcium signaling and synaptic long term potentiation. Neurotransmitter signaling plays a key role in drug dependence [3,15]; CAMK2B (calcium/calmodulin dependent protein kinase II beta) was shown to be a hub molecule in pathways relevant to drug addiction , whereas long term potentiation could modulate heroin relapse through the glutamate receptor NR2B (NMDA2b-containing receptor) . The groundbreaking study by Nelson et al  utilized a novel and highly valid design, the comparison of exposed and non-exposed controls with affected individuals. Most genetic studies in addiction tend to use unexposed controls which are useful for assessing dependence on drugs but tend to lead to reduced power when analyzing intermediate or later stages of addiction. Comparison of opioid-dependent subjects with opioid misusers, namely those individuals who had not progressed to dependence, revealed a protective role of variants in the gene CNIH3 (cormichon family AMPA receptor auxiliary protein 3). Identification of protective variants is useful as these could serve as biomarkers to prevent transition from opioid use to dependence, and thus help translational work.
The current study is the first genome wide study to test Han Chinese individuals for association with heroin dependence. The samples were hybridized to the Illumina HumanCoreExome-12v1_A microarray, developed to capture extensive genomic variation including rare single nucleotide variants and insertions/ deletions (indels). We assessed the data for risk variants as well as pathways that might be functionally relevant in heroin dependence. It would have been relevant to test exposed controls but these were not available hence the analysis was limited to dependence. Nonetheless, the results are of value due to the novelty of population tested as well as the results of the post hoc analyses, such as gene-based association tests, and polygenic risk scores.
Materials and Methods
Written, informed consent was obtained from all participants and peripheral blood samples were collected for DNA extraction. The study was conducted in accordance with ethics approval granted by Internal Review Committees of King’s College London, UK (No. 103/02) and West China Hospital, for conducting genetic studies using the Chinese sample of cases and controls. The UK National Health Service (NHS) Research Ethics Review Committee approved the use of the sample for large scale genetic studies.
Our initial sample of 567 individuals was comprised of 398 heroin addicts and 169 controls, with a mean age of 26–31 years, as shown in Table 1. The participants were predominantly northern Han Chinese from Sichuan Province; these are a more homogeneous population than the southern Han. Chinese ethnic minorities were not included in the sample to reduce the likelihood degree of population stratification. The cases were recruited from inpatient clinics attached to two psychiatric hospitals in Chengdu (Southwest China) and were diagnosed with DSM-IV criteria for heroin dependence using a semi-structured clinical interview. This interview included questions on (i) age at first use of heroin and duration of use, (ii) the quantity of drug consumed over this period, (iii) whether the patient had abused other addictive substances such as alcohol, cocaine, cannabis, etc., (iv) co-morbidity of other psychiatric conditions. Those subjects who were abusing other substances or suffered from a major psychiatric illness such as recurrent major depression, schizophrenia or bipolar disorder were not included in the study. To corroborate patient interview, case notes were examined and a family member was also used as an informant. As the abuse of drugs other than heroin is uncommon is Southwest China, it was possible to include most subjects interviewed. However, there were some subjects who admitted to excessive alcohol use and these were excluded. The control sample was recruited from college staff, medical students and acute medical inpatients in a general hospital, none of whom had any neurological or psychiatric disorders or a family history of psychiatric or addictive disorders. The control participants did not undergo a formal psychiatric interview but were asked if they had ever been told by a doctor that they suffered from a mental or neurological illness, or had been prescribed drugs or admitted to hospital for such an illness.
Genotyping and quality control
Genomic DNA was extracted from blood samples using standard phenol-chloroform procedures and was initially quantified using spectrophotometry. Prior to hybridization to the chip, all samples were re-quantified using pico-green fluorimetry and DNA quality was assessed using standard gel electrophoresis techniques; this ensured that only samples of high quality DNA were used. Automated procedures were used to hybridize the cases and controls to the HumanCoreExome-12v1_A Beadchip (Illumina Inc., San Diego CA, USA) and scanned on the Illumina HiScan platform, using standard protocols. This particular chip comprises of 547644 markers, including all the tag markers (264,909 markers) on the HumanCore beadchip, over 240,000 markers of the HumanExome chip as well as several rare variants.
Genotyped data was first assessed in GenomeStudio using the GenCall algorithm (Illumina), however as the program is better at examining common variants, rare variants were then assessed using zCall , a rare-variant caller specially designed for microarrays. In accordance with the analytic pipeline developed in-house , the data from the initial genotype calling was subjected to further stringent quality control in PLINK  and PLINK2 . The resulting dataset taken forward to imputation analysis consisted of 263084 autosomal SNPs, with MAF >5%, call rate >99% and which did not deviate from Hardy-Weinberg with p>1x10-5. A large number of SNPs were eliminated during this QC stage, as they were not polymorphic in the Chinese sample or were too rare; setting the MAF cutoff at 5% enabled us to include more of the common variants. Individuals with call rates of >99%, gender consistent with the heterozygosity of X chromosome SNPs and with genome-wide SNP heterozygosity within 2SD of the sample mean were retained, leaving a total of 504 samples (370 cases and 134 controls) available for genetic analyses.
Following QC, 370 cases were compared with 134 controls for association using logistic regression, adjusting for two ancestry-informative covariates, in PLINK . Power analysis using 400 cases and 150 controls with the online calculator, the Center for Statistical Genetics (CaTS) power calculator program (http://csg.sph.umich.edu//abecasis/cats/), indicates that at a significance level of p = 0.0025, such a sample size has 60% power to detect common alleles with frequency of 0.5. (Figure A(i) in S1 File), and expectedly has reduced power to detect alleles of lower allele frequency (Figure A(ii) in S1 File). To account for multiple testing issues, the standard GWAS significance threshold of α = 5x10-8  was used. Annotation of gene names for the SNPs was conducted using SeattleSeq http://snp.gs.washington.edu/SeattleSeqAnnotation137/) . For those markers not annotated by the software, gene names were searched on the UCSC Genome Browser, build 37/hg19 (http://genome.ucsc.edu/). For the 100 most significantly associated SNPs from logistic regression, flanking regions of 50kb on either side were also searched. Finally, in order to identify independent association signals in our data, we applied the clumping procedure in PLINK2 , taking all SNPs associated with our phenotype with p≤0.001, termed index SNPs, and identifying SNPs that are in LD with the index SNP (r2 > 0.5) within a sliding window of 250kb.
Genotypes were imputed to NCBI build 37 using Phase 1 of the 1000 Genomes reference data and selecting for the Asian population for ethnicity, as implemented on the Minimac server (http://imputationserver.sph.umich.edu/) . Following imputation, duplicate IDs corresponding to triallelic SNPs were removed. In accordance with our imputation pipeline, we removed SNPs with MAF < 0.01, imputation quality R2 < 0.9 and average call rate of <0.95. The imputation analysis produced a post-imputation analytic sample of 4,009,606 SNPs which was subjected to further QC. The imputed data, after removal of the major histocompatibility complex (26–33 Mb on chromosome 6), and pruning, was used to calculate two ancestry-informative covariates, using Multidimensional Scaling. These ancestry-informative covariates were used to adjust for any population structure.
Gene based association testing
We used VEGAS2 (Versatile Gene-based Association Study) , an updated version of VEGAS , to calculate gene-based p-values from the association results. This online software tool uses population-based estimates for linkage disequilibrium (LD) and SNP-based p-values from GWAS, to identify significant deviations from expected p-value distributions within genes, under the null. In VEGAS2, this information is derived using the 1000 Genomes phase 1 data, enabling improvements in the LD estimates and allows analysis of X-chromosome. The analysis provides a statistic for gene-based results that is sensitive to gene length and recombination hotspots, as well as identifying genes in which there are multiple independent signals across cases, each of which individually may not reach genome-wide significance. For our analysis we used the Asian reference data, selecting the Chinese reference panel to estimate LD within genes and calculate gene-based empirical tests of association. Margins of +/- 50kb were set for LD estimates and all chromosomes were included except chromosome Y and mitochondrial genes.
In silico replication
Possible replication of our association results was assessed using summary statistics for 1331 individuals of African-American ancestry and 1814 individuals of European-American ancestry, tested for genome-wide association with opioid dependence by Gelernter and colleagues . The study samples had been ascertained at five different US sites and all subjects had satisfied the DSM-IV criteria of opioid dependence with the Semi-structured Assessment for Drug Dependence and Alcoholism  and provided written, informed consent as described in Gelernter et al . GWAS had been performed on the Illumina HumanOmni1-Quad v1.0 chip. For our replication assessment, we prioritized the top 100 SNPs from our imputed association results and limited our replication to SNPs located within or close to the top genes, CCDC42 and BRSK2. Prior to the assessment, we checked for concordance of marker locations between the two datasets. A total of 49 SNPs were examined for replication, 24 markers in CCDC42 and 25 markers in BRSK2. In addition to the SNP-based replication, we were interested in assessing the association status in our data of candidate genes that had been previously tested in Chinese populations. A literature search of previously published studies provided us with a list of candidate genes, noted in Table C(i) in S2 File.
Functional and pathway assessment
Pathways were extracted from MSigDB v5.2 canonical pathways (CP) and Gene Ontology (GO) datasets; MSigDB  is distinguished for having the largest collection of gene sets, derived from diverse gene set sources. Only pathways containing 10–1000 genes were included, yielding a total of 7111 pathways (1309 CP, 5802 GO) for the analysis. We compiled gene sets in our association data using a 35kb upstream and 10kb downstream window to include gene regulatory regions and MAF ≥ 0.05. The genes were encoded by ENSEMBL identifiers (release 75, genome assembly h19). Pathways were assigned competitive p-values using MAGMA v1.05  which assesses whether a pathway is more associated with a trait than other pathways, and takes into account linkage disequilibrium (LD). The reference data used for LD was the Southern Han Chinese subset (CHS) of 1000 genomes phase III data . The gene and pathway p-values were adjusted using Benjamini-Hochberg FDR procedure  to obtain q-values. In silico tissue specific expression of the top genes from the association and VEGAS2 analyses, was examined using the freely available online database, Genotype-Tissue Expression (GTEx) Portal (http://www.gtexportal.org/).
Polygenic risk scores
Genome-wide association studies in neuropsychiatric disorders tend to produce small effect sizes; even the most significantly associated markers tend to have small effects, with odds ratios (OR) in the range of 0.8–1.2. These small effect sizes, and the resultant lack of power, mean that the majority of disease associated SNPs fall below genome-wide significance; markers rejected by GWAS can be combined into quantitative scores to examine the combined effects of the variants [32,33]. We calculated polygenic risk scores (PRS) using PRSice . PRSice calculates the best-fit PRS across 10,000 thresholds (from PT = 0.0001 to PT = 0.5 by increments of 0.00005) by regressing phenotype on score and two ancestry informative covariates. To calculate PRS in this study, we used publicly available GWAS results as base datasets; these are the five most recent analyses by the Psychiatric Genomics Consortium (PGC)—schizophrenia, depression, ADHD, autism and bipolar disorder [35–39]–and four phenotypes from the Tobacco And Genetics Consortium—ever smoked, quantities of cigarettes smoked, former smoker and age at starting smoking .
Following quality control and imputation, data was available on 4,009,606 markers genome-wide, in 504 individuals. We performed association testing between 370 cases and 134 controls, adjusting for population structure using two ancestry-informative dimensions generated using multidimensional scaling. We found no evidence for residual population structure after using these covariates, as shown in the QQ plot in Fig 1 (GWAS λ median = 1.026). In this study, the most significant markers demonstrated p-values of 10−7 (Table 2). Association testing identified suggestive association between heroin dependence and several markers located within genes that may be of relevance to heroin dependence; the 25 most associated independent markers are listed in Table 2. Among these were several markers located on chromosome 17p13.1; indeed the three most associated genotyped SNPS were all located either in or near the gene CCDC42 (coiled coil domain containing 42) on chromosome 17 (see Table A in S2 File for details on the 100 most associated genotyped SNPs). Many of the top markers are located within or close to the genes CCDC42, BRSK2 (BR serine/threonine kinase 2), ZNF546 (zinc finger protein 546), CHIT1 (chitinase 1) and PPP1R12B (protein phosphatase 1, regulatory subunit 12B) and NEK1 (NIMA-related kinase 1).
Genes are mapped based on chromosome and base pair position, using the program SeattleSeq.
We identified three distinct clusters of associated SNPs, on chromosomes 17, 11 and 8 (Fig 2). The two smaller clusters of markers on chromosome 11p15.5 and chromosome 8q21.12 yielded suggestive association with p < 10−6 and 19 markers with p < 10−7 were located either within or close to CCDC42 on chromosomes 17 (Table A in S2 File). The regions on chromosomes 8 and 11 contained a density of markers with p≤3.5 x 10−5 (Table A in S2 File). The genes PXDNL (peroxidasin-like), PCMTD1 (protein-L-isoaspartate (D-aspartate) O-methyltransferase domain containing 1) and ARHGEF10 (Rho guanine nucleotide exchange factor (GEF) 10) all localized to the region on chromosome 8. The markers on chromosome 11 predominantly localized to the gene BRSK2.
Data for chromosomes X and Y was removed prior to imputation. Analysis highlighted three main chromosomal regions of association. The blue line denotes a p-value cutoff of 10−5 (suggestive significance) and the red line is at p-value = 5x10-8 (genome-wide significance).
Gene based association testing
Applying VEGAS2  to the results of our genome-wide association results identified two genes CCDC42 and SPDYE4 meeting the significance threshold of α = 2.85x10-6 (Table 3), as suggested by Liu et al . This threshold is likely to be overly conservative, meaning that the data provides good evidence for an association between these genes and heroin dependence; it should be noted that the two genes are located close to each other and may represent a single association signal. The VEGAS2 output represents non-significant regions from GWAS attaining significance in gene-based analysis; this is consistent with multiple relatively independent causal loci within the same gene. This is supported by multiple relatively independent signals in CCDC42 after clumping.
In silico replication
Assessment of replication using summary statistics demonstrated no evidence of replication; results are presented in the supplementary data, Table B(i) and (ii) in S2 File. We present comparison of the two genes CCDC42 and BRSK2 with the data for the two different populations, African-American and European-American. In general, the replication samples demonstrated p-values > 0.1 compared to p-values <10−5 in our dataset, for the selected markers. When examining the association status of candidate genes, we did not observe association with variants in the selected candidate genes. The results are presented in the supplementary data; in Table C(i) in S2 File we present the candidate gene list compiled using previously published studies and Table C(ii) in S2 File shows results in our study, highlighting p-values of the most associated SNPs in these genes.
Functional pathway analysis using MSigDB yielded top pathways related to regulation of vacuolar transport, regulation of skeletal muscle contraction and cellular processes relevant for cell division, however none of the pathways demonstrated an empirically significant result. Table D in S2 File in the supplementary data lists the top ten pathways, with the empirical P-value for the pathway. We examined the expression patterns of some of the top genes, using GTEx Portal (www.gtexportal.org). The gene CCDC42 is the top gene in the association analysis as well as in the gene-based analysis; expression patterns show that it is primarily expressed in the testes with no observable expression levels in any other tissue. The gene BRSK2 is expressed in several parts of the brain, with expression levels being highest in cerebellum tissues, followed by somewhat lower expression in the hippocampus and the hypothalamus. According to the database, there is some expression in the pancreas and the pituitary. Finally, ARHGEF10 shows expression mainly in tissues derived from the tibial nerve and the lung with smaller amounts of expression in a range of other tissues, including the brain.
Polygenic risk scoring
We used PRSice to calculate polygenic risk scores for liability to a number of psychiatric traits, and evaluated their ability to predict heroin addiction (Table 4). The base data included GWAS for smoking and related behaviors and datasets for diseases known to be comorbid with addiction, for example, schizophrenia and ADHD. We found significant evidence that genetic risk of schizophrenia could predict heroin dependence, (p = 0.0007) at PT = 0.0085. This is likely in part explained by the relatively higher power of the schizophrenia GWAS, compared to the other psychiatric traits investigated, leading to improved identification of any shared genetic factors between heroin dependence and any generalized liability to psychiatric disorders . We did not find evidence for a genetic overlap between heroin dependence and any of the other GWAS traits investigated here. The data from the Tobacco Addiction Genetics (TAG) Consortium may be the most relevant for the phenotype of heroin addiction; in a previous study , PRS for cigarettes per day had predicted the number of glasses of alcohol consumed per day and age at onset of smoking predicted age at regular drinking. In the current study, no smoking phenotypes studied (ever smoked, cigarettes per day, former smoker, age of onset for smoking) predicted heroin addiction, when using the suggested significance threshold of α = 0.001; instead, there was a suggestive relationship with ‘ever smoked’ and age at first cigarette. The proportion of variance in heroin use explained by polygenic risk score was very low or moderately low (Nagelkerke’s R2 = 0.001–0.06) for all the phenotypes tested (Figure B(i)-(iv) in S1 File).
Our study was aimed at conducting a genome-wide association study (GWAS) to identify genetic variants that may play a role in heroin dependence and, with the aid of bioinformatics tools, to assess their expression patterns as well as functional pathways involved. The GWAS was performed using an ethnically homogeneous sample of Han Chinese origin; examination for presence of possible population substructure confirmed homogeneity of our sample of cases and controls. Case-control association analysis with 370 cases and 134 control individuals did not identify any genetic markers reaching genome-wide significance, as may be expected considering the relatively small sample size. Nonetheless, our gene based association analysis with VEGAS2, which involves fewer tests, also supported the putative role of CCDC42, consistent with the SNP based genome-wide analysis. Following imputation, three distinct regions were highlighted, on chromosomes 17, 11 and 8; the four most associated genes in these regions were CCDC42, BRSK2, CHIT1 and ARHGEF10. The top ranked gene was CCDC42 but little is known about the functional role of CCDC42, other than that it interacts with the transcriptional repressor ZBTB1 (zinc finger and BTB domain containing 1), which in turn is functionally important for chromatin remodeling  (www.ncbi.nlm.nih.gov); this relationship suggests that CCDC42 might be involved in epigenetic processes. A more intriguing functional relationship would be its involvement in immunological processes, as suggested by the association of CCDC42 with Behcet’s disease . Characterized by recurrent inflammatory attacks, Behcet’s disease affects orogenital mucosa, eyes, joints as well as the nervous system and the gastrointestinal tract . A number of immunoregulatory pathways have implicated in Behcet’s disease; if CCDC42 belonged to any of these pathways, it might explain its role in both Behcet’s disease and heroin addiction, as heroin use is often associated with increased presence of infectious diseases. Our gene-based analysis also yielded significant results with CCDC42 as well as the neighboring gene, SPDYE4 (Speedy/RINGO cell cycle regulator family member E4) of which little is known in terms of function.
The second of the top genes, BRSK2 is one of a pair of kinases (BRSK1/2) highly expressed in the mammalian brain , in particular the cerebellum, hippocampus and hypothalamus tissues (www.gtexportal.org). The enzyme encoded by BRSK2 localizes to presynaptic sites and modulates structural and functional maturation of synapses; indeed functional studies in mice indicate that the serine/threonine kinases (also known as SAD kinases) are critical for specification of axons and axonal development, and for playing a critical role in cell proliferation, differentiation and cell death . The absence of SAD kinases does not prevent axon formation but does compromise maturation of axon terminals and mutations in SAD orthologs led to presynaptic defects in C. elegans  and Drosophila . Gene ontology processes associated with BRSK2 include actin cytoskeleton reorganization, apoptotic signaling pathways and establishment of cell polarity (www.uniprot.org).
Another one of the top genes in our study, ARHGEF10, is similarly involved in several cellular and actin cytoskeleton processes. As a member of the large family of rho guanine-nucleotide-exchange factors (GEFs), ARHGEF10 acts as a molecular switch in the regulation of signal transduction pathways , binding to G-protein coupled receptors to stimulate downstream binding with protein kinases to affect cell signaling and extracellular stimuli processed through Rho proteins to modulate the intracellular actin cytoskeleton and subsequently intracellular processing [51,52]. Genetic studies in dogs suggest its role in neuropathies; a mutation in the gene resulted in a severe form of juvenile-onset polyneuropathy, which bears clinical similarity to the group of neuropathies termed Charcot-Marie-Tooth disease in humans . Though expressed in multiple tissues, it has higher expression in the spinal cord and dorsal root ganglion . Another gene showing nominal association (p≤0.001), RASGRP2 (RAS guanyl releasing protein 2), encodes a brain-enriched nucleotide that contains an N-terminal GEF domain and may also play a role in cell signaling. The emerging functional themes related to the top genes in our data appear to be cellular processing and actin cytoskeleton restructuring; these are also identified in molecular studies investigating alcohol and cocaine addiction  and are therefore consistent with molecular models of addiction. It is already known that chronic and dependent drug consumption is correlated with structural plasticity in relevant neural circuits  and that such experience-dependent plasticity is primarily driven by changes in the shape and number of dendrites and dendritic spines [57–60]. Chronic administration of opioids can lead to reduction in the spine density . In addition to the drug-induced plasticity, regulation of the cytoskeleton alterations may be influenced by mutations in the genes encoding the cytoskeleton proteins .
When looking at the individual SNPs, it is likely that we did not find loci associated with heroin dependence at genome-wide significance due to our relatively small sample size. Whether the genes identified in our study can be considered to have any pathophysiological role in heroin abuse depends on replication of these results and future functional studies. Our limited replication assessment using summary data from a large GWAS in opioid dependence showed no evidence of replication at the SNP level. Similarly, when we looked at some of the common candidate genes [62–73], our results did not show significant association with any of these genes (Table C(i) and (ii) in S2 File). Nonetheless, among the SNP-based and gene-based results, there are individual genes implicated in addiction. ARHGEF10 has been discussed above, and GRIN3B (glutamate receptor, ionotropic, N-methyl-D-aspartate 3B) has been shown to be differentially expressed in heroin addicts compared to controls and methadone-maintained addicts , and was suggestively associated in gene-based analysis (p = 0.003; top SNP p = 0.6.8−5).
The strength of this study lies in the homogeneous and novel population sample, and one that is not confounded by addiction to other drugs of abuse. We have identified suggestive association with a number of functionally plausible genes that might play a role in heroin dependence. Although the genes did not reach genome-wide significance, their putative role in addiction has been implicated through a range of post hoc analyses, such as our analysis of polygenic risk scores which revealed suggestive evidence that smoking-related behaviours may be modestly predictive of heroin addiction. Specifically, we find suggestive association with age at first cigarette, and ‘ever smoked’, which may indicate an overlapping susceptibility to novelty-seeking behavior; this may be confirmed in the future using higher-powered GWAS. Other genetic studies using Han Chinese subjects have tested similar sample sizes, of about 300 individuals . Collecting large scale samples is difficult for substance use disorders such as heroin addiction; nonetheless, in order to achieve robust results, whether at the SNP level, gene or pathway level, a substantially larger sample size, in the range of tens of thousands of individuals, is required. Acknowledging this need, the Psychiatric Genetics Consortium (PGC) for Substance Use Disorders has been set up and data from the current study has been offered to the PGC-SUD, which is collecting data from other GWAS studies on substance use. The large number of samples will achieve the level of statistical power required and will provide much needed information on the etiological basis of opioid dependence. In conclusion, we have presented the results of the first genome wide study of heroin dependence in a Chinese sample, identifying novel associations between addiction and variation within genes that will be tested for replication in a large collaborative effort.
S1 File. Output from the power analysis calculator, CaTS (Center for Statistical Genetics), estimating power for 370 cases and 170 controls at significance level of 0.0025 and selected minor allele frequency of (i) 0.5 and (ii) at 0.1.
The y-axis shows statistical power for a range of sample s (x-axis) (Figure A). Polygenic risk scores, using data on smoking behaviors obtained from the GWAS by the Tobacco and Genetics Consortium, show suggestive prediction for heroin addiction (Figure B).
S2 File. Results of the genome-wide association with heroin dependence showing the top 100 SNPs, following imputation with the 1000 genomes phase 1 dataset, selected for the Asian population.
The imputation produced 4M SNPs with top results on chromosomes 17, 11 and 8 (Table A). Results of in silico replication of results for markers in BRSK2 using summary data from a GWAS in opioid dependence in two different populations, African American and European American. (i). In silico replication of results for markers in and around the top gene, CCDC42, using summary data from a GWAS in opioid dependence in African American and European American populations (ii) (Table B). Table showing common candidate genes previously tested in Chinese samples. The table shows the top result in these published studies and the type of polymorphism yielding the result. This list was used to assess gene-level replication in our data (i). Gene-based replication using the list of genes compiled from previous published reports, as shown in Table B(i) (above) and assessing their association status in the results from the VEGAS2 analysis performed in the current study (ii) (Table C). Top ten results of the pathway analysis using the MSigDB database. The p-value calculated for each pathway was adjusted using the Benjamini and Hochberg method to yield an empirical value. The number of genes in each pathway is noted (Table D).
The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007-2013) under the REA grant agreement n° 302040.
We are grateful to all the participating subjects and to the staff who contributed to this work.
We express gratitude to Dr Joel Gelernter and Dr Richard Sherva for providing the use of summary statistics of their published study on opioid dependence, to enable us to perform in silico replication of our results.
- Conceptualization: GK GB.
- Data curation: GK GB.
- Formal analysis: GK JE JRIC FA SJN GB.
- Funding acquisition: GK DAC FD.
- Investigation: GK.
- Methodology: GK DAC GB.
- Project administration: GK PA.
- Resources: DAC XL XM YW TL.
- Software: JE JRIC GB.
- Supervision: PA GB.
- Validation: GK GB.
- Visualization: GK.
- Writing – original draft: GK.
- Writing – review & editing: GK GB.
- 1. Kendler KS, Jacobson KC, Prescott CA, Neale MC (2003) Specificity of genetic and environmental risk factors for use and abuse/dependence of cannabis, cocaine, hallucinogens, sedatives, stimulants, and opiates in male twins. Am J Psychiatry 160: 687–695. pmid:12668357
- 2. Tsuang MT, Lyons MJ, Eisen SA, Goldberg J, True W, et al. (1996) Genetic influences on DSM-III-R drug abuse and dependence: a study of 3,372 twin pairs. Am J Med Genet 67: 473–477. pmid:8886164
- 3. Nestler EJ, Aghajanian GK (1997) Molecular and cellular basis of addiction. Science 278: 58–63. pmid:9311927
- 4. Levran O, Yuferov V, Kreek MJ (2012) The genetics of the opioid system and specific drug addictions. Hum Genet 131: 823–842. pmid:22547174
- 5. LaForge KS, Yuferov V, Kreek MJ (2000) Opioid receptor and peptide gene polymorphisms: potential implications for addictions. Eur J Pharmacol 410: 249–268. pmid:11134674
- 6. Lotsch J, Geisslinger G (2005) Are mu-opioid receptor polymorphisms important for clinical opioid therapy? Trends Mol Med 11: 82–89. pmid:15694871
- 7. Schwantes-An TH, Zhang J, Chen LS, Hartz SM, Culverhouse RC, et al. (2016) Association of the OPRM1 Variant rs1799971 (A118G) with Non-Specific Liability to Substance Dependence in a Collaborative de novo Meta-Analysis of European-Ancestry Cohorts. Behav Genet 46: 151–169. pmid:26392368
- 8. Kreek MJ, Bart G, Lilly C, LaForge KS, Nielsen DA (2005) Pharmacogenetics and human molecular genetics of opiate and cocaine addictions and their treatments. Pharmacol Rev 57: 1–26. pmid:15734726
- 9. Levran O, Londono D, O'Hara K, Nielsen DA, Peles E, et al. (2008) Genetic susceptibility to heroin addiction: a candidate gene association study. Genes Brain Behav 7: 720–729. pmid:18518925
- 10. Yuferov V, Levran O, Proudnikov D, Nielsen DA, Kreek MJ (2010) Search for genetic markers and functional variants involved in the development of opiate and cocaine addiction and treatment. Ann N Y Acad Sci 1187: 184–207. pmid:20201854
- 11. Nielsen DA, Ji F, Yuferov V, Ho A, Chen A, et al. (2008) Genotype patterns that contribute to increased risk for or protection from developing heroin addiction. Mol Psychiatry 13: 417–428. pmid:18195715
- 12. Nielsen DA, Ji F, Yuferov V, Ho A, He C, et al. (2010) Genome-wide association study identifies genes that may contribute to risk for developing heroin addiction. Psychiatr Genet 20: 207–214. pmid:20520587
- 13. Gelernter J, Kranzler HR, Sherva R, Koesterer R, Almasy L, et al. (2014) Genome-wide association study of opioid dependence: multiple associations mapped to calcium and potassium pathways. Biol Psychiatry 76: 66–74. pmid:24143882
- 14. Nelson EC, Agrawal A, Heath AC, Bogdan R, Sherva R, et al. (2015) Evidence of CNIH3 involvement in opioid dependence. Mol Psychiatry.
- 15. Hyman SE, Malenka RC, Nestler EJ (2006) Neural mechanisms of addiction: the role of reward-related learning and memory. Annu Rev Neurosci 29: 565–598. pmid:16776597
- 16. Li CY, Mao X, Wei L (2008) Genes and (common) pathways underlying drug addiction. PLoS Comput Biol 4: e2. pmid:18179280
- 17. Shen H, Moussawi K, Zhou W, Toda S, Kalivas PW (2011) Heroin relapse requires long-term potentiation-like plasticity mediated by NMDA2b-containing receptors. Proc Natl Acad Sci U S A 108: 19407–19412. pmid:22084102
- 18. Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, et al. (2012) zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics 28: 2543–2545. pmid:22843986
- 19. Coleman JR, Euesden J, Patel H, Folarin AA, Newhouse S, et al. (2015) Quality control, imputation and analysis of genome-wide genotyping data from the Illumina HumanCoreExome microarray. Brief Funct Genomics.
- 20. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. pmid:17701901
- 21. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, et al. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4: 7. pmid:25722852
- 22. Dudbridge F, Gusnanto A (2008) Estimation of significance thresholds for genomewide association scans. Genet Epidemiol 32: 227–234. pmid:18300295
- 23. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, et al. (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461: 272–276. pmid:19684571
- 24. Fuchsberger C, Abecasis GR, Hinds DA (2015) minimac2: faster genotype imputation. Bioinformatics 31: 782–784. pmid:25338720
- 25. Mishra A, Macgregor S (2015) VEGAS2: Software for More Flexible Gene-Based Testing. Twin Res Hum Genet 18: 86–91. pmid:25518859
- 26. Liu JZ, McRae AF, Nyholt DR, Medland SE, Wray NR, et al. (2010) A versatile gene-based test for genome-wide association studies. Am J Hum Genet 87: 139–145. pmid:20598278
- 27. Pierucci-Lagha A, Gelernter J, Feinn R, Cubells JF, Pearson D, et al. (2005) Diagnostic reliability of the Semi-structured Assessment for Drug Dependence and Alcoholism (SSADDA). Drug Alcohol Depend 80: 303–312. pmid:15896927
- 28. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, et al. (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27: 1739–1740. pmid:21546393
- 29. de Leeuw CA, Mooij JM, Heskes T, Posthuma D (2015) MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol 11: e1004219. pmid:25885710
- 30. Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, et al. (2015) A global reference for human genetic variation. Nature 526: 68–74. pmid:26432245
- 31. Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate—a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B-Methodological 57: 289–300.
- 32. Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. PLoS Genet 9: e1003348. pmid:23555274
- 33. International Schizophrenia C, Purcell SM, Wray NR, Stone JL, Visscher PM, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460: 748–752. pmid:19571811
- 34. Euesden J, Lewis CM, O'Reilly PF (2014) PRSice: Polygenic Risk Score software. Bioinformatics.
- 35. Cross-Disorder Group of the Psychiatric Genomics C (2013) Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381: 1371–1379. pmid:23453885
- 36. Major Depressive Disorder Working Group of the Psychiatric GC, Ripke S, Wray NR, Lewis CM, Hamilton SP, et al. (2013) A mega-analysis of genome-wide association studies for major depressive disorder. Mol Psychiatry 18: 497–511. pmid:22472876
- 37. Neale BM, Medland S, Ripke S, Anney RJ, Asherson P, et al. (2010) Case-control genome-wide association study of attention-deficit/hyperactivity disorder. J Am Acad Child Adolesc Psychiatry 49: 906–920. pmid:20732627
- 38. Psychiatric GCBDWG (2011) Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat Genet 43: 977–983. pmid:21926972
- 39. Ripke S, O'Dushlaine C, Chambert K, Moran JL, Kahler AK, et al. (2013) Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet 45: 1150–1159. pmid:23974872
- 40. Thorgeirsson TE, Gudbjartsson DF, Surakka I, Vink JM, Amin N, et al. (2010) Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior. Nat Genet 42: 448–453. pmid:20418888
- 41. Caspi A, Houts RM, Belsky DW, Goldman-Mellor SJ, Harrington H, et al. (2014) The p Factor: One General Psychopathology Factor in the Structure of Psychiatric Disorders? Clin Psychol Sci 2: 119–137. pmid:25360393
- 42. Vink JM, Hottenga JJ, de Geus EJ, Willemsen G, Neale MC, et al. (2014) Polygenic risk scores for smoking: predictors for alcohol and cannabis use? Addiction 109: 1141–1151. pmid:24450588
- 43. Kim H, Dejsuphong D, Adelmant G, Ceccaldi R, Yang K, et al. (2014) Transcriptional repressor ZBTB1 promotes chromatin remodeling and translesion DNA synthesis. Mol Cell 54: 107–118. pmid:24657165
- 44. Remmers EF, Cosan F, Kirino Y, Ombrello MJ, Abaci N, et al. (2010) Genome-wide association study identifies variants in the MHC class I, IL10, and IL23R-IL12RB2 regions associated with Behcet's disease. Nat Genet 42: 698–702. pmid:20622878
- 45. (1990) Criteria for diagnosis of Behcet's disease. International Study Group for Behcet's Disease. Lancet 335: 1078–1080. pmid:1970380
- 46. Inoue E, Mochida S, Takagi H, Higa S, Deguchi-Tawarada M, et al. (2006) SAD: a presynaptic kinase associated with synaptic vesicles and the active zone cytomatrix that regulates neurotransmitter release. Neuron 50: 261–275. pmid:16630837
- 47. Lilley BN, Krishnaswamy A, Wang Z, Kishi M, Frank E, et al. (2014) SAD kinases control the maturation of nerve terminals in the mammalian peripheral and central nervous systems. Proc Natl Acad Sci U S A 111: 1138–1143. pmid:24395778
- 48. Crump JG, Zhen M, Jin Y, Bargmann CI (2001) The SAD-1 kinase regulates presynaptic vesicle clustering and axon termination. Neuron 29: 115–129. pmid:11182085
- 49. Baas S, Sharrow M, Kotu V, Middleton M, Nguyen K, et al. (2011) Sugar-free frosting, a homolog of SAD kinase, drives neural-specific glycan expression in the Drosophila embryo. Development 138: 553–563. pmid:21205799
- 50. Etienne-Manneville S, Hall A (2002) Rho GTPases in cell biology. Nature 420: 629–635. pmid:12478284
- 51. Hall A (1998) Rho GTPases and the actin cytoskeleton. Science 279: 509–514. pmid:9438836
- 52. Sit ST, Manser E (2011) Rho GTPases and their role in organizing the actin cytoskeleton. J Cell Sci 124: 679–683. pmid:21321325
- 53. Ekenstedt KJ, Becker D, Minor KM, Shelton GD, Patterson EE, et al. (2014) An ARHGEF10 deletion is highly associated with a juvenile-onset inherited polyneuropathy in Leonberger and Saint Bernard dogs. PLoS Genet 10: e1004635. pmid:25275565
- 54. Verhoeven K, De Jonghe P, Van de Putte T, Nelis E, Zwijsen A, et al. (2003) Slowed conduction and thin myelination of peripheral nerves associated with mutant rho Guanine-nucleotide exchange factor 10. Am J Hum Genet 73: 926–932. pmid:14508709
- 55. Rothenfluh A, Cowan CW (2013) Emerging roles of actin cytoskeleton regulating enzymes in drug addiction: actin or reactin'? Curr Opin Neurobiol 23: 507–512. pmid:23428655
- 56. Robinson TE, Kolb B (2004) Structural plasticity associated with exposure to drugs of abuse. Neuropharmacology 47 Suppl 1: 33–46.
- 57. Harris KM, Kater SB (1994) Dendritic spines: cellular specializations imparting both stability and flexibility to synaptic function. Annu Rev Neurosci 17: 341–371. pmid:8210179
- 58. Kasai H, Matsuzaki M, Noguchi J, Yasumatsu N, Nakahara H (2003) Structure-stability-function relationships of dendritic spines. Trends Neurosci 26: 360–368. pmid:12850432
- 59. Lamprecht R, LeDoux J (2004) Structural plasticity and memory. Nat Rev Neurosci 5: 45–54. pmid:14708003
- 60. Nimchinsky EA, Sabatini BL, Svoboda K (2002) Structure and function of dendritic spines. Annu Rev Physiol 64: 313–353. pmid:11826272
- 61. Robinson TE, Gorny G, Savage VR, Kolb B (2002) Widespread but regionally specific effects of experimenter- versus self-administered morphine on dendritic spines in the nucleus accumbens, hippocampus, and neocortex of adult rats. Synapse 46: 271–279. pmid:12373743
- 62. Gao F, Zhu YS, Wei SG, Li SB, Lai JH (2011) Polymorphism G861C of 5-HT receptor subtype 1B is associated with heroin dependence in Han Chinese. Biochem Biophys Res Commun 412: 450–453. pmid:21839728
- 63. Hou QF, Li SB (2009) Potential association of DRD2 and DAT1 genetic variation with heroin dependence. Neurosci Lett 464: 127–130. pmid:19664686
- 64. Li T, Liu X, Zhao J, Hu X, Ball DM, et al. (2002) Allelic association analysis of the dopamine D2, D3, 5-HT2A, and GABA(A)gamma2 receptors and serotonin transporter genes with heroin abuse in Chinese subjects. Am J Med Genet 114: 329–335. pmid:11920858
- 65. Li T, Liu X, Zhu ZH, Zhao J, Hu X, et al. (2000) Association analysis of polymorphisms in the mu opioid gene and heroin abuse in Chinese subjects. Addict Biol 5: 181–186. pmid:20575833
- 66. Li T, Xu K, Deng H, Cai G, Liu J, et al. (1997) Association analysis of the dopamine D4 gene exon III VNTR and heroin abuse in Chinese subjects. Mol Psychiatry 2: 413–416. pmid:9322237
- 67. Li T, Zhu ZH, Liu X, Hu X, Zhao J, et al. (2000) Association analysis of polymorphisms in the DRD4 gene and heroin abuse in Chinese subjects. Am J Med Genet 96: 616–621. pmid:11054768
- 68. Loh EW, Tang NL, Lee DT, Liu SI, Stadlin A (2007) Association analysis of GABA receptor subunit genes on 5q33 with heroin dependence in a Chinese male population. Am J Med Genet B Neuropsychiatr Genet 144B: 439–443. pmid:17440936
- 69. Shao C, Li Y, Jiang K, Zhang D, Xu Y, et al. (2006) Dopamine D4 receptor polymorphism modulates cue-elicited heroin craving in Chinese. Psychopharmacology (Berl) 186: 185–190.
- 70. Shi J, Hui L, Xu Y, Wang F, Huang W, et al. (2002) Sequence variations in the mu-opioid receptor gene (OPRM1) associated with human addiction to heroin. Hum Mutat 19: 459–460.
- 71. Szeto CY, Tang NL, Lee DT, Stadlin A (2001) Association between mu opioid receptor gene polymorphisms and Chinese heroin addicts. Neuroreport 12: 1103–1106. pmid:11338173
- 72. Xu K, Lichtermann D, Lipsky RH, Franke P, Liu X, et al. (2004) Association of specific haplotypes of D2 dopamine receptor gene with vulnerability to heroin dependence in 2 distinct populations. Arch Gen Psychiatry 61: 597–606. pmid:15184239
- 73. Xu K, Liu XH, Nagarajan S, Gu XY, Goldman D (2002) Relationship of the delta-opioid receptor gene to heroin abuse in a large Chinese case/control sample. Am J Med Genet 110: 45–50. pmid:12116270
- 74. Wang TY, Lee SY, Chen SL, Chang YH, Chen SH, et al. (2013) The ADH1B and DRD2 gene polymorphism may modify the protective effect of the ALDH2 gene against heroin dependence. Prog Neuropsychopharmacol Biol Psychiatry 43: 134–139. pmid:23266708