Pathways and Networks-Based Analysis of Candidate Genes Associated with Nicotine Addiction

Nicotine is the addictive substance in tobacco and it has a broad impact on both the central and peripheral nervous systems. Over the past decades, an increasing number of genes potentially involved in nicotine addiction have been identified by different technical approaches. However, the molecular mechanisms underlying nicotine addiction remain largely unclear. Under such situation, a comprehensive analysis focusing on the overall functional characteristics of these genes, as well as how they interact with each other will provide us valuable information to understand nicotine addiction. In this study, we presented a systematic analysis on nicotine addiction-related genes to identify the major underlying biological themes. Functional analysis revealed that biological processes and biochemical pathways related to neurodevelopment, immune system and metabolism were significantly enriched in the nicotine addiction-related genes. By extracting the nicotine addiction-specific subnetwork, a number of novel genes associated with addiction were identified. Moreover, we constructed a schematic molecular network for nicotine addiction via integrating the pathways and network, providing an intuitional view to understand the development of nicotine addiction. Pathway and network analysis indicated that the biological processes related to nicotine addiction were complex. Results from our work may have important implications for understanding the molecular mechanism underlying nicotine addiction.


Introduction
Cigarette smoking is a worldwide epidemic, and one of the major preventable causes of morbidity and mortality [1][2]. Although there are some effective control policies and interventions, the negative effect of tobacco abuse on public health and social economy is still astonishing, highlighting the need for continuing efforts. According to World Health Organization (WHO), currently there are about 1.3 billion smokers worldwide, most of whom come from the low-or middle-income countries; and it is estimated that more than 5 million smokers die from smoking-related diseases every year [3][4]. If effective measures are not adopted, by 2020, smoking will become the biggest health problem worldwide, and the number of deaths caused by smoking will reach 10 million per year [5]. Besides the health problems, smoking also causes heavy economic burden on society. According to the Centers for Disease Control and Prevention (CDC), in USA alone, the economic burden made by smoking to society, including both the direct health care expenditures and the loss of productivity, can be as high as $193 billion a year [6]. Therefore, developing effective approaches and drugs for the treatment and prevention of smoking are of huge challenge in public health.
Nicotine, as the primary psychoactive component of tobacco smoke, binds to neuronal nicotinic acetylcholine receptors (nAChRs), a family of ligand-gated ion channels [7], facilitating various neurotransmitter release such as dopamine, glutamate, serotonin and γ-aminobutyric acid (GABA) [8][9][10] and thereby producing a number of neurophysiological and behavioral effects. Emerging evidence suggests that repeated exposure to nicotine can alter the level or types of genes expressed in multiple brain regions and such alteration ultimately mediates the functions of the related neurons and neural circuits. Numerous studies aiming to discover genetic variants or candidate genes, such as genome-wide association studies, genome-wide linkage scan, gene expression and candidate gene association studies, have found a large number of promising genes and chromosomal regions involved in the etiology of nicotine addiction [11][12][13]. Moreover, various neural pathways and transmitter systems have emerged as compelling candidates for the processing of addictive properties of nicotine, which provide a valuable resource to unravel the molecular mechanism underlying nicotine addiction. Through its interaction directly or indirectly with these genes and biological pathways, nicotine evokes multiple effects in the central nervous system.
During the past decade, rapid advances in high-throughput technologies have brought unprecedented opportunities for the large-scale analysis of the nicotine addiction-related genes/ proteins, leading to a rapid generation of large-scale nicotine addiction-related data. These datasets are often heterogeneous and multi-dimensional, which makes integrating and arranging such datasets to ascertain the key molecular mechanisms and to transform the data into meaningful biological phenomenon a major task and challenge. To meet the demand, pathway and network-based analyses have become an important and powerful approach to elucidate the biological implications underlying complex diseases [14][15][16]. Such a systems biology approach could be pivotal for better understanding of mental disorder at the molecular level [17]. Thus, a comprehensive analysis of the nicotine addiction-related candidate genes within a systematic framework may provide us important insights on the molecular mechanisms underlying nicotine addiction.
In this study, we performed a systematic analysis on genes potentially involved in nicotine addiction by identifying the enriched functional categories and pathways, as well as examining the crosstalk among the significantly enriched pathways. Then, we extracted a nicotine addiction-specific network and constructed a molecular network of nicotine addiction.

Data sources
In this study, the candidate genes for nicotine addiction included 220 genes prioritized via a multi-source-based gene approach [18]. Briefly, genes identified to be related to nicotine addiction or involved in the physiological response to nicotine exposure or smoking behaviors were collected from different sources, including genetic association analysis, genetic linkage analysis, high throughput gene/protein expression analysis and/or literature search of single gene/protein-based studies. Based on these resources, the 11,781 genes collected were scored and a weight value was determined for each category. The overall relation between a gene and nicotine addiction was measured by a combined score derived from its scores in the four categories. Then, the genes were ranked according to the combined scores with a larger score value indicating a potentially higher correlation between the gene and nicotine addiction. Based on the distribution of the combined score of all the genes, 220 genes on the top of the list (i.e., with the largest combined scores) were selected as the prioritized nicotine addiction-related genes (NAGenes).
The human protein-protein interaction (PPI) data were downloaded from the Protein Interaction Network Analysis (PINA) platform (May 21, 2014) [19], which integrated data from six major public PPI databases, namely IntAct, BioGRID, MINT, DIP, HPRD, and MIPS/MPact. Also, we downloaded the related annotation files from NCBI (ftp://ftp.ncbi.nlm.nih.gov/gene/) (May 24, 2014), including the Entrez gene information database of human (Homo_sapiens. gene_info.gz), the dataset specifying relationship between pairs of NCBI and UniProtKB protein accessions (gene_refseq_uniprotkb_collab.dz), and file containing mappings of Entrez Gene records to Entrez RefSeq Nucleotide sequence records (gene2refseq.gz). Then the human PPI data were mapped to NCBI human protein-coding genes and the unmapped proteins were discarded. After removing self-interactions and redundant interacting pairs, a final human PPI network containing 15,093 nodes and 161,419 edges was obtained.

Functional enrichment analysis
To examine the functional features of NAGenes, WebGestalt [20] and Ingenuity Pathway Analysis system (IPA; https://analysis.ingenuity.com) were applied for functional enrichment analysis, including Gene Ontology (GO) term analysis and pathway analysis. WebGestalt is a web-based integrated data mining system to evaluate the significance of GO terms enrichment in the candidate genes. IPA is designed to identify global canonical pathways from a given list of genes. Basically, the genes with their symbol and/or corresponding GenBank Accession Numbers are uploaded into the IPA and compared with the genes included in each canonical pathway. All the pathways with one or more genes overlapping the candidate genes are extracted, with each of them assigned a p value to denote the probability of overlap between the pathway and the input genes via Fisher's exact test. Then, the corresponding multiple testing correction p-value is calculated with the method of Benjamini and Hockberg, namely P BHvalue [21].

Pathway crosstalk
We further performed pathway crosstalk analysis to explore the interactions among significantly enriched pathways. To describe the overlap between any given pair of pathways, we introduced two measurements, i.e., the Jaccard Coefficient ðJCÞ ¼ j A\B A[B j and the Overlap Coefficient ðOCÞ ¼ jA\Bj minðjAj;jBjÞ , where A and B are the lists of genes included in the two tested pathways. To construct the pathway crosstalk, we implemented the following procedure: 1. Select a set of pathways for crosstalk analysis. Only the pathways with P BH -value less than 0.01 were used. Meanwhile, the pathways containing less than 5 candidate genes were removed because pathways with too few genes may have insufficient biological information.
2. Count the number of shared candidate genes between any pair of pathways. Pathway pair with less than 3 overlapped genes was removed.
3. Calculate the overlap of all pathway pairs and rank them. All the pathway pairs were ranked according to their JC and OC values.
4. Visualize the selected pathway crosstalk. To focus on the most important biological themes, we only chose those crosstalks with scores in the top 20% using the software Cytoscape [22]. Of note, this criterion was set up somewhat arbitrarily, but the results were well displayed with an appropriate number of nodes (pathways) and edges (crosstalk). We also used other criteria to select the crosstalks among pathways, but the results were similar.

Construction of nicotine addiction-specific network
Steiner minimal tree algorithm uses a greedy search strategy to merge the smaller trees into larger ones until only one tree connecting all input seeds is built [23]. We applied GenRev [24], a network-based software package to search the optimal intermediate nodes (genes) for the connection of input seed genes via the Steiner minimal tree algorithm, to extract a subnetwork from the human PPI network by using the 220 NAGenes as seeds. To test the non-randomness of this subnetwork, we first generated 1000 random networks with the same number of nodes and edges as the nicotine addiction-specific network using Erdos-Renyi model in R igraph package [25]. Then, we calculated the average values of the shortest-path distance and clustering coefficient. By counting the number of random networks with average shortest-path distance (n L ) smaller than that of the nicotine addiction-specific network and the number of random networks with average clustering coefficient (n C ) higher than the observed clustering coefficient, we were able to estimate the significance of non-randomness. Finally, we calculated the empirical p-value = n L /1000 and n C /1000, respectively.

Pathway enrichment analysis in NAGenes
Identifying biological pathways enriched in the candidate genes may provide important information for our understanding of the molecular mechanism underlying nicotine addiction. We searched for enriched pathways in the NAGenes using IPA and found 97 significant enrichment pathways (P BH 0.01) (S2 Table). The 20 most significantly enrichment pathways are shown in Table 2. Most of the pathways were related to neurotransmission system, consistent with the fact that nicotine addiction is a neuronal disease. Among them, several pathways associated with monoamine neurotransmitters stood out, e.g., dopamine receptor signaling (ranked 4 th ), serotonin receptor signaling (ranked 11 th ), glutamate receptor signaling (ranked 14 th ) and GABA receptor signaling(ranked 19 th ), all of which play important roles in signaling transduction. Moreover, two pathways, synaptic long term potentiation (P BH = 1.07×10 -3 ) and synaptic long term depression (P BH = 8.13×10-3) were enriched in the NAGenes (S2 Table). These two pathways were critical in synaptic plasticity development and have been reported to be involved in addiction [26][27]. In addition, we also highlighted other significantly enriched pathways, i.e., cAMP-mediated signaling, calcium signaling, G-protein coupled receptor signaling, neuropathic pain signaling in dorsal horn neurons and CREB signaling in neurons. This result was consistent with prior knowledge of nicotine addiction [28][29][30], providing valuable evidence for the study of molecular mechanism underlying nicotine addiction. Of note, we found many pathways that were related to immune system in the list, in line with previous reports that nicotine might have effects on organism's immune [2,31]. Moreover, three pathways  ADRA2A, ADRB2, AGTR1, AKAP13, CAMK4, CHRM1, CHRM2, CHRM5, CNR1, CREB1,  DRD1, DRD2, DRD3, DRD4, DRD5, GABBR1, GABBR2, GNAS, GRM7, HTR1F, HTR6 related to retinoid X receptor (RXR) were found to be enriched in the NAGenes, i.e., LPS/IL-1 mediated inhibition of RXR function, PXR (pregnane X receptor)/RXR activation and LXR (liver X receptor)/RXR activation. RXR, a kind of nuclear receptor, is a master regulator during ligand-induced transcription activities [32]. In summary, these results suggest that neurodevelopment, immune and metabolic systems play important roles in the pathogenesis of nicotine addiction.

Crosstalk among significantly enriched pathway
To take a further step beyond identifying lists of significantly enriched pathways and to understand how they interact with each other, we performed a pathway crosstalk analysis among the 97 significantly enriched pathways. The approach was based on the assumption that two pathways were considered to crosstalk if they shared a proportion of NAGenes [16]. There were 74 pathways containing 5 or more members in NAGenes, of which, 72 pathways met the criterion for crosstalk analysis, i.e., each pathway shared at least 3 genes with one or more other pathways. There were a total of 380 pathway pairs (edges) from the 72 pathways and then we ranked these edges according to the average scores of the JC and the OC. Ultimately, we chose the top 20% edges to construct the pathway crosstalk. Based on their crosstalk, the pathways could be roughly grouped into three major modules, each of which included pathways shared more crosstalks compared with other pathways and may likely be involved in the same or similar biological process (Fig 1). One module mainly consisted of neurodevelopment-related signaling pathways, such as glutamate receptor signaling, synaptic long term potentiation and CREB signaling in neurons. The second module was primarily dominated by immune systemrelated pathways, including role of cytokines in mediating communication between immune cells, T helper cell differentiation and others. Another module composed of the metabolic pathways of neurotransmitters or drug, such as nicotine degradation II, dopamine degradation and serotonin degradation, the roles of these pathways in nicotine addiction have not been fully explored. As indicated by the above results, pathway crosstalk analysis can provide important insights for understanding of nicotine addiction mechanisms.

Nicotine addiction-specific network
To distill insight into the interaction of NAGenes in a local environment, we extracted the specific subnetwork of nicotine addiction (NA-specific network) from the human PPI network using the Steiner minimal tree algorithm. Basically, this approach linked as many as possible members of NAGenes via the minimal number of connections. As shown in Fig 2, the subnetwork contained 252 nodes and 591 edges. Of the 220 NAGenes, 208 were included in the NAspecific network, which accounted for approximately 94.5% of the candidate genes and 82.5% of the genes in the NA-specific network, indicating a high coverage of NAGenes in the subnetwork. Of note, some of the 44 additional genes, e.g., calmodulin 2 (CALM2), calnexin (CANX), caveolin-1 (CAV1), glutathione S-transferase omega 1 (GSTO1) and protein phosphatase 1 (PPP1CA), had been reported to be associated with addiction in previously studies (Table 3) [33][34]. Moreover, to test randomness of the NA-specific network, 1000 random subnetworks were generated using the Erdos-Renyi model and their average shortest-path distance and average clustering coefficient were compared with the corresponding values of the NA-specific network. For these random subnetworks, the average shortest-path distance was 3.72, which was significantly larger than that of the NA-specific network (shortest-path distance, 2.87; empirical p value < 0.001). The average clustering coefficient of the random subnetworks was 0.02, which was significantly smaller than that of the NA-specific network (clustering coefficient, 0.25; empirical p value < 0.001). Thus, the NA-specific network extracted from the whole PPI network was a non-random network.

Molecular network of nicotine addiction
Summarizing the results from pathway analysis and network analysis, we were able to obtain a relatively comprehensive view about nicotine addiction (Fig 3). In such network, a number of key genes and pathways played important roles, e.g., the neurotransmitters receptor signaling transduction pathways such as dopamine receptor signaling and serotonin receptor signaling, and several intracellular signaling transduction cascades such as cAMP-mediated signaling, Gprotein coupled receptor signaling and calcium signaling. The network also included several feedback loops, among which the one from N-methyl-D-aspartate subtype glutamate receptor (NMDAR) to a-amino-3-hydroxyl-5-methylisoxazole-4-propionic acid subtype glutamate receptor (AMPAR) would be considered the shortest loops. We further observed that some loops Nodes represent pathways and edges represent crosstalk between pathways. Node size corresponds to the number of NAGenes found in the corresponding pathway. Node color corresponds to the P BH -value of the corresponding pathway. Darker color indicates lower P BH -value. Edge width corresponds to the score of the related pathways. Node shape indicates pathway categories, with ellipse for neurodevelopment, diamond for immune, triangle for metabolism, square for other pathways. interlinked with each other through CaM (also called CALM) and calcium/calmodulin-dependent protein kinase II (CAMKII). CaM and CAMKII both play important roles in the induction of long term potentiation and long term depression, indicating that they might make contribution to the synaptic plasticity development and they might provide clues to explain the irreversible features of nicotine addiction. In addition, we observed that mitogen-activated protein kinase (MAPK), nuclear factor of kappa light polypeptide gene enhancer in B-cells (NF-κB) and related pathways of NF-κB signaling and MAPK signaling also implicated in the process of nicotine addiction, which might be the valuable candidate genes or pathways.

Discussion
In the past decades, we have made considerable progress in understanding the molecular mechanisms underlying nicotine addiction, which is largely due to the identification of various neurotransmitter receptors, genes or pathways associated with addiction and the development of animal or cell models. Additionally, with the development of high throughput analysis technology, more and more genes/proteins have been suggested to be linked to nicotine addiction and provide a valuable resource to analyze candidate genes function, biochemical pathways and networks related to nicotine addiction. In this study, we provided a comprehensive analysis of the functional features and interaction network of nicotine addiction-related genes.  Function enrichment analysis revealed the specific biological processes involved by NAGenes. Our GO enrichment analysis indicated that these genes participated in neurodevelopment-related process and ion channel or neurotransmitters activity. For example, terms such as acetylcholine receptor activity, dopamine receptor activity and glutamate receptor binding were significantly enriched in NAGenes, indicating the importance of these neurotransmitters in the development of nicotine addiction. Of note, we found the GO terms of cognition and learning or memory were also in the enriched list, consistent with previous findings of the roles of nicotine in the regulation of various physiological processes, including learning and memory [35][36].
Pathway analysis revealed that pathways related to neurodevelopment were enriched in NAGenes, which further verified the existence of close relationship between the pathology of nicotine addiction and the signaling pathways of nervous system. Four pathways that were related to monoamine neurotransmitters were found to be enriched in the NAGenes, consistent with their central roles in the development of nicotine addiction. Stimulation of nicotinic acetylcholine receptors (nAChR) releases a variety of neurotransmitters in the brain, e.g., dopamine, serotonin, glutamate and GABA. Dopamine is critical for the reinforcing effects or rewarding behaviors of nicotine [37][38], glutamate and GABA are respectively major excitatory and inhibitory neurotransmitter and both of them play important roles in the development of nicotine addiction [39][40]. These neurotransmitters interact with the specific receptors, triggering a series of neuronal signaling pathways and then ultimately realize the regulation of various physiological processes. Of note, our analysis indicated that two pathways, synaptic long term potentiation and synaptic long term depression, were also enriched in genes associated with nicotine addiction. Repeated stimulation of nicotine to nervous system ultimately can modify the neural circuitry and thereby lead to addiction. As for many forms of experience-dependent synaptic plasticity, synaptic long term potentiation and synaptic long term depression play critical roles in the formation, maintenance, and appropriate functioning of neural circuits [41][42]. Therefore, these two pathways might be involved in the early stages of the development of nicotine addiction and facilitate the adaptation of body to changing environments. In addition, synaptic plasticity, as the molecular basis of learning and memory in the nervous system, has been extensively studied. Synaptic long term potentiation and synaptic long term depression have been reported to underline the cognitive and memory effects of the addictive potential of some drugs of abuse [43][44]. This further proved that nicotine could directly or indirectly modulate the physiological processes of learning and memory. Moreover, three signal pathways related to RXR were identified, i.e., LPS/IL-1 mediated inhibition of RXR function, PXR/RXR activation and LXR/RXR activation. Retinoic acid (RA), a class of natural or synthetic vitamin A analogs, exert profound effects on many biological processes, such as development, differentiation and maintenance of nervous system [45] and have been reported that it may serve as potential bridge between the genetic and environmental components of complex diseases [46][47], suggesting that environmental factors also play an important role in nicotine addiction. Interestingly, we found the circadian rhythm signaling was also in the enriched pathway list, supporting that there might be a link between nicotine addiction and abnormal or disrupted circadian rhythms [48]. As indicated by these results, the molecular mechanisms underlying nicotine addiction are quite complex and involve many genes, pathways and their interactions.
Of significance, in pathway crosstalk analysis we identified three main modules. One module was mainly dominated by the pathways associated with the activity of the nervous system. Among these pathways, dopamine-DARPP32 feedback in cAMP signaling, glutamate receptor signaling, neuropathic pain signaling in dorsal horn neurons, and CREB signaling in neurons have been well studied to be involved in neuron or central nervous system (CNS) [49][50][51]. For instance, CREBs, widely been accepted as prototypical transcription factors, play a critical role in biological processes such as neuronal plasticity, learning and memory. Meanwhile, several lines of evidence have pointed that alterations of the activity of CREB by drugs of abuse have a profound effect on behavioral manifestations of drug reward and withdrawal [52]. Subsequently, we collected the genes contributing to the crosstalk, and the most frequently shared genes included glutamate receptor ionotropic N-methyl D-aspartate 2A (GRIN2A), GRIN2B, GRIN3A, calcium/calmodulin-dependent protein kinase IV (CAMK4), CREB1, and glutamate receptor metabotropic 7 (GRM7), suggesting these genes might be more potential targets in the development of nicotine addiction. In addition, the pathway pair of cAMP-mediated signaling and G-protein coupled receptor signaling, which were not included in the three modules, was also noteworthy. In the pathway analysis, we found these two pathways stood out at the top of the list by the statistically significant level (P BH = 2.00×10 -14 , ranked 1 st and P BH = 3.16×10 -13 , ranked 3 rd , respectively) (S2 Table). And the score of this pathway pair was 0.94, ranking the first. Furthermore, these two pathways have been deeply studied for their functions in the nervous system, such as regulating pivotal physiological processes. It was worth noting that, several edges, linking any one of these two pathways and other significant pathways were not displayed in Fig 1 just because they did not meet our criteria, such as the link between cAMPmediated signaling and dopamine receptor signaling. In this study, we just empirically chose the pathway pairs whose crosstalk score fell within the top 20%, therefore some pathway pairs which we might be interested in were not shown in Fig 1. We further extracted the NA-specific network from the reference network. It was interesting to note that some additional genes not among the NAGenes but included in the human PPI network, were associated with nicotine addiction. For instance, CAV1, the structural protein of caveolae, can regulate the function of dopamine receptor D1 in glial cells and hippocampal neurons and may participate in G protein-coupled receptor signaling events [53][54]. CANX, a transmembrane protein in endoplasmic reticulum, can mediate intracellular Ca 2+ concentration and directly interact with dopamine and G protein-coupled receptors to regulate their expression (Table 3) [55][56]. As indicated by the results, network-based analysis could not only provide meaningful information about the organization and environment of NAGenes, but also be promising to identify novel candidate genes. Although the quantity and quality of PPI data have been greatly improved, the human PPI network is still far from complete. In such scenario, some proteins may simply have more interaction information than others because they are better studied, instead of they are biologically more important. Also, due to the limitation of current technology, there may be some false positives in the PPI data [57]. Such potential biases associated with human PPI network may affect our interpretation of the results.

Conclusions
The neurobiological processes that underlie nicotine addiction are complex which relate to multiple factors, such as genetic and environmental factors. In this study, we applied a systems biology framework for a comprehensive functional analysis of nicotine addiction using candidate genes prioritized via a multi-source-based approach. Through integrating the information from GO, pathway and pathway crosstalk analysis, we found neurotransmitters or neurodevelopment-related signal pathway and immune system play key roles in the molecular mechanism of nicotine addiction. Further, we extracted nicotine addiction-specific subnetwork, in which some of the additional genes had been reported to be involved in nicotine addiction. To distill the global view of nicotine addiction process, we preliminarily constructed a molecular network for it. Our results provide important information for the further analysis and suggest that system level analysis is promising for understanding the pathophysiology of nicotine addiction.
Supporting Information S1