DNA Repair Gene (XRCC1) Polymorphism (Arg399Gln) Associated with Schizophrenia in South Indian Population: A Genotypic and Molecular Dynamics Study

This paper depicts the first report from an Indian population on the association between the variant Arg399Gln of XRCC1 locus in the DNA repair system and schizophrenia, the debilitating disease that affects 1% of the world population. Genotypic analysis of a total of 523 subjects (260 patients and 263 controls) revealed an overwhelming presence of Gln399Gln in the case subjects against the controls (P < 0.0068), indicating significant level of association of this nsSNP with schizophrenia; the Gln399 allele frequency was also perceptibly more in cases than in controls (p < 0.003; OR = 1.448). The results of the genotypic studies were further validated using pathogenicity and stability prediction analysis employing computational tools [I-Mutant Suite, iStable, PolyPhen2, SNAP, and PROVEAN], with a view toassess the magnitude of deleteriousness of the mutation. The pathogenicity analysis reveals that the nsSNP could be deleterious inasmuch as it could affect the functionality of the gene, and interfere with protein function. Molecular dynamics simulation of 60ns was performed using GROMACS to analyse structural change due to a mutation (Arg399Gln) that was never examined before. RMSD, RMSF, hydrogen bonds, radius of gyration and SASA analysis showedthe existence of asignificant difference between the native and the mutant protein. The present study gives astrong indication that the XRCC1 locus deserves serious attention, as it could be a potential candidatecontributing to the etio-pathogenesis of the disease.


Introduction
Schizophrenia is a debilitating neuropsychiatric disorder that affects approximately 1.0% of the population worldwide with devastating consequences for the affected individuals and their families [1]. Previous investigations addressing the causative factors of the disease reveal that the genes possessing Single Nucleotide Polymorphism (SNP) in the neurotransmitter system could become aberrant, and afford susceptibility to the disease [2,3]. The SNPs of the serotoninergic system, for instance, have been shown to be associated with schizophrenia in various populations [4][5][6]. Lack of any disease-specific biomarker, however, is still a lingering problem that hinders proper diagnosis and treatment of the disease.
Previous investigators, in their quest to understand the etio-pathogenesis of the disease, have also been focusing on several putative targets other than the neurotransmitter and the neuroendocrine systems. The X-ray repair cross-complementation group (XRCC) of proteins, an essential component in the endogenous DNA repair system, encoded by the XRCC genes is now increasingly being considered a 'potential target' in this regard. The interest on the XRCC1 protein related to schizophrenia has been evolved out of its profuse presence (and the predominant mRNA levels) in the brains of rats and baboons [7][8][9]. Further, not only that the aberrations of the DNA repair system could restrain it from performing the repair function, but it would as well cause increased susceptibility to apoptosis, leading to the elimination of the damaged cell(s). Significantly, an increased rate of apoptosis has been observed in schizophrenia patients [10][11][12][13]. An interaction involving the SNP(s) of DNA repair system (XRCC1, for example) and increased apoptosis of nerve cells seems to exist, which in turn could play a vital role in the onset of several neuropsychological disorders, including schizophrenia. Further, there are also instances where polymorphism in the XRCC1 locus is suggested to be entrained with neurological disorders [14][15][16]. This (hypothetical) relation offers a potential rationale to consider the XRCC1 as a 'candidate susceptibility gene' for schizophrenia. Pertinently, the first report demonstrating an association between a variant at codon 399 of XRCC1 locus and schizophrenia was brought out by Sadaat et al. (2008) [17] in Iranian population. However, no association was found to exist between Arg194Trp polymorphism of XRCC1 and schizophrenia in the same population [18].
To our knowledge, however, apart from the studies on Iranian population, attempts have not been made to explore the possible association of XRCC1 with schizophrenia in any other population. It is at this juncture, that we present the results of our study on the frequency distribution of the SNP (Arg399Gln; rs25487) of XRCC1 in Tamil population, and its association with schizophrenia.
Located on chromosome 19q13.2-13.3 and spanning 31.9 kb of genomic DNA, XRCC1 is known to encode a protein of 633 amino acids, suggested to be involved in single-strand break repair, and base excision repair mechanisms [23]. The protein (XRCC1) has three functional domains, such as N-terminal domain (NTD) (1-151; 151 residues) that could interact with beta-polymerase (β-pol) [24], a central BRCT-I (breast cancer susceptibilityprotein-1) domain (315-403;88 residues), to interact with PARP [25], and a C-terminal BRCT-II (breast cancer susceptibilityprotein-2)domain (538-633; 96 residues), to interact with DNA ligase III [26], to facilitate the repair mechanism [27,28]. SNPs that occur in the conserved domain of XRCC1 could result in the conformational change in the protein it encodes, resulting in deleterious effect, leading to genetic disease risk [29].
The present study will be the first report on the existence of a significant association between Arg399Gln of XRCC1 and schizophrenia from Indian population. Also, we attempted to define the magnitude of deleteriousness of the variant protein through in silico prediction methods. Furthermore, molecular dynamics was initiated for the first time to reveal the impact of disease-contributing mutation Arg399Gln on the structure and function of the protein in question.

Sample collection
A total of 523 subjects (260 patients and 263 controls, with age ranging between 20 and 60) were recruited from among South Indian Tamil population for the present study with the following inclusion-exclusion criteria: The 260 case subjects, diagnosed with schizophrenia using Diagnostic and Statistical Manual for Mental Disorders 4 th edition (DSM-IV) criteria [30] by an experienced psychiatrist, were recruited at random from the Government Vellore Medical College. 263 Controls were chosen from healthy volunteers, based on the parameters (such as age, gender, sex and substance abuse), matching with those of the case subjects.
A questionnaire was designed to obtain the demographic data, including age, gender, marital status, mother tongue, status of current medication (for case subjects), age of onset of the symptoms (for case subjects), family history (if any) of any psychotic symptoms and smoking habits (if applicable) with regard to all the subjects (Table 1). While recruiting the subjects, a detailed enquiry was made for any possible existence of psychotic symptoms in the first degree and/or the second-degree relatives of all the subjects. It was ensured that those control subjects having the family history (in 1 st or 2 nd degree relatives) of any psychoticsymptoms were excluded from the study. Subjects who had any family history of carcinoma were also excluded from the study. While undertaking the present study, special care was taken to make sure that the subjects and the controls belong to only Tamil population. The homogeneity of the population was determined by the demographic data obtained from all the study subjects. It was ensured that the ancestors (for three generations) of the subjects, all of them living in villages in Tamil Nadu, were having Tamil as their mother tongue.
10 ml of the peripheral blood was collected in Ethylene-diamine-tetra-acetic acid (EDTA)coated vials from all the study subjects. All the subjects participated in the study voluntarily

Statistical analysis
The data on allelic and genotypic frequencies were subjected to χ2 analysis (GraphPad Prism5.0) with a view toassess the differences (if any) in the proportionate distribution between the case subjects and the controls. That the genotype distribution would conform to the Hardy-Weinberg equilibrium was also assessed (http://ihg.gsf.de/cgi-bin/hw/hwa1.pl). The odds ratio and confidence limits were computed through GraphPad Prism 5.0. The power of the study was calculated by CaTs power calculator (http://www.sph.umich.edu/csg/abecasis/ CaTS/). The sample size for the present study was estimated in such a way that the power of the study would be not less than 90%, for which the following parameters were considered: genetic relative risk = 1.5; prevalence of the disease = 0.01; risk allele frequency = value of the frequency observed in controls; α = 0.05 and multiplicative model of inheritance [32].
In silicopredictive mutational analysis iStable/indexSeq.php) integrates results from five different element predictors along with sequence information to predict the protein stability upon mutation in analmost accurate manner [34]. PolyPhen2 (http://genetics.bwh.harvard.edu/pph2/), through evolutionary sequence and structure-based approach, classifies a variant as 'possibly damaging', 'probably damaging' or 'benign', with the values ranging between 0 and 1; while the score range of 0.85-1 is being designated as 'most likely damaging', the score of 0.15-0.84 is designated as 'possibly damaging', and that with less than 0.14 is designated as 'benign' [35]. Neural network-based method screening for non-acceptable Polymorphism (SNAP) (https://www.rostlab.org/services/SNAP/ ) predicts the functional effect of a variant as 'neutral' or 'non-neutral', based on evolutionary sequence and structure information [36]. Assessing the single amino acid substitutions, small insertions and/or deletions, the Protein Variation Effect Analyzer (PROVEAN)(http://provean.jcvi.org/index.php) predicts the functional impact of protein sequence variations, as 'deleterious' or 'neutral' [37]. In silicoassessment of sequence conservation Conservation pattern of a variant was calculated using an empirical Bayesian inference. Using the Consurf server (http://consurf.tau.ac.il/) that quantifies the degree of conservation at each aligned position; the extent of conservation of each residue is computed as a score range of 1-9; 1 denotes 'rapidly evolving' (variable) sites, 5 denotes sites that are 'evolving at an average rate', and 9 denotes 'slowly evolving' (evolutionarily conserved) sites [38].

Structural visualization of mutant
Project "Have your Protein Explained" (HOPE) (http://www.cmbi.ru.nl/hope/home), an automatic mutant web server, was used to analyse the impact of the mutation on the protein structure. The web server utilizes FASTA, BLAST, UniProt, PDB (3D structure), YASARA (homology modelling), DSSP (secondary structure), HSSP (Conservation score), ClustalW (multiple sequence alignment), and DAS server (solvent accessibility) to make its prediction on protein structure [39].

Structural analysis using molecular dynamics
For the structural analysis, the native (PDB ID: 2D8M) was retrieved from PDB database (http://www.rcsb.org/pdb/home/home.do) followed by mutation [Arg399Gln) modelling with SWISS-PDB viewer [40]. For thecomputational investigation, MD simulation analysis was performed through Gromacs 4.5.6 package [41], on the native-and mutant type proteins, to observe if these mutations might lead to changes in the surface property and induce structural changes, to be propagated to distort the orientation of the protein functional site. The protein molecule was solvated in a rectangular box with water molecules (TIP3P) at 10 Å marginal radiuses. As the protein structures were found to be positively charged at physiological pH, the system was made electrically neutral, by the addition of 6 chlorine ions [Cl -] in the simulation box using the 'genion' tool, resulting in the replacement of water molecules by negative ions at the position of the first atoms with the most favourable electrostatic potential or at random position.Emtol convergence criterion was set to 1000 kcal/mol, and subsequently, the whole molecular system was subjected to energy minimization by steepest descent algorithm implementing GROMOS96 43a1 force field. Berendsen temperature coupling method [42] was used to regulate the temperature inside the box. Isotropic pressure coupling was performed using Parrinello-Rahman method [43]. Electrostatic interactions were computed using the Particle Mesh Ewald method [44]. The ionization states of the residues were set to appropriate to pH 7.0, with all histidine having been assumed neutral. The pressure was maintained at 1 atm with the allowed compressibility range of 4.5e-5 atm. SHAKE algorithm was used to constrain bond lengths involving hydrogen, permitting a time step of 2 fs. Van der Waals and Coulomb interactions were truncated at 1.0 nm. The non-bonded pair list was updated every 10 steps, and conformations were stored every 0.5 ps. The position restraint molecular dynamics simulation was performed for 20000 ps to soak the insolvent protein molecules to restrain the atoms at a fixed reference position. Finally, systems were subjected to MD simulation for 60 ns. The structural deviations in native and mutant XRCC1 structure were subjected to comparative analysis. g_rms compares the two structures by computing the root mean square deviation (RMSD) and the g_rmsf compute the root mean square fluctuation (RMSF). g_rms and g_rmsfgromacsbuilt-in tools were used for protein trajectories and atomic interaction analysis. Sumof hydrogen bonds formed by specific residues to other amino acids from the protein during the simulation were calculated by using g_hbond that analysed the hydrogen bonds between all possible donors and acceptors. g_sas was performed to determine the solvent accessible surface area. Finally,g_gyrate was used to study the protein compactness.

Genotypic analysis
The genotypic and allelic frequency of rs25487 was assessed in all study cohorts. The frequency distributions of the genotypes were in accordance with Hardy-Weinberg Equilibrium (MAF>0.05; P>0.05) (in case subjects and the controls) (Tables 1 and 2). The genotypic analysis was confirmed by sequencing and proved to be consistent with that of the (restriction) digested PCR amplicons (Figs 1 and 2); the sequence information was deposited in NCBI Gen-Bank, and the Accession Numbers were obtained [JX682562-TT /Gln haplotype (S1 File for sequence information)and KF848660-CC /Arg haplotype(S2 File for sequence information)]. The genotype frequency analysis revealed an overwhelming presence of Gln399Gln (36%) in the case subjects than in the control group (28%) (P<0.0068) (afterBonferroni correction), implicating an association of Gln399Gln with schizophrenia. Comparing the allelic frequencies between the cases and controls, the Gln399 allele (61%) frequency was evidently more in cases, than in controls (52%) (p<0.003; OR = 1.448, 95% CI = 1.132 to 1.851) ( Table 2). The power analyses revealed that this study has a power of 90% to detect a genotype relative risk of 1.5 (α = 0.05) under a multiplicative model.

Nicotine substance abuse and the variant genotype frequency
Encouraged by the observation of heavy smoking prevalence among schizophrenia patients [the rate of smoking among schizophrenia patients has been reported to be at least three to five times more than in the general population and addiction to smoking has been suggested to be an aetiological factor for the disease] [45][46][47][48], the present study examines if there exists any association between addiction to smoking and the frequency distribution of Arg399Gln. Furthermore, the demographic data obtained from the case subjects in the present study also evinced the inclination of schizophrenia subjects towards nicotine abuse. However, our analysis of the data on the cigarette smoking did not reveal any significant association (χ 2 = 1.20; P = 0.27) between the frequency of the variant (Arg399Gln) and cigarette smoking.

In silico predictive functional analysis
Our stability and pathogenicity prediction analysis using the five tools (I-Mutant Suite, iStable, SNAP, PolyPhen2, and PROVEAN) revealed the mutation Arg399Gln to be more 'deleterious' and affecting the protein stability (Table 3). Structural stability analysis performed by I-Mutant Suite suggested a 'large decrease' in the stability of the mutant protein, borne out by the free energy change DDG value -2.23. The istable analysis also predicted a 'decrease' in the structure stability due to the mutant form. Furthermore, the functional analysis performed through PolyPhen2 classified the mutant (Arg399Gln) as "most likely damaging", judged by the score of 0.979. SNAP analysis based on the evolutionary sequence and structure showed that the mutant (Arg399Gln) is 'nonneutral', suggesting that it could be deleterious. The PROVEAN analysis also predicted the mutation as deleterious.

Sequence conservation analysis
The conservation pattern of mutation at 399 th position in XRCC1 was calculated using Bayesian classifier Consurf. This server provides the evolutionary conservation profile of amino acid by first identifying the conserved position using multiple sequence alignment and later estimates the evolutionary conservation rate using Empirical Bayesian. The analysis revealed a conservation score of 7.0, suggesting that the occurrence of mutation had taken place at a slowly evolving and evolutionarily conserved site (Fig 3).

Structural visualization and insight using Molecular Dynamics
Our analysis on the impact of mutation (through Project HOPE) revealed that the change in size, charge and hydrophobicity might lead to loss of interaction with other molecules or residues (Table 4), which in turn could result in undesirable effects. Taking into consideration of the parameters such as RMSD, RMSF, hydrogen bonds, Solvent-Accessible Surface Area (SASA) and radius of gyration, molecular dynamics simulation was performed for 60 ns with a view toexamine the change (if any) in conformation behaviour of the mutant protein Arg399Gln, as compared to that of the native. Both proteins experienced initial fluctuation due to the kinetic shock applied during the molecular dynamics simulation. Interestingly, the native and the mutant XRCC1 proteins showed distinctly different types of deviation throughout the entire simulation, resulting in backbone RMS fluctuation of~0.15 nm. Overall, RMSD ranges of~0.15-0.5 and~0.12-0.82 nm were observed for the native and the mutant respectively ( Fig  4). This magnitude of fluctuation, together with the difference between the average RMSD values after the relaxation period (20 ns), suggested that simulation produced stable trajectories, thus providing a suitable basis for further analyses. With the aim of determining whether mutation affected the dynamic behaviour of each residue, the RMSF values of Cα atom of native and mutant protein were calculated (Fig 5). The analysis revealed the existence of a higher degree of flexibility in mutant structure as compared to that of the native XRCC1 protein. Moreover, interestingly, the amino acid residues present in the region of 25 to 45 residues of the protein showed high fluctuation, indicating that the mutation has affected the protein conformation leading to an increased flexibility of residues in some regions while resulting in constraints on the flexibility of the residues in other regions. Results of the hydrogen bond analysis of the native and the mutant protein performed with respect to thetime indicated that the mutant Arg399Gln has significantly less number of hydrogen bonds formation during the entire simulation as compared to the native structure. The native protein has attained a maximum of~120 hydrogen bonds while the minimum hydrogen bonds of the mutant have fallen behind~60 (Fig 6). Solvent Accessible Surface of XRCC1 proteins in the native form showed relatively larger area (~35 to~40 nm) and maintained the same throughout the simulation period. In the mutant form, however, the area was much less than~35nm (Fig 7). Protein compactness was determined through the radius of gyration. The native R g was found to be almost stable whereas the mutant has experienced continuous fluctuation all the way through; the  fluctuation shown by the mutant could be due to the disturbance in compactness of the protein (Fig 8). This supports the results obtained through RMSD and RMSF.

Discussion
This 'population-based study', depicts a first time report from an Indian population, on the existence of perceptibly high percentage of Gln399Gln genotypic (p<0.0068) and Gln399 allelic (61%; p<0.003) frequencies in case subjects than in controls, signifying the association between the SNP (rs25487) and schizophrenia. Although the SNPs at the serotoninergic and dopaminergic systems were suggested to have asignificant association with the disease [6,[49][50][51][52], not much was known of the association between the polymorphism(s) at XRCC1 locus and schizophrenia. The maiden report from the Iranian population [17] wherein an association between aberrant XRCC1 locus and schizophrenia has been suggested, and the present study on the Indian Tamil population strongly indicate that the gene(s) in the DNA repair mechanism (XRCC1) could be a potential candidate to be monitored in connection with the etiophysiology of this elusive disorder. The question as to how mechanistically the aberration in XRCC1 could contribute to the pathophysiology of schizophrenia is still enigmatic. Firstly, mRNA expression was found to occur predominantly in the brain, as evidenced from studies using baboon and rat models. Secondly, in vitro and animal model studies revealed a preferential decrease in XRCC1 mRNA in response to the use of DNA-damaging agents, reiterating its role in DNA repair [7,8]. Further, there is sufficient reason to believe that a damaged DNA repair system could promote apoptosis; thus the cell cycle progression is suggested to be arrested upon DNA damage, leading to apoptosis and elimination of the cell in question [53]. A correlation existing between polymorphism in XRCC1Arg399Gln and increased rate of apoptosisare reported in patients of ulcerative colitis [54]; and there are reports of increased susceptibility to apoptosis in schizophrenia patients [55,56]. All these findings put together; it is tempting to hypothesize that a deficient DNA repair system (impaired XRCC1, for instance) could enhance apoptosis of the central nervous system (primarily the brain, evidenced by the reduced XRCC mRNA expression, as demonstrated in rats and baboons), leading to neuropsychological disorders, including schizophrenia.
It is worth noticing that the control subjects possessed 28% of the variant genotype (the susceptibility genotype for schizophrenia). However, despite this predisposition, this set of controls did not show any clinical symptom to the disease, the exact reason of which is not clear. Arguably, such carriers could have an increased susceptibility to the disease (than the rest of the controls), if exposed to the 'pathogenic' influence of the genetic and/or the epigenetic factors. A comparable situation has been reported in the Iranian population as well; the allelic frequency for the control population was 33% against the 40.8% in case subjects [17]. Pertinently enough, we are also encouraged to recall at this juncture, the instance of breast cancer studies in African Americans, wherein the variant 399 genotype showed significant association with breast cancer risk. Arg399Gln carriers, accounting for 26% of the controls (against 35% of the case subjects), however, did not clinically express the disease [57]. Here again, the authors suggest that such carriers could have an increased likelihood of developing the disease, apparently in combination with other alleles and/or environmental factors. In a complex genetic disease such as schizophrenia, the possible involvement of multiple factors in the etio-physiology of the disease cannot be ruled out.
This first ever in silico attempt to study the impact of Arg399Gln polymorphism on XRCC1 protein has provided us with valuable clues on the structural and functional discrepancies of the native and the mutant protein. The analysis has helped us understand the deleterious nature of the mutation (Table 3). A closer look at the mutant protein (Consurf Serverand Project HOPE) suggests that the differences in size, charge, hydrophobicity and surface property Arg399Gln of XRCC1 Associated with Schizophrenia in Indian Population (compared to the native), could induce structural changes to distort the orientation of the protein functional site which in turn could render the mutant protein abnormal and deleterious. These observations, together with the results of the molecular dynamics studies implicate that the mutation (Arg399Gln) dramatically alters the structural and functional properties of the XRCC1 protein, and cause its malfunction and thus could contribute to the vulnerability to the disease.
Significantly, there has been a resurgence of interest in the recent past on the prevalence of cigarette smoking among schizophrenia patients [45,58,59]. Although there are suggestions to relate between genetic/environmental factors and nicotine addiction, and schizophrenia [45,60](the exact reason for the heavy smoking dependence in schizophrenic patients, is still unclear. Not only that, cigarette smoking/nicotine exposure is suggested to alter the efficacy of the DNA repair gene [61], but meta-analysis studies [62,63] also revealed an association between the candidate polymorphism (Arg399Gln) and smoking behaviour. These interactions existing between schizophrenia and smoking behaviour, and the polymorphism in question have given us compelling reasons to examine if there exists any significant association between the aberrant XRCC1 and smoking behaviour, leading to schizophrenia. Hence, the present study attempted to address the question whether the association of genetic predisposition (rs25487) along with nicotine addiction will increase the risk of schizophrenia or not. Further, some of the recent reviews even implicate that increased tobacco use could lead to the early manifestation of psychosis (including schizophrenia) [64,65]. However, despite the fact that the case subjects of our present study have shown anoverwhelming inclination towards nicotine abuse, our genotype analysis did not reveal any significant correlation between tobacco use (cigarette smoking) and the distribution of the variant genotype. This observation suggests that the mutation of the DNA repair system (Arg399Gln) could primarily be inherent in the patient (but not related to the nicotine substance abuse).
Another interesting aspect that merits discussion is the reduced cancer risk reported from among schizophrenia patients [66][67][68]. Although the precise mechanism underlying such a 'cancer protection' is unclear, it is apparent that the increased rates of apoptosis (one of the major lines of defence against cancer) prevalent among schizophrenia patients [56] may lead to the elimination of potential pre-malignant cells. At this juncture, a relatively recent experiment by Catts et al. [69] invites our attention, wherein the authors could observe significant differences in DNA damage response signallingin the unirradiated and irradiated lymphoblasts Arg399Gln of XRCC1 Associated with Schizophrenia in Indian Population from the schizophrenia patients and the normal controls. In unirradiatedlymphoblasts, γH2AX levels (index of DNA double-stranded breaks) were significantly higher in the schizophrenia group, compared with those of the controls. In irradiated lymphoblasts, however, radiation-induced γH2AX levels were found to be significantly reduced in patients; resultantly, no difference was found to exist between the patients and the controls with respect to the rate of DNA repair or the cell cycle distribution. That the aberrant DNA damage response signalling could play any role in protecting the patients from cancer is still ambiguous, and would merit more research.

Conclusion
To conclude, the association between the variant XRCC1 and schizophrenia, as depicted in the present study, appears to be the rule at least in the context of (Indian) Tamil population, inasmuch as the mutant 399Gln is seen significantly prevalent in patients (evident from the genotype analysis), to impart deleterious influence (judged from the present in silico predictive mutational analysis). The present observation (on the deficient DNA repair system correlated with instances of schizophrenia) could attract further attention from researchers, as it could open up new candidate (gene), putatively pivotal to the etiology of the disease.