Use of Long Term Molecular Dynamics Simulation in Predicting Cancer Associated SNPs

Computational prediction of cancer associated SNPs from the large pool of SNP dataset is now being used as a tool for detecting the probable oncogenes, which are further examined in the wet lab experiments. The lack in prediction accuracy has been a major hurdle in relying on the computational results obtained by implementing multiple tools, platforms and algorithms for cancer associated SNP prediction. Our result obtained from the initial computational compilations suggests the strong chance of Aurora-A G325W mutation (rs11539196) to cause hepatocellular carcinoma. The implementation of molecular dynamics simulation (MDS) approaches has significantly aided in raising the prediction accuracy of these results, but measuring the difference in the convergence time of mutant protein structures has been a challenging task while setting the simulation timescale. The convergence time of most of the protein structures may vary from 10 ns to 100 ns or more, depending upon its size. Thus, in this work we have implemented 200 ns of MDS to aid the final results obtained from computational SNP prediction technique. The MDS results have significantly explained the atomic alteration related with the mutant protein and are useful in elaborating the change in structural conformations coupled with the computationally predicted cancer associated mutation. With further advancements in the computational techniques, it will become much easier to predict such mutations with higher accuracy level.


Introduction
Most genetic variations in human were portrayed by single nucleotide polymorphisms (SNPs), and most of them were thought to cause phenotypic differences amid individuals. Due to the use of high-throughput sequencing methods, the amount of identified variants in the human genome was rising rapidly, but studying their phenotypic outcome was a laborious task. Moreover, processing of these vast amounts of SNPs needs the improvement of regular annotation tools. Detecting pathological genetic variants were helpful in the organization of genome level study, establishing the hope on the way of target based therapies and tailored medicine. The ability to differentiate between pathogenic and neutral nsSNPs (non-synonymous Single Nucleotide Polymorphisms) using computational approaches could significantly aid in targeting the disease associated mutations. In last two decades, extensive computational efforts had been provided to study the functional and structural consequences of the SNPs [1]. The primary efforts to recognize the patterns of SNPs taking place in the coding regions of genes were studied by Cargill et al. [2] and Halushka et al. [3]. Cargill et al. characterized SNPs in 106 candidate genes having potential relevance to various human diseases [2]. Several reports had very precisely explained the occurrence and consequences of disease-associated alleles on the protein structures. Terp et al. [4] reported the structurally relevant features common in disease-associated mutations from the Human Gene Mutation Database (HGMD). Vitkup et al. [5] examined the frequency of mutations associated with diseases in detail. Furthermore, the detailed analysis of the biophysical and evolutionary distributions of the diseaseassociated mutations was demonstrated by Ferrer-Costa et al. [6]. Stitziel et al. [7] further analysed the locations of nonsynonymous disease-associated polymorphisms by organizing the mutational data into structural classes. Moreover, Mooney et al. [8] demonstrated the majority of occurrence of diseaseassociated mutations in the highly conserved genomic locations. Furthermore, Saunders and Baker [9] applied decision trees and a linear logistic regression to find that a protein structure-derived solvent accessibility term (Cb density) and an evolutionary term derived from a PSSM matrix (SIFT) to examine different features associated with intolerant mutations. Moreover, Krishnan and Westhead [10] applied decision tree and support vector machine (SVM) techniques to the in vitro mutagenesis data sets as well as to SNPs in the nematode worm species Caenorhabditis elegans to examine their effectiveness in SNP characterization. These efforts were further accompanied by several other researches that demonstrated the effective use of computational tools and algorithms for accurate characterization of SNPs [1]. Path to deleterious SNP prediction has now advanced from its previous form. Apart from reporting the deleterious nsSNPs, now the calculation of phenotypic changes had further raised the prediction accuracy level. The previous computational SNP analysis pathways lack the genotype-phenotype correlations which were considered as the major disadvantage of the result. To overcome these limitations, our lab studied the structural consequences of deleterious predicted mutations. Changes in amino-acid composition had wide role in affecting the native conformation of the protein structure.
The vibrant nature of proteins was closely linked to its function, with key conformational changes usually taking place on timescales, varying from microseconds to several seconds. At a scrupulous time step, protein attains particular conformation that occupies a least amount on its free-energy landscape. When energy fluctuates from one minimum to another, it leads to alterations in the protein structure which controls their structural fluctuations and function. Thus to examine these conformational alteration occurring due to predicted deleterious alleles, we further incorporated molecular dynamics calculations for native and mutant protein structures. Molecular dynamics (MD) simulation is an absolute method for modelling protein dynamic motions, the classification of which gives perception into the workings of macromolecular systems at spatial and temporal scales that are hard to access empirically. The use of molecular dynamics approach had been a leading platform to determine the phenotypic consequences induced by point mutations [11][12][13][14]. Our previous results had shown high accuracy of deleterious nsSNP prediction in CENPE and MCAK protein coding genes when MDS (Molecular Dynamics Simulation) was incorporated [15,16]. On the technical side, depending on the size of the protein, the simulation might take a very long time to converge (e.g. a hundred nanoseconds). Thus, even the most 'state-ofthe' art or computationally intensive methods might not give correct results. Moreover, folding was typically much slower than what presently simulated through direct application of allatom MD simulations. Most simulation studies of protein folding had relied on reduced models, implicit solvent models, protein unfolding or the inclusion of experimental data into the simulations [17]. Extended MD simulations were liable to play a vital role in modelling the long-timescale native and mutant state dynamics and their association with protein function. Thus, we carried out a long term molecular dynamics simulation to aid the computational SNP prediction results. Aurora kinases were chosen to detect the cancer associated SNPs due to the recent cases showing their involvement in cancer associated pathways.

Dataset collection
Human Aurora kinase protein sequence data was collected form national centre for biological Information (NCBI) protein sequence database [18]. SNP information for our computational analysis was obtained from NCBI dbSNP (http://www.ncbi.nlm. nih.gov/snp/) [19]. Structure of Aurora-A kinase was obtained from Brookhaven Protein Data Bank [20] (PDB ID: 1MQ4). The mutant structure was build using homology modelling technique through Modeller9v9 package [21]. Modelled structures were refined by means of loop refining, checking wrong bond contacts and adding hydrogen atoms. The best selected structure was energy minimized by charmm27 force field for 5000 iterations using Gromacs 4.5.3 package [22].

Disease-associated SNP prediction
There are likely chances of amino acid variants to cause pathological phenotype, which might directly or indirectly lead to cancers. We used SIFT [23], Polyphen-2 [24], PhD-SNP [25], Pmut [26], MutPred [27], Dr Cancer [28], Fathmm [29] and SNP Function Portal [30] to detect the amino acid variants which are likely to cause cancer. SIFT and Polyphen-2 is used to filter deleterious and harmful SNPs from the large pool of SNP dataset. Moreover, PhD-SNP and Pmut was used to filter disease associated SNPs. Furthermore, Dr. Cancer, Fathmm and SNP Function Portal was used to predict the cancer associated SNPs. MutPred was used to examine the molecular changes occurring in the protein induced by a particular amino acid variant.

Author Summary
The genetic mutations in human were shown by single nucleotide polymorphisms (SNPs), and several of them were supposed to cause deleterious and disease associated phenotypic effects. The ability to differentiate between pathogenic and neutral nsSNPs (non-synonymous Single Nucleotide Polymorphisms) using computational approaches could significantly help in targeting the disease associated mutations. In this work we applied computational tools to prioritize the most deleterious disease associated mutation in Aurora kinases. We used sequence and structural based approaches to refine cancer associated mutation and applied a long term simulation (MDS) to understand the changes in structural conformation and function of the aurora kinases upon mutation. From the 200 ns MD simulation, we suggested that the G325W mutation induced major phenotypic damages in aurora-A kinase protein and altered its structural behaviour in 3D space, which might play an important role in inducing hepatocellular carcinoma.

Phenotype analysis
SNPeffect4.0 tool was used to detect the phenotypic changes induced by the computationally predicted cancer associated mutation [31]. It reports the changes in TANGO, LIMBO, WALTZ and ddG scores for the given mutation. Furthermore, the meta-analysis tool of SNPeffect4.0 used to map these changes for different cancer type.

Cation-p sites and stabilizing residues
Cation-p interactions were calculated by using CaPTURE program [32]. The CaPTURE program identifies energetically significant cation-p interactions within proteins in the Protein Data Bank (PDB). Cation-p interactions in Aurora_A kinase structures are evaluated with default parameters which were mentioned in our previous work [33].  Identifying stabilizing residues SRIDE tool was used to identify the stabilizing residues in Aurora-A protein [16,34]. The prediction is done by calculating 4 essential criteria that involves surrounding hydrophobicity of a residue, the long-range order, stabilization centre defined by considering the conservation scores of residues and the contact map of a protein [16]. Two residues are in contact if there is at least one pair of heavy atoms with a distance less than the sum of the van der Waals radii of the two atoms plus 1.0 Å [16]. A contact is measured long-range if it is between residues that are separated by at least 10 residues in the protein sequence [16]. Two residues are stabilizing centers (SC) elements if they are involved in long-range contacts and if at least one supporting residue can be found in each of the flanking tetra-peptides of these residues, in such a way that at least seven out of the possible nine interactions are formed between the two triplets [16]. Combinedly, all these criteria are compiled to report the stabilizing residues [16].

Molecular docking analysis
We used molecular docking analysis to carry comparative analysis between TPX2 binding affinity of native and mutant Aurora A protein using HADDOCK tool [35,36]. HAD-DOCK is information driven flexible docking approach to perform the molecular docking simulation and it runs a series of scripts in combination with ARIA [37,38] and CNS [39]. The interacting active residues of Aurora-A and TPX2 for docking simulations were obtained from our previous work [40].

Molecular dynamics simulation
Molecular Dynamics Simulation was carried out by using Gromacs 4.5.3 package [22]. Structure of native and mutant Aurora-A kinase was used as starting point for MD simulations. The simulation parameters were set according to our previous work conducted on Aurora-A protein [40] and other proteins [15,32,40]. Systems were solvated in a rectangular box with TIP3P water molecules at 10 Å marginal radius. The systems were neutralized by added 3 sodium ions (Na + ) to the simulation box using the ''genion'' tool that accompanies with Gromacs package. Energy minimization was performed for 5000 iterations by conjugate gradient method implementing GROMOS96 43a1 force field. Emtol convergence criterion was set to 1000 kJ/mol/nm. Berendsen temperature coupling method [41] was used to control the temperature within the box. Electrostatic interactions were calculated using the Particle Mesh Ewald method [42]. The systems were subjected to position restraint simulation for 5 ns and after that unrestraint simulation for 200 ns. We then computed the comparative analysis of structural deviations in native and mutant structure. RMSD, RMSF, SAS and Rg analysis were carried out by using g_rms, g_rmsf, g_sas and g_gyrate tool respectively. Number of distinct hydrogen bonds (NHbonds) was calculated using g_hbond. Moreover, we implemented g_densmap to obtain the atomic density distribution of the native and mutant protein. All graphs were plotted using Grace GUI toolkit 5.1.22 version. Further we carried out principal component analysis using essential dynamics (ED) method according to protocol [43] within the Gromacs software package. This section is an abbreviated version of our previously published work [15,32,40].

Results/Discussion
Total 60 nsSNPs were computationally examined to detect their harmful and damaging properties. Out of 60 SNPs, 24 were calculated to be deleterious as well as damaging using SIFT and Polyphen2 servers (Table 1). Among these 24 nsSNPs, 7 were accounted to be exceedingly deleterious and damaging with tolerance index of zero and PSIC score 1. To further categorize the expected deleterious nsSNP's as cancer-associated, we used PhD-SNP, Pmut, MutPred, Dr Cancer, Fathmm and SNP Adenocarcinoma. This large number of cancer-associated predictions directly suggested that the G325W Aurora-A kinase mutation was likely to be associated with cancer cases. Fig. 1 showed the position of mutant residue in Aurora-A kinase protein.
Without studying the phenotypic evidences associated with a particular mutation, the conclusion derived from computational studies are not easily reliable. Thus we implemented SNPEffect4.0 to examine the phenotypic consequences of G325W mutation at molecular level. Significant fluctuation in the LIMBO score for G325W mutation was observed. For G325W mutation, dLIMBO equaled 2335.67 which meant that the mutation decreased the chaperone binding propensity of the protein. Fig. 2 represents the position of the chaperone-binding sites in the protein. In Fig 2, it could be clearly seen that the mutation had shifted the chaperonebinding sites from the 7 th stretch to the 6 th stretch. In table 2, we could see that the 6 th stretch had shown significant rise in the chaperone-binding tendency where in native, the LIMBO score of 6 th stretch was 5.66 whereas in mutant the LIMBO score increased to 82.59. Furthermore the 7 th stretch showed significant loss in the chaperone-binding tendency in mutant when compared to the native. Moreover, the 5 th chaperone-binding stretch was completely absent in the mutant protein, whereas it was shown to be active in native Aurora-A kinase. Molecular chaperones are protein molecules that support in the non-covalent folding or unfolding and the assembly or disassembly of other biomolecular structures. Furthermore, it disallowed both newly synthesised polypeptide chains and assembled subunits from aggregating into non-functional structures. Decrease in chaperone-binding tendency of Aurora-A might severely damage the folding pattern of the protein. Moreover, it would lead to the formation of nonfunctional protein aggregates which might cause disease-associated phenomena. Furthermore, major stability loss was observed in the case of mutant when FoldX predictions from SNPEffect4.0 were taken into account. The mutant showed rise in ddG value of 7.28 kcal/mol. This implied that the mutation severely reduced the protein stability. Based on the dLIMBO and ddG scores it was evident that the mutation had induced loss in the chaperonebinding tendency of protein which further led to severe conformational changes in the protein motif and reduced its stability. We subjected these scores for the meta-analysis in order to detect the type of cancer which was most likely to occur in the case of G325A Aurora-A kinase mutation. The dLIMBO and ddG scores were plotted for previously reported cancer cases. Our dLIMBO and ddG scores were shown to lie in the dLIMBO and ddG distribution range of hepatocellular carcinoma associated mutations (Fig. 3). For other cancer cases, no positive distributions for the above obtained scores were observed. This showed that the G325W mutation might play an important role in inducing hepatocellular carcinoma associated phenotypes, whereas no significant correlation was observed for other cancer cases. Other scores obtained from SNPEffect4.0 did not have much significance.
By the interaction with TPX2, the catalytic activity of Aurora-A protein increased up to 15-fold [44]. TPX2 bound to and localized Aurora-A to spindle microtubules along with microtubules on the periphery of spindle poles [45]. To examine the changes in interaction pattern of mutant Aurora-A with TPX2 when   compared to the native, we conducted molecular docking analysis.
In-depth analysis of the docked complexes revealed notable differences between interaction patterns of native and mutant Aurora-A protein with TPX2. As we described in [14], computation of interaction energy was very important to comprehend the attraction level of biological partners. Overall interaction energy amid Aurora-A (native/mutant) and TPX2 protein mainly contributed to their electrostatic and van der Waals interaction energy. In native, there were a major contribution of electrostatic and van der Waals energy of 2 97.7+/224.1 kcal/mol and 241.0+/25.6 kcal/mol respectively ( Table 3). The total protein-protein interaction energy of native complex was 2153.17 kcal/mol (Table 3). All the results obtained from docking analysis were confirmed from the results obtained in our previous work on Aurora-A kinase [40]. Electrostatic and van der Waals energy confirms large amount of surface complementarities amid native Aurora-A and TPX2. On the contrary, mutant complex showed less van der Waals and electrostatic energies of 217.9+/22.1 and 259.3+/214.8 kcal/ mol, respectively, and relatively less amount of total interaction energy 292.96 kcal/mol when compared to the native (Table 3). More negative value of interaction energy for native as compared to mutant indicated its high affinity towards TPX2. Approximations of van der Waals interaction energy were computed to provide a theoretical quantitative estimation of the proteinprotein non-bonded interactions [14]. Result showed that the mutation induced damaging consequences in native conformation of Aurora-A kinase that in turn reduced its overall binding affinity to TPX2 protein.
As we described in [14], buried surface area (BSA) is a criterion to assess protein surface which is not exposed to water and more BSA showed a compact macromolecular complex. In table 3, we have showed the BSA of native and mutant complexes as 1201.6+/232.9 and 758.5+/228.8 respectively. Consideration of desolvation component (the loss of interactions with the water phase) was important because it overcompensated the interaction energy and resulted in an opposite effect [14,20]. Restraints violation energy, desolvation energy and BSA showed good correlation with interaction energy and docking score of complex during docking simulation (Table 3). Major difference between docking score, interaction energy and BSA were found in mutant than native. The net values of native were significantly higher than mutant, indicating the loss-of-function due to mutation in Aurora A which disrupted the binding with TPX2 when compared to mutant (Table 3). Docking process solely depended on the position of binding residues (active residues) on the protein surface. It showed that divergence from the actual location of binding residues (position number 155, 157, 159 and 197) in mutant Aurora-A might be a reason for the loss of binding with TPX2. The three factors namely, score difference in docking process, alteration in interaction energy and in buried surface area of complex possibly corresponded to conformational variation of protein-binding surface due to mutation.
The ddG scores obtained from SNPeffect4.0 FoldX prediction implicated loss in the stability of protein. Thus we implemented MDS calculation to observe the molecular alterations associated with the stability loss. We explored RMSD, Rg, RMSF, SASA, NHbond, total energy and density distribution of mutant and native structure. RMSD for all the Ca atoms were calculated from the starting structure which was described as the central origin to compute the protein system (Fig. 4) [46]. In Fig. 4, native and mutant protein showed diverge fashion of deviation from starting, resulting in final backbone RMSD of ,0.172 nm and 0.252 nm respectively during the simulation. Mutant lingered distinguished till the end and exhibited higher aberration than native. This level of fluctuation together with a minute difference in average RMSD value after the relaxation time led to the outcome that the simulation produced stable trajectory, thus providing a suitable basis for further investigation. To observe the affect of mutation on the dynamic behaviour of residues, the RMSF values of mutant and native backbone residues were computed (Fig. 5). Inspection of fluctuation score exposed the occurrence of higher degree of flexibility in mutant. Mutant curve differed significantly and fluctuated at higher degree during the simulation time period, indicating that mutant conformation was flexible throughout the   simulation time and its structure acquired expanded conformation when compared to native.
As we described in [46], the radius of gyration (Rg) is the massweighted root mean square distance of group of atoms from their common centre of mass. Hence it provided an observation into global dimension of protein. Radius of gyration graph for alphacarbon atoms of protein vs time at 300 K is depicted in Fig. 6. We observed a major fluctuation in both native and mutant between a time periods of 0 to 200 ns (Fig. 6). Based on Rg graph, native structure was found stable than mutant. The alteration in SASA of mutant and native protein with time is depicted in Fig. 7. Mutant protein showed higher value of SASA with time, while native indicated lower SASA value. Longer variation in Rg plot showed that the mutant protein might be enduring a major structural transition. This was further maintained by SASA result in which the mutant was found to reveal higher values (Fig. 7). Hydrogen bond plays a major factor on controlling the steady conformation of protein. Time dependent NHbond formations during simulation were observed in native and mutant proteins in order to comprehend the relationship between hydrogen bond formation and flexibility. Mutant structure demonstrated significantly less number of NHbond formation throughout the simulation when compared to native (Fig. 8). Furthermore, the energy spectrum provided a detail insight into the stability of a particular structure. We investigated the overall energy variation of native and mutant protein throughout the simulation. A significant rise in energy level of mutant structure was observed (Fig. 9). It provided a clear indication of loss of the stability in mutant structure. Moreover, the observation of corresponding eigenvalues showed the level of variation and dynamic nature of protein molecule in the simulation system and was mostly restricted within the first two eigenvectors. Our result showed higher range of eigenvector trajectory covered by mutant in comparison with native (Fig. 10).
Further information on the structural plasticity of protein was found by the analysis of time-dependent secondary structures and structural fluctuations. Fig. 11 showed the secondary structural elements as a function of simulation time. Coils, b-sheets, bends and turns were found in both native and mutant protein during simulation time period and it is depicted in fig 11. In native Aurora-A kinase, the residues between position 238-275 appeared as alpha-helix, b-sheets and coils with few bend conformation till the end of the simulation but in mutant Aurora-A this region showed beta-sheets and turns with more traces of bend conformation. The severe conformational changes in this region supported our previous results obtained from SNPEffect4.0, where the loss in chaperone-binding tendency of this region was observed. In Comparison to native Aurora-A kinase, mutant showed significant structural changes between a region of residues 238-275, 324-334 and 348-360 during simulation. Between residues 324-334, native showed more a-helix whereas in mutant it attained bend with few traces of turns. Between the residues of 348-360, a-helix and coil conformation formed the majority of structure whereas in mutant, the coil conformation was only seen. In other regions, very slight changes in secondary structure conformation were observed.
In SNPeffect4.0 results, we observed that the chaperone-binding location shifted from the 7 th stretch to the 6 th stretch. Moreover, the chaperone-binding activity of 5 th stretch was shown to be completely lost. Thus we plotted the atom density distribution to check if these shifting had caused any major changes in the orientation and atomic distribution of 7 th and 5 th chaperonebinding stretches. The plot indicated major conformation loss in mutant as its atomic distribution differed significantly from the native. For the 5 th stretch the mutant structure confirmed a highest atomic density of 6.24 nm 23 whereas in native it was 7.76 nm 23 (Fig. 12). Moreover, for the 7 th stretch, the mutant structure exhibited a highest atomic density of 7.23 nm 23 whereas in native it was 10.3 nm 23 (Fig 13), which was in accordance to the DSSP results, further indicating major conformational loss in these segments.
The above results strongly indicated the loss of stability of the mutant structure. To examine the molecular causes associated with these outcomes, we further investigated the cation-p interactions and the stabilizing residues in Aurora A protein. Cation-p interaction has significant contribution in protein stability. In proteins, cation-p interactions occur between the aromatic side chains of phenylalanine (F), tyrosine (Y), and tryptophan (W) and the cationic side chain of lysine (K) or arginine (R). It was likely to have a important role in the folding or stabilization of the proteins. The significance of cation-p interaction had been observed in several researches for their corresponding role in retaining the stability of proteins. We observed a total of 5 energetically significant cation-p interactions (Fig 14). To scrutinize if the cation-p interactions were maintained in mutant structure, we further examined the bond length variations of the outlying cation-p  interaction. The bond length fluctuation showed that the Arg232-Phe238, Lys240-Tyr256 and Arg357-Phe348 cation-p interactions had undergone major interaction losses (Fig 14). After 32900 ps there was a major loss in Arg232-Phe238 cation-p interaction whereas after 28400 ps there was a significant loss in Arg357-Phe348 interactions (Fig 14). Mutant trajectory of Arg371-Trp313 bond length was comparatively higher when compared to the native and maintained a constant trajectory during the simulation. Arg232-Phe238 and Lys250-Tyr246 interaction were retained in the mutant structure. These results suggested that there was a major loss in cation-p interactions in the mutant structure which might have induced higher fluctuation and lower in stability of the structure. Furthermore, the protein structure contains certain SC that acted as an essential component to maintain the stable conformation of corresponding protein structure. Residues Ala160, Leu161, Lys162, Val163, Asn192, Leu210, Leu263, Lys271, Ile272 and Ser278 were identified as stabilizing centers. When the RMSF of these residues were investigated, we observed significant rise in their fluctuation levels (Table 4). This further indicated that apart from damaging the ATP binding domain and the cation-p interactions, the mutation had also introduced major deflection in the Aurora A protein SC, which might have significantly contributed towards phenotypic changes observed in mutant G325W structure. All these biophysical and biochemical factors obtained from 200 ns MD simulation had collectively suggested that the G325W mutation had strong evidence of inducing phenotypic damages in Aurora-A kinase protein, and might play significant role in inducing hepatocellular carcinoma. These results could be further implemented for wet lab investigations and could provide an evidence of Aurora kinase genetic mutation in association with cancer.

Conclusion
By the use of multiple computational platforms and longterm molecular dynamics simulation approaches, we identified Aurora-A G325W mutation as highly deleterious and associated with hepatocellular carcinoma. The RMSF, Rg, DSSP and energy fluctuation results provided a clear insight of    stability loss and conformational changes in mutant protein.
The results showed that the long-range computational simulations provided clear picture of conformational changes occurring in the computationally predicted disease associated mutant protein structure. Through the advancements in the computational resources, the use of MDS for higher time scale would become approachable. Dedicated computer for biomolecular simulation will most likely bring further speedups in future, allowing the length of each simulation to move well ahead of the large timescale. These advancements will aid in studying the changes in the dynamics behavior of computationally predicted disease-associated SNPs. Moreover, the advancements in high-performance molecular dynamics simulation of biomolecules will provide new prospects for the use of in silico simulations to observe mutationinduced changes in biophysicochemical properties of protein, especially for those parameters, which were well-known to occur at long timescales. Furthermore to the capacity to produce very long-range incessant simulation trajectories, these advances will also greatly enhance the accuracy of SNP predictions, ultimately allowing us to examine the changes occurring at the atomic state in a very precise manner [46].

Supporting Information
Table S1 Cancer-associated nsSNPs predicted using PhD-SNP, Pmut, MutPred, Dr Cancer and Fathmm server. g score, P score, molecular changes and prediction were obtained from MutPred server. SEQPROF results were obtained from Dr Cancer server. Allele highlighted in bold has been predicted to show Cancerassociated SNPs. (DOC)