Identification and In-Silico study of non-synonymous functional SNPs in the human SCN9A gene

Single nucleotide polymorphisms are the most common form of DNA alterations at the level of a single nucleotide in the genomic sequence. Genome-wide association studies (GWAS) were carried to identify potential risk genes or genomic regions by screening for SNPs associated with disease. Recent studies have shown that SCN9A comprises the NaV1.7 subunit, Na+ channels have a gene encoding of 1988 amino acids arranged into 4 domains, all with 6 transmembrane regions, and are mainly found in dorsal root ganglion (DRG) neurons and sympathetic ganglion neurons. Multiple forms of acute hypersensitivity conditions, such as primary erythermalgia, congenital analgesia, and paroxysmal pain syndrome have been linked to polymorphisms in the SCN9A gene. Under this study, we utilized a variety of computational tools to explore out nsSNPs that are potentially damaging to heath by modifying the structure or activity of the SCN9A protein. Over 14 potentially damaging and disease-causing nsSNPs (E1889D, L1802P, F1782V, D1778N, C1370Y, V1311M, Y1248H, F1237L, M936V, I929T, V877E, D743Y, C710W, D623H) were identified by a variety of algorithms, including SNPnexus, SNAP-2, PANTHER, PhD-SNP, SNP & GO, I-Mutant, and ConSurf. Homology modeling, structure validation, and protein-ligand interactions also were performed to confirm 5 notable substitutions (L1802P, F1782V, D1778N, V1311M, and M936V). Such nsSNPs may become the center of further studies into a variety of disorders brought by SCN9A dysfunction. Using in-silico strategies for assessing SCN9A genetic variations will aid in organizing large-scale investigations and developing targeted therapeutics for disorders linked to these variations.


Introduction
Single variant polymorphisms (SNVs) are segments of DNA with variations of only one base pair across individuals.The human genome has millions of SNPs that serve as biological markers due to the fact that individuals usually differ in one nucleotide out of every 1,000 or so [1][2][3].Over half a million SNPs in the DNA coding sequence have been correlated to the emergence of previously unknown disorders [4,5].Knowing which SNPs had an impact on the morphology was crucial for understanding the genetic basis of disease and phenotypic diversity, as well as deciding which markers to exploit in population-based association studies [6,7].Missense mutations (nsSNPs) have a significant probability of inducing phenotypic variation in humans by modifying protein expression [8,9].Many studies have found that nsSNPs account for about half of the DNA variations correlated with hereditary disorders [10].GWAS studies explore those SNPs which are more common in people with the disease in order to identify genes which may contribute to disease susceptibility [11].
Further, SCN9A gene polymorphisms have been linked to a spectrum of pain perception, from extreme insensitivity to intense hypersensitivity, by encoding the alpha monomer of NaV1.7 channels was attributed to variations in pain sensitivity such as primary erythermalgia, congenital analgesia, and paroxysmal pain disorder [12][13][14][15].Recent research have led to the discovery of variants that contribute to persistent pain disorders [15].It is evident that abnormal transcription of voltage-gated Na + channels (VGSCs) is a key mechanism causing a variety of diseases, notably migraine, pain, MS, and epilepsy [16].Over the past decade, cellular-level research has provided insight about transmembrane channels and involved in the body pain response, such as the capsicum-activated TRPV1 heat receptor [17].New studies have shown the catechol-O-methyltransferase (COMT) and tetrahydrobiopterinm role in pain perception that regulates nociceptors and chronic inflammation [18][19][20].The NaV1.7 channels, often termed as the VGSCs-IV type subunit are made up of active pore-forming αlpha-monomers and accessory βeta-subunits and are needed for the formation of electrical impulses in nerve and endothelial cells, respectively [21][22][23][24].In humans, nine distinct Na + channels have been identified, each of which is encoded by a single SCN1A-SCN9A gene [18,[25][26][27][28].
The SCN9A (NC 000002.12)gene encodes the Nav1.7 and spans 113.5 kb with 26 exons and mapped on human chromosome 2q24.3.Sodium channels produced by this gene and are mainly found in neurons of the dorsal root ganglion (DRG) and the sympathetic ganglia.Each domain of this Na + channel, which contains 1988 amino acid, is split into six membrane segments [18][19][20][29][30][31].A VGSC requires at least three subunits: an alpha-subunit and one or two βeta-subunits, which are much smaller.Each of the repetitive domains was tightly packed into the centre of the polypeptide chains, and the P-loop region between the S5 and S6 segment is essential for pore formation.There are four homology domains (DI-IV) in the alpha-subunit, and each of them splits the into six membrane segments (S1-6) [32].Over evolutionary period, 9 distinct variants of the genes that generate the alpha monomer have emerged.Such genes have a broad array of tissue specificity and structural characteristics.Several diseases were linked to variations in a certain amino acid that occurred as a result of these gene alterations.These instances are tremor, dystonia, pain perception disorders, epilepsy, paralysis, ataxia, arrhythmia, cardiac and skeletal muscles disorder, and so more.They may also be associated with specific psychological problems.Researchers have discovered that SCN9A genetic mutation can cause a diverse variety of pain sensitivity, from extreme insensitivity to extreme sensitivity [20,29,[33][34][35].In this paper, we considered various In-silico approaches for pinpointing the human SCN9A nonsynonymous polymorphisms.

Material and methods
The research methodology that was followed in this study is depicted schematically in Fig 1.

Retrieval of SNPs
Human SCN9A dbSNPs data was collected from a number of open-access databases, notably NCBI (http://www.ncbi.nlm.nih.gov) and the sequence was obtained from Uniprot (https:// www.uniprot.org).Studies explored the detrimental impacts of missense SNPs on the SCN9A gene.

Screening of disease-associated SNPs
To examine the association of screened nsSNPs with a disease, P-Mu, PhD-SNP, and SNPs &GO were performed.Any variant with a p-value of greater than 0.5 was classified as a disease-associated nsSNPs.P-MUT (http://mmb.irbbarcelona.org/PMut)allows users to access all single amino acid variants on human proteins.It predicts pathological characteristics linked with a mutation in people with an 80% accuracy [53].PhD-SNP (https://snps.biofold.org/phdsnp/phd-snp.html) with an accuracy of 78% and a score range of 0-9, this method may determine whether a mutation occurs is a benign polymorphic or is related to inherited mutations in humans [36,38,40,46,54].SNP &GO (https://snps-and-go.biocomp.unibo.it/snps-and-go)assesses amino acid changes at a given locus in a particular protein [36][37][38]46].Meta-SNP (https://snps.biofold.org/meta-snp)predicts disease when there are more than 0.5 mutations.Single predictor outputs are used as input in Meta-SNP, which achieves an overall accuracy of 79 percent [55].

Functional effects of SNPs on protein stability
To check the stability, I-Mutant (http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi), a type of vector support machine-based web server that forecasts any modifications to protein stability upon being mutated [46].As input, the SCN9A sequence and variants, temperature (25˚C) and pH (7) were submitted.It predicts the reliability index of the outcomes on a range from zero to ten, where ten signifying the highest reliability [36,38,56].MU Pro (http://mupro.proteomics.ics.uci.edu)evaluates protein sequence variations using wild and mutant residues, and a number below 0 (mutation has an effect on protein function), whereas a value larger than 0 indicates that the modification enhances protein stability [36,57].

Phylogenetic conservation analysis of nsSNPs
Using gene sequence, we calculated the conservation of amino acid sites through evolution by using ConSurf (https://consurf.tau.ac.il/consurf_index.php)[36,38,58].The Bayesian approach resulted in a cutoff of conserved scores: Grading 1-4 as dynamic, 5-6 as moderate, and 7-9 as consistent [59,60].Conserved patterns were anticipated from an input SCN9A FASTA sequence, yielding a conservation score and color scheme.For further study, we selected high-risk nsSNPs found in the highly conserved region [43].

Homology Modeling of SCN9A gene
Using the SWISS-MODEL platform (http://swissmodel.expasy.org),we built the 3D configurations of both wild-type and mutated proteins to predict structural stability and variations [61].By homology modeling methods, the native SCN9A structure was modeled, and then subjected to a single point mutation in the pymol (https://pymol.org/2)[62].The structure refinement was done by using ModRefiner (http://zhanglab.ccmb.med.umich.edu/ModRefiner)[63], and five separate refined protein models are provided at the end and each model will be downloaded as a PDB file [64].

Structural validation and RMSD calculation
The structural model was selected and subjected for the structural validation using SAVES server (https://saves.mbi.ucla.edu)The SAVES incorporates PROCHECK, and ERRAT to verify the quality of the whole 3D model.[41,65].The model's quality was also evaluated by RAMACHANDRAN plot generated by ProCHECK [36].The 3D verification evaluates the concordance between a tertiary protein structure and its primary structure [66].Afterwards TM-align (https://zhanglab.ccmb.med.umich.edu/TM-align)was used to compare wild type protein structure with mutant protein structures.In this method, we use a superposition to calculate both the template modeling score (TM-score) and the root mean square deviation (RMSD).TM-score provides a numeric value between 0 and 1, where 1 indicates a precise match between the two structures.A greater RMSD value is indicative of greater structural divergence between wild-type and mutant forms [67][68][69].The RAMAHANDRAN Plot also considered the dihedral angle of atoms in amino acid residues to pinpoint the preferred region of amino acids [70][71][72].

Protein-Ligand docking analysis
Molecular docking was performed in order to find ligand protein interaction and for finding potential ligands.For this, we docked all selected ligands with SCN9A using PyRx program (https://pyrx.sourceforge.io).The Lamarckian genetic algorithm (LGA) which incorporates AutoDock and AutoDock Vina, was applied for virtual ligand screening [73][74][75].The 10 greatest exclusive values were calculated for each ligand, with the active parameters set to the grid size of the center (XYZ axis).The AutoDock tools were used to convert the PDB files to Pdbqt format and calculate the binding affinities [76].For virtual screening, Discovery Studio (https://discover.3ds.com/discovery-studio-visualizer-download) was used for 2D and 3D interaction of ligands with protein.They showed the size and location of bonding sites, hydrogen bond interactions, hydrophobic interactions, and bonding distances of a docked ligand [77,78].

Download SNPs datasets
According to the NCBI dbSNP database, the human SCN9A gene had 335369 SNPs.Exons are DNA fragments that undergo translation after introns are eliminated during splicing, and is often utilized to refer a protein-coding regions, which is erroneous, particularly in humans, where less than 30% of exonic sequences code for proteins.It is critical to comprehend that exons and introns may also be found in untranslated regions (UTRs), both in the 5' and 3' UTRs.The study of SNPs in such UTRs sheds light on the functional and structural repercussions of uncommon exonic SNPs in the human genome.The human SCN9A gene comprises of 26 coding exons, and the 3' UTRs of SCN9A are highly similar with around 80% sequence similarity between the human and mouse genes.It highlights the significance of controlling SCN9A gene across species.From Ensembl (335369), 2,782 SNPs found in UTRs, synonymous (4,056), non-synonymous (10418), exonic (2536), intronic (240596), 3'UTR (2243), 5'UTR (529), non-coding (64753), and 255,466 SNPs became identified within coding region as shown in Fig 2 .The SCN9A SNPs were assessed further in this study to anticipate their impact on target protein, stability, and activity and only nsSNPs were evaluated for further investigation.

Functional prediction of deleterious SNPs
The retrieved 66326 missense SNPs were analyzed first using SNPnexus and assigned each one an index value.In the SIFT algorithm, 5914 nsSNPs are found as deleterious out of 5544 missense SNPs that may have a functional effect on the protein.The output value for Polyphen varies from 0 to 1, with 1 being the most damaging and 0 shows neutral behavior.Among nsSNPs, 1693 nsSNPs are possibly damaging and 4339 were benign and 4179 nsSNPs are highly deleterious as shown in Fig 3 .We selected common nsSNPs that scored 0 in SIFT and 1 in PolyPhen to ensure that only the most detrimental SNPs would be studied.We found 18 nsSNPs out of 132 that met the criterion and categorized them as high risk of affecting protein function.The complete data of commonly found 18 nsSNPs are given in Table 1.With a range from -100 to +100, SNAP-2 evaluates the no influence of one amino acid residue.SNAP-2 indicated that 3 nsSNPs including C710W (rs1358344300), D743Y (rs1186212683) and C1370Y (rs1390718765) shows a highest score of 90, 89 and 86 as shown in Table 1.
In PPh-2, 18 nsSNPs were projected to be detrimental (PSIC > 0.5); 18 of these variants were anticipated to be highly damaging, with a PSIC score of 1. MutPred2 to assess the 18 prevalent SNPs for their possible effects and were expected to be deleterious to the SCN9A  protein based on the results, with scores ranging from 0.636 to 0.948 and P values less than 0.05.Scores with P < 0.05 and g > 0.5 indicated an actionable hypothesis, scores with P < 0.05 and g > 0.75 indicated a confident hypothesis, and scores with P < 0.01 and g > 0.75 were classed as very confident hypotheses based on the mechanistic disruption induced by the nsSNPs mutations.As a result, all of the selected nsSNPs were categorized as very confident or highly probable to be detrimental hypotheses.The PANTHER tool predicts whether the nsSNPs will influence the protein function.For 18 nsSNPs, the estimated score was equal to or less than 3, leading in a probability of adverse effect greater than 0.5.From CADD values range from non-deleterious to deleterious, with greater scores indicates the highly damaging polymorphism.In addition, ConDEL prediction shows 18 nsSNPs have deleterious effect on SCN9A gene and Table 2 shows the complete forecast outcomes.

Prediction of disease causing nsSNPs
The SNP & GO algorithm identified 17 nsSNPs that were related with disease, while the mutation T1291A (rs1173515969) was graded as neutral.The prediction results are summarized in Table 3. P-Mutant indicates a certain 15nsSNPs will cause disease, which classify them as TRUE, While P1706S, T1291A and V877F predict FALSE result.As shown in Table 3, Meta-SNP predicts 2 nsSNPs (P1706S, T1291A) shows neutral effect on SCN9A protein.The prediction score (<0.5) makes the SNP neutral and disease causing (>0.5).PhD-SNP found only 5 nsSNPs were neutral including E1889D, P1706S, T1291A, I929T and M358I and remaining were disease causing, as shown in the Table 3.

Prediction of SCN9A protein stability
The DDG forecasted by I-Mutant 3.0 identified 5 mutations were expected to increase the stability of the mutant protein, whereas the remaining 13 nsSNPs were predicted to decrease the stability of the protein, hence lowering protein activity.Disease-related nsSNPs were predicted using a sequence-based method from the I-Mutant 3.0 package.Findings for the estimated structural influence of 18 potential nsSNPs were obtained from Mu-Pro servers.The findings of the protein stability assessment are listed in Table 4.

Evolutionary conservation of deleterious nsSNPs
The ConSurf analysis revealed a high degree of structural and functional conservation among all SCN9A residues.On the other hand, we focused on solely on the 9-scoring residues that corresponded to the 14 high-risk nsSNPs we discovered.According to the findings, Y1248H are highly exposed as they are functional residues.It can be shown in Table 4 that C710W (rs1358344300) and W730L (rs1010654142) are projected to be structural residues, which indicates that they are deeply buried.Fig 4 displays the data that proved these 14 high-risk nsSNPs to be truly detrimental to the structure and/or function of the SCN9A protein.

SWISS modeling for the SCN9A protein
The prediction score indicates that 14 highly-conserved mutations on the SCN9A protein were evaluated to detect the protein conformational modifications induced by them.The list of those 14 nsSNPs are as follows; E1889D, L1802P, F1782V, D1778N, C1370Y, V1311M, Y1248H, F1237L, M936V, I929T, V877E, D743Y, C710W, D623H.For comparative homology modeling the generated sequences were selected of at least >30% similarities and identities.We got 50 templates with 64.41% STML ID 6Iqa.1.A identification for the query sequence.To learn how mutations drastically alter the stability of proteins, we modeled the three-dimensional structure of the SCN9A protein using the template PDB ID 6lqa.1.A (range: 2-1978aa).
The results obtained by using a template with a quality of 6lqa.The most favored, additional allowed, generously allowed, and disallowed regions are colored in red, yellow, light yellow, and white respectively.
The above listed proteins have been downloaded with the respective PDB files and mutated by PyMol.The mutants (L1802P, F1782V, D1778N, V1311M, and M936V) showing high RMSD value as shown in Table 5. Validation of the modeled framework was performed by SAVES and RAMACHANDRAN plot evaluation was used to examine at the secondary structure.All constraints imposed by potential energy calculations were respected by the resulting structure.On a RAMACHANDRAN plot, the majority of the amino acid residues in the SCN9A protein (82.90%) were found in a highly favorable region as depicted in Fig 5 .The complete predicted results can be found in Table 5.

Molecular Docking by PyRx
To discover ligand-protein interactions, molecular docking was used by PyRx tool; we docked all of the selected ligands with SCN9A.They created ten distinct conformations for each ligand, which are characterized by binding affinity (-Kcal/mol).The docking results of ligands indicates that these binding affinities relate to their level of activity, and all 20 compounds with binding affinities are given in Table 6.We selected 9 compounds with strong binding affinities, notably batrachotoxin, carbamazepine, flecainide, funapide, naloxone, phenytoin, ranolazine, saxitoxin, and vixotrigine, and docked them with all 6 of our native and mutated protein complexes further to investigate their interactions.Discovery studio, which offers a 2D representation of all docking interactions, was used for the research.
The binding free energy of all the chosen ligands is higher than -4Kcal/mol.The highest binding energy revealed that the SCN9A protein docked successfully with Batrachotoxin.The Batrachotoxin and Funapide shows highest binding affinity -7.7 and -7.5Kcal/mol, which are greater than other ligand-binding affinities.A Batrachotoxin ligand was fixed in the SCN9A binding pocket sites by forming the conventional hydrogen bond with residues TYR 405, GLN 408, and SER 969; and Vander wall interactions with ALA 965, ASN 412, LEU 866, ASN 409,  7.
According to Table 7, polymorphisms not only alter the protein's conformation because of modifications to the number of interacting residues, but also because of alterations in hydrogen bonds and hydrophilic associations.

Discussion
The prevalence of deleterious non-synonymous substitutions in the SCN9A gene was determined by evaluating 15 different In-Silico analyses.This difference highlights the need for a comparative evaluation to accurately identify the nsSNPs that often significantly impair SCN9A gene activity.Consequently, we used a meta-analysis of 14 different computational methods to determine how to properly classify nsSNPs, from benign to detrimental.We focused on the 18 nsSNPs identified as deleterious by SNPnexus and that were classified as harmful by the 2 tools (Table 1).More than five algorithms showed high consistency in forecasting of the nsSNPs were harmful, deleterious, or disease-associated.Despite the fact that most of the 18 most damaging nsSNPs identified here were not tested in vitro, and there is no information on the functional significance of these mutations in SCN9A protein, the findings indicate that they should be prioritized for further populational and laboratory studies.The nsSNPs in the SCN9A gene that have an impact in biological processes were studied using a method that combines the forecasts of various tools to identify the most genetic mutations.When the single sodium channel (Na1.7) is inactivated, all pain sensation is abolished (apart from nerve pain).Significant effects could be anticipated in the domain of analgesia if  antagonist for Na1.7 might be identified.[79] The Na1.7 channel generally serves as gatekeeper channels, amplifying weak impulses and pushing neurons to threshold voltages where the far more complicated Na1.8 channels became activated.Because Na1.8 channels generate the majority of the current needed to produce action potentials and transduce pain, they are a key player in this pathway [80][81][82][83].Moreover, the point mutation found in SCN9A enables the channels to be triggered by less significant depolarizing events and, as a result, have increased activity [84].The SCN9A (NC 000002.12)gene encodes the Nav1.7 and spans 113.5 kb with 26 exons and mapped on human chromosome 2q24.3[18,85].Sodium channels produced by this gene and are mainly found in neurons of the DRG and the sympathetic ganglia.Each domain of this Na + channel, which contains 1988 amino acid is split into six membrane segments.[18][19][20][29][30][31] Further, SCN9A polymorphisms have been linked to a spectrum of pain perception, from extreme insensitivity to intense hypersensitivity, by encoding the alpha monomer of NaV1.7 channels was attributed to variations in pain sensitivity [12][13][14][15].A new nonsense mutant R523X, which affects the first linker that joins domains 1 and 2, was discovered in exon 10 of a Pakistani family.The same linker, S459X has previously been associated with loss of function, indicating that this site is essential for the development of the channel [18].The patient had previously been found to be a homozygous carrier of the prevalent polymorphism in the SCN9A gene, which codes for the NaV1.7-dbSNPrs6746030 (R1150W).This SNP was originally believed to be part of quantitative changes in the pain threshold in various patient cohorts rather than being associated with erythromelalgia [86].
Fig 2 shows the outcomes of a query of the NCBI and Uniprot databases for pathogenic nsSNPs.The polymorphisms defined as SCN9A-associated or pathogenic in the dbSNP database including E1889D, L1802P, F1782V, D1778N, C1370Y, V1311M, Y1248H, F1237L, M936V, I929T, V877E, D743Y, C710W, and D623H were predicted as harmful consequences in 14 tools, while 5 SNPs (L1802P, F1782V, D1778N, V1311M and M936V) were thoughts to be highly deleterious.Homology modeling with SWISS Model was used to generate a 3D model of the protein sequences of both wild-type (SCN9A) and mutants, allowing us to investigate the effects of these 5 variations.High RMSD mutants, including L1802P, F1782V, D1778N, V1311M, and M936V, are well-known to be incredibly detrimental.Thus, PyRx was brought to use for protein-ligand interaction.Using PyRx, we docked 20 ligands to both wildtype and mutated SCN9A (L1802P, F1782V, D1778N, V1311M, and M936V).The binding affinity of ligand-receptor compounds was used to evaluate the index value at each site.Docking was developed to study how ligand binding activity correlates with three-dimensional protein structure.Research method used in this research common underlying on creating an association between the modifications and their molecular consequences on the protein.Since each program/tool performs on a distinct algorithm, the results are more reliable when multiple are utilized to accomplish a single purpose.

Conclusions
Genetic analysis for identifying phenotypic or disease-associated polymorphism is a complex situation that demands for advanced techniques.We investigated a variety of methods to determine the variations in the human SCN9A gene that are likely be damaging.The voltagegated sodium-channel type IX subunit Nav1.7, which is located in peripheral neurons and is encoded by the SCN9A gene, is vital to the generation of action potentials.The SCN9A mutations were associated with primary erythermalgia, channelopathy-associated insensitivity to pain, and paroxysmal intense pain condition.In this study, 14 nsSNPs with a mutational effect on the SCN9A function and structure were found to be highly deleterious, according to the trajectory analysis and stepwise prediction of pathogenicity of nsSNPs (SNPNexus > SNAP2 > PolyPhen 2> Mutpred2>PANTHER> CADD> ConDEL> P-Mutant> Meta SNP> PhD SNP> SNP & GO).Five nsSNPs have been examined as deleterious SNPs in the SCN9A protein using a structural homology-based method by the Swiss model.In the present study, we analyzed the L1802P, F1782V, D1778N, V1311M and M936V SNPs associated with the SCN9A gene was docked with selected 9 molecules with significant binding affinities including Batrachotoxin, Carbamazepine, Flecainide, Funapide, Naloxone, Phenytoin, Ranolazine, Saxitoxin and vixotrigine and docked with native and mutant structures and visualized by Discovery studio.Future genome association studies will be able to detect damaging SNPs linked to specific patients with pain according to the findings of this study.To characterize this data on SNPs, extensive clinical trial based research on a broad population are needed, as well as experimental mutational studies for the confirmation of the outcomes.