A computational in silico approach to predict high-risk coding and non-coding SNPs of human PLCG1 gene

PLCG1 gene is responsible for many T-cell lymphoma subtypes, including peripheral T-cell lymphoma (PTCL), angioimmunoblastic T-cell lymphoma (AITL), cutaneous T-cell lymphoma (CTCL), adult T-cell leukemia/lymphoma along with other diseases. Missense mutations of this gene have already been found in patients of CTCL and AITL. The non-synonymous single nucleotide polymorphisms (nsSNPs) can alter the protein structure as well as its functions. In this study, probable deleterious and disease-related nsSNPs in PLCG1 were identified using SIFT, PROVEAN, PolyPhen-2, PhD-SNP, Pmut, and SNPS&GO tools. Further, their effect on protein stability was checked along with conservation and solvent accessibility analysis by I-mutant 2.0, MUpro, Consurf, and Netsurf 2.0 server. Some SNPs were finalized for structural analysis with PyMol and BIOVIA discovery studio visualizer. Out of the 16 nsSNPs which were found to be deleterious, ten nsSNPs had an effect on protein stability, and six mutations (L411P, R355C, G493D, R1158H, A401V and L455F) were predicted to be highly conserved. Among the six highly conserved mutations, four nsSNPs (R355C, A401V, L411P and L455F) were part of the catalytic domain. L411P, L455F and G493D made significant structural change in the protein structure. Two mutations-Y210C and R1158H had post-translational modification. In the 5’ and 3’ untranslated region, three SNPs, rs139043247, rs543804707, and rs62621919 showed possible miRNA target sites and DNA binding sites. This in silico analysis has provided a structured dataset of PLCG1 gene for further in vivo researches. With the limitation of computational study, it can still prove to be an asset for the identification and treatment of multiple diseases associated with the target gene.


Introduction
Single nucleotide polymorphisms (SNPs) are the most common genetic variations found in humans (3-5 million) [1]. It is a type of polymorphism in which a single nucleotide differs between individuals. SNPs of coding region cause the change in amino acid sequences, resulting in an alteration of protein function and hence are termed non-synonymous SNPs a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 (nsSNPs). It has been proven that these mutations show molecular effects with actual phenotypes [1]. Half of the SNPs are nsSNPs and these nsSNPs can affect the protein, both structurally and functionally [2,3]. Moreover, mutations in the highly structured non-coding regions of the gene can have a significant impact on gene expression. Mutations in the 5' and 3' untranslated region can alter the secondary structure of the protein, and thus the binding of proteins and ligands to these regions [4].
Phospholipase C gamma-1 (PLCG1) gene has been found associated with noteworthy T-cell lymphomas like peripheral T-cell lymphoma (PTCL), angioimmunoblastic T-cell lymphoma (AITL), cutaneous T-cell lymphoma (CTCL) and adult T-cell leukemia/lymphoma [5][6][7][8][9]. It has also been linked to two subtypes of CTCL-Sezary syndrome and Mycosis fungoides (MF) [10,11]. Moreover, the mutation of this gene has been found responsible for diseases like bipolar disorder and angiosarcoma [12,13]. The protein, Phospholipase C gamma-1 (PLCγ1) encoded from the PLCG1 gene creates inositol 1,4,5-trisphosphate (IP 3 ) and diacylglycerol (DAG) from phosphatidylinositol 4,5-bisphosphate (PIP 2 ). It is located on chromosome 20 with eight domains. It is bound with calcium while catalyzing the reaction [14]. PLCγ1 also mediates DNA and mRNA synthesis in the process [15]. Epidermal growth factor receptor (EGFR) activates PLCγ1 and helps in cancer cell mitogenesis [16]. It is also suggested that the binding of EGFR-PLCγ1 through SH2 domain is needed for cell cycle progression [16]. An exciting fact is that PLCγ1 can also inhibit cancer cell proliferation by binding with JAK2 and PTP-1B. These two opposite characteristics of the protein make the study of the target gene much more intriguing [14]. The production of DAG and PIP 2 is in downstream signaling of the T-cell receptor (TCR) pathway, where mutation may cause AITL. S345F and G869E missense mutations have already been found in cases of CTCL and AITL [7]. R707Q missense mutation was found in angiogenesis based lymphoedema angiosarcoma. It is proposed that constitutive angiogenesis signaling driven by PLCγ1 may be the underlying reason for this [13]. K713N missense mutation was found in a sample of MF patient where NF-κB (nuclear factor kappalight-chain-enhancer of activated B cells), NFAT (Nuclear factor of activated T-cells), and STAT3 (signal transducer and activator of transcription proteins-3) pathways were activated together [11].
No in silico analysis of the gene PLCG1 has been done till now to find all the possible nsSNPs related to the functional and the structural change of the protein. Therefore, the primary purpose of this study was to find possible coding and non-coding SNPs, which can affect the protein function by utilizing various computational approach and bioinformatics tools. These tools find out the possible conserved residues, mutations with the chance of most functionality, possible altered molecular mechanism, structural change in the protein, decreasing protein stability, post-translational modifications (PTM), and other predictable changes to recognize the most significant SNPs [17,18]. Now-a-days such computational research has become popular to find pathogenicity of genes, such as CSN3, RETN, FOXC2, CHK2 and so on [17][18][19][20]. Through our study, it may be possible to identify and predict new SNPs that can be associated with possible diseases.
In this study, we have utilized a number of in silico tools to comprehensively characterize the coding and non-coding SNPs located at the PLCG1 gene. We have shortlisted the most significant nsSNPs and further validated their structural impact through structural analysis. We identified four potentially deleterious nsSNPs (R355C, A401V, L411P and L455F) through our analysis, which form a part of the catalytic domain of PLCG1. Among these, L411P L455F made significant structural changes in the protein structure. Our analysis will provide the framework for further in vitro and case-control studies to validate the structural and functional impact of the SNPS in the PLCG1 gene.

Dataset collection of SNPs
The nsSNP dataset of our target gene PLCG1 was collected from the dbSNP database (https:// www.ncbi.nlm.nih.gov/snp/). After searching for the gene, a missense filter was used to get the nsSNPs. The protein sequence for the gene (FASTA format) was retrieved from the UNIPROT database. To get unique results in our study, SNPs of protein ID ENSP00000362368 were selected. This isoform has been chosen as the canonical sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry [21][22][23]. For analyzing the non-coding region SNPs, the dataset was collected from ENSEMBL database for the above-mentioned protein ID.

Detection of deleterious and functional SNPs
Four tools were used to find out the deleterious functional nsSNPs of our dataset. Sorting intolerant from tolerant (SIFT) was adopted in the study to predict whether an amino acid substitution is deleterious or tolerant based on protein conservation with the homology sequence and physical properties of amino acid. Substitution with a probability score of less than 0.05 is considered to be deleterious or functional [24]. Protein variation effect analyzer (PROVEAN) predicts the functional effect of single or multiple amino acid substitutions, insertions, and deletions. The cutoff value for the substitution to be deleterious is -2.5. Anything above is counted as neutral or non-deleterious [25]. Polymorphism Phenotyping version 2 (PolyPhen-2) is a similar kind of tool which predicts damaging missense mutations using multiple sequence alignment and structural information [26]. Protein analysis through evolutionary relationship (PANTHER) does an evolutionary analysis of coding SNPs to find the damaging amino acid substitutions [27].

Disease related SNPs
The common nsSNPs found to be deleterious in all four previous tools, were then run in 3 disease-associated SNPs predicting tools. Predictor of human Deleterious Single Nucleotide Polymorphisms (PhD-SNP) uses support vector machine (SVM) method to predict whether a phenotype of nsSNP can be related to any disease associated conditions. The output of the result comes with a reliability index predicting if the SNP is disease-causing or neutral [28]. Pathogenic mutation prediction (Pmut) server uses a neural network-based predictor which is trained by a manual database SwissVar to predict if a mutation is associated with a disease or not. A prediction scoring from 0.5-1 is termed as disease-causing [29]. SNPS&GO is another SVM based tool which predicts a mutation to be disease-causing based on the protein sequence as well as the protein structure (when available) and gene ontology terms [30,31].

Prediction of change in protein stability
The common SNPs found to be disease-causing were then run to check protein stability. The deleterious nsSNPs with decreasing protein stability are considered as substantial. I-mutant 2.0 and MUpro were used to predict the change in protein stability due to the mutations. I mutant 2.0 is another SVM based tool that provides the free energy change (Delta Delta G) value and predicts the sign as increase or decrease. A Delta Delta G (DDG) (kcal/mole) value <0 means a decrease in the protein stability, whereas DDG (kcal/mole) value >0 means an increase in the protein stability [32]. MUpro uses two methods: SVM and neural networks. However, SVM method result is recommended. A confidence score <0 indicates a decrease in protein stability, and a confidence score >0 indicates an increase in protein stability with the mutation [33].

Prediction of the molecular mechanism of pathogenicity
The common SNPs found to be disease-related from PhD-SNP, Pmut and SNPS&GO were run in Mutpred2 server. It is a server that can predict the pathogenicity of the substitutions with a detailed molecular target and affected mechanisms. It uses multiple neural networks, and the final score is the cumulative results from all of them, ranging from 0 to 1. The closer the result is to 1, the higher the chance of the substitution to alter its stability. The threshold of p value was set at 0.05; only substitutions with p value equal or less than this were collected [34].

Prediction of post-translational modification
ModPred server was used to see if there were any post-translational modification (PTM) sites in our common target SNPs, which were found to be disease-causing. This server uses a collection of datasets containing 278,703 PTM sites. The tool then assesses those PTM sites for multiple protein sequences. The output gives potential PTM sites for each residue with a confidence score. Only high and medium confidence score PTMs were taken into consideration [35].

Sequence conservation and solvent accessibility
Again, the disease-causing SNPs' conservation and solvent accessibility were checked. Consurf predicts the evolutionary conservation of residues of a protein sequence. It estimates the evolutionary rate of the amino acids and further can anticipate if the substitution has any structural or functional effect along with a conservation score ranging from 1-9. Here score 9 indicates the most conserved amino acid, whereas scoring 1 to the most variable. It also provides solvent accessibility of the amino acid residues where the results show if the amino acids are buried or exposed. Generally, they evaluate their result from protein structure, but as our structure was not available on PDB, they predicted the result through protein sequence (Conseq) [36]. Netsurf 2.0 was also used to predict the solvent accessibility of the amino acid residues. It uses a neural network that has been used on protein structures and shows the buried and exposed regions of the protein [37].

Mutation cluster prediction
Mutation3D is a web-based tool to identify clusters of amino acids which can arise from somatic mutation. It can predict driver genes for mutation, separating the functional SNPs from the nonfunctional ones. The common SNPs found to be disease-related from PhD-SNP, Pmut, and SNPS&GO were put along with query sequence to identify possible clusters [38].

Structural analysis
Homology modeling by SWISS-MODEL. The target protein structure was not available on PDB, so the homology modeling of the protein was done by the SWISS-MODEL server. This server takes a query sequence as input, searches for closely related sequence template for the structure and aligns them [39]. Using that structure as template homology modeling was done, and the model was further validated by QMEAN value. It also provides Ramachandran plot to ensure the quality of the generated structure further. The template with maximum coverage and highest sequence identity was chosen. The native wild type protein structure and mutant protein residues' structures were generated. The mutant residues which had high conservation scores were generated for homology modeling.
Model validation. All the structures generated by homology modeling were validated by the tool PROCHECK. It is a standard tool to verify the stereochemical quality of a protein structure. It generates a Ramachandran plot to validate the structure with details of residues in the core and other allowed regions [40].
RMSD value and TM align value. The RMSD value associated with the mutant residues after superimposition with the native protein structure was calculated with PyMol, an opensource software to perform structural analysis [41].
TM align value is checked to see the structural dissimilarity between the native and wild type structure. A score of 1 means that there is no dissimilarity between the structures; a score < 0.2 means unrelated protein structures, whereas a score > 0.5 means the same fold [42].
Chemical property analysis by BIOVIA discovery studio visualizer. Further analysis of the mutant residue structures compared with the wild type structure was done by BIOVIA discovery studio visualizer, a structural analysis tool [43]. It is downloadable from the website (https://discover.3ds.com/discovery-studio-visualizer-download). It can help to visualize protein structures, residue solvent accessibility, polar and non-polar bonds, and analyze the difference between native and wild type residues. Specific SNPs were selected with a cumulative result of conservation score, solvent accessibility, structural/functional prediction by Consurf, RMSD value, TM align value, Mutation cluster prediction, respectively, and taken into further consideration.

Analysis of 5' and 3' UTR non-coding SNPs
Investigation of non-coding regions was done using the ENSEMBL database [44]. The 5' and 3' region SNPs were filtered out. Minor allelic frequency (MAF) value of � 0.001 was selected only. Later the SNPs were run in Regulome DB, which relates SNPs to regulatory elements of the human genome [45]. It gives a ranking based on DNA binding, and also provides Chip data, chromatin states, and motifs, if available. Information of our gene was checked in the PolymiRTS database-a server to predict naturally occurring DNA variation in miRNA target sites, mainly in the 3' UTR region [46]. The results are given in 4 classes D, O, C and N with a context and conservation score along with miR ID and miR target site.

Gene-gene interaction analysis
Gene MANIA server was used to correlate the target PLCG1 gene with functionally similar genes and further analyze the interactions among them [47]. Currently, it supports six organisms with datasets collected from GEO, BioGRID, Pathway commons, I2D etc. Ensemble was used as the primary identifier. Interaction data available between the genes include physical interaction, co-expression, co-localization, genetic interaction.
An outline of the methodology used in this study has been summarized (Fig 1).

SNP datasets
SNPs of the PLCG1 gene were retrieved from the dbSNP database. Primarily, 11096 SNPs were found, but after applying the missense filter, 745 SNPs were retrieved (https://www.ncbi.nlm. nih.gov/snp/?term=PLCG1). Later, protein isoform P19174-1 was selected for the current study, and its sequence was retrieved from the UniProt database to perform the analyses.

Detection of deleterious and functional SNPs
After running the 745 SNPs in the SIFT tool, the result was filtered with our target protein ID and 74 SNPs were obtained. The 74 SNPs were then run in PolyPhen-2, PROVEAN, and PAN-THER tools. After combining the results, 16 SNPs were found to be deleterious in all the tools (Table 1). SNPs having neutral or non-deleterious results in any of the tools were not selected for further analyses (Fig 2).

Disease related SNPs
The 16 deleterious SNPs were then run in three tools: PhD-SNP, Pmut and SNPS&GO to predict if the mutations can be related with diseases or disease associated conditions. Out of the 16 mutations, 13 SNPs showed disease-causing effects in all the three tools (Table 2). Again, any SNP having neutral result in any of the tools mentioned above was not selected for further analyses.

Prediction of change in protein stability
The 13 disease-associated SNPs were put in I-mutant 2.0 and MuPro to check their effect on protein stability. All the SNPs showed decreasing protein stability in MuPro server, but three SNPs Y210C, A401V and L455F showed increasing stability in the I-mutant 2.0 server ( Table 3).

Prediction of the molecular mechanism of pathogenicity
The 13 common SNPs were run in MutPred2 server to check protein stability alteration capability and molecular effect of the mutations. Out of them, 11 SNPs showed a satisfactory result

PLOS ONE
Approach to predict high-risk coding and non-coding SNPs of human PLCG1 gene within the threshold range. The functional impacts included altered stability, loss of DNA strand, altered metal binding, gain of helix, loss of phosphorylation sites, altered ordered interface, and gain of relative solvent accessibility. Details of the result along with p value and prediction score are given ( Table 4).

Prediction of post-translational modification
ModPred server provided significant results for two of the SNPs: Y210C and R1158H. Both the SNPs had post-translational modification in the native and mutant residue. Y210C showed proteolytic cleavage in the native residue and amidation modification in the mutant residue. R1158H had proteolytic cleavage in both mutant and native residues (Table 5).

Sequence conservation and solvent accessibility
All the 13 SNPs had good conservation scores in Consurf, but only the ones with score 8 and 9, and prediction of effect in MutPred2 server were chosen for further structural analysis. Six SNPs (L411P, R355C, G493D, R1158H, A401V and L455F) scored 9. Among them, L411P, G493D, A401V and L455F were shown to have possible structural effects as they were highly conserved and buried. The rest two SNPs were shown to have a possible functional effect as they were highly conserved exposed residues. Netsurf 2.0 showed contradictory results in three SNPs. R355C and R1158H were shown as buried residues instead of exposed shown by Consurf. In Netsurf 2.0, Y210C was shown as exposed residue, unlike Consurf where it was shown as buried. The details are given (Table 6), and the conservation score prediction figure is given (S1 Fig).

Mutation cluster prediction
The 13 disease-causing SNPs were used to predict the mutation clusters. Mutation 3D showed a cluster in the PI-PLC-X domain consisting of three substitutions L411P, L455F, and A401V. There can be other clusters but not shown in the prediction tool because of the unavailability of the whole structure in the tool's database. While doing structural analysis, this criterion was taken into account. The result is given along with other structural information (Table 7).

Structural analysis
Homology modeling. Eight SNPs (L411P, R355C, G493D, R1158H, A401V, L455F, A816P and R601Q) were chosen for structural analysis (Fig 2). 6pbc.1. A template (X-ray structure) was used to generate our native and mutant protein structures in the SWISS--MODEL server. It had 97.19% sequence identity and 91% coverage. All the targeted SNPs were in the covered region. The native structure of the protein is shown (Fig 3). Model validation. All the structures generated from SWISS-MODEL were given in the PROCHECK tool. It showed almost 90% residues in the core region for all the structures. The results of core region residues are given (Table 7), and the quality assessment of structure and data are given (S3-S11 Figs).

RMSD value and TM align value.
The RMSD values of the eight SNPs were calculated by PyMol software. Among them, five SNPs (L411P, L455F, R355C, G493D and A816P) showed a high deviation. Their TM-align value was also checked, and all five SNPs showed fluctuation in their property. The results are given (Table 7).
Chemical property analysis by BIOVIA discovery studio visualizer. A filtration of SNPs was done for further analysis of our protein structures. Total hydrogen bonds of all the eight SNPs were generated by the BIOVIA discovery studio visualizer (Table 7). Then taking account of RMSD value, TM align value, total hydrogen bonds, and mutation cluster prediction, three SNPs (G493D, L411P and L455F) were chosen for further chemical analysis. The three SNPs had RMSD values of 2.423Å, 0.096Å, and 1.973Å, respectively. They had TM align

PLOS ONE
Approach to predict high-risk coding and non-coding SNPs of human PLCG1 gene value showing differences in the structures and the hydrogen bonds compared to the wild structures. L411P and L455F showed mutation cluster in the prediction by Mutation 3D (Table 7). Three SNPs showed changes in hydrophobicity and number of hydrogen bonds, after further analysis by BIOVIA discovery studio visualizer ( Table 8). The mutant protein structures of the three SNPs are given (Fig 3b-3d). The intermolecular bonds generated by the wild type and mutant structures of the three SNPs-G493D, L411P and L455F, are shown respectively (Figs 4-6). Finally, the comparative superimposed structures showing hydrogen bonds and their difference in numbers and angles are shown (Fig 7).

Analysis of 5' and 3' UTR non-coding SNPs
After setting the MAF filter of �0.001, 65 SNPs were found in the Ensemble database. In Regulome DB only the SNPs with ranking < 4 were taken into consideration, and nine SNPs were chosen. The rankings along with probability score and Chip data are given (Table 9).

PLOS ONE
Approach to predict high-risk coding and non-coding SNPs of human PLCG1 gene In the S1 section, data of all the SNPs of Regulome DB generated from ENSEMBLE has been given (S7 Table). PolymiRTS database provided data with miRNA target sites for PLCG1 gene. Among the SNPs which provided results in Regulome DB, two SNPs rs139043247 and rs62621919 also provided result in the PolymiRTS database. rs139043247 has two alleles G and A in the database with class of D and C, respectively in all their target sites. All the target sites came with negative context scores and a high conservation score. rs62621919 has two alleles G and A with class of D and C, respectively in their target sites along with negative context scores (S4-S6 Tables).

PLOS ONE
Approach to predict high-risk coding and non-coding SNPs of human PLCG1 gene

Discussion
The target gene, human PLCG1 produces the protein PLCγ1, which consists of an N-terminal PH domain followed by EF hands, TIM barrel (X and Y), and a C-terminal domain C2. There is an insertion of two parts of another PH domain between the TIM barrel catalytic domain.

PLOS ONE
Approach to predict high-risk coding and non-coding SNPs of human PLCG1 gene The two parts of PH2 domain are further split into two SH domains and one SH3 domain [48]. It is a monomer with two isoforms found in the human body (P19174-1, P19174-2). Our selected isoform P19174-1 had 1290 amino acid residues in it, and all the SNPs predicted to be damaging or having mutations with functional effects are scattered in these domains. The prediction of nsSNPs has been very significant in recent years as these mutations have been related to several diseases, and computational approach has become a successful way to predict them quite efficiently [17][18][19][20]. As no in silico analysis has been done to date to predict deleterious nsSNPs and possible functional non-coding SNPs associated with our target gene PLCG1, the purpose of this analysis was to find out possible nsSNPs and non-coding SNPs which can affect the functionality of the protein molecule.
Several tools were used to predict the probable damaging effect of nsSNPs of PLCG1 gene. At first, the nsSNPs gathered from the dbSNP database were filtered out according to the prediction of their functionality. Using four tools SIFT, PROVEAN, PolyPhen-2 and PANTHER, 16 SNPs were considered to have deleterious effects. These tools generally use the idea of finding the more conserved residues to predict the effect. Similar tools that predict mutations associated with diseases are PhD-SNP, Pmut and SNPS&Go. These tools filtered out three SNPs, and thus, 13 SNPs were finalized for further study. From the Uniprot database, the following information was obtained: I109T substitution is from Domain PH1 (27-142); R355C, A401V, L411P and L455F substitutions are from PI-PLC X box domain (320-464); G493D substitution is from first part of PH2 domain (489-523); R601Q substitution is from one of the SH2 domains (550-657); A816P substitution is from SH3 domain (791-851) "R1105L, P1152A, D1075V and R1158H" substitutions are from the C terminal C2 domain (1071-1194). These are important details as different domains are associated with different activities, and nsSNPs of these domains may alter their structures and activities. The four nsSNPs of the X box domain are the most significant ones as this domain is part of catalytic activity [48]. On the other hand, the C2 domain is involved in calcium-binding of the protein and subcellular localization, so nsSNPs of this domain can be considered important also [49]. SH2 domain is crucial for cancer cell cycle progression, [16] as a result, R601Q can be a significant nsSNP. From the cBioPortal database, it was found out that G493D and R355C mutations were found in patient samples [50]. The G493D mutation was found in Uterine mixed endometrial carcinoma patient sample, and the R355C mutation was found in the Leiomyosarcoma patient sample. The link of the result is given. (https://www.cbioportal.org/results/mutations?cancer_ study_list=5c8a7d55e4b046111fee2296&case_set_id=all&gene_list=PLCG1).

PLOS ONE
After finalizing the 13 SNPs, the effect of these SNPs on protein stability was checked. Decreasing protein stability with the effect of substitution indicates the possible effect of SNPs on proteins [32]. The Gibbs free energy is directly related to protein stability. A value <0 indicates decreasing protein stability [51]. Almost all the proteins showed decreasing result for both the tools-I-mutant 2.0 and MuPro. ModPred predicted PTM sites in Y210C and R1158H for native and mutant residues. Y210C had proteolytic cleavage for its native residue, which is a very important modification as it can produce irreversible post-translational modification leading to a permanent alteration of protein function [52]. The server predicted amidation for its mutant residue cysteine, which can alter the localization and stability of the protein. It can also affect the sensitivity of the protein to surrounding pH, enhanced signaling, and binding to receptors [53]. R1158H had proteolytic cleavage prediction for both mutant and native residues, which does not vary, but cleavage in different positions in different cases may still change the type of alteration. Four SNPs (I109T, L411P, A401V, and R1158H) can lead to the altered metal-binding site according to MutPred2 server, which can be significant as this protein generally does not show the metal-binding property. A reason for the prediction can be their presence in N-terminal, C-terminal and catalytic domains of the protein. A gain of loop was seen for the substitution A816P which also has altered transmembrane function. A loop in structure can change the intrinsic functionality of protein along with their transmembrane property [54,55]. A816P also gains relative solvent accessibility becoming exposed from buried, making it more available to have active site activity [56]. Altered transmembrane property was also predicted with substitutions G493D, P1152A and R1158H. G493D had a loss of strand and a gain of loop property, which can explain this [54,55]. Y210C can lose its function with the loss of phosphorylation sites.
Conservation analysis further confirmed the pathogenicity of eight SNPs with high conservation score. Solvent accessibility analysis showed that L411P, G493D, A401V and L455F substitutions are both highly conserved and buried. Buried residues are generally located in the core protein, and substitution in them affects the protein function mostly [56].
Homology modeling was done with a template. Then the wild type structure was used to calculate RMSD value and TM align value to check the change in the 3D protein structures among the wild type and mutant residues. A higher RMSD value indicates more deviation in the structure between wild type and mutant protein structures [39]. TM align values of >0.5 and <1 show dissimilarity in the structures. Three substitutions G493D, L411P and L455F were finalized for further structural analysis by BIOVIA discovery studio visualizer according to these data. G493D and L411P showed a considerable number of changes in hydrogen bonds (Table 7). L411P and L455F were predicted to have a somatic mutation cluster according to Mutation 3D tool.
G493D is part of the PH2 domain, and glycine is changed to aspartic acid. Glycine is a strong hydrophilic amino acid, and analysis showed the change in hydrophobicity while it converts to aspartic acid (Table 8). Strong hydrophobicity can induce a change in binding capacity and interaction of the protein with other molecules [56,57]. There is an increase in the number of hydrogen bonds, which can be the reason for a change in the free energy value, thus changing protein stability [51]. In the glycine residue, there are two hydrogen bonds with phenylalanine with a distance of 2.87Å and 2.94 Å (Fig 4a). These distances are 2.86Å and 3.01Å, respectively, with the mutant residue aspartic acid, which will indeed affect the Gibbs free energy [51]. Two extra hydrogen bonds are created with isoleucine and tryptophan (Fig  4b). The two new bonds formed have a distance of 2.72Å (isoleucine) and 2.9Å (tryptophan) from glycine. L411P mutation showed the amino acid change from leucine to proline, indicating a huge change in hydrophobicity (Table 8). Being changed from hydrophobic to hydrophilic can be a reason for significant structural change in the protein. A decrease of alkyl hydrophobic bond can be a reason behind this (Fig 5). Also, a leucine-leucine hydrogen bond is lost, decreasing the number of hydrogen bonds from two to one (Fig 5). The distances of other hydrogen bonds with lysine residue are 2.84Å (native) and 2.96Å (mutant). Finally, in the L455F substitution, one hydrogen bond increases with the mutation (Fig 6). Bond formation with lysine and serine remains the same with the distance of 3.24 Å and 3.26 Å in the wild type respectively and 3.34 Å and 3.17 Å in the mutant structure respectively. A new bond with phenylalanine is seen (3.17 Å). There is an apparent change in the angles of all the hydrogen bonds shown (Fig 7). As L411P and L455F are from the catalytic domain, these SNPs can be highly damaging to protein function as well.
GENEMANIA results showed FYN and ITK genes interact with PLCG1. These genes are involved in the T cell mediated pathways, the same as PLCG1 and are related to diseases like Adult T cell leukemia [58,59]. Involvement of these two genes with PLCG1 can be a subject for further study. The results also showed that PLCG1 shares domains with the following genes: ITK, CBL, PLCG2, FYN and HCK, which may allow them to offer similar functions. PH domain is common between (PLCG1 and ITK proteins which show interactions in the GENE-MANIA analysis (S2 Fig). This conserved mammalian domain is responsible for the interaction between ITK and phosphoinositide 3-kinase (PI 3-kinase, PI3K) which in turn is the key player in lymphocyte differentiation and activation [60]. Computationally predicted functionally and structurally deleterious SNPs located in these regions could thus play an important role in this interaction (Fig 8). Protein-Protein interaction was seen between the SH2 domain of PLCG1 and ERBB2 which regulates protein tyrosine kinase. Catalytic domain PI-PLC-X box domain is also seen in PLCL1 protein which monitors GABA mediated synaptic inhibition [61]. FGFR1 which showed all possible interactions in GENEMANIA network with PLCG1, executes an engrossing complex activity with PLCG1. Through the binding of SH2, C2, and catalytic domain, they upregulate the status of these two proteins [62]. Our in silico analysis found that four SNPs located at the PI-PLC-X box domain (R355C, A401V, L411P and L455F, Fig 8) are functionally and structurally deleterious. Thus, these SNPs could potentially impact its functions. However, these findings should further be validated in laboratory experiments.
Among the non-coding SNPs, rs139043247 and rs543804707 showed the best result according to Regulome DB. They had a prediction of transcription binding sites, matched or unmatched motifs, and DNase footprint with DNase peak. rs139043247 also showed a significant result in the PolymiRTS database. Generally, the D and C classes with high conservation score and negative context score are the ones with the highest functional probable effect. Class D means the derived allele is disrupting a conserved site where class C means the creation of a new site [46]. This means there are high chances that the two SNPs rs139043247 and rs62621919 will affect the miRNA with probable mutations occurring in DNA.

Conclusion
Out of all the missense SNPs, 16 SNPs were found to have deleterious effects by SIFT, PolyPhen-2, PROVEAN, and PANTHER tools. Further, 13 SNPs were predicted disease associated with disease predicting tools-PhD-SNP, Pmut and SNPS&GO. Ten SNPs were predicted to decrease the stability of the protein. Six SNPs (L411P, R355C, G493D, R1158H, A401V, L455F) were predicted highly conserved. Among them, L411P, G493D, A401V, L455F were predicted as the most significant ones with possible structural effect. Two mutations Y210C and R1158H had post-translational modification prediction in both wild type and mutant residues. Three SNPs L411P, G493D and L455F showed a promising structural change in the protein structure. R355C and R601Q mutations can also be important as they are part of domains that have shown previous relations with diseases. Among the non-coding region SNPs, rs139043247, rs543804707, and rs62621919 showed possible pathogenicity to interact with certain diseases and affect the functions of miR-NAs. Further study of the gene PLCG1 is highly necessary with the help of the data generated from the current study. The mentioned SNPs can be related to specific diseases mentioned earlier, especially with specific types which have been found related to the gene. Nevertheless, this is a computational study, and there will always be limitations regarding the analysis. So, there needs to be more in vivo researches with these data to prove their authenticity. Albeit, the study provided salient information by shedding light on the high-risk coding and non-coding SNPs of the target PLCG1 gene to predict the possible diseases associated with the gene which will eventually help the researchers to find out a proper treatment plan to cure the disease-associated conditions. Supporting information S1