Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comprehensive characterization of coding and non-coding single nucleotide polymorphisms of the Myoneurin (MYNN) gene using molecular dynamics simulation and docking approaches

  • Sadia Islam Mou,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft

    Affiliation Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh

  • Tamanna Sultana,

    Roles Formal analysis, Investigation, Methodology, Validation

    Affiliation Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh

  • Dipankor Chatterjee,

    Roles Data curation, Investigation, Methodology, Writing – review & editing

    Affiliation Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh

  • Md. Omar Faruk,

    Roles Supervision, Validation, Writing – review & editing

    Affiliation Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh

  • Md. Ismail Hosen

    Roles Conceptualization, Project administration, Supervision, Writing – review & editing

    ismail.hosen@du.ac.bd

    Affiliation Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh

Abstract

Genome-wide association studies (GWAS) identified a coding single nucleotide polymorphism, MYNN rs10936599, at chromosome 3q. MYNN gene encodes myoneurin protein, which has been associated with several cancer pathogenesis and disease development processes. However, there needed to be a more detailed characterization of this polymorphism’s (and other coding and non-coding polymorphisms) structural, functional, and molecular impact. The current study addressed this gap and analyzed different properties of rs10936599 and non-coding SNPs of MYNN via a thorough computational method. The variant, rs10936599, was predicted functionally deleterious by nine functionality prediction approaches, like SIFT, PolyPhen-2, and REVEL, etc. Following that, structural modifications were estimated through the HOPE server and Mutation3D. Moreover, the mutation was found in a conserved and active residue, according to ConSurf and CPORT. Further, the secondary structures were predicted, followed by tertiary structures, and there was a significant deviation between the native and variant models. Similarly, molecular simulation also showed considerable differences in the dynamic pattern of the wildtype and mutant structures. Molecular docking revealed that the variant binds with better docking scores with ligand NOTCH2. In addition to that, non-coding SNPs located at the MYNN locus were retrieved from the ENSEMBL database. These were found to disrupt the transcription factor binding regulatory regions; nonetheless, only two affect miRNA target sites. Again, eight non-coding variants were detected in the testes with normalized expression, whereas HaploReg v4.1 unveiled annotations for non-coding variants. In summary, in silico comprehensive characterization of coding and non-coding single nucleotide polymorphisms of MYNN gene will assist researchers to work on MYNN gene and establish their association with certain types of cancers.

Introduction

Single nucleotide polymorphisms (SNPs) are the most prevailing forms of genome variation in the human genome, where multiple alleles can exist in some population(s), and the frequency of the least common allele must be at least 1%. They occur approximately every 300–400 base pairs away [1]. It has been reported that SNPs are associated with disease markers, disease susceptibility, and genomic evolution [2]. A high-throughput molecular biology technique called a genome-wide association study (GWAS) sheds light on the relationship between the frequency of single-nucleotide polymorphisms (SNPs) and other forms of genetic variants and specific phenotypes. In recent years, GWAS has led to the discovery of numerous genetic loci or regions associated with common diseases, including cancers [3, 4]. GWAS Catalog [5] has revealed that a non-synonymous polymorphism (rs10936599) at chromosome 3q, covering the MYNN gene, is correlated with colorectal cancer [6], telomere length [7], multiple myeloma [8], bladder cancer [9], and so on.

MYNN gene, located on the 3q26.1 chromosome, encodes a 610 amino acids long protein called myoneurin (isoform A) [10]. This protein mainly functions as a transcriptional repressor and belongs to the POK (Poxviruses and Zinc-finger (POZ) and Krüppel) family [11]. It is categorized by the existence of an amino-terminal POZ/ Broad Complex, Tramtrack, and Bric a’ brac (BTB) domain in addition to eight Kruppel-type zinc fingers at the carboxy-terminal moiety [10, 11]. The BTB/POZ domain mediates protein-protein interactions with transcriptional co-factors (corepressors, histone deacetylases) through homo-dimerization and hetero-dimerization. The recruitment of transcriptional corepressors and histone deacetylases induces heterochromatin formation, followed by inhibition of transcription activation. However, Krüppel-type zinc finger motifs are responsible for the DNA binding properties. This gene is associated with gene expression, cancer development, and tumorigenesis [11]. Additionally, it regulates BMP signaling [12], synaptic gene expression [13], skeletal muscle growth [10], etc.

Reportedly, rs10936599 is associated with shorter telomere lengths and biological ageing [14]. Moreover, there may be a significant correlation between the polymorphisms for Telomerase RNA Component (TERC) (rs2293607) and MYNN (rs10936599), which is responsible for elevated risk of colorectal cancer, colorectal adenomas [15], and bladder cancer [16]. Additionally, it has also impacted the elevated hazard of chronic obstructive diseases [17], chronic lymphocytic leukemia [18], cutaneous melanoma [19], and multiple sclerosis [20], etc. Despite the clinical significance of rs10936599, the molecular functions and structural mechanisms are not fully established yet. This study aimed to detect the effect of this single nucleotide polymorphism on the functional characteristics, structural mechanisms, and dynamic behavior of myoneurin protein. The insights of this study can contribute to the research and development of personalized treatments and medications.

Materials and methods

Retrieval of Non-synonymous SNPs (nsSNPs)

MYNN gene was selected for in-silico analysis from the literature study as it has been reported to be associated with several cancer development processes [8, 16, 18]. Then, we investigated the human MYNN gene in the ENSEMBL genome browser [21] (https://asia.ensembl.org/index.html) and selected the ENST00000349841.10 transcript encoding 610 amino acids long myoneurin protein. Missense variants were filtered using the global minor allele frequency (MAF) value (0.05–0.5). Moreover, the protein sequence was retrieved from UniProt [22] (https://www.uniprot.org/).

Functional consequence analysis of nsSNPs

Sort Intolerant From Tolerant (SIFT) (https://sift.bii.a-star.edu.sg/) was employed to detect the deleteriousness of nsSNPs. SIFT can distinguish the deleterious and neutral effects of amino acid substitutions in nsSNPs and missense mutations based on physical characteristics and sequence homology of amino acids [23]. It utilizes multiple sequence alignment to obtain normalized probability scores for all substitutions. A score <0.05 is considered a deleterious substitution.

Polymorphism Phenotyping v2 (PolyPhen-2) (http://genetics.bwh.harvard.edu/pph2/) is a publicly accessible web server for predicting the structural and functional consequences of amino acid substitutions [24]. Variants with PolyPhen-2 score of (0.0–0.15) are considered benign, (0.15–1.0) as possibly damaging, and (0.85–1.0) as damaging.

The Rare Exome Variant Ensemble Learner (REVEL) (https://sites.google.com/site/revelgenomics/) is an ensemble method for detecting the pathogenic nsSNPs based on tools, namely MutPred, PolyPhen, FATHMM, SIFT, MutationAssessor, PROVEAN, and several ensemble methods. REVEL score ranges from (0–1) with a cut-off of 0.5 [25].

MetaLR (https://wglab.org/) distinguishes between neutral and damaging SNPs using logistic regression by providing a score between 0 to 1, where a score>0.5 indicates the damaging effect [26]. MutationAssessor (http://mutationassessor.org/r3/) is a web server that estimates the functional effect of missense polymorphisms and mutations based on evolutionary conservation in protein homologs. It produces a score ranging from 0 to 1. nsSNPs with higher scores are more likely to be pathogenic [27].

MutPred2 (http://mutpred.mutdb.org/), a machine learning-based method, estimates the pathogenicity and molecular alteration of single nucleotide polymorphisms by integrating genetic and molecular data [28]. MutPred2 generates a general score from the mean scores of the neural networks. A score cut-off of 0.50 denotes pathogenicity. Protein ANalysis THrough Evolutionary Relationships (PANTHER) (http://www.pantherdb.org/tools/) is a comprehensive, freely available database that employs phylogenetics to analyze protein sequences and determine their evolutionary links to other proteins [29]. It employs PANTHER-PSEP (Position-Specific Evolutionary Preservation) to anticipate how nonsynonymous coding single nucleotide polymorphisms may affect the functionality of proteins [30].

ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) is a public database of genetic variants and their clinical significance that gathers data from a variety of sources, such as clinical testing facilities, research projects, and the scientific literature, and disseminates knowledge regarding the associations between genetic variants and diseases or other health issues [31]. PON-P2 (http://structure.bmc.lu.se/PON-P2/) is a machine learning-based tool that has been developed for the classification of amino acid substitutions in human proteins, utilizing the evolutionary conservation of sequences, the physical and biochemical properties of amino acids, Gene Ontology (GO) annotations, and functional annotations of variation sites [32].

Protein-protein interaction

NetworkAnalyst (https://www.networkanalyst.ca/) was employed for predicting protein-protein interaction. With the aid of NetworkAnalyst, generic PPI networks, cell-type or tissue-specific PPI networks, gene regulatory networks, gene co-expression networks, networks for toxicogenomics and pharmacogenomics studies, and networks for gene co-expression profiling can be built [33]. Additionally, gene ontology (biological process, molecular function, and cellular component) data were retrieved from NetworkAnalyst, and the gene ontology plot was generated using the ggplot2 package in R programming [34].

Structural analysis

To analyze the structural impact of missense variants, we used the HOPE web tool [35] (https://www3.cmbi.umcn.nl/hope/), an automatic mutant server. It integrates data from various sources, namely genetic annotations from the UniProt database, prediction models from DAS services, protein’s structural coordinates from WHAT IF web services, and homology models from YASARA.

Mutation3D (http://www.mutation3d.org/) is a new algorithm and web server that uses a 3D clustering approach to analyze the distribution of amino acid substitutions within tertiary protein structures [36]. ConSurf [37] (https://consurf.tau.ac.il/consurf_index.php) is a publicly accessible bioinformatics tool to estimate the evolutionary conservancy of amino acid substitution, using either an empirical Bayesian method [38] or a maximum likelihood method [39]. The conservation scores provide a relative indicator of evolutionary conservation, where the lowest conservation score denotes the most conserved position in the sequence. The analysis was carried out with the default parameters.

CPORT (Consensus Prediction Of interface Residues in Transient complexes) (https://alcazar.science.uu.nl/services/CPORT/) is a consensus method that combines six interface prediction web servers to predict interface residues in protein-protein complexes [40]. It generates more stable and reliable predictions than individual predictors alone and competitive results with the ab initio methods. CPORT was employed to detect the active residues in protein-protein or protein-ligand complexes.

Secondary structure prediction

SOPMA (Self-Optimized Prediction method With Alignment) (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html), a bioinformatics tool, was utilized for predicting the secondary structure of the protein [41]. Based on the homologue model [42], it generates a secondary structure with 73.2% accuracy.

3D structure modeling

I-TASSER (https://zhanggroup.org/I-TASSER/), a KU-developed bioinformatics tool for predicting protein structure, was used to model tertiary structure [43]. Based on the significance score of various threading templates and clustering density, the program calculates the C-score to measure the accuracy of the predictions. The produced structures were refined using GalaxyWEB (https://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE) [44]. It is a server for refining protein structures based on the ab initio method.

Structural models assessment

The improved structures were validated by several structure validation programs, such as PROCHECK (SAVES v6.0) [45] (https://saves.mbi.ucla.edu/), ProSA-web [46] (https://prosa.services.came.sbg.ac.at/prosa.php), and Structure Assessment—SWISS-MODEL [47] (https://swissmodel.expasy.org/assess). A protein structure can be evaluated for its stereochemical quality using the PROCHECK suite. Besides, Z-score is displayed by the ProSA tool (Protein Structural Analysis) for model evaluation.

Further, RMSD and TM-score between the wildtype and variant structure were estimated using TM-align (https://zhanggroup.org/TM-align/), a bioinformatics tool for protein sequence alignment [48] and pyMOL [49] (https://pymol.org/2/).

Molecular docking

The mutant and the wildtype structure were subjected to molecular docking with a target protein. As a negative control, two independent ligands were also docked against these protein structures. The docking was performed using the HDOCK server [50] (http://hdock.phys.hust.edu.cn/). This server is designed to estimate the protein-protein or protein-nucleic acid binding complexes based on a hybrid approach of ab initio and template-based modeling. The predicted complexes were visualized using PyMOL and Biovia Discovery Studio [51] (https://discover.3ds.com/discovery-studio-visualizer-download).

Molecular dynamics

GROMACS (version 2020.6) simulation software (https://www.gromacs.org/) was employed to conduct 100 nanoseconds Molecular Dynamics simulations for both the wildtype and variant models [52]. The simulation chose GROMOS96 43a1 force-field. The spc216 water model was deployed to build a water box with edges of 0.5 nm from the protein surface. We employed the proper ions to balance the systems. Following energy minimization, isothermal-isochoric (NVT) equilibration, and isobaric (NPT) equilibration of the system, a 100 nanoseconds molecular dynamics simulation with periodic boundary conditions was carried out. The 100 picoseconds snapshot interval was specified in order to analyze the trajectory data. The GROMACS software’s integrated rms, rmsf, gyrate, and sasa modules were used to perform the root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), and solvent accessible surface area (SASA) studies once the simulation was performed. Using the ggplot2 program in RStudio, the plots for each of these experiments were generated.

Functional analysis of non-coding SNPs

Non-coding SNPs (introns, 5’ UTR, 3’UTR) were retrieved from the ENSEMBL database by filtering a MAF value of 0.05–0.5. These non-coding SNPs were analysed in RegulomeDB (https://regulomedb.org/regulome-search), a database that provides comprehensive annotation of genetic variants in the non-coding regions of the human genome [53]. Furthermore, the annotated SNPs proceeded for analysis in GTEx Portal [54] (https://gtexportal.org/home/). The Genotype-Tissue Expression (GTEx) project is an extensive free-access repository to study tissue-specific gene expression and regulation.

Moreover, the functional importance of the non-coding SNPs was detected by employing HaploReg v4.1 [55] and PolymiRTS [56]. HaploReg (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php) is a publicly accessible bioinformatics tool to investigate non-coding genomic annotations at variations on haplotype blocks, like potential regulatory SNPs at genetic disorder loci. The polymorphism in microRNA target site (PolymiRTS) (https://compbio.uthsc.edu/miRSNP/) is a comprehensive database that provides information about genetic polymorphisms (SNPs) in microRNAs (miRNAs) and their target sites.

A schematic representation of the workflow of this study is provided in Fig 1.

Results

nsSNP data retrieval

From the ENSEMBL database, only one nsSNP (rs10936599) was obtained from the ENST00000349841.10 transcript with a MAF value of 0.27. Interestingly, this particular SNP has also been found for the MYNN gene in the GWAS Catalog [5], a curated genome-wide association study database. In this analysis, we focused on the G allele of this variant, where histidine is replaced with glutamine at position 6.

Results of functional consequence prediction

The functional impact of rs10936599 was assessed in nine bioinformatics-based web tools. All these tools predicted that this specific amino acid substitution at position 6 affects the function of myoneurin protein (Table 1). The prediction scores of these tools are represented in Fig 2.

thumbnail
Fig 2. Deleterious effect of rs10936599 in several web tools.

https://doi.org/10.1371/journal.pone.0296361.g002

thumbnail
Table 1. Functional predictions of rs10936599 from nine bioinformatics tools with threshold levels.

https://doi.org/10.1371/journal.pone.0296361.t001

Analysis of MYNN (myoneurin) interaction

NetworkAnalyst demonstrated that four proteins (UBC, PAK1, COPS5, and ELAVL1) interact with MYNN (Fig 3). These proteins are associated with numerous pathways, including gene expression, regulatory processes, cancer development, and cancer metastasis. It also revealed that this gene is significantly associated with 63 biological pathways, including the JNK cascade, MAPK cascade, cellular metabolic processes, hypoxia, etc. (S1 Table). Regarding molecular function, MYNN is involved in enzyme binding, kinase binding, nucleotide binding, etc. The most significant GO terms in cellular components are the nucleus, cytosol, sarcomere, etc. The top significantly enriched terms of biological process, molecular function, and cellular components of gene ontology analysis are visualized in Fig 3.

thumbnail
Fig 3.

A) Interaction of MYNN with other cellular proteins. B) Significant GO terms associated with MYNN.

https://doi.org/10.1371/journal.pone.0296361.g003

Effect of rs10936599 on the structure of the protein

Analysis of structural modifications.

Amino acid substitution from histidine to glutamine at position 6 was checked in the HOPE server. This server predicted that the variant residue is smaller than the wildtype, which can affect potential external interactions. Also, the wildtype amino acid seems highly conserved at this position, and this particular mutant residue is not present in homologous proteins. It suggests that the variant hardly results without affecting the protein. Furthermore, the MetaRNN score of the substitution is 0.827, indicating that rs10936599 is more likely to be pathogenic. The altered residue is found outside a domain without known function and nearby Skp1/Btb/Poz Domain Superfamily. This residue rarely interacts with any known domain but potentially affects interaction with others. The 3D structure gathered by the HOPE server is represented in Fig 4.

thumbnail
Fig 4.

A) Demonstration of the protein used in ribbon display. The side chain of the mutant residue is represented as little balls and is highlighted magenta, along with the protein, which is highlighted grey. B) Close-up of the substitution, where the protein is shaded grey along with the side of wildtype and mutant amino acid in green and red, respectively.

https://doi.org/10.1371/journal.pone.0296361.g004

Domain identification in tertiary structure.

Mutation 3D revealed that myoneurin protein consists of two known domains: BTB domain and zf-H2C2_2 domain. BTB domain, involved in transcription regulation, ion channel, cytoskeleton dynamics, etc. [57], spans from position 14 to 118. Nevertheless, the other domain, a zinc finger associated with cancer development [58], stretches from amino acid 372 to 398. Additionally, our mutation of interest was found proximal to the BTB domain.

Analysis of conservancy and active residues of the protein.

According to ConSurf, position 6 of the MYNN protein sequence is a highly conserved, exposed, and functional residue (Fig 5). It indicates that polymorphism at this position is deleterious for the function and structure of the protein. CPORT also disclosed that position is among the active residues of the protein.

thumbnail
Fig 5. Visualization of conservational analysis in ConSurf.

https://doi.org/10.1371/journal.pone.0296361.g005

Impact of rs10936599 on protein secondary structure

SOPMA unveiled the comparative secondary structures of wildtype and nsSNP (S1 Fig). The wildtype structure consists of 30.16% (184 residues) alpha helix, followed by 16.39% (100 residues) extended strand and 6.89% (42 residues) beta-turn. However, the variant structure contains 30.66% (187 residues), 16.56% (101 residues), and 6.23% (38 residues) of alpha helix, extended strand, and beta-turn, respectively. Both of the structures contain 46.56% (284 residues) random coil. Also, the substituted amino acid is located at the alpha helix region. Apparently, there is a difference in both structures, which might cause some functional differences.

Tertiary structure analysis through model simulation

I-TASSER generated tertiary structures for wildtype and nsSNP, using fold recognition or protein threading method with C scores of -3.78 and -3.91, respectively (Fig 6). Usually, the C-score lies between [–5,2], where a higher C score implicates higher confidence [43].

thumbnail
Fig 6.

A) Model structure of wildtype protein. B) Model structure of nsSNP protein. C) Superimposed display of wildtype and variant structure, where wildtype is colored in green and variant in purple. D) Superimposition of the mutated amino acid position in both models. The wildtype structure is shaded in green and nsSNP in purple.

https://doi.org/10.1371/journal.pone.0296361.g006

Post refinement in GalaxyWEB, these models were evaluated in PROCHECK, ProSA, Structure Assessment—SWISS-MODEL, TM-align, and PyMOL. Several quantitative scores from these tools are listed in Table 2. Scores of Ramachandran favored regions are 81.1% and 80.7% for the wildtype and variant models. The Ramachandran plots and Z-score plots for both the native and variant models are provided in S2 Fig. Notedly, there has not been found any template or structure for myoneurin protein in RCSB PDB [59] or any other database. Hence, I-TASSER couldn’t fulfill all requirements for protein threading. However, the RMSD value between the two models is 5.968, which implies a significant deviation between both structures. Moreover, the TM-score of 0.84197 indicates structures are roughly in the same topology.

thumbnail
Table 2. Quantitative scores for evaluating modeled structures.

https://doi.org/10.1371/journal.pone.0296361.t002

Molecular docking analysis

Potential ligands for MYNN were retrieved from several databases [22, 60] and literature studies [6163]. It was found that NOTCH2 potentially interacts with MYNN [63]. Hence, MYNN protein (myoneurin) was subjected to blind docking to estimate the change in protein-protein interaction. The PDB structure of NOTCH2 was retrieved from RCSB PDB under 5MWB PDB ID. Following docking, the top 10 models for each complex were generated in the HDOCK server. From those models, two compatible models were selected for comparison (Fig 7).

thumbnail
Fig 7.

Visualization of the molecular docking complexes of A) wildtype with NOTCH2 B) nsSNP with NOTCH. Here, variant and wildtype structures are shaded in grey, whereas NOTCH is highlighted in green. Ligand interactions in C) wildtype and NOTCH2 complex D) variant and NOTCH2 with hydrogen bond donor/acceptor surface.

https://doi.org/10.1371/journal.pone.0296361.g007

The docking results revealed that the docking scores for wildtype and mutants are -254.7 and -269.55, with confidence scores of 0.8901 and 0.9161, respectively (Table 3). It implies that the mutant binds with NOTCH2 with a higher affinity than the wildtype protein. Additionally, two independent ligands (Acetaminophen and Adderall) were docked with wildtype and variant structures to detect whether these models form non-specific interactions with random ligands. These ligands showed poor docking scores with low confidence scores, indicating these ligands are unlikely to bind with both protein structures.

Analysis of dynamic characteristics

Root Mean Square Deviation (RMSD) is calculated to assess the systems’ stability. A higher RMSD value indicates the unstable nature of the protein. The variant seemed to stabilize the protein structure here since the wildtype had a greater RMSD than the variant.

The regional flexibility of the protein is evaluated using the Room Mean Square Fluctuation (RMSF) method. The flexibility of a specific amino acid site increases with RMSF. Compared to the variant MYNN, the residues in the wildtype MYNN protein were generally more flexible.

The degree of compactness is measured by using the radius of gyration. Protein folding is stable when the radius of gyration is relatively constant. The radius of gyration fluctuation implies protein unfolding. With the mutant protein, the radius of gyration drastically decreased, suggesting that it folded quickly. The wildtype MYNN, on the other hand, had a much larger gyrating radius.

In MD simulations, Solvent Accessible Surface Area (SASA) anticipates the stability of proteins’ hydrophobic cores. The probability of protein instability due to solvent accessibility increases with increasing SASA score. SASA levels were higher in the wildtype MYNN than in the variant structure. The results of MD simulations are presented in Fig 8.

thumbnail
Fig 8. RMSD, RMSF, Radius of gyration, and SASA analysis of wildtype MYNN (blue) and variant MYNN (yellow) protein following molecular dynamic simulations.

https://doi.org/10.1371/journal.pone.0296361.g008

Analysis of functional consequences of non-coding SNPs

A total of 18 non-coding SNPs were retrieved from ENSEMBL. Among them, 14 were intron variants, and four were 3 prime UTR variants (S2 Table).

RegulomeDB generated regulome ranks and regulome scores for these polymorphisms to predict the functionality of these SNPs (Fig 9). Most of these SNPs were located at transcription factor binding or DNase peak (Rank 5), followed by motif hit (Rank 6) and transcription factor binding + any motif + DNase peak (Rank 3a).

thumbnail
Fig 9. Demonstration of the number of non-coding SNPs located in various regulome ranks.

Here, 3a, 5, and 6 denote TF binding + any motif + DNase peak, TF binding or DNase peak, and motif hit, respectively.

https://doi.org/10.1371/journal.pone.0296361.g009

These SNPs proceeded for further analysis in GTEx Portal. Among these, eight SNPs were detected at the testis with normalized effect sizes ranging from 0.28–0.35 (Table 4). Single tissue Expression quantitative trait loci (eQTL) violin plots are illustrated in S3 Fig. Notedly, other genes also showed tissue-specific eQTLs other than MYNN.

thumbnail
Table 4. Single tissue eQTL prediction for non-coding SNPs.

https://doi.org/10.1371/journal.pone.0296361.t004

These non-coding single nucleotide polymorphisms were assessed in PolymiRTS to detect if these amino acid substitutions affect any miRNA target site. Only two SNPs (rs1920123 and rs75277808) were unveiled to affect miRNA regions. rs1920123 seems to disrupt a conserved target site, whereas rs75277808 happens to create a novel target site.

HaploReg v4.1 was employed to analyze non-coding genomic annotations at variants. Annotations for a total of 11 variants were discovered for the MYNN gene. Among them, eight were intronic variants, and the remaining three were 3’-UTR variants. Annotations for all of these SNPs are reported in Table 5.

Discussion

MYNN gene encodes myoneurin protein, which is highly expressed in neuromuscular junctions and involved in regulating muscle attachment and neuromuscular networks [64]. Single nucleotide polymorphism of MYNN, rs109365 has an impact on the telomere length [14, 64], gene expression [11], developmental processes [12], and several cancer development processes [6, 15, 16, 18, 19, 65]. C allele acts as the ancestral allele in rs10936599, whereas minor alleles are the T allele with a global MAF value of 0.27 or the G allele [66]. Previously, it has been reported that the CC genotype entails a higher risk of bladder cancer [9, 16], colorectal cancer [6, 15], and multiple myeloma [8]with higher odds ratios. Nevertheless, the T allele demonstrates a relatively protective polymorphism with decreased odds ratios for bladder cancer [16], colorectal cancer [67], and telomere length [7]. In this study, the objectives were to discover the functional and structural alterations in myoneurin protein owing to rs109365599 (G allele) and how it impacts the susceptibility to associated diseases.

Bioinformatics tools and approaches are preferred for converting large-scale and complicated biological datasets into relevant and valuable information [68] because of the more straightforward and time-saving techniques [69]. To assess the functional impact of nsSNP, a comprehensive analysis was conducted by employing several in silico tools and methods. Each prediction tool uses an exclusive algorithm with a specified degree of precision for locating harmful SNPs, strengthening the prediction analysis. These tools address sequence homology, physiological features, and genetic, molecular, and statistical data and ensure the highest accuracy. A total of nine bioinformatics tools were used for predicting functional alterations, and all of the tools revealed that this amino acid substitution significantly disrupts the normal function of the protein.

For a better comprehension of the significance of the MYNN gene, protein-protein interaction was assessed in NetworkAnalyst. It revealed that myoneurin interacts with ubiquitin C, COP9 Signalosome Subunit 5, P21 (RAC1) Activated Kinase 1, and ELAV-like protein 1. Additionally, gene ontology analysis was performed to categorize the biological processes, cellular components, and molecular functions related to this gene. It was observed that myoneurin, majorly located in the nucleus, is significantly involved in numerous signaling and regulatory pathways, namely the JNK cascade, MAPK cascade, cell cycle, transcription, etc. It’s also linked to biological functions like enzyme binding, transcription regulation, translation initiation, etc. Hence, presence of single nucleotide polymorphisms might disrupt these cellular functions and processes.

Furthermore, to determine the general physiological and functional alterations due to the point mutation, nsSNP was subjected to analysis in the HOPE server. It unveiled that the desired SNP decreases the size of the protein, interrupting external interactions. The amino acid alteration modifies the structure of the protein and suggests this SNP as deleterious. Mutation 3D was employed to investigate the amino acid change in the spatial pattern of protein structure and domain identification. This tool reported mainly 2 domains: BTB domain (11–118) and zf-H2C2_2 domain (372–398). It also unveiled that our concerned mutation is located near the BTB domain.

The evolutionary rate of an amino acid position is significantly affected by its structural and functional relevance. Functionally and structurally critical amino acids are highly conserved because even minor alterations at these residues can cause potential modifications in the protein’s function [37]. ConSurf disclosed that position 6 in wildtype MYNN is highly conserved, exposed, and functional residue. CPORT identified binding site amino acids that interact with the substrate or other proteins. According to CPORT, our mutation of interest was found among the active residues.

Due to the absence of myoneurin tertiary structure in RCSB PDB, 3D structures were predicted using the I-TASSER server, which resulted in C scores of -3.78 and -3.91 for wild type and variant, respectively. It is evident that the C scores were relatively lower for these predicted structures. Considering that the MYNN protein sequence lacks a tertiary structure in RCSB PDB and that the I-TASSER prediction is based on protein threading, these scores seemed reasonable. Moreover, this approach was also used in earlier research to predict the three-dimensional structure of proteins [70, 71]. GalaxyWEB was also employed for the structure refinement process.

The generated structure models were evaluated based on the Ramachandran plot, ERRAT score, MolProbity score, and Z score, produced by PROCHECK, Structure Assessment—SWISS-MODEL, and ProSA. The atomic particles are regarded as solid spheres with van der Waals radii in Ramachandran plots. Any angle that causes sphere collisions is sterically unfavorable; hence, such conformations are disallowed. White areas indicate polypeptide conformations where atoms are closer than their van der Waals radii. These areas are sterically hindered for all amino acids except glycine, which has no side chain. The acceptable alpha-helical and beta-sheet configurations are red since they have no steric conflicts. Yellow sections indicate allowed regions if shorter van der Waals radii are involved in the computation, allowing atoms to gather closely. This reveals a left-handed alpha-helix area [72, 73]. The Ramachandran plot illustrates the protein backbone’s torsional angles (ϕ and ψ), where 90% of residues should be in the most favorable locations [74, 75]. 81.1% and 80.7% residues of native and variant structures were located in the Ramachandran favored region, respectively. These scores are justified in the sense that there is no tertiary structure found for the MYNN protein sequence, and I-TASSER prediction is based on protein threading.

Molprobity is a highly recognized technique for validating protein and nucleic acid tertiary structures. It evaluates structure quality using all-atom contact analysis. Structure quality increases as the score approaches 0 [76]. However, the ProSA Z-score estimates the structure’s overall energy deviation from an arbitrary configuration energy distribution. Z-score of -6.07 indicates model quality [46]. MolProbity scores of 1.93, 1.85, and ProSA Z-scores of -4.12, and -4.3 for native and variant structure models, respectively, suggest these models be acceptable.

The structural deviation between wildtype and missense variant structures was estimated based on TM-score and RMSD values predicted by TM-align and PyMOL consecutively. The root mean square deviation (RMSD) between homologous molecules of two protein chains is a widely utilized estimate of similarities between protein structures. The lower RMSD implicates similar structures [77]. The RMSD value of 5.968 indicates a significant deviation between both models. Again, TM-scores, another measure of protein similarity, range from 0 to 1, with 1 indicating a perfect match between two structures, below 0.2 implicating a random match, and above 0.5 presuming roughly the same fold [78]. TM-score of 0.84197 suggests that not only there is a significant deviation between structures, but also they are not randomly matched. Again, the secondary structure prediction by SOPMA also disclosed the difference between mutant and native models.

Molecular docking was performed in the HDOCK server to study interactions with other proteins and ligands. In the docking analysis, docking scores of -254.7 and -269.55 were assigned for wildtype and mutants, with confidence scores of 0.891 and 0.9161, respectively, when docked with NOTCH2. It implies that the variant binds more strongly than the wildtype, as a greater negative docking score represents a more likely binding model [50]. NOTCH2, a member of the NOTCH family receptor, is associated with a distinctive oncogenic process [79]. It is frequently upregulated in several cancers, including hepatocellular carcinoma [80], gastric cancer [81, 82], glioblastoma [83], medulloblastoma [84], B cell malignancies [85], etc. This transmembrane receptor family contains extracellular epidermal growth factor-like (EGF) repeats domain, with several intracellular domains [86]. It has been reported that EGFR-BTB domain oligomerization activates downstream signaling cascade without EGF [87]. So, the better binding pose of the variant and NOTCH2 complex implies the overexpression of NOTCH2 signaling, followed by a greater risk for oncogenesis.

Two independent ligands (acetaminophen and Adderall) were also docked with native and variant models as the negative control. The results unveiled that the wildtype and mutant models don’t form non-specific interactions. For evaluating the change in dynamic characteristics of the protein owing to nsSNP, the molecular simulation was conducted for 100 nanoseconds using GROMACS software. The analysis showed that the wildtype structure possessed higher RMSD than the variant, and the same trend was observed for RMSF, radius of gyration, and SASA analysis. The nsSNP (rs10936599) alters the stability, compactness, flexibility, and solvent accessibility of the protein. According to RMSD, RMSF, radius of gyration, and SASA profile, the polymorphism seemed more stable than the wildtype.

Usually, nsSNPs modify the protein structure and function potentially [88]. Previous studies suggested that changes in protein stability are indeed connected to changes in function. It’s important to note that stability changes alone cannot reliably predict how a protein’s function will be affected [88]. Even though the overall structure of the variant seemed more stable, it might modify specific regions responsible for the protein’s function. Notedly, the non-coding SNP is situated near the Skp1/Btb/Poz Domain, which mediates protein-protein interactions. Hence, this variant potentially alters interaction with others.

Non-coding SNPs of MYNN were also studied because a mutation in non-coding regions can ultimately affect transcription, translation, and phenotype [89]. According to GWAS, about 90% of all SNPs associated with phenotypes are located in the non-coding region [90]. SNPs of 3 prime UTR regions and 5 prime UTR regions with introns were focused on as functional variants are mostly found in these regions [91]. The non-coding SNPs were subjected to RegulomeDB analysis to assess whether these variants disrupt the regulatory transcription factor binding sites [92]. This analysis exposed that most polymorphisms affected transcription factor binding or DNase peak, followed by motif hit and transcription factor binding + any motif + DNase peak. GTEx Portal was employed to explore genetic mutations, gene expression, and other molecular phenotypes in numerous reference tissues through eQTL, relative gene expression, and splicing quantitative trait loci [93]. Expression quantitative trait loci (eQTL) is a simple method for identifying potential candidate genes at risk sites [94]. The GTEx analysis demonstrated single tissue eQTL of SNPs in testes, with normalized expression represented in violin plots. Further, non-coding SNPs proceeded for analysis in PolymiRTS to distinguish SNPs that influence miRNA and their target locations [56], as these small, non-coding RNAs control gene expression post-transcriptionally [95]. Two polymorphisms were found: rs1920123 disrupting the conserved target site and rs75277808 generating a novel target site. Lastly, HaploReg v4.1 was utilized for annotating non-coding polymorphisms and forecasting their associations with diseases [55].

This study implicated that variant rs10936599 has a pathogenic role in the development of several diseases and cancers. It is also supported by GWAS Catalog [5] with the higher odd ratio for the G allele of rs10936599 and previously reported literature [8, 20]. However, this study needs further research and clinical evidence.

Conclusions

Through a comprehensive bioinformatics approach, this study characterized rs10936599 of MYNN by unraveling its functional outcomes, structural modifications, molecular interactions, dynamics properties, and other properties. It also predicted a novel 3D structure of the complete protein sequence. This analysis can support further research in this field, ensuring a better understanding of this SNP and aiding in developing therapeutic treatments and drug discovery processes.

Supporting information

S1 Fig. Illustration of the secondary structures of A) native protein B) nsSNP.

https://doi.org/10.1371/journal.pone.0296361.s001

(TIF)

S2 Fig. Schematic representation of the Ramachandran plots and ProSA-web Z-score plots.

A) Ramachandran plot of wildtype MYNN structure. B) Ramachandram plot of rs10936599 structure. C) ProSA-web Z-score plot of wild structure. D) ProSA-web Z-score plot of variant structure.

https://doi.org/10.1371/journal.pone.0296361.s002

(TIF)

S3 Fig. Presentation of single tissue eQTL violin plots of non-coding SNPs.

https://doi.org/10.1371/journal.pone.0296361.s003

(TIF)

S1 Table. List of gene enrichment terms of MYNN.

https://doi.org/10.1371/journal.pone.0296361.s004

(XLSX)

References

  1. 1. Brookes AJ. The essence of SNPs. Gene. 1999;234(2):177–186. pmid:10395891
  2. 2. Shastry BS. SNPs in disease gene mapping, medicinal drug development and evolution. Journal of Human Genetics 2007 52:11. 2007;52(11):871–880. pmid:17928948
  3. 3. Marczyk M, Macioszek A, Tobiasz J, Polanska J, Zyla J. Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies. Front Genet. 2021;12:2423. pmid:34956320
  4. 4. Wu MC, Kraft P, Epstein MP, et al. Powerful SNP-Set Analysis for Case-Control Genome-wide Association Studies. The American Journal of Human Genetics. 2010;86(6):929–942. pmid:20560208
  5. 5. Sollis E, Mosaku A, Abid A, et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 2023;51(D1):D977–D985. pmid:36350656
  6. 6. Houlston RS, Cheadle J, Dobbins SE, et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat Genet. 2010;42(11):973–977. pmid:20972440
  7. 7. Codd V, Nelson CP, Albrecht E, et al. Identification of seven loci affecting mean telomere length and their association with disease. Nat Genet. 2013;45(4):422–427. pmid:23535734
  8. 8. Chubb D, Weinhold N, Broderick P, et al. Common variation at 3q26.2, 6p21.33, 17p11.2 and 22q13.1 influences multiple myeloma risk. Nat Genet. 2013;45(10):1221–1225. pmid:23955597
  9. 9. Figueroa JD, Ye Y, Siddiq A, et al. Genome-wide association study identifies multiple loci associated with bladder cancer risk. Hum Mol Genet. 2014;23(5). pmid:24163127
  10. 10. Guo X, Li M, Gao P, et al. Novel splice isoforms of pig myoneurin and their diverse mRNA expression patterns. Asian-Australas J Anim Sci. 2018;31(10):1581–1590. pmid:29747493
  11. 11. Costoya JA. Functional analysis of the role of POK transcriptional repressors. Brief Funct Genomics. 2007;6(1):8–18. pmid:17384421
  12. 12. Yang S, Ning G, Hou Y, et al. Myoneurin regulates BMP signaling by competing with Ppm1a for Smad binding. iScience. 2022;25(6). pmid:35712083
  13. 13. Cifuentes-Diaz C, Bitoun M, Goudou D, et al. Neuromuscular expression of the BTB/POZ and zinc finger protein myoneurin. Muscle Nerve. 2004;29(1):59–65. pmid:14694499
  14. 14. Michalek JE, Kepa A, Vincent J, et al. Genetic predisposition to advanced biological ageing increases risk for childhood-onset recurrent major depressive disorder in a large UK sample. J Affect Disord. 2017;213:207–213. pmid:28233563
  15. 15. Jones AM, Beggs AD, Carvajal-Carmona L, et al. TERC polymorphisms are associated both with susceptibility to colorectal cancer and with longer telomeres. Gut. 2012;61(2):248–254. pmid:21708826
  16. 16. Polat F, Yilmaz M, B Diler S. The Association of MYNN and TERC Gene Polymorphisms and Bladder Cancer in a Turkish Population. Urol J. 2019;16(1):50–55. pmid:30120764
  17. 17. Tacheva T, Zienolddiny S, Dimov D, Notø HØ, Haugen A, Vlaykova T. The lueokocyte telomere length, single nucleotide polymorphisms near TERC gene and risk of COPD. European Respiratory Journal. 2016;48(suppl 60):PA894.
  18. 18. Speedy HE, Di Bernardo MC, Sava GP, et al. A genome-wide association study identifies multiple susceptibility loci for chronic lymphocytic leukemia. Nat Genet. 2014;46(1):56–60. pmid:24292274
  19. 19. Liyanage UE, MacGregor S, Bishop DT, et al. Multi-Trait Genetic Analysis Identifies Autoimmune Loci Associated with Cutaneous Melanoma. J Invest Dermatol. 2022;142(6):1607–1616. pmid:34813871
  20. 20. Sawcer S, Hellenthal G, Pirinen M, et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature. 2011;476(7359):214–219. pmid:21833088
  21. 21. Howe KL, Achuthan P, Allen J, et al. Ensembl 2021. Nucleic Acids Res. 2021;49(D1):D884–D891. pmid:33137190
  22. 22. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023;51(D1):D523–D531. pmid:36408920
  23. 23. Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 2012;40(W1):W452–W457. pmid:22689647
  24. 24. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;(SUPPL.76). pmid:23315928
  25. 25. Ioannidis NM, Rothstein JH, Pejaver V, et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet. 2016;99(4):877. pmid:27666373
  26. 26. Dong C, Wei P, Jian X, et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet. 2015;24(8):2125–2137. pmid:25552646
  27. 27. Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: Application to cancer genomics. Nucleic Acids Res. 2011;39(17). pmid:21727090
  28. 28. Pejaver V, Urresti J, Lugo-Martinez J, et al. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nature Communications 2020 11:1. 2020;11(1):1–13. pmid:33219223
  29. 29. Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou LP, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Science. 2022;31(1):8–22. pmid:34717010
  30. 30. Tang H, Thomas PD. PANTHER-PSEP: predicting disease-causing genetic variants using position-specific evolutionary preservation. Bioinformatics. 2016;32(14):2230–2232. pmid:27193693
  31. 31. Landrum MJ, Lee JM, Benson M, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–D1067. pmid:29165669
  32. 32. Niroula A, Urolagin S, Vihinen M. PON-P2: Prediction Method for Fast and Reliable Identification of Harmful Variants. PLoS One. 2015;10(2):e0117380. pmid:25647319
  33. 33. Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. NetworkAnalyst 3.0: A visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res. 2019;47(W1):W234–W241. pmid:30931480
  34. 34. Wickham H. ggpolt2 Elegant Graphics for Data Analysis. Use R! series. Published online 2016:211.
  35. 35. Venselaar H, te Beek TAH, Kuipers RKP, Hekkelman ML, Vriend G. Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces. BMC Bioinformatics. 2010;11. pmid:21059217
  36. 36. Meyer MJ, Lapcevic R, Romero AE, et al. mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome. Hum Mutat. 2016;37(5):447–456. pmid:26841357
  37. 37. Ashkenazy H, Abadi S, Martz E, et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016;44(Web Server issue):W344. pmid:27166375
  38. 38. Mayrose I, Graur D, Ben-Tal N, Pupko T. Comparison of Site-Specific Rate-Inference Methods for Protein Sequences: Empirical Bayesian Methods Are Superior. Mol Biol Evol. 2004;21(9):1781–1791. pmid:15201400
  39. 39. YANG Z. Maximum likelihood methods. Computational Molecular Evolution. Published online October 5, 2006:100–144.
  40. 40. de Vries SJ, Bonvin AMJJ. CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK. PLoS One. 2011;6(3):e17695. pmid:21464987
  41. 41. Geourjon C, Deléage G. SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Bioinformatics. 1995;11(6):681–684. pmid:8808585
  42. 42. Levin JM, Robson B, Garnier J. An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett. 1986;205(2):303–308. pmid:3743779
  43. 43. Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics. 2008;9(1):1–8. pmid:18215316
  44. 44. Ko J, Park H, Heo L, Seok C. GalaxyWEB server for protein structure prediction and refinement. Nucleic Acids Res. 2012;40(Web Server issue):W294. pmid:22649060
  45. 45. Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr. 1993;26(2):283–291.
  46. 46. Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007;35(suppl_2):W407–W410. pmid:17517781
  47. 47. Waterhouse A, Bertoni M, Bienert S, et al. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46(W1):W296–W303. pmid:29788355
  48. 48. Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33(7):2302. pmid:15849316
  49. 49. Schrödinger LLC, DeLano W. PyMOL. http://www.pymol.org/pymol
  50. 50. Yan Y, Tao H, He J, Huang SY. The HDOCK server for integrated protein–protein docking. Nat Protoc. 2020;15(5):1829–1852. pmid:32269383
  51. 51. SYSTÈMES D. BIOVIA Discovery Studio. Dassault Syst mes BIOVIA, Discovery Studio Modeling Environment, Release 2017. Published online 2016. http://accelrys.com/products/collaborative-science/biovia-discovery-studio/
  52. 52. Abraham MJ, Murtola T, Schulz R, et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25.
  53. 53. Boyle AP, Hong EL, Hariharan M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790–1797. pmid:22955989
  54. 54. Carithers LJ, Ardlie K, Barcus M, et al. A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. Biopreserv Biobank. 2015;13(5):311–317. pmid:26484571
  55. 55. Ward LD, Kellis M. HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res. 2016;44(Database issue):D877. pmid:26657631
  56. 56. Bhattacharya A, Ziebarth JD, Cui Y. PolymiRTS Database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways. Nucleic Acids Res. 2014;42(Database issue):D86. pmid:24163105
  57. 57. Stogios PJ, Downs GS, Jauhal JJS, Nandra SK, Privé GG. Sequence and structural analysis of BTB domain proteins. Genome Biol. 2005;6(10):1–18. pmid:16207353
  58. 58. Munro D, Ghersi D, Singh M. Two critical positions in zinc finger domains are heavily mutated in three human cancer types. PLoS Comput Biol. 2018;14(6):e1006290. pmid:29953437
  59. 59. Burley SK, Bhikadiya C, Bi C, et al. RCSB Protein Data Bank: Powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2021;49(1):D437–D451. pmid:33211854
  60. 60. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34(Database issue):D535. pmid:16381927
  61. 61. Bennett EJ, Rush J, Gygi SP, Harper JW. Dynamics of cullin-RING ubiquitin ligase network revealed by systematic quantitative proteomics. Cell. 2010;143(6):951–965. pmid:21145461
  62. 62. Fasci D, Van Ingen H, Scheltema RA, Heck AJR. Histone Interaction Landscapes Visualized by Crosslinking Mass Spectrometry in Intact Cell Nuclei. Mol Cell Proteomics. 2018;17(10):2018–2033. pmid:30021884
  63. 63. Huttlin EL, Bruckner RJ, Navarrete-Perea J, et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell. 2021;184(11):3022–3040.e28. pmid:33961781
  64. 64. Do SK, Yoo SS, Choi YY, et al. Replication of the results of genome-wide and candidate gene association studies on telomere length in a Korean population. Korean J Intern Med. 2015;30(5):719. pmid:26354067
  65. 65. Ye G, Tan N, Meng C, et al. Genetic variations in TERC and TERT genes are associated with lung cancer risk in a Chinese Han population. Oncotarget. 2017;8(66):110145. pmid:29299136
  66. 66. Cunningham F, Allen JE, Allen J, et al. Ensembl 2022. Nucleic Acids Res. 2022;50(D1):D988–D995. pmid:34791404
  67. 67. Fernandez-Rozadilla C, Timofeeva M, Chen Z, et al. Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries. Nat Genet. 2023;55(1):89–99. pmid:36539618
  68. 68. Tenenbaum JD. Translational Bioinformatics: Past, Present, and Future. Genomics Proteomics Bioinformatics. 2016;14(1):31–41. pmid:26876718
  69. 69. Mustafa MI, Murshed NS, Abdelmoneim AH, Abdelmageed MI, Elfadol NM, Makhawi AM. Extensive in Silico Analysis of ATL1 Gene: Discovered Five Mutations That May Cause Hereditary Spastic Paraplegia Type 3A. Scientifica (Cairo). 2020;2020. pmid:32322428
  70. 70. Mahalah S, Hamid Z, Ibrahim S, Babiker S. Computational Analysis of Functional Coding/Noncoding Single Nucleotide Polymorphisms (SNPs/Indels) in Human NEUROG1 gene. Journal of Biological Sciences. Published online 2022.
  71. 71. Akhtar M, Jamal T, Jamal H, et al. Identification of most damaging nsSNPs in human CCR6 gene: In silico analyses. Int J Immunogenet. 2019;46(6):459–471. pmid:31364806
  72. 72. Choudhuri S. Additional Bioinformatic Analyses Involving Protein Sequences. Bioinformatics for Beginners. Published online January 1, 2014:183–207.
  73. 73. Sumitha A, Devi PB, Hari S, Dhanasekaran R. COVID-19—In Silico Structure Prediction and Molecular Docking Studies with Doxycycline and Quinine. Biomedical and Pharmacology Journal. 2020;13(3):1185–1193.
  74. 74. Hollingsworth SA, Karplus PA. A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins. Biomol Concepts. 2010;1(3–4):271. pmid:21436958
  75. 75. Ho BK, Brasseur R. The Ramachandran plots of glycine and pre-proline. BMC Struct Biol. 2005;5(1):1–11. pmid:16105172
  76. 76. Williams CJ, Headd JJ, Moriarty NW, et al. MolProbity: More and better reference data for improved all-atom structure validation. Protein Science. 2018;27(1):293–315. pmid:29067766
  77. 77. Reva BA, Finkelstein A V., Skolnick J. What is the probability of a chance prediction of a protein structure with an rmsd of 6 å? Fold Des. 1998;3(2):141–147. pmid:9565758
  78. 78. Xu J, Zhang Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics. 2010;26(7):889. pmid:20164152
  79. 79. Xiu MX, Liu YM. The role of oncogenic Notch2 signaling in cancer: a novel therapeutic target. Am J Cancer Res. 2019;9(5):837. Accessed March 8, 2023. /pmc/articles/PMC6556604/ pmid:31218097
  80. 80. Wu WR, Zhang R, Shi X De, Yi C, Xu LB, Liu C. Notch2 is a crucial regulator of self-renewal and tumorigenicity in human hepatocellular carcinoma cells. Oncol Rep. 2016;36(1):181–188. pmid:27221981
  81. 81. Demitrack ES, Samuelson LC. Notch as a Driver of Gastric Epithelial Cell Proliferation. Cell Mol Gastroenterol Hepatol. 2017;3(3):323. pmid:28462374
  82. 82. Kim TH, Shivdasani RA. Notch signaling in stomach epithelial stem cell homeostasis. J Exp Med. 2011;208(4):677. pmid:21402740
  83. 83. Sivasankaran B, Degen M, Ghaffari A, et al. Tenascin-C is a novel RBPJkappa-induced target gene for Notch signaling in gliomas. Cancer Res. 2009;69(2):458–465. pmid:19147558
  84. 84. Fiaschetti G, Schroeder C, Castelletti D, et al. NOTCH ligands JAG1 and JAG2 as critical pro-survival factors in childhood medulloblastoma. Acta Neuropathol Commun. 2014;2(1):39. pmid:24708907
  85. 85. Zhang X, Shi Y, Weng Y, et al. The Truncate Mutation of Notch2 Enhances Cell Proliferation through Activating the NF-κB Signal Pathway in the Diffuse Large B-Cell Lymphomas. PLoS One. 2014;9(10). pmid:25314575
  86. 86. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40(Database issue):D130. pmid:22121212
  87. 87. Nitsch L, Jensen P, Yoon H, et al. BTBBCL6 dimers as building blocks for reversible drug-induced protein oligomerization. Cell Reports Methods. 2022;2(4):100193. pmid:35497498
  88. 88. Bromberg Y, Rost B. Correlating protein function and stability through the analysis of single amino acid substitutions. BMC Bioinformatics. 2009;10(SUPPL. 8):1–9. pmid:19758472
  89. 89. Tatarinova T V., Chekalin E, Nikolsky Y, et al. Nucleotide diversity analysis highlights functionally important genomic regions. Scientific Reports 2016 6:1. 2016;6(1):1–12. pmid:27774999
  90. 90. Giral H, Landmesser U, Kratzer A. Into the Wild: GWAS Exploration of Non-coding RNAs. Front Cardiovasc Med. 2018;5:181. pmid:30619888
  91. 91. Zhang F, Lupski JR. Non-coding genetic variants in human disease. Hum Mol Genet. 2015;24(R1):R102. pmid:26152199
  92. 92. Cheng SJ, Jiang S, Shi FY, Ding Y, Gao G. Systematic identification and annotation of multiple-variant compound effects at transcription factor binding sites in human genome. Journal of Genetics and Genomics. 2018;45(7):373–379. pmid:30054217
  93. 93. Lonsdale J, Thomas J, Salvatore M, et al. The Genotype-Tissue Expression (GTEx) project. Nature Genetics 2013 45:6. 2013;45(6):580–585. pmid:23715323
  94. 94. Lawrenson K, Li Q, Kar S, et al. Cis-eQTL analysis and functional validation of candidate susceptibility genes for high-grade serous ovarian cancer. Nature Communications 2015 6:1. 2015;6(1):1–14. pmid:26391404
  95. 95. Cannell IG, Kong YW, Bushell M. How do microRNAs regulate gene expression? Biochem Soc Trans. 2008;36(Pt 6):1224–1231. pmid:19021530