Single nucleotide polymorphisms (SNPs) are the most common type of genetic variations in humans and play a major role in the genetics of human phenotype variation and the genetic basis of human complex diseases. Recently, there is considerable interest in understanding the possible role of the CYP11B2 gene with corticosterone methyl oxidase deficiency, primary aldosteronism, and cardio-cerebro-vascular diseases. Hence, the elucidation of the function and molecular dynamic behavior of CYP11B2 mutations is crucial in current genomics. In this study, we investigated the pathogenic effect of 51 nsSNPs and 26 UTR SNPs in the CYP11B2 gene through computational platforms. Using a combination of SIFT, PolyPhen, I-Mutant Suite, and ConSurf server, four nsSNPs (F487V, V129M, T498A, and V403E) were identified to potentially affect the structure, function, and activity of the CYP11B2 protein. Furthermore, molecular dynamics simulation and structure analyses also confirmed the impact of these nsSNPs on the stability and secondary properties of the CYP11B2 protein. Additionally, utilizing the UTRscan, MirSNP, PolymiRTS and miRNASNP, three SNPs in the 3′UTR region were predicted to exhibit a pattern change in the upstream open reading frames (uORF), and eight microRNA binding sites were found to be highly affected due to 3′UTR SNPs. This cataloguing of deleterious SNPs is essential for narrowing down the number of CYP11B2 mutations to be screened in genetic association studies and for a better understanding of the functional and structural aspects of the CYP11B2 protein.
Citation: Jia M, Yang B, Li Z, Shen H, Song X, Gu W (2014) Computational Analysis of Functional Single Nucleotide Polymorphisms Associated with the CYP11B2 Gene. PLoS ONE 9(8): e104311. doi:10.1371/journal.pone.0104311
Editor: Junwen Wang, The University of Hong Kong, Hong Kong
Received: February 27, 2014; Accepted: July 7, 2014; Published: August 7, 2014
Copyright: © 2014 Jia et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by grants from Science Technology Department of Zhejiang Province of China [grant number 2011C23018], and grants from Research Fund for the Doctoral Program of Higher Education of China [grant number 20130101120038]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Single nucleotide polymorphisms (SNPs) are the most abundant class of genetic variations in the human genome with a frequency of approximately every 100 to 300 base pairs . Given that there are millions of SNPs in the entire human genome, SNPs are important as markers for constructing genetic maps and have potential as direct functional variants associated with common and genetically complex diseases and drug responses. The vast majority of SNPs are neutral allelic variants; thus, one of the main goals of SNP research is the identification of functional SNPs, which is a crucial step for understanding the molecular basis of complex traits and diseases in humans . However, the identification of these functional sets of SNPs may be a daunting task. Although experimental techniques will provide the strongest evidence for the functional role of a genetic variant , it is not feasible to perform laboratory experiments for all SNPs in the human genome or even in a single gene. Hence, theoretical and/or computational methods are becoming indispensable for the identification and prioritization of SNPs with functional significance from an enormous number of non-risk alleles . Computational methods are sufficiently fast and flexible to provide reliable predictions of functionally significant SNPs with a high accuracy of 80–85% – when combined with sequence, structure, and phylogenetic relationships.
The aldosterone synthase (CYP11B2) gene is situated on chromosome 8q24.3 and encodes aldosterone synthase, which is the key rate-limiting enzyme for the terminal steps of aldosterone biosynthesis . Previously, Strushkevich N and his research group determined the CYP11B2 structure by means of X-ray crystallography . In recent years, there is considerable interest in understanding the possible role of the CYP11B2 gene for assessing the risk associated with corticosterone methyl oxidase deficiency (including CMO I and CMO II), primary aldosteronism, and cardio-cerebro-vascular diseases –. However, most disease association studies have focused on just a few SNPs, particularly T-344C (rs1799998). Other SNPs in the CYP11B2 gene have not been studied, and the in silico investigations of SNPs in the CYP11B2 gene remain scarce. Lately, Hui E et al. described a 33-year old Chinese man who was compatible with type 2 aldosterone synthase deficiency carried a heterozygous mutation c.977C > A (p.Thr326Lys) in exon 3 and computational analysis also confirmed the missense variant nocuity . Hence one can see that bioinformatics has its unique advantages in understanding the relationship between genes and diseases. In this study, we performed computational analyses of non-synonymous SNPs (nsSNPs) and UTR-region SNPs in the CYP11B2 gene to identify all of the possible deleterious mutations and propose a modeled structure for the mutant protein. We are confident that the results of our study will provide a further understanding of the CYP11B2 gene in human diseases, as well as a guide for future experimental work.
Materials and Methods
The SNP information [SNP ID, amino acid position, mRNA accession number NM_000498.3, and Protein accession number NP_000489.3] of the human CYP11B2 gene used in our computational analyses was retrieved from the National Center for Biotechnology Information (NCBI) database of SNPs (dbSNP (http://www.ncbi.nlm.nih.gov/snp/) . The workflow, tools, and databases used to identify the potential functional SNPs in the human CYP11B2 gene are shown in Figure 1.
Assessment of nsSNP functionality
The functional context of nsSNPs was predicted using SIFT, PolyPhen and I-Mutant Suite.
SIFT (http://sift.bii.a-star.edu.sg/index.html) is a sequence-homology-based tool to predict whether an amino acid substitution in a protein would be tolerated or damaging . We performed SIFT by submitting the query in the form of SNP IDs or chromosome positions and alleles in nsSNVs tool. Variants at the position with tolerance index score ≤0.05 are considered to be deleterious. A lower tolerance index indicates that the particular amino acid substitution likely has a more functional impact , .
PolyPhen (http://genetics.bwh.harvard.edu/pph2/) is an automatic tool that predicts the possible impact of an amino acid substitution on a number of features, including the sequence, phylogenetic, and structural information . The query was submitted in the form of protein sequence with mutational position and substitution. The PolyPhen output comprises a score that ranges from 0 to 1, with zero indicating a neutral effect of amino acid substitutions on protein function. Conversely, a high score represents a variant that is more likely to be damaging.
I-Mutant Suite is a suite of support vector machine (SVM)-based predictors of protein stability changes according to Gibbs free energy change, enthalpy change, heat capacity change, and transition temperature . The analyses were performed based on protein sequence combined with mutational position and correlated new residue. And the output result of the predicted free energy change (DDG) classifies the prediction into one of three classes: largely unstable (DDG < −0.5 kcal/mol), largely stable (DDG>0.5 kcal/mol), or neutral (-0.5≤ DDG≤0.5 kcal/mol). I-Mutant Suite is available at http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi.
Evolutionary conservation analysis of nsSNPs
An amino acid that plays an essential role, e.g., in enzymatic catalysis, is likely to remain unaltered despite random evolutionary drift. Hence, the level of evolutionary conservation is often indicative of the importance of the position for maintaining the protein’s structure and/or function. The ConSurf server is a bioinformatics tool for estimating the evolutionary conservation of amino/nucleic acid positions in a protein/DNA/RNA molecule based on the phylogenetic relationships between homologous sequences . After entering the 3D structure of the query protein, the conservation scores are calculated based on the evolutionary relationships among the protein and its homologs , . A conservation score between 1 and 4 is considered variable, whereas a score of 5–6 is intermediate, and a score in the range of 7 to 9 indicates conserved. Using the empirical Bayesian method, the accuracy of the conservation score estimation was significantly improved, particularly when a small number of sequences are used for the calculations . ConSurf is available at http://consurftest.tau.ac.il.
Evaluation of the functional context of SNPs in the UTR region
The 5′and 3′ untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleocytoplasmic mRNA transport, translation efficiency, subcellular localization, and message stability –. The functional impacts of UTR SNPs were analyzed using UTRScan , MirSNP , PolymiRTS  and miRNASNP.
The program UTRscan looks for UTR functional elements by searching through user submitted sequence data for the patterns defined in the UTRsite collection. And UTRsite is a collection of regulatory elements located in the 5′ and 3′UTRs whose function and structure have been experimentally determined and published. If different sequences for each UTR SNP are found to have different functional patterns, that particular UTR SNP is predicted to have functional significance. The pattern change included two directions by the influence of SNPs at the UTR regions, either from “have pattern” to “no pattern”, or “no pattern” to “have pattern”. UTRscan is available at http://itbtools.ba.itb.cnr.it/utrscan.
MirSNP is a database of SNPs used for the prediction of whether an SNP within the target site would decrease/break or enhance/create a microRNA-mRNA binding site based on information from dbSNP135 and miRBase 18. Its output of single search by entering the gene name includes mirSVR score, the effect of different alleles, the predicted score, conservative information and Start & End & Binding information. Combined with GWAS or eQTL data sets, MirSNP is highly sensitive and covers most experiments confirmed SNPs that affect miRNA function. MirSNP is available at http://cmbi.bjmu.edu.cn/mirsnp.
PolymiRTS is a database of naturally occurring DNA variations in microRNA seed regions and microRNA target sites. Integrated data from CLASH (cross linking, ligation and sequencing of hybrids) experiments, PolymiRTS database provides more complete and accurate microRNA–mRNA interactions. The polymorphic microRNA target sites are assigned into four classes: ‘D’ (the derived allele disrupts a conserved microRNA site), ‘N’ (the derived allele disrupts a nonconserved microRNA site), ‘C’ (the derived allele creates a new microRNA site) and ‘O’ (other cases when the ancestral allele cannot be determined unambiguously). The class ‘C’ may cause abnormal gene repression and class ‘D’ may cause loss of normal repression control. So these two classes of PolymiRTS are most likely to have functional impacts. PolymiRTS is available at http://compbio.uthsc.edu/miRSNP/.
miRNASNP is a database which predicts the effect (loss or gain of function) of SNPs within pre-miRNA, mature miRNA, miRNA target sequences and flanking regions. Using the SNP IDs of the query protein as an input, it produced a list of targets with energy change, SNP-miRNA/target duplexes and gain/loss effect by SNP in miRNA seed or gene 3′UTR. Focused on the prediction of potential effects on miRNA biogenesis and target binding by SNPs through both prediction and experimental validation, miRNASNP is a useful resource to shed light on further experiments. miRNASNP is available at http://www.bioguo.org/miRNASNP/.
Molecular modeling and molecular dynamics simulation
A structural analysis was performed to evaluate the structural stability of the native and mutant proteins. The crystal structure of the CYP11B2 protein was acquired from PDB [Protein Data Bank; PDB ID = 4DVQ (A chain)] . The Modeller 9.11 package was used to map the mutations on the structure . Furthermore, we used energy minimization and molecular dynamics simulation (MDS) techniques to understand the structural variations in the mutant protein with respect to the native structure using the NAMD 2.6 package . The native and mutant protein structures were solvated in a water sphere using the VMD 1.9.1 package . The cutoff for electrostatic and Van der Waals interactions was 12.0 Å. The temperature was maintained constant at 310K through the use of Langevin dynamics, which provides a means of controlling the kinetic energy of the system with a damping coefficient (gamma) of 1/ps. The energy minimization and molecular dynamics simulations were performed using the CHARMM force field with 5000 iterations and a 1-ns timescale, respectively. The trajectory files were analyzed to obtain the root-mean square deviation (RMSD), radius of gyration (Rg), and solvent-accessible surface area (SASA).
To determine the differences in the RMSD, Rg and SASA value between native and mutant protein structures, statistical analyses were performed with SAS 9.1 software (SAS Institute, Inc., Cary, NC). If quantitative data both fit the normal distribution and homogeneity of variance, Student’s t-test was used to compare the differences between native and mutant group. Otherwise nonparametric Wilcoxon two-sample test was used. The parameters were summarized by medians and interquartile ranges (IQRs). All P-values are two-sided and less than 0.05 was considered a statistically significant difference.
CYP11B2 database construction
The database at http://220.127.116.11 contains the results obtained from this work. The natural variants listed in the database come from dbSNP. For each nsSNP, we provide predictions of the function effects using SIFT, PolyPhen-2, and I-Mutant Suite. Meanwhile, we also list the UTR SNPs that were predicted to have functional significance by MirSNP, polymiRTS and miRNASNP. In addition, PDB structure files of native and mutant proteins as well as results of molecular dynamics simulation can be downloaded. This database is freely available and will be regularly updated.
SNP dataset from dbSNP
The human CYP11B2 gene contains a total of 358 SNPs, of which 51 (14.2%) are nsSNPs and 36 (10.0%) are coding synonymous SNPs. The non-coding region includes 166 SNPs (46.4%) in the intronic region, 79 (22.1%) SNPs in the “near gene” region, and 26 SNPs (7.3%) in the mRNA UTR region. The distribution of SNPs is shown in Figure 2. We selected the nsSNPs and UTR-region SNPs for our subsequent investigations.
Identification of deleterious and damaging nsSNPs
The identification of the nsSNPs that confer susceptibility or resistance to human diseases should become increasingly feasible with improved in silico tools. In this analysis, we employed three in silico tools to determine the functional significance of nsSNPs in the CYP11B2 gene. Table 1 presents the results obtained through the SIFT, PolyPhen-2, and I-Mutant Suite analyses of the CYP11B2 nsSNPs.
Through SIFT, 19 nsSNPs (37.3%) were predicted to be deleterious with a tolerance score of less than or equal to 0.05. Of these 19 SNPs, seven (R181W, F499C, Y275C, V129M, T185I, T498A, and V403E) were reported to be highly deleterious with a tolerance score of 0.00.
We further analyzed the nsSNPs using PolyPhen based on structural information and multiple sequence alignments. Of the 51 nsSNPs used in our analysis, 14 nsSNPs were predicted to be “probably damaging”, and nine nsSNPs were found to be “possibly damaging”. Consequently, 23 nsSNPs (45.1%) were characterized as damaging.
To improve the prediction accuracy of structure-based tools, we then used I-Mutant Suite. We found that 24 nsSNPs (47.1%) exhibit a DDG value of less than −0.5, which indicates that these are largely unstable.
The predictive power of determining the functional impact of a given nsSNP can be significantly increased by combining information from a variety of tools . Accordingly, we combined the SIFT, PolyPhen, and I-Mutant Suite programs to predict the influence of nsSNPs on protein function and structure. Figure 3 shows the distribution of deleterious and benign nsSNPs obtained using SIFT, PolyPhen, and I-Mutant Suite. Of all of the predictions, 37.3%, 45.1%, and 47.1% were specific found by SIFT, PolyPhen, and I-Mutant Suite, respectively. In addition, six nsSNPs (F499C, Y275C, V129M, T498A, F487V, and V403E) were predicted to be functionally significant by all three tools. With a diverse set of alignments and molecular characteristics of each in silico tool, the results of three tools were slightly different.
The black rectangular bar indicates the percentage of nsSNPs that were found to be deleterious by SIFT, damaging (Possibly/Probably) by PolyPhen, and largely unstable by I-Mutant Suite. The white rectangle indicates the percentage of nsSNPs that were found to be tolerated by SIFT, benign by PolyPhen, and largely stable/neutral by I-Mutant Suite.
Analysis of nsSNPs in the conserved region
A disease-causing mutation often resides in highly conserved positions. Conservation analyses of the six nsSNPs that were predicted to be deleterious by the above-mentioned three tools were performed using the ConSurf server based on protein structure. Of the six nsSNPs, the four nsSNP positions of V129M, T498A, F487V, and V403E were considered to be located in a highly conserved amino acid region through homologous sequence alignment with the SWISS-PROT, UniProt, and UniRef90 protein databases. The main results are shown in Table 2 and Figure 4.
Colors of the ConSurf output indicate the level of sequence conservation. Purple indicates conservation and blue indicates variability. Residues are predicted to be exposed (e), buried (b), functional (i.e., highly conserved and exposed; f), or structural (i.e., highly conserved and buried, s). Numbers indicate residue number of CYP11B2. The bold (black) arrows represent V129M, Y275C, V403E, F487V, T498A and F499C mutation, respectively.
Functional SNPs in the UTR region
UTRs are known to play vital roles in the post-transcriptional regulation of gene expression, and their importance is emphasized by the finding that UTR variations can lead to serious pathology . All of the 26 UTR SNPs were analyzed using UTRscan. After comparing the functional elements for each UTR SNP, we predicted that three SNPs, namely rs61763988, rs35574522, and rs3097, exhibited a pattern change of upstream open reading frame (uORF). Considering the extensive role of UTR SNPs in microRNA binding sites, which could affect the degradation or translational suppression of mRNA, we further analyzed the UTR SNPs by MirSNP, PolymiRTS and miRNASNP. The results showed that 19 SNPs were predicted to change the binding sites with microRNAs by MirSNP and miRNASNP. In PolymiRTS, 11 SNPs were found to highly affect the microRNA binding targets. Then combined the results of these three tools, eight SNPs (rs188784518, rs117910248, rs61763989, rs61757284, rs28390200, rs7463238, rs3802228 and rs3097) indicate a highest likelihood that the polymorphism significantly altered microRNA targeting of the sequence (Table 3).
Molecular dynamics simulation of native and mutant CYP11B2 proteins
To further understand the structural consequences of the prioritized deleterious mutations, molecular dynamics simulations were conducted to analyze the conformational changes in the native and mutant structures (V129M, V403E, F487V, and T498A). The trajectory files were produced after the molecular dynamics simulation, and we then investigated the RMSD, Rg, and SASA variations between the native and the four mutant structures.
We calculated the RMSD for all the atoms from the initial structure that was considered as the central origin to measure the convergence of the protein system concerned (Figure 5). In all five structures, considerable structural changes were observed during the initial few picoseconds, leading to an RMSD of ∼1.2 Å and subsequently notable structural deviations during the rest of the simulations. In the last 200 picoseconds of the simulation, the median of RMSD is 1.21(IQR:1.18–1.26) Å for native structure, 1.46(IQR:1.36–1.51) Å for V129M, 1.40(IQR:1.37–1.43) Å for V403E, 1.82(IQR:1.79–1.86) Å for F487V, and 1.47(IQR:1.41–1.50) Å for T498A (Table 4). The statistical analysis showed significant differences between the native structure and the four mutant structures (P<0.0001, particularly F487V). Moreover, small fluctuations in the average RMSD value after the relaxation period led to the conclusion that the simulation generated a stable trajectory and thus provides a credible basis for further analyses.
The ordinate is RMSD (Å), and the abscissa is time (ps). Black, blue, green, violet and red lines indicate native, V129M, V403E, F487V and T498A mutation, respectively.
Rg is defined as the mass-weight root mean square distance of a collection of atoms from their common center of mass. Hence, it provides insight into the overall dimension of a protein. The Rg plot for the Cα atoms of the protein as a function of time at 310 K is shown in Figure 6 and results of data analyses are shown in Table 4. The statistic analysis of Rg value of the last 200 picoseconds of the simulation showed that F487V, V129M and T498A had significant differences with native structure [native: 22.32(IQR: 22.29–22.35) Å; V129M: 22.40(IQR: 22.37–22.43) Å; F487V: 22.58(IQR: 22.55–22.61) Å; T498A: 22.37(IQR: 22.34–22.39) Å]. As reflected in Figure 6, the F487V mutant curve differed significantly and fluctuated at a higher rate during the simulation time period, indicating that the mutant conformation is flexible throughout the simulation time and that its structure acquires an expanded conformation compared to the native structure. On the contrary, no difference was found between the native structure and V403E structure.
The ordinate is Rg (Å), and the abscissa is time (ps). Black, blue, green, violet and red lines indicate native, V129M, V403E, F487V and T498A mutation respectively.
The SASA is the surface area of a biomolecule that is accessible to a solvent and can be related to the hydrophobic core. It is typically calculated using the ‘rolling ball’ algorithm developed by Shrake and Rupley in 1973 . The SASA was calculated for native and mutant trajectories and is depicted in Table 4 and Figure 7. Data analyses showed that there were significant differences between all four mutant structures and native structure [native: 24896(IQR: 24830–24980) nm2; V129M: 24821(IQR: 24753–24895) nm2; V403E: 24880(IQR: 24827–24934) nm2; F487V: 24993(IQR: 24931–25058) nm2; T498A: 24719(IQR: 24667–24778) nm2]. Compared with the native protein, the F487V mutant protein exhibited a greater value of SASA over time, whereas V129M, V403E and T498A presented lower SASA values. An increase or decrease in SASA indicates changes in the exposed amino acid residues and could affect the tertiary structure of the protein.
The ordinate is SASA (nm2), and the abscissa is time (ps). Black, blue, green, violet and red lines indicate native, V129M, V403E, F487V and T498A mutation, respectively.
To properly visualize the crystal structure differences between the native and mutant proteins, we spatially superimposed the molecules (Figure 8). The results show that F487V and V129M exhibit a high displacement (5 Å; shown in red) and that T498A and V403E present a low displacement (0 Å; shown in blue).
Residues with a low displacement (0 Å) are shown in blue, those with a high displacement (5 Å) are shown in red, and those with a moderate displacement are shown in white. The CYP11B2 models are represented in NewCartoon, and the mutated amino acids are represented in CPK.
Furthermore, we ranked above four SNPs based on results of RMSD, Rg, SASA variations and spatial superimposition (Table 5). So F487V had the highest likelihood of deleterious effect, then V129M, T498A, and V403E with descending perniciousness.
During the execution of this project, the CYP11B2 database was created to show a more updated and complete set of in silico analyses per mutation. This database allows a user to quickly retrieve and rapidly analyse the predicted effects of protein variants. With its interactive interface, the CYP11B2 database allows dynamic utilization by enabling users to select only the results of the mutations and algorithms that are most important to them. The in silico analysis of CYP11B2 in this database will be helpful in the design of further experimental research. The CYP11B2 database is available at http://18.104.22.168/.
Because of the application of high-throughput sequencing technologies, the number of identified genomic variants, particularly SNPs, in the human genome is rapidly growing. The latest release of NCBI dbSNP database (build 141) contains nearly 44 million validated human SNPs . The principal objective of studies in molecular biology and population genetics is to identify and characterize SNPs that are functionally deleterious from neutral SNPs. This is also an inevitable process in genetic association studies of complex genes and diseases . To the best of our knowledge, this study provides the first demonstration of the computational analysis of functional SNPs associated with the CYP11B2 gene. The value and novelty of this study are to prioritize SNPs with functional significance from an enormous number of non-risk alleles and provide new insights for further genetic association studies. Moreover, these identified SNPs could contribute to aldosterone-induced cardiovascular disease, possibly representing novel targets for the therapy. Of 358 SNPs, we selected the nsSNPs and UTR-region SNPs for our investigations, and variants in near-Gene, intronic regions were unexplored.
In this study, we attempted to evaluate the deleterious nsSNPs in three contexts: (1) Identification of deleterious nsSNPs through both sequence- and structure-based methods (SIFT, PolyPhen and I-Mutant Suite), (2) Calculation of the evolutionary conservation of amino acid positions through a conservation score (ConSurf server), and (3) Measurement of alterations in the protein 3D structure due to deleterious nsSNPs through a molecular dynamics approach. Of the 51 nsSNPs associated with the CYP11B2 gene, four nsSNPs, namely F487V, V129M, T498A, and V403E, were finally identified to be highly deleterious based on above comprehensive analyses, particularly F487V.
A number of recent studies mainly focused on the T-344C polymorphism, which impacts the CYP11B2 promoter activity, but the literature on coding substitutions that directly influence the structure of the protein is scarce. However, T498A, one of four above-mentioned nsSNPs that were predicted to be deleterious, was found to be strongly associated with CMO-II deficiency, which shows very low levels of aldosterone synthesis (0.5% or less compared with the wildtype enzyme). The in vitro analysis of the enzyme activities of the T498A mutation showed efficient 11 β-hydroxylase activity but a loss of C18 activity, resulting in poor aldosterone synthesis . Hence, it appears reasonable to speculate that nsSNPs can ruin the secondary structure of the enzyme, thereby leaving the aldosterone synthase activity intact. It is worth noting that some patients, such as CMO-II deficiency patients who reach adulthood, could be asymptomatic and able to synthesize adequate amounts of aldosterone at the expense of elevated levels of aldosterone precursors. This existence of ostensibly asymptomatic individuals with significantly compromised aldosterone synthase function may reflect problems of ascertainment and may at least partly explain why few coding mutations in the CYP11B2 gene have been reported.
Because the translational regulation of gene expression is as important as the transcriptional regulation for normal cell function and that its dysfunction is related to the pathophysiology of various diseases –, the UTR SNPs in the CYP11B2 gene were also evaluated by UTRScan, MirSNP, PolymiRTS and miRNASNP. In our study, we found that 7.3% of the SNPs are located in the UTR region. After comparing the functional elements for each UTR SNP using UTRscan, we found that three SNPs in the 3′UTR were predicted to exhibit a pattern change in their upstream open reading frames (uORFs). However, the uORF in the 3′UTR is hypothesized to have no functional importance.
Due to the importance of the translational regulation of microRNAs, we further studied whether the 3′UTR SNPs change the profile of microRNA binding to the CYP11B2 gene using MirSNP, PolymiRTS and miRNASNP. Of the 26 UTR SNPs, eight (rs188784518, rs117910248, rs61763989, rs61757284, rs28390200, rs7463238, rs3802228 and rs3097) were found to highly affect the microRNA binding targets with MirSNP, PolymiRTS and miRNASNP. These SNPs can break, create, enhance, or decrease microRNA binding (i.e., a single SNP can break a microRNA binding site and also potentially create another site), with consequences on regulation of mRNA degradation pathway thereby affecting mRNA turnover and microRNA function. Therefore, these UTR SNPs could result in the disturbance of aldosterone biosynthesis. Recently, mounting evidence suggests that aldosterone plays crucial roles in a variety of cerebro-, cardiovascular and renal complications . Nevertheless, validation and pathomechanism experiments of these predicted deleterious UTR SNPs were still few. Several studies indicated that rs3802228 might be associated with atrial structural remodeling and the presence of coronary artery disease, . As reflected in Table 3, rs3802228 could disturb the interactions between mRNA and microRNA-331–5p. Consistent with this idea, one recent study comes to demonstrate that the upregulation of rno-miR-331* could be seen as biomarkers of prognosis in clinical therapy of heart failure . Besides, rs3097 (G5937C), one of above eight detrimental SNPs, was also found to be associated with cardiac wall thickness . Collectively, these facts and speculations suggest that a potential role of these identified UTR SNPs in the pathogenesis of aldosterone-induced cardiovascular complications. Then, it is of considerable interest that the pathogeny of some cardiovascular disease but not limited to primary aldosteronism could be the variants in the CYP11B2 gene, and aldosterone may act as a central player in this pathological process. Thereby, aldosterone antagonist treatment seems to be of considerable therapeutic value to control and limit the progression of these diseases. This newly pathway of CYP11B2 SNPs/aldosterone/cardiovascular disease opens new research insights and therapeutic avenues for the cardiovascular diseases.
CYP11B2 protein is a steroid hydroxylase cytochrome P450 enzyme involved in the biosynthesis of the mineralocorticoid aldosterone. It is the sole enzyme capable of synthesizing aldosterone in humans and plays an important role in electrolyte balance and blood pressure. Mutations in the CYP11B2 gene can disturb the biosynthesis of aldosterone, then resulting in aldosterone synthase deficiency, also known as corticosterone methyloxidase deficiency. Besides, CYP11B2 gene variations can also change the gene expression, therefore play an important role in many diseases, such as hypertension, primary aldosteronism and heart failure. In addition, Nicod et al. found that CYP11B2 is also strongly associated with the rate of decline in renal allograft function . Our in silico studies identified various deleterious SNPs, and majority of them have not been reported experimentally so far. However, these findings highlight an attractive screening target for disease association studies involved in CYP11B2 protein, and also provide a guide for future experimental work.
Although the prediction of deleterious SNPs seems to be more and more accurate when integrating more valuable informations, there still exist some challenges to deal with. Computational tools can predict a variant is deleterious or not with a strong confidence, but the information about which disease the variant is related to and which disease the variant has a casual relation with is still missing . In addition, facts show that variants in regulatory regions may alter the consensus of transcription factor binding sites or promoter elements; variants in the introns and silent variants in exons may alter splicing efficiency. Nevertheless, prediction of these variants from genomic sequence remains one of the most challenging tasks for bioinformatics. The biggest problem is over-prediction: (1) the prediction of promoter was expressed cryptically; (2) the vast majority of transcription factor binding sites lack characteristics either in length or sequence; (3) cis-regulatory elements, such as ESE (exonic splicing enhancers), ESS (exonic splicing silencers), ISE (intronic splicing enhancers) and ISS (intronic splicing silencers) sites are very poorly defined and may be located in almost any position within exons and introns. For these reasons, we currently did not perform the prediction of variants in near-Gene, intronic regions.
In summary, using combinational in silico investigations, the current study identified four nsSNPs, denoted F487V, V129M, T498A, and V403E, as deleterious to the structure and function of the CYP11B2 gene. The molecular dynamics simulation analyses also confirmed that the four nsSNPs that were predicted to be deleterious may induce changes in the stability of the protein by altering the RMSD, Rg, and SASA. In addition, three SNPs in the 3′UTR were predicted to influence the translation pattern of the CYP11B2 gene through UTRscan analysis, and eight 3′UTR SNPs may affect microRNA binding sites, as determined through MirSNP, PolymiRTS and miRNASNP analyses. Altered CYP11B2 function due to mutations and protein expression may play a critical role in determining susceptibility to complex diseases. This cataloguing of deleterious SNPs is essential for narrowing down the number of CYP11B2 mutations to be screened in genetic association studies and for a better understanding of the functional and structural aspects of the CYP11B2 protein.
Conceived and designed the experiments: MJ XS WG. Performed the experiments: MJ BY ZL. Analyzed the data: MJ HS. Contributed reagents/materials/analysis tools: XS WG. Wrote the paper: MJ XS WG.
- 1. Ke X, Taylor MS, Cardon LR (2008) Singleton SNPs in the human genome and implications for genome-wide association studies. Eur J Hum Genet 16: 506–515.
- 2. Shastry BS (2002) SNP alleles in human disease and evolution. J Hum Genet 47: 561–566.
- 3. Chen X, Sullivan PF (2003) Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput. Pharmacogenomics J 3: 77–96.
- 4. Xu H, Gregory SG, Hauser ER, Stenger JE, Pericak-Vance MA, et al. (2005) SNPselector: a web tool for selecting SNPs for genetic association studies. Bioinformatics 21: 4181–4186.
- 5. Chasman D, Adams RM (2001) Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J Mol Biol 307: 683–706.
- 6. Ferrer-Costa C, Orozco M, de la Cruz X (2004) Sequence-based prediction of pathological mutations. Proteins 57: 811–819.
- 7. Jordan DM, Ramensky VE, Sunyaev SR (2010) Human allelic variation: perspective from protein function, structure, and evolution. Curr Opin Struct Biol 20: 342–350.
- 8. Saunders CT, Baker D (2002) Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J Mol Biol 322: 891–901.
- 9. Yue P, Li Z, Moult J (2005) Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol 353: 459–473.
- 10. Brand E, Chatelain N, Mulatero P, Fery I, Curnow K, et al. (1998) Structural analysis and evaluation of the aldosterone synthase gene in hypertension. Hypertension 32: 198–204.
- 11. Strushkevich N, Gilep AA, Shen L, Arrowsmith CH, Edwards AM, et al. (2013) Structural insights into aldosterone synthase substrate specificity and targeted inhibition. Mol Endocrinol 27: 315–324.
- 12. Kayes-Wandover KM, Schindler RE, Taylor HC, White PC (2001) Type 1 aldosterone synthase deficiency presenting in a middle-aged man. J Clin Endocrinol Metab 86: 1008–1012.
- 13. Pascoe L, Curnow KM, Slutsker L, Rosler A, White PC (1992) Mutations in the human CYP11B2 (aldosterone synthase) gene causing corticosterone methyloxidase II deficiency. Proc Natl Acad Sci U S A 89: 4996–5000.
- 14. Jia M, Zhang H, Song X, Pang X, Ye W, et al. (2013) Association of CYP11B2 polymorphisms with susceptibility to primary aldosteronism: a meta-analysis. Endocr J 60: 861–870.
- 15. Tousoulis D, Androulakis E, Papageorgiou N, Miliou A, Chatzistamatiou E, et al. (2013) Genetic predisposition to left ventricular hypertrophy and the potential involvement of cystatin-C in untreated hypertension. Am J Hypertens 26: 683–690.
- 16. Ji P, Jiang L, Zhang S, Cui W, Zhang D, et al. (2013) Aldosterone Synthase Gene (CYP11B2) −344C/T Polymorphism Contributes to the Risk of Recurrent Cerebral Ischemia. Genet Test Mol Biomarkers 17: 548–552.
- 17. Androulakis E, Tousoulis D, Papageorgiou N, Miliou A, Chatzistamatiou E, et al. (2013) Effects of the C-344T aldosterone synthase gene variant on preclinical vascular alterations in essential hypertension. Int J Cardiol 168: 1605–1606.
- 18. Hui E, Yeung MC, Cheung PT, Kwan E, Low L, et al. (2014) The clinical significance of aldosterone synthase deficiency: report of a novel mutation in the CYP11B2 gene. BMC Endocr Disord 14: 29.
- 19. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308–311. doi: 10.1093/nar/29.1.308
- 20. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4: 1073–1081. doi: 10.1038/nprot.2009.86
- 21. Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11: 863–874. doi: 10.1101/gr.176601
- 22. Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31: 3812–3814. doi: 10.1093/nar/gkg509
- 23. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249. doi: 10.1038/nmeth0410-248
- 24. Capriotti E, Calabrese R, Casadio R (2006) Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22: 2729–2734. doi: 10.1093/bioinformatics/btl423
- 25. Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, et al. (2003) ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19: 163–164. doi: 10.1093/bioinformatics/19.1.163
- 26. Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol 21: 1781–1791. doi: 10.1093/molbev/msh194
- 27. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18 Suppl 1: S71–77. doi: 10.1093/bioinformatics/18.suppl_1.s71
- 28. Mignone F, Gissi C, Liuni S, Pesole G (2002) Untranslated regions of mRNAs. Genome Biol 3: REVIEWS0004. doi: 10.1186/gb-2002-3-3-reviews0004
- 29. Flynt AS, Lai EC (2008) Biological principles of microRNA-mediated regulation: shared themes amid diversity. Nat Rev Genet 9: 831–842. doi: 10.1038/nrg2455
- 30. Grillo G, Turi A, Licciulli F, Mignone F, Liuni S, et al. (2010) UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 38: D75–80. doi: 10.1093/nar/gkp902
- 31. Liu C, Zhang F, Li T, Lu M, Wang L, et al. (2012) MirSNP, a database of polymorphisms altering miRNA target sites, identifies miRNA-related SNPs in GWAS SNPs and eQTLs. BMC Genomics 13: 661. doi: 10.1186/1471-2164-13-661
- 32. Bhattacharya A, Ziebarth JD, Cui Y (2013) PolymiRTS Database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways. Nucleic Acids Research 42: D86–D91. doi: 10.1093/nar/gkt1028
- 33. Gong J, Tong Y, Zhang H-M, Wang K, Hu T, et al. (2012) Genome-wide identification of SNPs in microRNA genes and the SNP effects on microRNA target binding and biogenesis. Human Mutation 33: 254–263. doi: 10.1002/humu.21641
- 34. Strushkevich N, Gilep AA, Shen L, Arrowsmith CH, Edwards AM, et al. (2013) Structural insights into aldosterone synthase substrate specificity and targeted inhibition. Mol Endocrinol 27: 315–324. doi: 10.1210/me.2012-1287
- 35. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, et al. (2006) Comparative protein structure modeling using Modeller. Curr Protoc Bioinformatics Chapter 5: Unit 5 6.
- 36. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, et al. (2005) Scalable molecular dynamics with NAMD. J Comput Chem 26: 1781–1802. doi: 10.1002/jcc.20289
- 37. Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14: 33–38, 27–38. doi: 10.1016/0263-7855(96)00018-5
- 38. Rajith B, George Priya Doss C (2011) Path to facilitate the prediction of functional amino acid substitutions in red blood cell disorders–a computational approach. PLoS One 6: e24607. doi: 10.1371/journal.pone.0024607
- 39. Conne B, Stutz A, Vassalli JD (2000) The 3′ untranslated region of messenger RNA: A molecular 'hotspot' for pathology? Nat Med 6: 637–641. doi: 10.1038/76211
- 40. Shrake A, Rupley JA (1973) Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J Mol Biol 79: 351–371. doi: 10.1016/0022-2836(73)90011-9
- 41. Zhu M, Zhao S (2007) Candidate gene identification approach: progress and challenges. Int J Biol Sci 3: 420–427. doi: 10.7150/ijbs.3.420
- 42. Cazzola M, Skoda RC (2000) Translational pathophysiology: a novel molecular mechanism of human disease. Blood 95: 3280–3288.
- 43. Reynolds PR (2002) In sickness and in health: the importance of translational regulation. Arch Dis Child 86: 322–324. doi: 10.1136/adc.86.5.322
- 44. Scheper GC, van der Knaap MS, Proud CG (2007) Translation matters: protein synthesis defects in inherited disease. Nat Rev Genet 8: 711–723. doi: 10.1038/nrg2142
- 45. Quinkler M, Born-Frontsberg E, Fourkiotis VG (2010) Comorbidities in primary aldosteronism. Horm Metab Res 42: 429–434. doi: 10.1055/s-0029-1243257
- 46. Huang H, Zhang L, Liu R, Chen YC, Li X, et al. (2011) Polymorphisms within micro-RNA-binding sites and risk of coronary artery disease in Chinese: an angiography-based study. Eur Heart J 32: 355–355.
- 47. Cao FF, Chen XD, Wang QS, Li L, Wang XF, et al. (2009) [Associations of the genetic polymorphisms in CYP11B2 gene with nonfamilial structural atrial fibrillation]. Zhonghua Liu Xing Bing Xue Za Zhi 30: 1069–1072.
- 48. Feng HJ, Ouyang W, Liu JH, Sun YG, Hu R, et al. (2014) Global microRNA profiles and signaling pathways in the development of cardiac hypertrophy. Braz J Med Biol Res 0: 0. doi: 10.1590/1414-431x20142937
- 49. Mayosi BM, Keavney B, Watkins H, Farrall M (2003) Measured haplotype analysis of the aldosterone synthase gene and heart size. Eur J Hum Genet 11: 395–401. doi: 10.1038/sj.ejhg.5200967
- 50. Nicod J, Richard A, Frey FJ, Ferrari P (2002) Recipient RAS gene variants and renal allograft function. Transplantation 73: 960–965. doi: 10.1097/00007890-200203270-00023
- 51. Wu J, Jiang R (2013) Prediction of deleterious nonsynonymous single-nucleotide polymorphism for human diseases. ScientificWorldJournal 2013: 675851. doi: 10.1155/2013/675851