Human Coding Synonymous Single Nucleotide Polymorphisms at Ramp Regions of mRNA Translation

According to the ramp model of mRNA translation, the first 50 codons favor rare codons and have slower speed of translation. This study aims to detect translational selection on coding synonymous single nucleotide polymorphisms (sSNP) to support the ramp theory. We investigated fourfold degenerate site (FFDS) sSNPs with A↔G or C↔T substitutions in human genome for distribution bias of synonymous codons (SC), grouped by CpG or non-CpG sites. Distribution bias of sSNPs between the 3rd ∼50th codons and the 51st ∼ remainder codons at non-CpG sites were observed. In the 3rd ∼50th codons, G→A sSNPs at non-CpG sites are favored than A→G sSNPs [P = 2.89×10−3], and C→T at non-CpG sites are favored than T→C sSNPs [P = 8.50×10−3]. The favored direction of SC usage change is from more frequent SCs to less frequent SCs. The distribution bias is more obvious in synonymous substitutions CG(G→A), AC(C→T), and CT(C→T). The distribution bias of sSNPs in human genome, i.e. frequent SCs to less frequent SCs is favored in the 3rd ∼50th codons, indicates translational selection on sSNPs in the ramp regions of mRNA templates.


Introduction
Synonymous DNA variations may affect mRNA function through the change of mRNA secondary structure, mRNA stability, synonymous codon (SC) usage, or co-translational protein folding [1][2][3][4]. With empirical evidence, synonymous single nucleotide polymorphisms (sSNP) in the COMT gene (encoding Catechol-O-Methyltransferase) may modulate pain sensitivity through the effect on mRNA secondary structure and efficiency of protein expression [5][6][7]. Examples of associations of sSNPs and human complex traits like the COMT sSNPs in pain sensitivity are rare. Most probably, although not functionally neutral, the functional effects of sSNPs are largely minor, while the minor effects are not readily identifiable by traditional genetic association study. SC usage bias is a widespread phenomenon across biological species [8]. A sSNP changing codon usage may be expected to fine-tune translational efficiency based on the availability of rare tRNAs [9,10]. According to the ramp model of mRNA translation, except the second codon, the first 50 codons of mRNAs tend to favor rarer codons and have slower speed of translation [10][11][12]. This ''ramp'' mechanism is important in determining translation efficiency, preventing ribosome congestion, and allowing proper co-translational folding of proteins [3]. Based on the ramp theory, human sSNPs at ramp regions may confront selection pressure because of their functional effect on codon usage. To identify the translational effect of an individual SNP is difficult. Instead, we tried to identify the overall selection effect on sSNPs in human genome in this study. We investigated the incidences of sSNPs in the 3 rd ,50 th codons vs. those in the remainder codons after the 51 st codon.

Methods
Fourfold degenerate site (FFDS, i.e. the four nucleotides A/C/ G/T at this site encode the same amino acid) sSNPs with A«G or C«T substitutions in human genome were extracted from the NCBI dbSNP database build 134 (http://www.ncbi.nlm.nih.gov/ projects/SNP/). Altogether, 39,276 sSNPs in 12,568 genes were collected. All SNP alleles were corresponding to the nucleotides in coding sequences. Among these FFDS sSNPs, 20,122 were A«G sSNPs, and 19,154 were C«T sSNPs. Of the 20,122 A«G FFDS sSNPs, 43 at second codons of coding regions were removed from further analysis; of 19,154 C«T sSNPs, 25 at second codons were removed from further analysis. The FFDS sSNPs were annotated as N 1 RN 2 , while N 1 represents the ancestral allele and N 2 represents the variant allele. Ancestral alleles of sSNPs were inferred by human-chimpanzee genomic alignment according to the SeattleSeq Annotation 134 (http://snp.gs.washington.edu/ SeattleSeqAnnotation134/index.jsp). All sSNPs were differentiated by CpG sites versus non-CpG sites, while a CpG site has the pattern of YpG or CpR (Y represents C«T substitution, and R represents A«G substitutions).

Results
Our results showed that the fraction of FFDS sSNPs is significantly lower in the ramp (the 3 rd ,50 th codons) than the rest regions (after the 50 th codon) [0.23% vs. 0.32%, odds ratio OR (95% confidence interval CI) = 0.708 (0.684, 0.734), P = 1.60610 281 ), corrected by the FFDS codon usages calculated by the European Molecular Biology Laboratory (EMBL) Human CDSs (Coding sequences) Release 115 (ftp://ftp.ebi.ac.uk/pub/ databases/embl/cds/). We identified significant distribution bias of sSNPs between the 3 rd ,50 th codons and the 51 st , remainder codons at non-CpG sites (Table 1). This distribution bias at non-CpG sites is consistent with our previous study on the asymmetry pattern of complementary sSNPs at FFDS, which was seen in non-CpG sSNPs only, but not sSNPs at CpG sites. This contextspecific distribution bias is related to lower mutation rates and longer periods of evolutionary selection at non-CpG sites [13]. In the 3 rd ,50 th codons, GRA sSNPs are favored than ARG sSNPs at non-CpG sites [OR (95% CI) = 1.353 (1.108, 1.652)], and CRT sSNPs are favored than TRC sSNPs at non-CpG sites [OR (95% CI) = 1.272(1.063, 1.523)]. In both cases of GRA and CRT, the favored direction of SC usage is the change from more frequent SCs to less frequent SCs. The reference data of human codon usage (Table S1) was calculated by the EMBL human coding sequences (CDS) data release 115 (ftp://ftp.ebi.ac.uk/ pub/databases/embl/cds/). By further investigation, our study disclosed that the GRA bias was mainly seen in synonymous substitution CG(GRA) at non-CpG sites [OR (95% CI) = 1.861(1.020, 3.395)] (Table 2, Figure 1); the CRT bias was mainly seen in AC(CRT) [OR (95% CI) = 2.275 (1.255, 4.124)] and CT(CRT) [OR (95% CI) = 1.780 (1.053, 3.010)] at non-CpG sites (Table 3, Figure 2). In all these three types of biased synonymous substitutions [i.e. CG(GRA), AC(CRT), and CT(CRT)], the favored change at the ramp region is from more frequent SCs to less frequent SCs.
To further characterize the distribution bias of FFDS sSNPs, we examined distributions of FFDS sSNPs stepwisely by comparing the 3 rd ,n th (n = 20, 21, …,60) codons vs. the remainder codons (Table S2). The overall CRT bias at non-CpG sites was most significant in the first 46 codons. The codon-specific AC(CRT) bias at non-CpG sites was most significant in the first 50 codons, and the codon-specific CT (CRT) bias at non-CpG sites was most significant in the first 45 codons. The overall GRA bias at non-CpG sites was most significant in the first 55 codons, and the codon-specific CG(GRA) bias at non-CpG sites was most significant in the first 39 codons. Therefore, the ramp region may not have a clear border in term of codon number. As a side note, the GG(GRA) bias at non-CpG sites also showed nominal significance in the first 57 codons (P = 0.021), and the CT(GRA) bias at non-CpG sites was nominal significant in the first 46 codons (P = 0.026). The change of codon usage of CT(GRA) has also the direction from more frequent SC to less frequent SC. The change of codon usage of GG(GRA) is unobvious. One exception is the statistical significance of GC(GRA) bias (P = 1.85610 23 ) in the first 25 codons. These GC(GRA)s have the codon usage change from less frequent GCG to more frequent GCA. The GC(GRA) bias disappeared when more codons ($45 codons) in the ramp region are considered.

Discussion
Our previous study showed genome-wide discrepancy of human sSNPs between two complementary DNA strands, and suggested widespread selective pressure due to functional effects of sSNPs related to gene transcription [13]. The asymmetry pattern of complementary sSNPs in human genome may be related to transcription-coupled mutation and repair [13]. In this study, we identified another type of distribution bias of sSNPs in human genome related to mRNA translation. Biased directions of SC substitutions between the 3 rd ,50 th codons and the 51 st , remainder codons at non-CpG sites were observed. In the 3 rd ,50 th codons, GRA sSNPs at non-CpG sites are favored than ARG sSNPs, and CRT at non-CpG sites are favored than TRC sSNPs. In both cases, the change from more frequent SCs to less frequent SCs is favored in the 3 rd ,50 th codons over the remainder codons. This finding is supportive to the ramp model of SC uage in mRNA translation [10,11]. The change from more frequent SCs to less frequent SCs may enhance the function of ramp regions to prevent subsequent ribosome congestion and improve the efficiency of protein synthesis. On the other hand, if a synonymous substitution has the change of a less frequent SC to a more frequent SC, it may impair ramp function and cause ribosomal traffic jams during protein synthesis. The potential deleterious effect of these sSNPs may be subjected to larger evolutionary selection pressure, and tend to be removed by purifying selection.
By investigating 13,798 common sSNPs genotyped by the HapMap3 project, Waldman et al. demonstrated evolutionary selection for translation efficiency on sSNPs [14]. By investigating all human sSNPs, our study identified the obvious bias in the ramp region for synonymous substitutions CG(GRA), AC(CRT), and CT(CRT), indicating codon-specific effect on gene translation efficiency. As a limitation of this study, the specific SC changes that we identified didn't reach the significance level after correction of multiple testing by Bonferroni correction, which warrants for further study. On the other hand, empirically, codonspecific translation efficiency has been observed in model organisms, e.g. the strongly inhibitory effect of the CGA codon in yeast [15]. The intriguing exception of the GC(GRA) bias may suggest that the hypermutable GCG through methylation-induced deamination of 5-methyl cytosine on the antisense strand [16] meets less negative selection in the first half of the ramp region, but stronger negative selection in the second half of the ramp region which compensates the GC(GRA) bias in the first half of the ramp region. The lack of negative selection on GC(GRA) in the first 25   codons may suggest a functional heterogeneity of the ramp region, which warrants further study. In addition, Tuller et al. recently highlighted that stronger mRNA folding may also be involved in the ramp function [17]. Different effect of these SCs on mRNA secondary structure is an interesting issue deserving further inquiry.

Supporting Information
Table S1 Human codon usage calculated by the EMBL human coding sequences (CDS) data release 115.
(DOC)  x 2 test of the difference of substitution direction between the first 50 codons and the remainder codons; *Uncorrected P,0.05; **Uncorrected P,0.01. By Bonferroni correction for multiple comparisons, the threshold for statistical significance is P,0.003125. The CRT bias was mainly explained by the AC(CRT) and CT(CRT) substitutions at non-CpG sites. doi:10.1371/journal.pone.0059706.t003