SNP and indel frequencies at transcription start sites and at canonical and alternative translation initiation sites in the human genome

doi:10.1371/journal.pone.0214816

Fig 1.

Definition of human genomic regions.

Definition of the basic genomic regions: intergenic region, promoter region, 5’ UTR, coding exon(s), 3’ UTR, all exons, intron(s), intragenic region and CpG islands (not shown here). Shown is the + strand, − strand is analogous.

More »

Expand

Fig 2.

Sequence context around translation start sites.

Two example sequence contexts are shown to visualize the definition of the flanking sequence around translation start sites. Per definition, positions around the translation start site are given relative to the start codon, which is denoted as 1,2 and 3. Position zero is left out. Positions -3R (R = purine) and +4G were shown to be crucial for translation initiation [16, 17] and are therefore highlighted in red color.

More »

Expand

Fig 3.

SNP and indel densities for all variant types and genomic elements considering 1000G data (European cohort).

Upper panel: distribution of SNP and indel densities, the horizontal line (−) represents the median value, the asterisk (⋆) denotes the mean value. In total, the 1000G European super population comprises 503 individuals; Lower panel: Tajima’s D statistic was applied to evaluate the neutral evolution hypothesis.

More »

Expand

Fig 4.

SNP and dinucleotide distribution around the TSS.

A: Average SNP and indel density (1000G and GoNL data) around the TSS (±200 bp) of all RefSeq genes. The standard error of the mean is visualized for every datapoint. B: SNP pattern in direct vicinity (−15 to +12) of the TSS considering 1000G and GoNL data. Position 1 denotes the first intragenic nucleotide. C: Distribution of dinucleotides starting with cytosine in the flanking region of the TSS of all RefSeq genes. CpG and CpA dinucleotides are prevalent. D: Number of SNPs (1000G data) at individual dinucleotides. The majority of SNPs resides at CpG dinucleotides.

More »

Expand

Table 1.

Number of start sites and SNPs in direct vicinity of TSS and CSS.

More »

Expand

Table 2.

Functional annotation of genes with a CpG at TSS position −1.

More »

Expand

Fig 5.

SNP distribution around the coding start site (CSS).

Average SNP density in a range of ±200 bp and 20 bp windows around the CSS (1000G and GoNL data). Upper panel: annotated translation start sites of (RefSeq genes); Lower panel: alternative translation starts detected by ribosome profiling applied to HEK293 cells. The standard error of the mean is visualized for every datapoint.

More »

Expand

Fig 6.

SNP distribution in the flanking region of the coding start site (CSS).

SNP pattern in the flanking region (–15 to +13) of canonical and alternative starts. The applied permutation test provided the following p-values (curves from top to bottom): RefSeq+1000G: 0.0, RefSeq+GoNL: 0.0002, HEK293+1000G: 0.027, HEK293+GoNL: 0.244. With a significance threshold of , the drop in the total number of SNPs at the CSS is significant only for canonical start sites. Upper panel: total number of SNPs; Lower panel: the number of SNPs normalized by the number of sequence contexts with at least one SNP, compare with Table 1.

More »

Expand