Fig 1.
A sequence translated using three different genetic codes illustrating changes in BLASTx scores.
The nucleotide sequence from comp102324 was translated as a query under three different genetic codes with the dashes (-) representing identically translated codons. The top amino acid translation uses NCBI genetic code 1, where UGA, UAA, and UAG are all interpreted as stop (*). These codons are shown above the amino acid translation and are boxed. The NCBI genetic code 4 translates UGA as tryptophan (W), increasing the number of pairwise identities in this example by three, and the number of positive matches by one (identified in the midline with +). However, three UAA stop codons still interrupt the open reading frame. The third translation using NCBI code 6 interprets UAA and UAG as glutamine (Q), which does not increase the number of pairwise identities, but does increase the number of positives, or similar amino acids by two. In this example there are no UAG codons. The top blast hit (Candida tenuis ATCC10573 gi 575519749) and pairwise identities or positives when translating all three stop codons as amino acids are shown below the different translations.
Fig 2.
Proportions of BLASTx hits producing different raw pairwise scores when translated with three genetic codes.
A. BLAST using the high AT content, putative parasite, sequences as queries and the NCBI reference sequence database as the subject (3,652 hits) or the translated transcriptome data from Amoebophrya sp. ex Akashiwo sanguinea as the subject (9,589 hits). For the comparison with the reference sequence database 3,652 queries with e-values ≤10−10 roughly half (unshaded portion) had equal scores with all three genetic codes. Of the remaining sequences, most contained UAA or UAG codons which, when translated as glutamine increased the BLASTx score (light grey shading). Many of these sequences also contained UGA with increased scores when translated as tryptophan (darkest shading contains both UGA and UAA or UAG). A small fraction of the UGA-containing sequences did not have increased scores when UAA or UAG were translated as glutamine (light grey shading).
Fig 3.
The amino acids aligned to UGA, UAA and UAG.
A. The amino acids found in the top BLASTx alignments to the NCBI reference sequence protein database where scores were increased relative to the standard genetic code when encoding UGA as tryptophan. The amino acids are grouped into hydrophobic, polar, negative or positively charged, and special cases. The UGA commonly was aligned to gaps, but these cases are not shown here. Only alignments to tryptophan, tyrosine, or phenylalanine increase BLASTx positive alignment scores with the matrix used; however all UGA codons within sequences with increased BLASTx scores are used in this comparison. B. Amino acids associated with UAA and UAG codons in BLASTx comparisons where scores were increased when translating UAA and UAG as glutamine. Gaps were not included in this analysis. C. FACIL analysis of the genetic code. The four possible glutamine and two likely tryptophan codons are shown. The most commonly found amino acids when these codons are translated and compared to the protein family (pfam) database are shown as a sequence logo with height proportional to frequency as inferred by FACIL.
Fig 4.
Position and nucleotide context around UGA codons.
A. The amino acid position of the last or most 3’ UGA codon in the query sequence alignment was divided by the total length of the top hit in the BLASTx comparison and plotted as a histogram to indicate where the UGA codons were found relative to the total length of the subject sequence. Comparisons to P. marinus (red) and the reference sequence (black) databases are shown for AT biased sequences with increased BLASTx scores when using alternative genetic codes. B. The composition before and after UGA apparently encoding stop versus UGA as tryptophan, based on increased BLASTx scores when UGA was translated as tryptophan. In frame UGA as stop for 457 non-redundant sequences from the parasite fraction were compared to 333 in UGA as tryptophan using two sample logo with the t-test. The size of the nucleotide shows the relative enrichment or depletion of each nucleotide in the upstream and downstream positions, but is shown only for nucleotides enriched or depleted at a p-value below 0.05 (see S2 Fig for raw frequency values).
Table 1.
Differences between the genomic and expressed versions of mitochondrial genes found in Amoebophrya sp. ex Karlodinium veneficum.
Table 2.
Comparison of Dynein Heavy Chain (DHC) sequences from Symbiodinium kawagutii as queries to Amoebophrya sp. ex Karlodinium veneficum StringTie transcripts.
Fig 5.
Tryptophan and glutamine codons in the dynein heavy chain alignments.
The relative synonymous codon usage based on UGG and UGA as the two codons for tryptophan (W) on the left and the four likely glutamine (Q) codons CAA, CAG, UAA, and UAG on the right were inferred from aligned regions of Dynein Heavy Chain (DHC) family members (as shown in Table 2) based on top tBLASTx hits. The total number of tryptophan or glutamine residues in the aligned regions is shown at the end of the bars for each gene.
Fig 6.
Comparison of codons in transcripts versus tRNA count in genomic assembly.
The relative synonymous codon usage (RSCU) for 28 codons in 1,466 transcripts are shown as a bar graph in the bottom portion, while the top portion shows number of complementary tRNA found in the genome survey. The black bars are used for codons ending in G or C, while the grey bars are for codons ending in A or U. The codons were sorted based on third positions ending in C or U (left portion of figure) when compared with those ending in G or A (right portion). Due to redundancy in the genetic code some amino acids have between one and six possible codons. The 28 codons shown include the 16 third position informative codons ending in C or T and A or G, or amino acids with just two codons. For comparison an additional 12 codons for amino acids with three to six codons are shown and indicated with an asterisk. The RSCU values for likely glutamine codons (UAG, UAA and CAU, CAG) sum to four while phenylalanine with two codons would sum to two and serine to six. All of the codons that were synonymous at third positions and AUG, AUA, UGG and UGA were excluded.