Figure 1.
A universal schema for tRNA interaction networks.
tRNAs interact to varying degrees of specificity within a strongly conserved network of protein and RNA complexes. The simultaneous and conflicting requirements of “identity” and “conformity” on tRNAs create potential deleterious pleiotropic effects when components of the network mutate or are transferred to foreign cells by HGT. They also facilitate the bioinformatic prediction of Class-Informative Features (CIFs) from tRNAs that function together in the same or similar networks.
Figure 2.
Function logos of structurally aligned tRNA data as calculated by LOGOFUN [36] for two groups of Alphaproteobacteria and overview of tRNA-CIF-based binary phyloclassification.
Function logos generalize sequence logos. They are the sole means by which we predict tRNA Class-Informative Features (CIFs), which form the basis of the scoring schemes of the classifiers reported in this work. A full derivation of the mathematics of function logos is provided in [36]. The tRNA-CIF-based phyloclassifier shown in Figure 3A sums differences in heights of features between two function logos for a set of genomically derived tRNAs. Complete source code and data to reproduce the function logos in this figure are in Dataset S1.
Figure 3.
Leave-One-Out Cross-Validation (LOO-CV) scores of alphaproteobacterial genomes under two different binary phyloclassifiers.
A. Score distribution of genomes under the binary tRNA-CIF-based phyloclassifier as sketched in Figure 2. The score of a genome in this classifier is the summation of differences in heights of the features of its tRNAs in the RRCH and RSR function logos in Figure 2. B. Scores under the “zero” total tRNA sequence-based phyloclassifer defined in Materials and Methods and conducted as a control. Here the score of a genome is just the sum of log-odds of its tRNA sequences in two class-specific sequence profiles from the RRCH and RSR clades. See Figure S2 for alternative treatments of missing data under other methods. Complete source code and data to reproduce these results and those in Figure S2 are in Dataset S2.
Figure 4.
Breakout of class contributions to scores under the tRNA CIF-based binary phyloclassifier.
Contributions of each functional variety of tRNA, or class, to the tRNA-CIF-based phyloclassifier scores in Figure 3A. Different SAR11 strain tRNAs are plotted separately by genome of origin. Complete source code and data to reproduce these results are in Dataset S3.
Figure 5.
Seven-way tRNA-CIF-based phyloclassification of alphaproteobacterial genomes by the default multilayer perceptron in WEKA.
Each test genome classified is assigned a probability of classification into each of the seven alphaproteobacterial clades indicated. Bootstrap support values under resampling of tRNA sites against (left) all tRNA CIFs and (right) CIFs with heights bits and model retraining (100 replicates). All support values correspond to most probable clade as shown except for Stappia and Labrenzia for which they correspond to Rhizobiales. Complete source code and data to produce this figure, including the full WEKA model for classification of other alphaproteobacterial genomes and code to produce such models from scratch, is provided in Dataset S4.
Figure 6.
FastUniFrac-based phylogenetic tree of alphaproteobacteria using tRNA data computed according to the methods of [51].
The FastUniFrac algorithm was recently adapted as a phylogenomic method using tRNA genes. Like the supermatrix phylogenomic approach on tRNAs with results shown in Figures S3 and S4, this method uses unfiltered total sequence information of tRNAs. In contrast to Figure 5, both in this figure and in Figures S3 and S4, all SAR11 strains are affiliated with Rickettsiales. For reasons shown in Figure 7, we argue these results are artifacts of convergence in tRNA base contents. Complete source code and data to reproduce these results are in Dataset S5.
Figure 7.
Base compositions of alphaproteobacterial tRNAs showing convergence between Rickettsiales and SAR11.
A. Stacked bar graphs of tRNA base compositions by clade. B. UPGMA clustering of clades based on Euclidean distances of tRNA base compositions under the centered log ratio transformation [88]. tRNA base compositions alone are sufficient to group all SAR11 strains together with Rickettsiales as a clade. Most popular molecular evolutionary models in use today do not account for base content variation as a source of bias in phylogenetic estimation. Complete source code and data to reproduce these results are in Dataset S6.