Translational Selection Is Ubiquitous in Prokaryotes

Figure 2

Predictive performance of the Random Forest classifier between datasets with and without codon frequencies.

Performance is measured for the task of discriminating ribosomal protein genes from the rest of the protein coding genes, where each point represents a single run of four-fold crossvalidation. Points above the diagonal line signify improvement in AUC score with addition of codon frequencies, indicating that ribosomal protein genes have a characteristic pattern of codon usage which cannot be derived from the composition of intergenic DNA, a representation of the local nucleotide substitution patterns. The eleven genomes shown were cited as exhibiting no translational selection by each of the three previous multi-genome studies [23][25], see Text S1, Appendix B. Figure S2 shows the same experiments, but with codon frequencies shuffled between genes.

