Structural Phylogenomics Retrodicts the Origin of the Genetic Code and Uncovers the Evolutionary Impact of Protein Flexibility
Figure 6
Dipeptide makeup of ancient proteins.
A. The distribution of dipeptide compositions in proteins shows remarkable conservation along the FF timeline. Stacked column charts describe the 408 possible dipeptides (combinations of two amino acids) corresponding to 9 sets specified by Groups 1, 2 and 3 aaRS structures (1-1, 1-2, 2-1, etc). The stacked columns on the right display the general distribution pattern of dipeptides in the dipeptide sets for all 2,384 sequences and the expectation of dipeptide set distributions calculated by free permutation. Circles and asterisks represent groups that are over- or underrepresented, respectively, following χ−square statistical contrasts. B. Ancient FFs appearing before anticodon-binding domains (ndFF ≤0.2) were significantly enriched (P<0.01) in dipeptides composed of amino acids specified by the ancient editing domains (Group 1 and 2). The bar plot shows the amino acid frequencies of the 33 enriched dipeptides, the doughnut chart describes enriched dipeptide set compositions, and the network displays dipeptide makeup, with peptide bonds (edges, weighed by number of dipeptide types) connecting participating amino acids (nodes, with size proportional to connections). C. Mapping of enriched dipeptides in protein structures. Box-and-whisker plots describe the distribution of the 33 dipeptides that are significantly enriched in early FFs (ndFF ≤0.2) versus that of all dipeptides in regular and non-regular structural regions of the 2,384 protein sequences analyzed. Regular structures include helical regions (H) with α-helix (h), 310-helix (g) and π-helix (i) elements, strand regions (E) with β-strand (e) and β-bridge (b) elements, and turn/bend regions (T) with turns (t) and bends (b). Non-regular (unstructured) regions include loops (Ω). PBT amino acids can span different regions. Statistical differences between PBT were defined by p-values of Mann-Whitney non-parametric tests. Increases and decreases in central tendencies for the ancestral proteins are indicated with+and – signs, respectively, for structural sets with significant associations.