Structural Phylogenomics Retrodicts the Origin of the Genetic Code and Uncovers the Evolutionary Impact of Protein Flexibility
Figure 8
Model of origin and evolution of archaic protein biosynthesis.
The flow diagram describes the evolutionary progression of protein biosynthesis and its diversification into ribosome-like processive and NRPS-like assembly-line systems. Translation starts with archaic non-specific synthetases capable of producing dipeptides and small peptides [7]. We assume that these primordial enzymes were originally peptides of less than 60 amino acid residues that emerged from a pool of small peptides (some of them ∼ 25 residues in length and loop-forming) through non-specific condensation reactions. These emergent molecules quickly gained structural properties and stable molecular functions, all of which were initially driven by enhancements of the persistence of emerging cells [7]. The initial synthetases developed the ability to acylate a wide variety of cofactors (4′-phosphopantetheine, CoA, NADP, and related derivatives, and short polynucleotides) in two-step catalytic reactions involving activated intermediates. Peptides could be further ligated into quasi-statistical proteins by the action of non-specific ligase derivatives of the synthetases. It is likely that prebiotic biases in dipeptide makeup resulting from amino acid chemical synthesis [47], [99] and prebiotic peptide formation [48], [49] acted as initial constraints of the emerging quasi-statistical system. In all cases, the quasi-statistical proteins that were formed achieved only Rossmanoid and bundle folded structures, constrained by primitive membranes, and were founders of the most basal fold structures of our phylogenomic timelines [7]. These included P-loop hydrolases and extended or tandem AAA-ATPase mechanoenzymes, oxydoreductases, chaperones and factors. The initial biases were then enhanced by fortuitous contacts between proteins that were beneficial for the primordial cellular system, including the protection of cofactors from degradation. These contacts stabilized protein biosynthesis complexes that facilitated the initial enzymatic activities, and the resulting ensembles behaved very much as modules as the synthetases diversified and enhanced their catalytic toolkit. In some cases they went to produce assembly line complexes similar to modern NRPS systems. In other cases, some modules interacted with polynucleotides and specialized in aminoacylation reactions, leading to modern aaRS functions. Other modules specialized in processive functions leading to the modern ribosome. Polynucleotides gained in some cases folded structures (minihelices, L-shaped conformations) that tuned the make up of interacting protein structures. These initial chains became ancient genomes and important cofactors, and could have also gained functions as nucleic acid replicases, helicases, and ligases. The model that we here propose is fully compatible with a framework that explains the generation of modules and hierarchical structure in biology [100]. Under this framework, modules emerge through two phases of diversification of parts. In the first phase, parts interact weakly and associate diversely. As they diversify and compete, parts interact and these interactions increasingly constrain their structure and associations, leading to modular structures. In the second phase of diversification, variants of the modules and their functions evolve and become new parts for a new cycle of generation of higher-level modules. In our model, parts are emerging proteins and modules are complexes that gain biosynthetic functions. The model highlights the biphasic patterns of diversification of the underlying framework, which we also see unfolding at the amino acid composition level (Fig. 5) and when studying protein flexibility [63].