The Emergence and Early Evolution of Biological Carbon-Fixation

The fixation of into living matter sustains all life on Earth, and embeds the biosphere within geochemistry. The six known chemical pathways used by extant organisms for this function are recognized to have overlaps, but their evolution is incompletely understood. Here we reconstruct the complete early evolutionary history of biological carbon-fixation, relating all modern pathways to a single ancestral form. We find that innovations in carbon-fixation were the foundation for most major early divergences in the tree of life. These findings are based on a novel method that fully integrates metabolic and phylogenetic constraints. Comparing gene-profiles across the metabolic cores of deep-branching organisms and requiring that they are capable of synthesizing all their biomass components leads to the surprising conclusion that the most common form for deep-branching autotrophic carbon-fixation combines two disconnected sub-networks, each supplying carbon to distinct biomass components. One of these is a linear folate-based pathway of reduction previously only recognized as a fixation route in the complete Wood-Ljungdahl pathway, but which more generally may exclude the final step of synthesizing acetyl-CoA. Using metabolic constraints we then reconstruct a “phylometabolic” tree with a high degree of parsimony that traces the evolution of complete carbon-fixation pathways, and has a clear structure down to the root. This tree requires few instances of lateral gene transfer or convergence, and instead suggests a simple evolutionary dynamic in which all divergences have primary environmental causes. Energy optimization and oxygen toxicity are the two strongest forces of selection. The root of this tree combines the reductive citric acid cycle and the Wood-Ljungdahl pathway into a single connected network. This linked network lacks the selective optimization of modern fixation pathways but its redundancy leads to a more robust topology, making it more plausible than any modern pathway as a primitive universal ancestral form.


Supporting Discussion
The phylometabolic analysis described in the main text suggests several important experiments. We outline four main sets here.
Confirmation of two alternative C 1 reductive pathways in deep-branching, non-WL, organisms The first class of experiments concerns confirming the operation of the direct reductive pathway outside of WL organisms. We highlight several non-WL organism here and in the main text with a complete gene complement in the reductive pathway that can be examined for this purpose, including members of the Thermotogae, Nitrospirae, and Chloroflexi. In particular, however, we focus discussion here on elucidating the mechanisms involved in the alternate form of the reductive pathway that we find in a large number of strains. In this alternate form the gene for reaction 2, the ATP-consuming synthesis of N 10formyl-THF (see Fig. S1), is the lone missing gene in an otherwise complete pathway. Remarkably, nearly a quarter of all bacterial strains (and in many clades a much higher fraction still) show the presence of this version of the reductive pathway. As shown in Supplementary Table 2, we find that a large majority of these strains instead possesses a gene for N 5 -formyl-THF cycloligase, an enzyme whose origin and metabolic role remains very poorly understood. We also note the very broad distribution of this gene across all deep bacterial and archaeal clades.
Based on these observations we propose a possible alternate route to formate incorporation and reduction on THF -through N 5 rather than through N 10 (see Fig. S1) -in deep-branching bacteria and archaea. This route would involve either a previously unidentified N 5 -formyl-THF synthase, or broader functionality of the N 5 -formyl-THF cycloligase than previously annotated (each shown as dashed lines in Fig. S1). The use of an ATP in the synthesis of N 10 -formyl-THF is necessary due to the low electron density and low pKa of the N 10 in folates. The less stable nature of the resulting C-N 10 bond likely explains the higher biosynthetic capacity of THF relative to H 4 MPT, where a higher pKa of the N 10 stabilizes the C-N 10 bond and (together with the action of the methanofuran cofactor) removes the need for ATP hydrolysis in its formation. Similarly, N 5 -formyl-THF cycloligase also requires an ATP in creating the C-N 10 in the cyclization step. The proposed alternate route would thus conserve the total consumption of 1 ATP in the synthesis of (N 5 -N 10 )-methylene-THF from free formate and THF, and would only alter the order of the C-N bond formations.
Organisms in several clades could be studied for this alternate route, including members of the Aquificales (the Aquificaea family), the Chlorobiales or the Cyanobacteria. If an N 5 incorporation route is active in these or any other deep clades, it may reflect and intermediary form between the N 10 incorporation route in bacteria and the N 5 incorporation route of methanogens. Experiments to elucidate this chemistry could thus lead to insights into the evolutionary diversification in this important direct incorporation pathway of cellular carbon.

Phenotype switching and evolution of C 1 metabolism in Cyanobacteria
A related class of experiments is one that examines the possibility of phenotype switching in the Cyanobacteria. It is known that Cyanobacteria synthesize glycine through the glyoxylate pathways (reaction 12 in Fig. S1), but growth is nonetheless observed in mutants where the formation of glyoxylate is inhibited. More generally, it is also known that certain cyanobacterial strains can live under anoxic conditions, where 2-phosphoglycolate (the precursor to glyoxylate) is not formed. Furthermore, as outlined above, a majority of cyanobacterial strains we examined have gene profiles similar to Aquificales and Chlorobiales in the reductive pathway to glycine and serine, missing only the gene for reaction 2 but having a gene for reaction 2A.
This suggest that under anoxic conditions, or in mutation experiments, cyanobacteria may switch from the synthesis of glycine and serine in the glyoxylate pathways to the reductive pathway. This phenotype switch would in essence represent reverting from the modern autotrophic form in which all carbon is fixed through the Calvin-Benson-Bassham (CBB) cycle to the hybrid ancestral cyanobacterial form where a small fraction of carbon is fixed through the reductive pathway in parallel to the CBB cycle.
Diversification in Archaeal pterin-C 1 chemistry A third class of experiments involves elucidating in greater detail the pterin-C 1 chemistry in archaea. Many archaeal genomes lack only the THF-interconversion genes associated with the direct reductive pathway, and, in addition, the diversity of pterins in archaea is greater than in bacteria. Variations arise both in the arylamine-appendage attached at the N 10 position, and in the presence of methyl-groups at either of the R 2 positions as seen in Fig. S1. The absence of a carbonyl group immediately adjacent to the arylamine group in H 4 MPT raises the electron density and pKa of the N 10 -nitrogen relative to THF, and this has been suggested to be the central factor in eliminating the use of ATP in the incorporation of formate and the lower biosynthetic capacity of H 4 MPT in methanogens. However, it has also been noted that steric hindrance between the two methyl-groups on H 4 MPT results in a reduction of the entropic effects in reactions involving this pterin, which may also explain some part of the difference relative to THF-chemistry. We find a compete absence of the glycine cycle (reactions 5 − 8) only in methanogenic archaeal clades, with an otherwise broad presence across clades that possess pterins that share some, but not all, of the substitution characteristics of H 4 MPT. In addition, as mentioned above, we find a high abundance of N 5 -formyl-THF cycloligase across the archaeal domain, in many cases where all other THF interconversion enzymes are absent.
Characterizing archaeal pterin chemistry in greater detail is thus in several ways important for better understanding the evolution of carbon-fixation within the Archaea. First, it would confirm the use (or non-use) of the direct reductive pathway to glycine and serine, and thus the ancestral status of this route, in this domain. In addition, systematically examining variations in the arylamine-appendage and the substitution of methyl-groups in archaeal pterins and how it relates to the synthesis of glycine and serine would help in characterizing the relative contributions of entropic and electron-withdrawal effects in the biosynthetic capacity of the pterin group. This could in turn allow a better understanding of the greater diversity of pterin chemistry in the archaea relative to bacteria, as well as of the emergence of methanogens within the euryarcheota (as discussed in the main text).
Amino acid-specific isotope ratios and the decomposition of isotope effects in C 1 reduction A fourth, and last, class of experiments regards using the direct reductive synthesis of glycine and serine as a means to understand variations in the isotopic shifts that are observed in different autotrophic pathways. The WL pathway is commonly cited as producing the greatest depletion in 13 C of all autotrophic pathways, with a depletion of order -40 permil relative to input CO 2 . Since the full WL pathway comprises 9 reactions, it is of interest to disaggregate this isotope shift, to assign contributions to specific reactions performed on pterin cofactors and to the acetyl-CoA (ACA) synthase reaction at the culmination of the pathway. Sensitivity to substrate mass may contribute to more precise characterization of the reaction mechanism in the Ni-substituted Fe-S cluster of the ACA synthase including the interaction with its associated Cobalt-tetrapyrrol cofactor, or may highlight the tuning of nitrogen-bonding properties on pterins. Since direct C 1 reduction has been proposed to have mineral origins, knowledge of either type may provide clues to the transfer of reduction steps from mineral substrates to pterin/folate cofactors, and transfer of catalysis from mineral or chelated metal centers to an organometallic enzyme.
The broad and diversified role we have asserted for direct C 1 reduction among deep-branching organisms, and particularly the occurance of reduction on folates with or without the ACA synthase reaction, provide a low-cost comparative analysis to partially separate isotope effects. Identification of amino acidspecific isotope ratios would also further help refine the use of biomass compositions in characterizing newly observed species in terms of the carbon-fixation strategies used.
Any isotope shift that accumulates in the direct C 1 reduction sequence prior to the transfer of methylene will be restricted to the α−carbon of glycine and the α− and β−carbons of serine, and to the serine-derived carbon of cysteine and tryptophane in non-WL organisms that use the full reductive pathway to fix part of their carbon. These isotope shifts will be present in all fixed carbon of WL organisms. Isotope shifts specific to ACA synthase will be present in all carbon except the few positions in glycine, serine, cystein and tryptophane just noted, in acetogens and other autotrophic WL organisms that use the direct reductive route to glycine and serine. The same ACA synthase isotope shift will be present in all carbon in methanogens and other autotrophic WL organisms that use the oxidative pathway to glycine and serine. ? Figure S 1. Summary of C 1 metabolism. Red and numbered reactions represent (as we argue here) the ancestral route to glycine, serine, and methyl-group chemistry. In the Wood-Ljungdahl (WL) pathway these reactions couple to the synthesis of acetyl-CoA as the principal route to carbon-fixation, and to synthesis of either acetate (in acetogens) or methane (in methanogens) for energy metabolism. The difference in substitutions and sidechains (R 1 and R 2 ) on the basic pterin-moiety between acetogens and methanogens further results in the absence of the lipoic acid-based glycine cycle (reactions 5-8) in the latter. Dashed lines represent suggested alternate routes to formate incorporation through N 5 of THF (see Supporting Discussion). Abbreviations: H 4 MPT = tetrahydromethanopterin; THF = tetrahydrofolate; Ni = nickel-center on acetyl-CoA synthase; CoA = coenzyme A; CoB = coenzyme B; CoM = coenzyme M; Fd = ferredoxin; Ad = adenosyl; XH 2 = reductant; P i = orthophosphate. Clade b 1 2 3 4 5 6 7 8 9 10 11 12 2A HTH Aq