Conceived and designed the experiments: JFMR AW. Performed the experiments: JFMR. Analyzed the data: JFMR. Wrote the paper: JFMR AW.
The authors have declared that no competing interests exist.
Genome-scale metabolic networks are highly robust to the elimination of enzyme-coding genes. Their structure can evolve rapidly through mutations that eliminate such genes and through horizontal gene transfer that adds new enzyme-coding genes. Using flux balance analysis we study a vast space of metabolic network genotypes and their relationship to metabolic phenotypes, the ability to sustain life in an environment defined by an available spectrum of carbon sources. Two such networks typically differ in most of their reactions and have few essential reactions in common. Our observations suggest that the robustness of the
Understanding the fundamental processes that shape the evolution of bacterial organisms is of general interest to biology and may have important applications in medicine. We address the questions of how bacterial organisms acquire innovations, including drug resistance, allowing them to survive in new environments. We simulate the evolution of the metabolic network, the network of reactions that can occur inside a living organism. The metabolic network of an organism depends on the genes contained in its genome and can change by gaining genes from other organisms through horizontal gene transfer or loss of gene activity through mutations. Our observations suggest that the robustness to gene loss in
Organisms, especially microbes, thrive on organic nutrients with bewildering diversity: the vast majority of organic molecule can mean “food” for some species. From a microbe's perspective, acquiring the ability to survive on a new carbon source can make the difference between life and death; such an acquisition can thus be an important evolutionary innovation. We here study the properties of metabolic systems that facilitate such innovations.
The evolution of biological macromolecules has received serious attention for decades
A genotype can be represented in different ways: (A) as a metabolic network, (B) as a node in a genotype network, or (C) as a binary vector listing the reactions catalyzed. Genotypes on the genotype network (B) that are connected differ by only one mutation. The color of the genotype circles indicates their metabolic phenotype. Metabolic phenotypes are computed using FBA applied to 101 environments with different carbon sources. They can be represented as a binary vector listing the environments a genotype is viable in (D). Random evolutionary walks can be seen as paths on a genotype network. Two independent random walks are shown with the same starting genotype (G1) and two final genotypes (GF and GF'), passing through intermediate genotypes (i.e.: G2) that differ by one mutation. Mutations are chosen at random. They can be additions or deletions of individual reactions from the corresponding metabolic network but they must not change the phenotype. The neighborhood of each genotype can be analyzed by characterizing the phenotype of the one mutant neighbor genotypes (approximately 5’800 neighbors per genotype). The number of genotypes in the genotype space is 25800. Each genotype is able to catalyze approximately 1000 out of 5800 possible reactions.
The functions and phenotypes of biological macromolecules are robust to genetic change. Such robustness has important implications for the evolutionary plasticity of molecules, the ability of molecules to evolve new properties. Through mutations that do not affect a molecule's function, vast regions of phenotype space can be explored, regions in which molecules with novel phenotypes can lie
For our purpose, a metabolic
Metabolic phenotypes, as defined here, can be computed from metabolic genotypes using flux balance analysis. Flux balance analysis is a computational tool that relies both on stoichiometric information about chemical reactions occurring in a cell, as well as on an objective function such as the production of biomass precursors. For a given nutritional environment, it computes allowable rates at which individual reactions proceed in a metabolic steady state, and these rates in turn determine whether all necessary biochemical precursors can be produced. Its qualitative predictions – growth or no growth – are in good agreement with experimental data for well-studied model systems
We here study the evolution of metabolic networks in the space of the genotypes just defined. Genotypes can change through the elimination of chemical reactions caused by loss of function mutations in enzyme-coding genes. Many such mutations do not abolish a network's ability to sustain life
Mutations and horizontal transfer can sometimes affect more than one enzyme-coding gene (reaction), but we focus here on the individual reaction as the elementary unit of change. Each such change transforms a network into one of its immediate neighbors differing from it by one reaction. We refer to all of a network's neighbors as a network's neighborhood. Methodologically, our approach bears resemblance to that of an earlier study
In this context, we ask several fundamental questions about the organization of genotype space, and about the ability of metabolic networks to find evolutionary innovations in this genotype space. How different can the organization of two metabolic networks be while still preserving similar phenotypes? How many mutational steps are needed to get from a network with a given phenotype to one with a very different phenotype? How different are the new phenotypes that a network encounters in its immediate neighborhood during evolution? The answers to these questions can not only elucidate why metabolic networks are robust to mutations
We begin our analysis with a simple phenotype, a metabolic network's ability to produce all biochemical precursors from a single carbon source, glucose, in an aerobic minimal medium (see
Are metabolic networks that are very different from the
(A) Distribution of the fraction of essential reactions in 1000 random networks viable in minimal or rich glucose containing medium. (B) Distribution of the fraction of essential reactions shared among pairs of these 1000 random networks. (C) Rank plot of reaction essentiality. Reactions essential in all of the 1000 random viable networks are given the lowest rank of one. (D) The average fraction of essential reactions (vertical axis) as a function of the number of carbon sources a network can sustain life in (horizontal axis). Each point is an average of 100 networks (whiskers: 95% confidence interval).
How different are the networks that can sustain life in this simple environment? We addressed this question in two complementary ways. First, we asked how many essential reactions differ between each network pair drawn from the 1000 random viable networks we had generated previously. Specifically, we represented the set of all essential reactions by a binary vector. For each of the 1000 random viable networks, this vector contained a ‘1’ for a reaction that was essential in the respective network, and a ‘0’ for a reaction that was nonessential. We calculated the normalized Hamming distance between these vectors for each pair, which is the fraction of entries at which these vectors have different values. This distance ranges from zero if a network pair has completely identical essential reactions to one if a network pair has no essential reactions in common.
We next ranked all reactions according to the number of networks (among 1000) in which they were essential. Reactions essential in all 1000 networks received the lowest rank, and reactions that were essential in successively fewer networks received increasingly larger ranks. This ranking indirectly estimates the abundance of alternative pathways around any given reaction in a random viable metabolic network. If there are many alternative pathways, then the reaction will rarely appear as essential; if there are no alternate pathways, the reaction will appear as essential in all metabolic networks. The majority (4550) of reactions were never essential. Among the 1420 reactions that were essential in at least one network, only a small minority of 7.3% (103) reactions were essential in all networks. As an example,
Color-coded map of reactions in central energy metabolism that appear rarely (blue) or frequently (red) as essential in 1000 random viable metabolic networks. The color is in logarithmic scale indicating that most reactions even in this most central part of metabolism are essential only in a small fraction of networks with a given metabolic phenotype.
To validate our analysis of reaction essentiality with empirical data, we tested the following prediction: If a reaction is frequently essential in our random viable metabolic networks, then its enzyme-coding genes should also occur in a large number of different genomes. This is indeed the case, as we show in
In a second effort to characterize the plasticity of network organization, we asked how distant from the
(A) Distribution of maximum genotype distance between 1000 networks that are the end-points of random walks leading away from the initial (E. coli) network while preserving the metabolic phenotype. (B) Maximum genotype distances (vertical axis) between initial metabolic networks able to sustain life on a given number of carbon sources (horizontal axis) and 1000 final random viable metabolic networks. For each number of carbon sources 100 random walks of 104 mutations were carried out starting from 10 different initial networks (whiskers: 95% confidence interval). (C) The distribution of minimal genotype distance between pairs of networks with different metabolic phenotypes required to sustain life on at least one carbon source. (D) Average minimal genotype distance (the mean of the distribution in (C) as a function of the number of carbon sources. The error bars are too short to be visible in this plot.
An environment in which metabolic networks have to synthesize every single biochemical precursor is demanding. Thus, our observations might depend strongly on the nature of this environment. However, this is not the case. We also examined a rich medium in which 36 biochemical precursors are provided for the cell (see
Taken together, the following picture emerges from these observations. Networks that have the ability to sustain life on a particular carbon source have many neighbors in genotype space with the same ability. By mutationally stepping from neighbor to neighbor (through addition and deletion of chemical reactions) network organization can change fundamentally without losing this ability. Two networks with this ability can contain very different sets of reactions, and very different essential reactions. Because networks with the ability to sustain life in a given environment are connected through their neighbors in genotype space (see
We next turn to more complex phenotypes, namely the ability for a network to sustain life if any one of multiple carbon sources is provided in an otherwise minimal environment. We here focus on the 101 potential carbon sources annotated to have associated transport reactions in
We next studied several properties of metabolic networks that relate to their ability to evolve new phenotypes. The first such property regards the minimum genotype distance of two metabolic networks with arbitrary, different phenotypes. If this distance is typically large, then it would be very difficult to reach any one phenotype from a network with a different phenotype through a modest number of genetic changes. To determine this distance, we first created a pair (G1, G2) of metabolic network genotypes with randomly chosen different phenotypes, as described in the
Does the genotypic plasticity of metabolic networks facilitate the discovery of novel metabolic abilities? To address this question, we examined the novel metabolic phenotypes accessible to networks that are subject to phenotype-preserving evolutionary change. By phenotypes “accessible” to a network, we here mean all the phenotypes that can be found in the neighborhood of this network. These are novel phenotypes that can be easily reached through a single, small genetic change. Specifically, we first carried out a random walk starting from a network with a specific metabolic phenotype, and counted the cumulative unique number of phenotypes that occurred in the neighborhood of this random walker. That is, if a phenotype occurred twice, either in the neighborhood of the same network, or in the neighborhood of a network encountered previously during the random walk, we counted it only once.
(A) shows the average cumulative number of phenotypes (vertical axis) found in the neighborhood of an evolving network as a function of the number of mutations (horizontal axis) the network experienced during its evolution; (B) shows the fraction of the phenotypes in the neighborhood of the evolving network (
Second, we compared the phenotypes in the neighborhood of (i) an evolving network
Third, we examined the neighborhoods of multiple end points (orange circle in
Finally, we also examined how the accessibility of novel phenotypes depends on the phenotypic complexity of the evolving networks themselves, that is, on the number of carbon sources that they can support life on. In principle, all 2101 phenotypes are accessible from any metabolic genotype through a single mutation, regardless of the number of carbon sources the genotype is viable in (see
Metabolic networks can evolve through the elimination of individual reactions by mutation, and through the addition of new reactions by horizontal gene transfer. We here explored a vast space of metabolic network genotypes through random changes of individual reactions that preserve a network's metabolic abilities. The ability of flux balance analysis to determine metabolic phenotypes –a network's ability to sustain life in a well-defined environment containing specific carbon sources – allowed us to characterize the relationship between metabolic genotypes and phenotypes. We find that metabolic networks with the same phenotype show enormous genetic plasticity, and that this plasticity aids in the evolution of novel metabolic abilities.
Multiple experimental and computational studies show that a large fraction of enzyme-coding genes are dispensable in genome-scale metabolic networks. These networks continue to sustain life even upon removal of many apparently central and important reactions
Our observations go beyond preceding work which showed that a reaction's essentiality may depend on the environment
Gene essentiality thus strongly depends on a network's genotype, which is highly malleable. Even organisms with similar metabolic abilities may thus show very different dispensable genes in a given environment. These observations have implications for the design of antimetabolic drugs that inhibit specific metabolic reactions. Specifically, an evolutionary approach like ours may be highly useful in identifying reactions that are essential in most networks with a given metabolic phenotype, as a precursor to rationally designing drugs inhibiting these reactions. The more frequently essential a reaction is, the smaller the likelihood that a cell can circumvent it through addition or deletion
Our analysis shows that vastly different networks with the same phenotype can be connected through paths of single mutations (reactions additions/deletions) in genotype space. Specifically, these paths can traverse more than three quarters of genotype space without destroying a given phenotype. This phenomenon does not depend strongly on the evolutionary constraints on a metabolic network, that is, on the number of carbon sources a network is required to sustain life on. These observations are reminiscent of genotype networks or neutral networks that have been characterized for RNA, protein, and transcriptional regulation circuits
We here provide two lines of evidence that genotype networks may also facilitate the evolution of new metabolic phenotypes, the ability to survive on previously not utilizable carbon sources. First, we show that networks with different and arbitrary phenotypes can be found close together in genotype space. This means that from any one network, only a small fraction of genotype space needs to be traversed to find any given, novel phenotype. Second, we also analyze the neighborhood of different neutral networks with the same phenotype. This neighborhood consists of all networks that differ in only one reaction from a focal network. They are thus accessible from this network through a single mutation. We find that the neighborhoods of different networks contain very different novel phenotypes. This means that by traversing a large fraction of genotype space without changing the phenotype, one can render different novel phenotypes accessible (
We next motivate the choice of metabolic network sizes for our work. Flux balance analysis has been used to show that a significant number of reactions in
Flux balance analysis has limitations in how precisely it can predict growth or by-product secretion after gene knockouts
The potential problem of limited and likely biased information about the set of biochemical reactions that occur in nature does not affect our results qualitatively. The reason is that any increase in the number of known biochemical reactions will cause the appearance of alternative pathways, lowering the number of essential reactions, and thus increasing the robustness and the plasticity of metabolic networks.
Aside from these caveats, the biggest limitation of the approach presented here lies in its computational demands. Determining the metabolic phenotypes of networks in the neighborhood of a single genome-scale network for 101 carbon sources requires the solution of 5.85×105 ( = 101×5800) complex linear programming problems
In sum, the approach proposed here can provide various insights into the organization of metabolic networks. It demonstrates that the architecture of such networks shows high plasticity, even for single environments, a property that facilitates the evolution of new metabolic functions. It suggests a method to target metabolic reactions for rational drug design, and shows that the plasticity of metabolic networks creates both opportunities and constraints for the evolution of novel metabolic abilities.
We explore the vast space of metabolic networks by long random walks that leave a network's ability to synthesize all essential biomass components unchanged. Each step of the random walks we use has two parts. The first part consists of mutation, the deletion of a randomly chosen reaction from a network, or the addition of a new randomly chosen reaction from the global reaction set above. We constrain variation in the number of reactions in this random walk by means of a bias in the choice of mutation that depends linearly on the number of reactions in the metabolic network (see
Methods are described in greater detail in the
Detailed description of simulation conditions and methods
(0.14 MB PDF)
Random walks in genotype space. a) Autocorrelation function of growth flux in an unbiased random walk of 10'000 generations starting from the E. coli metabolic network. The autocorrelation function was calculated for the last 5'000 generations. b) A sample trajectory of a random walk starting from the E.coli metabolic network, showing both the number of reactions in the evolving network, as well as the genotype distance (normalized Hamming distance) between the evolving network and the initial network. When the genotypes of both networks are represented by binary vectors indicating the presence or absence of reactions (see
(0.07 MB TIF)
The fraction of reactions essential in a complex environment decreases with environmental complexity. Average fraction of essential reactions (vertical axis) as a function of the number of carbon sources a network can sustain life in (horizontal axis). A reaction is called essential here, if it is essential in an environment that contains all of the carbon sources a network is required to grow on. For each number of carbon sources 10 different initial networks were generated, as described in
(0.06 MB TIF)
Networks that can grow on more carbon sources encounter more novel phenotype during their evolution. The average cumulative number of phenotypes (vertical axis) found in the neighborhood of an evolving metabolic network at the endpoints of 100 phenotype-preserving random walks is shown as a function of the number of carbon sources the initial networks can grow on. For each number of carbon sources shown, the data is an average over 10 independently generated initial networks, and over 10 random walks starting from each of these 10 networks.
(0.06 MB TIF)
Reaction essentiality and gene appearance in prokaryotic genomes. Correlation of frequency of reaction essentiality in random metabolic networks and number of genomes carrying an enzyme-coding gene catalyzing that reaction. Pearson's r = 0.45; p = 2.2×10−16. This analysis uses enzyme-coding genes from 875 prokaryotic genomes in the KEGG database
(0.10 MB TIF)
Reactions in tetrahydrofolate biosynthesis and their essentiality. We found that the reaction dihydropteroate synthetase, a target of sulfonamides, is essential in 41% of the metabolic networks we studied, while the other reaction producing dihydropteroate is essential in 56.1% of networks. In the remaining 2.9% of networks, both reactions appear, but none are essential. These observations have a straightforward explanation. Dihydropteroate is an essential metabolite. Because only two alternative reactions exist to make dihydropteroate, whenever one of these reactions is missing, the other is an essential reaction. Whenever both reactions are present, neither reaction is essential. For the production of tetrahydrofolate from dihydrofolate, there exist, similarly, two parallel dihydrofolate reductase reactions. These reactions are the target of trimethoprim. The reactions are only distinguished by the molecule that acts as the electron donor, either NADH or NADPH. Individually, these reactions appear as essential in only 30%–40% of networks. In addition, only 66.2% of networks cannot tolerate the removal of both reactions. The reason is that there are alternative paths (not shown) that bypass the direct production of tetrahydrofolate from dihydrofolate.
(0.06 MB TIF)
The connectedness of metabolic networks with the same phenotype facilitates access to new metabolic phenotypes. The rectangle symbolizes genotype space, and the grey circles symbolize metabolic networks with a given metabolic phenotype. The colored circles stand for metabolic networks with a novel phenotype. Different novel phenotypes (different colors) are accessible from different networks (points) in genotype space with the same phenotype.
(0.08 MB TIF)
List of reactions that appear frequently as essential in random metabolic networks
(0.43 MB XLS)
JR thanks Jeremiah Wright for helpful discussions. We would also like to thank two anonymous reviewers for their thoughtful comments.