It is suggested that the degree distribution for networks of the cell-metabolism for simple organisms reflects a ubiquitous randomness. This implies that natural selection has exerted no or very little pressure on the network degree distribution during evolution. The corresponding random network, here termed the blind watchmaker network has a power-law degree distribution with an exponent γ≥2. It is random with respect to a complete set of network states characterized by a description of which links are attached to a node as well as a time-ordering of these links. No a priory assumption of any growth mechanism or evolution process is made. It is found that the degree distribution of the blind watchmaker network agrees very precisely with that of the metabolic networks. This implies that the evolutionary pathway of the cell-metabolism, when projected onto a metabolic network representation, has remained statistically random with respect to a complete set of network states. This suggests that even a biological system, which due to natural selection has developed an enormous specificity like the cellular metabolism, nevertheless can, at the same time, display well defined characteristics emanating from the ubiquitous inherent random element of Darwinian evolution. The fact that also completely random networks may have scale-free node distributions gives a new perspective on the origin of scale-free networks in general.
Citation: Minnhagen P, Bernhardsson S (2008) The Blind Watchmaker Network: Scale-Freeness and Evolution. PLoS ONE 3(2): e1690. doi:10.1371/journal.pone.0001690
Academic Editor: Ji Zhu, University of Michigan, United States of America
Received: November 13, 2007; Accepted: January 27, 2008; Published: February 27, 2008
Copyright: © 2008 Minnhagen, Bernhardsson. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by the Swedish Research Council through contract 50412501.
Competing interests: The authors have declared that no competing interests exist.
A network is a representation of who or what is connected to, or influenced by, whom or what. To characterize the structure of a network, researchers often measure its degree distribution N(k), the number of nodes with k links attached. Numerous studies have found that real-world networks often have very broad degree distributions for larger k, N(k)∼k−γ; These fat tailed distributions approximate a power law in their structure ,,,,. They are also called scale-free networks because a power-law tail indicates the lack of an intrinsic characteristic degree size. Biological networks are particularly interesting because the structure of these networks have, directly or indirectly, arisen through the process of evolution by natural selection. These networks have been constructed as if by a blind watchmaker, through the interplay between a random stochastic evolution and a natural selection process . So what can we learn from the observation that a biological network such as the network of the metabolism of a cell has a power law distribution ? In order to answer this question we define and investigate a blind watchmaker network. We demonstrate that the empirically observed degree distributions for networks of the cellular metabolism for simple organisms are good approximations of this random structure. Previous authors have ascribed the scale-free structure of biological networks to various aspects of the evolutionary process: Either the scale-free network structure has been suggested to confer an evolutionary advantage,, or the elementary mechanism for growth of the network has been suggested to generate a priori a scale-free network ,,,. Our findings are to the contrary: The close correspondence between the blind watchmaker network and the structure of empirical networks implies that the evolutionary pathway leading to the construction of the cell-metabolism, when projected onto our network representation, is statistically random with respect to a complete set of network states (see Figs. 1 and 2). Thus our findings provide a new perspective on the origin of scale-free network structures ,,,,.
The left half shows a network of nodes and links with the link-ends enumerated. The right half shows the equivalent balls-in-boxes model. The vertical position of the balls in a box gives the order in which they arrived into the box, or more generally a ranking.
Panels a-d show the most unbiased distributions n(k) obtained by the algorithm method: a) The variational solution n(k) = A exp(−bk)/k for the statistical states of the CBB-model compared to the average solution from the algorithm method in case of N = 1000 and M/N = 4. The fact that the two solutions agree reflects that the algorithm contains the least possible information compatible with the constraints: b) The same thing for the blind watchmaker version of the CBB-model. In this case the variational solution is n(k) = A exp(−bk)/k2 and the agreement again reflects that no additional information is contained in the algorithm. The two curves in the panel corresponds to N = 1000 and M/N = 2 and 4, respectively. As seen the distribution becomes more power law like with increasing ratio : c) The corresponding network solution. The most unbiased solution in this case gives the blind watchmaker network structure. The variational solution cannot be simply obtained for the network case because of the network constraints. By contrast these constraints are easily incorporated into the algorithm method. The panel shows the average solution for M/N = 4 and N = 1000. The solution has a fat tail power law n(k)∼k−γ with γ≈2.1 : d) A real network is just one representation n(k) and not an average . The panel shows a single network for N = 1000 and M/N = 4. A single network always contain large split-off nodes.
The cells living today have evolved into their present forms during some 4×109 years. An important part of the cell is its metabolism which provides the cell with substances necessary for sustaining life. It is a chemical mini-factory usually involving of the order of 500 to 1000 substances in a complex network of metabolic reactions. Some substances like water and NAD+ take part in very many chemical reactions producing new substances either needed in the chemical mini-factory itself to produce other substances, or needed in some other function in the cell, or in fact both ,. Other substances like iron are just needed for some specific purpose in the cell.
One possible metabolic network representation is constructed as follows: Substrates and products in the cell metabolism are nodes. Two nodes are connected if the substance of one is a substrate in a metabolic reaction which produces the substance represented by the other node. This means that each node corresponds to a particular substance and that the links denote its connections to other substances. One characteristic feature of a network is its degree distribution N(k). In the present context this is the number of substances which are connected to precisely k other substances. Figure 3(b) shows the degree distribution for the metabolic network of the E. Coli. bacteria. Here the substance with the most connection is water with 302 connections followed by NAD+ with 141 connections. In Fig. 3(b) these two substances corresponds to the two split off nodes with the largest k. The bulk part of the N(k)-distribution of E. Coli. is broad and power law like. The straight line in the figure corresponds to N(k)∼k−γ with γ≥2 and illustrates that the node-degree distribution for metabolic networks is broad and power-law like, as was first demonstrated in Ref. . This power law like distribution is even more pronounced when taking an average over many different metabolic networks as shown in Fig. 3(a).
The first two panels show the data for real metabolic networks (the data is taken from Ref. [ma03a][ma03b]): Panel a) shows the average distribution over 107 metabolic networks. This average distribution has a fat power-law tail with γ≈2.2. Note that the nodes with only one link are fewer (by roughly a factor of 5) than the nodes with two links. The average size of these 107 networks is N≈640 and M/N≈5.35 b) the specific metabolic network distribution n(k) for the E.Coli bacteria (N≈970 and M/N≈5.8). This network has 6 nodes with more than 100 links. Panel c) makes a direct comparison between the 107 metabolic networks in Panel a and 107 blind watchmaker networks with the same N and M. The agreement implies a common origin. Panel d) makes the same comparison between the E.Coli network and a single random blind watchmaker network for the same N and M. Apart from a general agreement (modulo the larger statistical spread inherent in comparing single networks) , the distribution of the split-off nodes are similar. Panel e) compares the average of the metabolic networks with the corresponding average of the blind watchmaker networks including a constraint decreasing the abundance of the single link node to the same average number, n(1), as for the metabolic networks (a = 0.07). The agreement is extraordinary. Panel f) makes the same comparison for the E. Coli. network (a = 0.08) again with a striking agreement.
As the complex metabolic system is projected and reduced into a network, the relevant possibilities also reduce to the corresponding relevant possibilities of the network. A possible state of a network corresponds to a possible way of assembling its parts i.e. it includes both a description of what nodes a particular node is connected to and a description of the time order this particular node was connected to these other nodes. The blind watchmaker network is the network which is unbiased with respect to these different assembling possibilities. In the following we will briefly explain what the properties of the blind watchmaker network are and how they are obtained (with more details in supporting information in Text S1).
We start from a simplified network model, the constrained-balls-in-boxes(CBB)-model . The mapping between the CBB-model and the corresponding network goes as follows: a link is defined by its link-ends such that an enumeration of the links is given by (1,2),(3,4)……..(M−1,M) where the link ends corresponds to balls enumerated by 1,2,…,M as is illustrated in Fig. 1: The nodes correspond to boxes and the link-ends to the enumerated balls. The balls in the boxes are given a ranking by the vertical position of the balls in a box. The time-order of connections gives the ranking: earliest connection corresponds to the bottom position and the latest to the top. The existence of this time-order ranking is crucial in the following. The point is that the existence of an implicit time-ordering is an un-avoidable consequence of any sequential process. Natural selection is but one example of such a process.
For simplicity we here consider undirected links. A network is then an association of link-ends where the associated link-ends form nodes. Note that this means that a node will always contain at least one link-end. This is the origin of “constrained” in the name of the model: a box always contains at least one ball. Obviously, any association of the balls corresponds to a network. However, it is customary to include additional constraints in the definition of a network. The most common for an undirected network are: 1) a network must be connected, 2) only one link between two nodes, 3) no self-loops (which means that the two link-ends of the same link cannot belong to the same node). We will first discuss the model without the additional network constraints. In order to make connection to standard statistical mechanics we will consider the case with a fixed number of balls M and a fixed number of boxes N.
The total number of ways you can distribute M balls into N boxes, Ω, is by elementary combinatorics where the factors N(k)! in the denominator is the number of identical ways you can place the same balls in the same order into N(k) boxes of size k . However, in the CBB-model one ball is assigned to each box to start with. This means that the distinguishable number of ways you can distribute the remaining k-1 balls into a box which already contains one ball, is k times less the number of ways to put k balls into the same box. Thus the total number of distinguishable ways you can distribute balls into the boxes is for the CBB-model reduced by a combinatorial factor (more details in Supporting Information Text S1).
The most unbiased estimate of Ω corresponds to the maximum and this maximum corresponds to one particular distribution N(k). This is completely equivalent to the maximum entropy principle in statistical mechanics : Once the statistical states are identified, the most unbiased system corresponds to the maximum of such states or equivalently to the maximum of the entropy S = lnΩ. This maximum is achieved when all statistical states are equally probable. The last statement is the counterpart of the postulate of a priory equal probabilities is statistical mechanics . From an information perspective the maximum of S gives a measure of the maximum information which can be contained in the system . For the CBB-model two statistical states are equal provided that there exists a one to one mapping between boxes containing the same balls and furthermore that for each such mapped pair of boxes the time order in which the balls arrived to the boxes differ by at most a cyclic permutation. The point is that this degeneracy of the statistical states is enforced by the constraint that the boxes always contain at least one ball: The statistical state of the CBB-model and its network counterpart contain a k-degeneracy.
The practical issue is to determine the distribution N(k) which maximizes Ω[N(k)] subject to the appropriate constraints. One way is to find an update algorithm which picks the statistical states with equal probability, since such an algorithm automatically yields N(k). For the CBB-model the obvious algorithm is as follows: Pick two balls randomly and then move one to the same box as the other. If the attempted move involves emptying a box you try again with two new randomly picked balls. If you start with a random distribution with at least one ball in each box, then the restriction of “always at least one ball in all boxes” is in this way implemented without any additional bias. The resulting distribution, N(k), is shown in Fig. 2(a). An alternative is to directly find N(k) using a mathematical standard method called variational calculus (see supporting information Text S1). Both alternatives give the same as result (see Fig. 2(a)). The advantage with the algorithm method is that it directly gives a complete characterization of the unbiased situation: the average which maximizes and at the same time maximizes the noise . As will be shown below this noise, or equivalently the spread of the data, provides important characteristics of the metabolic networks.
The definition of the statistical state is a direct consequence of the maximum number of distinguishable ways you can distribute the balls in the boxes: The statistical states, when picked with equal probability, give the global maximum of the entropy. However, our requirement is that the relevant states are picked with equal probability: in our case the rankings of the balls in the boxes is relevant. As emphasized above, this is a direct consequence of the sequential element implicit in the natural selection process and which imposes a time-order ranking on the balls in the boxes. The crucial assumption is that the blind watchmaker network is random with respect to these relevant states. The key observation is that many such relevant states correspond to the same statistical state. Thus unbiased, or equivalently random, with respect to the relevant states inevitably means biased with respect to the statistical states. Furthermore, biased with respect to the statistical states means smaller entropy S. What bias is imposed on the statistical states? In order to see this we note that the entropy is related to the probability of obtaining a new statistical state when choosing a new relevant state: . When the relevant states and statistical states are identical, this reduces to and S = lnΩ. If several relevant states correspond to the same statistical state then and . In the present case because is the number of different time-orders which gives the same statistical state. Thus where . Consequently, the most unbiased situation in terms of the relevant states corresponds to the N(k) which maximizes .
The corresponding least biased algorithm which achieves this maximization goes as follows:
- pick two boxes (nodes) A and B randomly with probability p∼k2
- pick a random ball in A and move to B.
- If the attempted move is forbidden by a constraint choose another ball in A. Repeat until one ball is moved. Then choose two new boxes (nodes)
- If no ball can be moved from A, choose two new boxes (nodes).
The important point here is that this algorithm incorporates the constraints in the most unbiased way and consequently yields the optimal N(k). Figure 2(b) illustrates that the algorithm solutions which have the functional form N(k)∼exp(−bk)/k2 (see Supporting Information in Text S1). Note that the distribution has an exponential decay for smaller values of average degree but becomes very power law like for larger .
In order to find the variational solution for the network case, one needs to introduce the network constraints. These constraints can be directly implemented into the corresponding algorithm method: Moves which violates the network constraint are discarded.(see Supporting Information in Text S1) When the network constraints are included the distribution follows a power law over a large range with γ>2, as illustrated in Fig. 2(c). An important point to notice is that average distribution is different from the individual network configurations N(k). The striking difference is that a single network configurations always contains large split-off nodes, as illustrated in Fig. 2(d). These split-off nodes constitute an essential characteristic of the network.
How does the blind watchmakers random network compare to real networks? Fig. 3 illustrates this for the case of metabolic networks: Fig. 3(a) shows the average distribution obtained from 107 such networks and Fig. 3(b) the metabolic network for the E. Coli. bacteria (the data is taken from Ref. ,). In Fig. 3(c) and 3(d) these networks are compared to the corresponding blind watchmaker network: the only information contained in this latter network is the number of nodes and links and the number of networks involved in the average. We again stress that both the shape of the distribution and the spread of the data are important characteristics of a real network. As apparent from Fig. 3(c) the overlap between the two data-sets is striking. From this we conclude that metabolic networks are to large extent blind watchmaker networks. Figure 3(d) compare a single metabolic network (E. Coli.) with the corresponding random network. A particular feature in this comparison is the large split off nodes. In case of E. Coli. there are 6 nodes with more than 100 links and for the corresponding blind watchmaker there are on the average 5.3±1.5.
Even if the blind watchmaker network explains the overall feature such as the fat tails of biological networks like the metabolic ones, there are of course differences. One such difference is the low number of nodes with only one link in case of metabolic networks. Such systematic deviations signal additional constraints in the real network. Whenever such constraints are present the entropy of the distribution is lower than for the blind watchmaker network. To illustrate this we in Fig. 3(e) and 3(f) compare the real networks with the corresponding blind watchmaker network including the least biased constraint which adjusts the number of single-link nodes (details in Supporting Information Text S1). Now the agreement is extraordinary considering the fact that only one data point, the number of single-link nodes at the very beginning of the distribution, has been adjusted. It clearly demonstrates that it is rather the deviations from the blind watchmaker network which needs to be explained (like in the present case the smaller number of single -link nodes): These deviations contain information about the system in addition to the less system specific information represented by the blind watchmaker network.
In the present work we focus on the different possible states a network can be found in. These network states distinguishes between the time order in which a node is connected to its neigbours. No a priory assumption of any growth mechanism or evolution process is made. We introduce the concept of random with respect to the network states and call the corresponding network the blind watchmaker network. It is found that the blind watchmaker network is scale-free and that metabolic networks to large extent are blind watchmaker networks. This means that the evolutionary path of the cell-metabolism, when projected onto a metabolic network representation, is statistically random with respect to a complete set of network states. This randomness emanates from the inherent randomness (the blind element) in Darwinian evolution and suggests that natural selection has had no or little effect on the network node degree distribution of metabolic networks.
Can these conclusions really be drawn from the agreement between the node degree distribution, the split off nodes and the stochastic spread of the data when comparing the actual data for metabolic networks and the blind watchmaker model network? In our opinion they can be drawn: The key is the quality of the agreement in relation to the number of free adjustable parameters. Earlier attempts to reproduce the degree distribution of metabolic networks usually starts from some assumption about the actual evolutionary path and has to our knowledge not been able to reproduce the data in the way demonstrated here . So why does the evolution of metabolic networks choose this particular stochastic node degree distribution displayed by the blind watchmaker model network? According to us, the answer is that this degree distribution is neutral with respect to natural selection and hence, in this sense, has not been chosen at all. The notion that the node degree distribution for metabolic networks could be neutral with respect to natural selection has also been suggested on the bases of a comparison between the node degree distribution in atmospheric chemical reaction networks (no natural selection) and metabolic networks ,.
Materials and Methods
The theoretical framework used in the analysis is classical statistical mechanics, both in its conventional variational form described in Text S1 and in its stochastical algorithm form described in Results.
(0.05 MB PDF)
Conceived and designed the experiments: SB PM. Performed the experiments: SB PM. Analyzed the data: SB PM. Contributed reagents/materials/analysis tools: SB PM. Wrote the paper: SB PM.
- 1. Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev. of Mod. Phys. 74: 47–97.
- 2. Dorogovtsev SN, Mendes JFF (2003) Evolution of networks: From biological nets to the Internet and WWW. Oxford University Press.
- 3. Newman MEJ (2003) The structure and function of complex networks. SIAM Review 45: 167–256.
- 4. Boccaletti S, Latora V, Morena Y, Chavez M, Hwang D-U (2006) Complex networks: Structure and dynamics. Phys. Rep. 424: 175–308.
- 5. Newman MEJ, Barabási A-L, Watts DJ (2006) The Structure and dynamics of networks. Princeton: Princeton University press.
- 6. Dawkins R (1985) The blind watchmaker: Why the evidence of evolution reveals a universe without design. Norton.
- 7. Wagner A (2003) Does selection mold molecular networks. Science STKE 41.
- 8. Albert R, Jeong H, Barabási A-L (2000) Error attack and tolerance of complex networks. Nature 406: 378–382.
- 9. Cohen R, Erez K, ben-Avraham D, Havlin S (2000) Resilience of Internet for random breakdowns. PRL. 85: 4626–4628.
- 10. Barabási A-L, Albert R, Jeong H (1999) Emergence of scaling in complex networks. Science 286: 509–512.
- 11. Solé RV, Pastor-Satorras R, Smith E, Kepler T (2002) A model of large-scale proteomic evolution. Adv. in Complex Systems 5: 43–54.
- 12. Vazquez A, Flammini A, Maritan A, Vespignani A (2003) Modeling of protein interaction networks. Complexus 1: 38–44.
- 13. Pfeiffer T, Soyer OS, Bonhoeffer S (2005) The evolution of connectivity in metabolic networks. PloS Biology 3: 1269–1275.
- 14. Ma H, Zeng A-P (2003) Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics 19: 270–277.
- 15. Ma H, Zeng A-P (2003) The connectivity structure, giant strong component and centrality of metabolic networks. Bioinformatics 19: 1423–1430.
- 16. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási A-L (2000) The large scale organization of metabolic networks. Nature 407: 651–654.
- 17. Minnhagen P, Bernhardsson S (2007) Optimization and scale-freeness for complex networks. Chaos 17: 0261171–0261177.
- 18. Jaynes ET (1957) Information theory and statistical mechanics. Phys. Rev. 106: 620–630.
- 19. Minnhagen P, Bernhardsson S, Kim BJ (2007) Scale-freeness for Networks as a Degenerate Ground State: A Hamiltonian Formulation. EPL 78: 280041–280045.
- 20. Wagner A (2005) Robustness and evolvability in living systems. Princeton: Princeton University Press. NJ.
- 21. Gleiss P, Stadler P, Wagner A, Fell D (2001) Relevant cycles in chemical reaction networks. Adv. Complex Systems 4: 207–226.