A Calculus of Purpose

Biological systems are so complex that we must ask: "What purpose does all this complexity serve?" Lander argues that computational biology may help provide answers

W hy is the sky blue? Any scientist will answer this question with a statement of mechanism: Atmospheric gas scatters some wavelengths of light more than others. To answer with a statement of purpose-e.g., to say the sky is blue in order to make people happy-would not cross the scientifi c mind. Yet in biology we often pose "why" questions in which it is purpose, not mechanism, that interests us. The question "Why does the eye have a lens?" most often calls for the answer that the lens is there to focus light rays, and only rarely for the answer that the lens is there because lens cells are induced by the retina from overlying ectoderm.
It is a legacy of evolution that teleology-the tendency to explain natural phenomena in terms of purposes-is deeply ingrained in biology, and not in other fi elds (Ayala 1999). Natural selection has so molded biological entities that nearly everything one looks at, from molecules to cells, from organ systems to ecosystems, has (at one time at least) been retained because it carries out a function that enhances fi tness. It is natural to equate such functions with purposes. Even if we can't actually know why something evolved, we care about the useful things it does that could account for its evolution.
As a group, molecular biologists shy away from teleological matters, perhaps because early attitudes in molecular biology were shaped by physicists and chemists. Even geneticists rigorously defi ne function not in terms of the useful things a gene does, but by what happens when the gene is altered. Molecular biology and molecular genetics might continue to dodge teleological issues were it not for their fi elds' remarkable recent successes. Mechanistic information about how a multitude of genes and gene products act and interact is now being gathered so rapidly that our inability to synthesize such information into a coherent whole is becoming more and more frustrating. Gene regulation, intracellular signaling pathways, metabolic networks, developmental programs-the current information deluge is revealing these systems to be so complex that molecular biologists are forced to wrestle with an overtly teleological question: What purpose does all this complexity serve?
In response to this situation, two strains have emerged in molecular biology, both of which are sometimes lumped under the heading "systems biology." One strain, bioinformatics, champions the gathering of even larger amounts of new data, both descriptive and mechanistic, followed by computerbased data "mining" to identify correlations from which insightful hypotheses are likely to emerge. The other strain, computational biology, begins with the complex interactions we already know about, and uses computer-aided mathematics to explore the consequences of those interactions. Of course, bioinformatics and computational biology are not entirely separable entities; they represent ends of a spectrum, differing in the degree of emphasis placed on large versus small data sets, and statistical versus deterministic analyses.
Computational biology, in the sense used above, arouses some skepticism among scientists. To some, it recalls the "mathematical biology" that, starting from its heyday in the 1960s, provided some interesting insights, but also succeeded in elevating the term "modeling" to near-pejorative status among many biologists. For the most part, mathematical biologists sought to fi t biological data to relatively simple mathematical models, with the hope that fundamental laws might be recognized (Fox Keller 2002). This strategy works well in physics and chemistry, but in biology it is stymied by two problems. First, biological data are usually incomplete and extremely imprecise. As new measurements are made, today's models rapidly join tomorrow's trash heaps. Second, because biological phenomena are generated by large, complex networks of elements, there is little reason to expect to discern fundamental laws in them. To do so would be like expecting to discern the fundamental laws of electromagnetism in the output of a personal computer.
Nowadays, many computational biologists avoid modeling-as-datafi tting, opting instead to create models in which networks are specifi ed in terms of elements and interactions (the network "topology"), but the numerical values that quantify those interactions (the parameters) are deliberately varied over wide ranges. As a result, the study of such networks focuses not on the exact values of outputs, but rather on qualitative behavior, e.g., whether the network acts as a "switch," "fi lter," "oscillator," "dynamic range adjuster," "producer of stripes," etc. By investigating how such behaviors change for different parameter setsan exercise referred to as "exploring the parameter space"-one starts to assemble a comprehensive picture of all the kinds of behaviors a network can produce. If one such behavior seems useful (to the organism), it becomes a candidate for explaining why the network itself was selected, i.e., it is seen as a potential purpose for the network. If experiments subsequently support assignments of actual parameter values to the range of parameter space that produces such behavior, then the potential purpose becomes a likely one.
For very simple networks (e.g., linear pathways with no delays or feedback and with constant inputs), possible global behaviors are usually limited, and computation rarely reveals more than one could have gleaned through intuition alone. In contrast, when networks become even slightly complex, intuition often fails, sometimes spectacularly so, and computation becomes essential.

A Calculus of Purpose
For example, intuitive thinking about MAP kinase pathways led to the long-held view that the obligatory cascade of three sequential kinases serves to provide signal amplifi cation. In contrast, computational studies have suggested that the purpose of such a network is to achieve extreme positive cooperativity, so that the pathway behaves in a switch-like, rather than a graded, fashion (Huang and Ferrell 1996). Another example comes from the study of morphogen gradient formation in animal development. Whereas intuitive interpretations of experiments led to the conclusion that simple diffusion is not adequate to transport most morphogens, computational analysis of the same experimental data yields the opposite conclusion (Lander et al. 2002).
As the power of computation to identify possible functions of complex biological networks is increasingly recognized, purely (or largely) computational studies are becoming more common in biological journals. This raises an interesting question for the biology community: In a fi eld in which scientifi c contributions have long been judged in terms of the amount of new experimental data they contain, how does one judge work that is primarily focused on interpreting (albeit with great effort and sophistication) the experimental data of others? At the simplest level, this question poses a conundrum for journal editors. At a deeper level, it calls attention to the biology community's diffi culty in defi ning what, exactly, constitutes "insight" (Fox Keller 2002).
In yesterday's mathematical biology, a model's utility could always be equated with its ability to generate testable predictions about new experimental outcomes. This approach works fi ne when one's ambition is to build models that faithfully mimic particular biological phenomena. But when the goal is to identify all possible classes of biological phenomena that could arise from a given network topology, the connection to experimental verifi cation becomes blurred. This does not mean that computational studies of biological networks are disconnected from experimental reality, but rather that they tend, nowadays, to address questions of a higher level than simply whether a particular model fi ts particular data.
The problem this creates for those of us who read computational biology papers is knowing how to judge when a study has made a contribution that is deep, comprehensive, or enduring enough to be worth our attention. We can observe the fi eld trying to sort out this issue in the recent literature. A good example can be found in an article by Nicholas Ingolia in this issue of PLoS Biology (Ignolia 2004), and an earlier study from Garrett Odell's group, upon which Ingolia draws heavily (von Dassow et al. 2000).
Both articles deal with a classical problem in developmental biology, namely, how repeating patterns (such as stripes and segments) are laid down. In the early fruit fl y embryo, it is known that a network involving cell-to-cell signaling via the Wingless (Wg) and Hedgehog (Hh) pathways specifi es the formation and maintenance of alternating stripes of gene expression and cell identity. This network is clearly complex, in that Wg and Hh signals affect not only downstream genes, but also the expression and/or activity of the components of each other's signaling machinery.
Von Dassow et al. (2000) calculated the behaviors of various embodiments of this network over a wide range of parameter values and starting conditions. This was done by expressing the network in terms of coupled differential equations, picking parameters at random from within prespecifi ed ranges, solving the equation set numerically, then picking another random set of parameters and obtaining a new numerical solution, and so forth, until 240,000 cases were tried. The solutions were then sorted into groups based on the predicted output-in this case, spatial patterns of gene expression.
When they used a network topology based only upon molecular and generegulatory interactions that were fi rmly known to take place in the embryo, they were unable to produce the necessary output (stable stripes), but upon inclusion of two molecular events that were strongly suspected of taking place in the embryo, they produced the desired pattern easily. In fact, they produced it much more easily than expected. It appeared that a remarkably large fraction of random parameter values produced the very same stable stripes. This implied that the output of the network is extraordinarily robust, where robustness is meant in the engineering sense of the word, namely, a relative insensitivity of output to variations in parameter values.
Because real organisms face changing parameter values constantly-whether as a result of unstable environmental conditions, or mutations leading to the inactivation of a single allele of a gene-robustness is an extremely valuable feature of biological networks, so much so that some have elevated it to a sort of sine qua non (Morohashi et al. 2002). Indeed, the major message of the von Dassow article was that the authors had uncovered a "robust developmental module," which could ensure the formation of an appropriate pattern even across distantly related insect species whose earliest steps of embryogenesis are quite different from one another (von Dassow et al. 2000).
There is little doubt that von Dassow's computational study extracted an extremely valuable insight from what might otherwise seem like a messy and ill-specifi ed system. But Ingolia now argues that something further is needed. He proposes that it is not enough to show that a network performs in a certain way; one should also fi nd out why it does so.
Ingolia throws down the gauntlet with a simple hypothesis about why the von Dassow network is so robust. He argues that it can be ascribed entirely to the ability of two positive feedback loops within the system to make the network bistable. Bistability is the tendency for a system's output to be drawn toward either one or the other of two stable states. For example, in excitable cells such as neurons, depolarization elicits sodium entry, which in turn elicits depolarization-a positive feedback loop. As a result, large depolarizations drive neurons to fully discharge their membrane potential, whereas small depolarizations decay back to a resting state. Thus, the neuron tends strongly toward one or the other of these two states. The stability of each state brings with it a sort of intrinsic robustnessi.e., once a cell is in one state, it takes a fairly large disturbance to move it into the other. This is the same principle that makes electronic equipment based on digital (i.e., binary) signals so much more resistant to noise than equipment based on analog circuitry.
Ingolia not only argues that robustness in the von Dassow model arises because positive feedback leads to network bistability, he further claims that such network bistability is a consequence of bistability at the single cell level. He strongly supports these claims through computational explorations of parameter space that are similar to those done by von Dassow et al., but which also use strippeddown network topologies (to focus on individual cell behaviors), test specifi cally for bistability, correlate results with the patterns formed, and ultimately generate a set of mathematical rules that strongly predict those cases that succeed or fail at producing an appropriate pattern.
At fi rst glance, such a contribution might seem no more than a footnote to von Dassow's paper, but a closer look shows that this is not the case. Without mechanistic information about why the von Dassow network does what it does, it is diffi cult to relate it to other work, or to modify it to accommodate new information or new demands. Ingolia demonstrates this by deftly improving on the network topology. He inserts some new data from the literature about the product of an additional gene, sloppy-paired, in Hh signaling, removes some of the more tenuous connections, and promptly recovers a biologically essential behavior that the original von Dassow network lacked: the ability to maintain a fi xed pattern of gene expression even in the face of cell division and growth.
Taken as a pair, the von Dassow and Ingolia papers illustrate the value of complementary approaches in the analysis of complex biological systems. Whereas one emphasizes simulation (as embodied in the numerical solution of differential equations), the other emphasizes analysis (the mathematical analysis of the behavior of a set of equations). Whereas one emphasizes exploration (exploring a parameter space), the other emphasizes the testing of hypotheses (about the origins of robustness). The same themes can be seen in sets of papers on other topics. For example, in their analysis of bacterial chemotaxis, Leibler and colleagues (Barkai and Leibler 1997) found a particular model to be extremely robust in the production of an important behavior (exact signal adaptation), and subsequently showed that bacteria do indeed exhibit such robust adaptation (Alon et al. 1999). Although Leibler and colleagues took signifi cant steps toward identifying and explaining how such robustness came about, it took a subsequent group (Yi et al. 2000) to show that robustness emerged as a consequence of a simple engineering design principle known as "integral feedback control." That group also showed, through mathematical analysis, that integral feedback control is the only feedback strategy capable of achieving the requisite degree of robustness.
From these and many other examples in the literature, one can begin to discern several of the elements that, when present together, elevate investigations in computational biology to a level at which ordinary biologists take serious notice. Such elements include network topologies anchored in experimental data, fi ne-grained explorations of large parameter spaces, identifi cation of "useful" network behaviors, and hypothesisdriven analyses of the mathematical or statistical bases for such behaviors. These elements can be seen as the foundations of a new calculus of purpose, enabling biologists to take on the much-neglected teleological side of molecular biology. "What purpose does all this complexity serve?" may soon go from a question few biologists dare to pose, to one on everyone's lips.