^{1}

^{*}

^{2}

^{3}

^{¤}

^{2}

^{3}

^{4}

TJP, JJ, JR, and LG analyzed the data. JJ contributed reagents/materials/analysis tools. TJP, JJ, JR, and LG wrote the paper.

¤ Current address: Department of Zoology, University of Cambridge, Cambridge, United Kingdom

The authors have declared that no competing interests exist.

A fundamental problem in functional genomics is to determine the structure and dynamics of genetic networks based on expression data. We describe a new strategy for solving this problem and apply it to recently published data on early

Modeling dynamical systems involves determining which elements of the system interact with which, and what is the nature of the interaction. In the context of modeling gene expression dynamics, this question equates to determining regulatory relationships between genes. Perkins and colleagues present a new computational method for fitting differential equation models of time series data, and apply it to expression data from the well-known segmentation network of

The segmented body pattern of

(A–C)

(D–H)

(I–L) Mean relative gap protein concentration as a function of A–P position (measured in percent embryo length) for Hb (I), Kr (J), Kni (K), and Gt(L). Expression levels are from images and are unitless, ranging from 0 to 255. Images and expression profiles are from the FlyEx database [

The gap gene network has been studied extensively by means such as the analysis of transcription factor binding sites on the DNA and the measurement of gene expression under wild-type and mutant conditions. These studies are the basis of qualitative regulatory models (e.g., [

The increasing availability of quantitative gene expression data raises the possibility of detailed mathematical modeling of the regulatory relationships between genes. Indeed, recent work has shown that model parameters and even the existence of regulatory relationships can be automatically inferred from expression time series data [

Unfortunately, fitting quantitative dynamical models can be computationally challenging. Only two techniques [

We use a novel strategy for fitting network models to spatio–temporal gene expression data that is compatible with a variety of modeling formalisms and, by combining function approximation techniques with simulation-based optimization, is vastly faster than previous methods [

Gap gene expression is established in the trunk region of the embryo, indicated by the black bars in

We represent gap gene dynamics using a reaction–diffusion partial differential equation [

where ^{a}^{a}^{a}^{a}

We fit a total of four models, resulting from two different choices for the form of the production rate functions, ^{a},

where ^{a}^{ab},^{ab}^{ab}^{a}

While our first model tests the feasibility of inferring regulatory relationships de novo, our second model tests the ability of an established regulatory model to reproduce the data. In our second gene circuit model, regulatory relationships are limited to those in the model of Rivera-Pomar and Jäckle [^{hb}^{hb}^{2}/255. We call the resulting model RPJ-GC.

Previous analyses suggest that the general regulatory principle of the gap gene system is that genes are activated in broad regions by maternal gradients, but that repression from other gap genes can locally overwhelm general activation [^{a}

states that Bcd is an activator of ^{hb}

For our logical models, fitting the production rate functions means finding values for the ^{a}

We fit all four models using the same general strategy outlined in

In Stage 1, protein production associated to each domain is assumed to fall within a quadrilateral-shaped region of space–time (A) (darkness indicates rate of production), whose boundaries are optimized so that simulated expression (B) matches observed data (

In Stage 2, regulatory parameters are estimated by trying to fit the quadrilateral production regions (A) based on the observed levels of transcription factors present at each space–time point (C).

In Stage 3, local search, starting from the parameter values estimated in Stage 2, is used to optimize a fully coupled partial differential equation model of gene expression, so that simulated expression (

(A) Observed expression of

(B) Observed expression of the trunk gap genes.

(C–F) Simulated expression produced by the models Unc-GC, Unc-Logic, RPJ-GC, and RPJ-Logic, respectively. The horizontal axis in each plot is A–P position, ranging from 35% to 92% of embryo length. The vertical axis represents relative protein concentration corresponding to fluorescence intensity from quantitative gene expression data [

The observed expression data is shown in

The exact parameters for the best-fitting models of each type can be found in

(A) Qualitative regulatory relationships in our four models after optimization (Unc-Gc, Unc-Logic, RPJ-GC, and RPJ-Logic), a Combined model (see text for details), and the relationships posed by Rivera-Pomar and Jäckle (R-P & J) [

(B) Diagram of the Combined network model. Boxes represent trunk gap gene domains, with endings “-a” or “-p” denoting the anterior and posterior domains, respectively, for

(A) Simulated expression at five of the 10 times for which we have observed data.

(B–E) Production rates of Hb, Kr, Gt, and Kni, respectively, as a fraction of maximum (black curve) along with the production rate that would result when individual regulatory inputs are removed (colored curves). For example, in the plot for Hb at time

In each plot in columns B–E, the black bar indicates the spatial extent of production. Colored bars above the black bar represent regions in which the corresponding activatory input is above threshold (at least one activator must be above threshold for production to occur). Colored bars below the black bar represent regions in which the corresponding repressive input is above threshold (production only occurs if no repressors are above threshold).

The maternal proteins Bcd and Cad are largely responsible for activating the trunk gap genes, with Bcd being more important for the anterior domains and Cad more important for the posterior domains. Bcd is a primary activator of the anterior

There is a long-running debate about whether or not low levels of Hb activate

Another point of disagreement in the literature is what prevents the expression of

In all of our models, the posterior

Domain shifting was first observed by Jaeger et al. [

Repression of

Activating or repressing links that oppose the direction of the repressive chain were eliminated by optimization of the Unc-Logic, RPJ-GC, and RPJ-Logic models (

All four of our models include autoactivation by

The regulatory relationships proposed by Rivera-Pomar and Jäckle [

In contrast, the regulatory relationships in our Combined model and both the Unc-GC and Unc-Logic models are able to capture the wild-type gap patterns without gross defects. The relationships in the Unc-GC model are very similar to those obtained by Jaeger et al. [

Both of our gene circuit models fit the data better than the corresponding logical models. Although both types of models grossly simplify the complexity of gene regulation, this suggests that the gene circuit formalism is a better description of gap gene regulation than the logical formalism. However, the Unc-Logic model shows that the logical formalism can correctly capture the main features of gap gene expression. Of greater concern is that the strict on/off nature of the logical rules renders many regulatory inputs completely redundant, effectively eliminating them from the regulatory structure (

The RMS errors in simulated expression for the Jaeger et al. [

While our models, particularly Unc-GC, Unc-Logic, and the Combined model, capture the main features of gap gene expression dynamics, some failings are common to all the models. For example, none of the models capture well the shifting of the posterior

Our models largely capture the wild-type expression patterns, and the regulatory relationships they rely on are consistent with mutant studies. However, our models do not in general display correct mutant expression patterns (see

Our models, particularly the Unc-GC model, includes a number of weak regulatory relationships, the significance of which is difficult to determine. Some of the links may arise from overfitting the data, or they may be compensating for incorrect modeling assumptions or for other missing or imperfectly modeled regulatory factors. However, it is difficult to say precisely which links should be ignored. Our data comprises a single space–time series, with strong correlations between datapoints. Resampling approaches for estimating significance, such as cross-validation or bootstrapping, are not useful in such cases. Jaeger et al. [

Quantitative gene expression data used in this study are available online in the FlyEx database

Here we describe our three-stage approach to fitting the parameters of the partial differential equations. Details of how this strategy was applied for each model can be found in

This is essentially the same as Equation 1, except that we have made explicit the regulatory parameters Θ^{a}^{a}^{a}

From the experience of Jaeger et al. [^{a}, λ^{a},^{a}

In the first stage, we estimate ^{a}, D^{a}

We use a set of seven parameters to describe the conditions of protein production associated with each of the six gap protein peaks: _{start},_{end},_{s,a},_{start}; x_{s,p},_{start}; x_{e,a},_{end};_{e,p},_{end}

For a two-domain gene, the production rate function is the maximum of two such functions. In either case, the dynamical model of the expression of protein

where ^{a}^{a}_{d}^{a}

Next, we generate an initial estimate of the regulatory parameters for each gene by searching for Θ^{a} that minimize the error function _{d}^{a}^{a} so that ^{a}^{a}

Although the Unc-GC model includes Gt activation of

Finally, we combine the decay and diffusion constants estimated in Stage 1 with the regulatory parameters estimated in Stage 2 in a fully coupled partial differential equation model (Equation 4). Starting from these initial parameters, we perform repeated first-improvement local search with randomized order of neighbor examination, seeking parameters that minimize the RMS error _{d}^{a}^{a}

We simulate using a fixed time step of one minute and a spatial grid of 58 points (one space point for each 1% of embryo length between 35% and 92%). For each step, we calculate the production rates for each gene at each space point and add them to the expression values. This corresponds to one minute of constant-rate production with no decay or diffusion. For Equation 1 (or 4), in which production rates depend on the protein levels present, the simulated gap protein levels and the observed ^{a}

(41 KB DOC)

(58 KB PDF)

(75 KB PDF)

(10 KB PDF)

(56 KB PDF)

(38 KB PDF)

(59 KB PDF)

Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation.

anterior–posterior

Bicoid

Caudal

Rivera-Pomar and Jäckle network structure

tailless