Dynamic patterning by the Drosophila pair-rule network reconciles long-germ and short-germ segmentation

Drosophila segmentation is a well-established paradigm for developmental pattern formation. However, the later stages of segment patterning, regulated by the “pair-rule” genes, are still not well understood at the system level. Building on established genetic interactions, I construct a logical model of the Drosophila pair-rule system that takes into account the demonstrated stage-specific architecture of the pair-rule gene network. Simulation of this model can accurately recapitulate the observed spatiotemporal expression of the pair-rule genes, but only when the system is provided with dynamic “gap” inputs. This result suggests that dynamic shifts of pair-rule stripes are essential for segment patterning in the trunk and provides a functional role for observed posterior-to-anterior gap domain shifts that occur during cellularisation. The model also suggests revised patterning mechanisms for the parasegment boundaries and explains the aetiology of the even-skipped null mutant phenotype. Strikingly, a slightly modified version of the model is able to pattern segments in either simultaneous or sequential modes, depending only on initial conditions. This suggests that fundamentally similar mechanisms may underlie segmentation in short-germ and long-germ arthropods.

Stable states of the network can be easily calculated from the overall set of protein state  transcriptional output mappings; they are those where the output state for each component is unchanged from its input state. These stable attractor states can be thought of as representing particular cell "fates".
The network is updated synchronously, by discrete timesteps. The transcriptional output of the network at timepoint t+1 is simply calculated from the network protein state at time t. However, calculating the new protein state is more complicated, because of the time delays that need to be accounted for.
Each component in the network is associated with two positive integer parameter values, a "synthesis delay"(s) and a "decay delay" (d). The synthesis delay determines how long a component must be transcribed before the component turns on at the protein level, while the decay delay determines how long a component has to be transcriptionally silenced before the component turns off. Each component is also associated with two non-negative integer variables, "transcript age" and "protein age" respectively, that provide some "memory" to the system. if (protein_age < decay_delay) then new_protein  1; else new_protein  0, protein_age  0 Note: "transcript" = transcriptional output at timepoint t+1; "old_protein" = protein state at timepoint t; "new_protein" = protein state at timepoint t+1.
The simulation algorithm is thus set up so that a protein will only turn on if there is continuous transcription for the entire duration of the synthesis delay, and will only turn off is transcription is continuously absent for the entire duration of the decay delay, effectively ignoring very brief changes to transcriptional output.

Implementation
The simulation software is written in object-oriented Python (www.python.org), using the additional libraries NumPy and Matplotlib [7,8]. All the functions necessary for producing and visualising a simulation are contained in one .py file (S1 File). The user simply specifies a network model and simulation parameter values in a separate script (see S2 File for a template), and can thereby easily carry out bespoke simulations. Simulations can be run simultaneously for an arbitrary number of "cells", each of which behaves independently. Each cell has the same network model and parameter values, but can be assigned different initial conditions.
The things the user must specify are: 1) The network components (i.e. the various transcription factors in the system) 2) The control logic (i.e. regulation) of each component. This is done simply by defining a function that takes the current state of the network as its argument and returns either 0 or 1 as appropriate.
3) The synthesis and decay delays associated with each component. There are also functions to set default values for all components, if desired. 4) The number of "cells" to be simulated. 5) The initial conditions of each cell. Protein state, protein age, transcript state, and transcript age may all be specified. Values are set to zero by default. 6) The number of timepoints to simulate. 7) If visualising the output with an animation, the colour associated with each gene product, and other aesthetic preferences.
The user then simply has to call the simulation function to calculate the system state at each timepoint, given the chosen initial conditions, control logic, and parameter values. The user may then export or visualise the output data.
Visualisations represent individual cells as different columns (named C1, C2, C3…) and different gene products as different named rows. Both "transcript" and "protein" outputs are shown for each row. Depending on user preference, transcript and protein ages can also be represented in the visualisation, with "decaying" proteins shown as increasingly transparent before they turn off, and "nascent" transcripts shown as increasingly opaque until protein appears.
Visualisations can be shown as an animation, or the individual frames may be saved as image files. The frames can then be stitched together to produce video files, for example using ffmpeg (www.ffmpeg.org).

Pair-rule model details
Each simulation referenced in the main text models pair-rule gene expression along an idealised anteroposterior axis. The simulations show an array of 20 "cells", each of which evolves independently. This cellular autonomy is justified for the model by the apical localisation of real pair-rule transcripts, which precludes diffusion between nuclei during cellularisation of the blastoderm [9,10]. (Note also that all the pair-rule genes code for transcription factors, rather than e.g. components of signalling pathways.) Initial conditions for each simulation consist of a pattern that repeats every 8 cells, with each group of 8 cells representing an idealised double segment repeat.
"Early network" simulations (Movies 1-8) are the simplest, restricted to modelling the establishment and maintenance of the original pair-rule pattern observed at midcellularisation. They model the expression of the five primary pair-rule genes (hairy, eve, runt, ftz, and odd), as patterned ultimately by abstracted gap inputs (e.g. "G1", "G2").
"Whole system" simulations (Movies 9-11) are more complicated, additionally including the patterning of the secondary pair-rule genes, and the transition to single-segment periodicity. As well as the primary pair-rule genes and the gap inputs, they model the secondary pair-rule genes prd, slp, and the segment-polarity gene en. They also utilise two temporal signals, Cad and Opa. Cad controls the onset of secondary pair-rule gene expression, while Opa controls the switch between the early and the late pair-rule networks.
"Modified network" simulations (Movies 12 and 13) are based on the "whole system" simulations, but tweak the inferred Drosophila network so as to produce a more autonomous patterning system that can operate in the absence explicit gap inputs.
In all simulations, the seven pair-rule factors and En are all assigned identical synthesis and decay delays, both equal to 6 timesteps. Studies of pair-rule gene expression kinetics indicate that protein synthesis and turnover rates in the embryo are equally rapid, and are both processes occurring on the order of 10 minutes [11,12]. The time delays associated with the remaining system components (gap inputs, Cad, and Opa) are chosen in an ad hoc manner so as to provide appropriate spatiotemporal signals to the pair-rule genes.
The control logic, time delays, and initial conditions for each simulation are described below. Note that any unspecified initial conditions can be assumed to be set to zero.

"Early network"simulations (Movies 1-8)
The core control logic of these simulations is the "early network" shown in Fig 1A ( In each simulation, no initial pair-rule gene expression is specified (i.e. in each cell, for each of the five variables, the initial conditions are transcript=0, protein=0, transcript age=0, protein age=0).

Effects of shifting gap inputs (simulations 1-5)
Simulations 1 to 5 explore the effects of dynamic gap inputs on pair-rule gene expression. The only differences between them relate to the "gap inputs" G1 and G2.

Control logic:
G1 and G2 autoactivate in simulation 1 so as to produce stable domains, and autorepress in simulations 2-5 so as to produce dynamic domains.

Time delays:
In each of the simulations 2-5, the time delays and initial conditions of G1 and G2 are specifically chosen so as to produce a particular gap domain shift rate while preserving an 8 cell pattern repeat. Simulation 2 produces a shift rate of one cell every 6 timesteps (equal to the synthesis/decay delay of the pair-rule factors), which requires G1/G2 expression to go through a full oscillation every 48 timesteps. Simulations 3-5 have time delays and initial conditions equivalent to those of simulation 2, except multiplied by a scaling factor to give slower or faster oscillations of G1/G2 in each cell and hence slower or faster shifts. Simulation 3 is scaled by 2, Simulation 4 is scaled by 1/2, and Simulation 5 is scaled by 1/3. These three simulations model situations where runt and/or ftz/odd are initially controlled by gap inputs, before being taken over the pair-rule network later on. These simulations involve some extra components. Simulation 6 has an additional gap input, G3, which regulates runt, while simulation 7 has an analogous gap input, G4, which regulates ftz and odd. Both G3 and G4 are present in simulation 8. All three simulations additionally have a temporal signal, "G" which controls whether runt and/or ftz/odd are controlled by the gap inputs G3/G4, or alternatively by pair-rule inputs as normal. Details of any new control logic and initial conditions are listed below. Anything not explicitly detailed is the same as simulation 2.
Control logic: Note: the G decay delay effectively specifies how long runt and/or ftz/odd are controlled by gap inputs rather than the pair-rule network.

Control logic:
The structure of the pair-rule network as a whole is based largely on evidence from [13], and summarised in Fig 1A of the main text. The topology of the network is context dependent, with many genetic interactions affected by the transcriptional cofactor, Opa. Since Opa protein appears in the Drosophila embryo only at the end of cellularisation [14], segment patterning is sequentially directed by "early" and "late" incarnations of the overall network.
I therefore make the control logic of each segmentation gene in the network conditional on the presence or absence of Opa. If Opa is off, the control logic is a formulation of the early network, while if Opa is on, the control logic is a formulation of the late network. The specific regulatory rules for each pair-rule gene (plus en) are described below, supplemented with explanatory notes where appropriate. Notes: While Opa is off, the regulation of eve is the same as in Simulations 1-8. As soon as Opa turns on, eve expression is instead patterned by various repressors (Runt, Odd, Slp, En). The qualification in the model that Slp represses eve only if Eve is not already turned on is required in order to resolve overlaps between eve and slp expression that form in the simulations just before the switch to the late network occurs. Specifying that Eve expression "wins" this contest ensures that the anterior boundaries of the Eve stripes define the parasegment boundaries, as occurs in the embryo [15]. However, the overlap situation is not a scenario that occurs in real embryos: the eve boundaries stabilise at roughly the same time slp turns on (rather than afterwards), and therefore the eve and slp domains are mutually exclusive throughout the segmentation process. A more realistic model that avoided this issue would need to include more sophisticated temporal control of eve and slp regulation, and/or time-dependent gap domain dynamics. Note that for simulation 11, which models an eve mutant embryo, the control logic of eve is replaced by unconditional repression (eve  OFF). Notes: While Opa is off, the regulation of runt is the same as in Simulations 1-8. As soon as Opa turns on, runt expression is instead patterned by various repressors (Eve, Odd, En). However, I have specified that if Runt is coexpressed with either Eve or Odd, the Runt expression will "win" and runt transcription will not turn off. This is because Eve/Runt and Runt/Odd overlaps are both present during cellularisation (at the anteriors and the posteriors of the Runt stripes, respectively), and both resolve in favour of runt expression during gastrulation [13,16]. In contrast, En expression has the potential to supplant Runt expression, as occurs at the posterior borders of the runt primary stripes. Note that the inferred control logic is a phenomenological description of regulation within the embryo, and does not necessarily imply Runt autoregulation at a mechanistic level. Note also that the Opadependent control logic specified above reflects the late regulation of the runt "seven stripe element", and I have not explicitly modelled the expression arising from the "six stripe element" [13,17].  Fig  1A, since Slp protein is synthesised too late to have any practical effect on ftz expression while the early network is in operation.) When Opa turns on, ftz expression remains repressed by Eve and Slp, but is no longer regulated by Hairy. Instead, it requires autoactivation. Having this autoregulation in the model permits the anterior border of the Ftz domain to remain stable and define the even-numbered parasegment boundaries, as occurs in the embryo [15]. However, note that while direct Ftz autoactivation is well-documented [18,19], it remains to be tested whether Ftz activity is absolutely required for late Ftz expression.  if (Ftz=ON and Odd=OFF and Slp=OFF) or (Prd=ON and Odd=OFF and Slp=OFF and Runt=OFF)) then en  ON; else en  OFF Notes: While Opa is off, en expression is repressed. When Opa turns on, en is spatially patterned by both activators and repressors. en requires activation from either Prd or Ftz, and is always repressed by Odd and Slp. The odd-numbered en stripes (those activated by Prd) are additionally repressed by Runt. In contrast, the even-numbered en stripes (those activated by Ftz) are insensitive to Runt activity.
The remaining components of the model (Cad, Opa, G1/G2) provide broad or abstracted inputs into the periodically expressed components described above. Cad and Opa are extrinsic inputs that are "off" by default or "on" by default, respectively. G1 and G2 are abstracted, autoregulatory gap inputs, whose expression (and function) is restricted to early stages of patterning (i.e. when Opa is off). Note that G1 and G2 have different control logic in Simulation 9 (modelling static gap inputs) versus Simulations 10 and 11 (modelling dynamic gap inputs).