Canalization of Gene Expression in the Drosophila Blastoderm by Gap Gene Cross Regulation

Developing embryos exhibit a robust capability to reduce phenotypic variations that occur naturally or as a result of experimental manipulation. This reduction in variation occurs by an epigenetic mechanism called canalization, a phenomenon which has resisted understanding because of a lack of necessary molecular data and of appropriate gene regulation models. In recent years, quantitative gene expression data have become available for the segment determination process in the Drosophila blastoderm, revealing a specific instance of canalization. These data show that the variation of the zygotic segmentation gene expression patterns is markedly reduced compared to earlier levels by the time gastrulation begins, and this variation is significantly lower than the variation of the maternal protein gradient Bicoid. We used a predictive dynamical model of gene regulation to study the effect of Bicoid variation on the downstream gap genes. The model correctly predicts the reduced variation of the gap gene expression patterns and allows the characterization of the canalizing mechanism. We show that the canalization is the result of specific regulatory interactions among the zygotic gap genes. We demonstrate the validity of this explanation by showing that variation is increased in embryos mutant for two gap genes, Krüppel and knirps, disproving competing proposals that canalization is due to an undiscovered morphogen, or that it does not take place at all. In an accompanying article in PLoS Computational Biology (doi:10.1371/journal.pcbi.1000303), we show that cross regulation between the gap genes causes their expression to approach dynamical attractors, reducing initial variation and providing a robust output. These results demonstrate that the Bicoid gradient is not sufficient to produce gap gene borders having the low variance observed, and instead this low variance is generated by gap gene cross regulation. More generally, we show that the complex multigenic phenomenon of canalization can be understood at a quantitative and predictive level by the application of a precise dynamical model.


S1 Gap gene circuits
A gene circuit [1][2][3][4][5] determines the time evolution of protein concentrations in the syncytial blastoderm of Drosophila melanogaster. The circuits used in this paper comprise the gap genes hb, Kr, gt, and kni of the anteroposterior segmentation system. Their protein products are transcription factors that localize in the nuclei of the blastoderm [6][7][8][9]. Therefore, the state variables are the concentrations of the protein products of these genes inside nuclei. Anteroposterior (A-P) and dorsoventral (D-V) patterning systems are largely independent of each other in the presumptive germ band of the blastoderm embryo. This allows us to consider a one-dimensional row of nuclei along the anteroposterior axis of the embryo, from 35% EL to 92% EL. The modeled region extends over 58% of the A-P axis, from the peak of the third gt stripe to the posterior border of the posterior hb domain (Fig. S4B). The circuit functions according to three rules: interphase, mitosis and division. The first two rules describe the continuous dynamics of proteins during interphase and mitosis. During interphase the evolution of protein concentrations is determined by three processes: regulated protein synthesis, protein transport, and protein decay. During mitosis, transcription shuts down and nascent transcripts are destroyed [10]. Therefore, only protein transport and protein decay govern the dynamics in the mitosis rule.
The third rule, division, accounts for nuclear cleavages in the blastoderm. It models mitotic division as a discontinuous change in the state of the system. At the end of a mitosis, each nucleus is replaced with its daughter nuclei. The inter-nuclear distance is halved and the daughter nuclei inherit the state of the mother nucleus. The divisions are carried out according to a division schedule based on experimental data [11] (Fig. S2).
The gap gene circuits used in this study model events occurring during a period of time which begins at the onset of cleavage cycle 13 and ends at the onset of gastrulation at the end of cycle 14A [11]. Kr, gt, and kni are exclusively zygotic, and their expression at the protein level is first detectable at early cycle 13 [12][13][14][15][16][17]. hb, which is expressed both maternally and zygotically, shows a large increase in expression in cycle 13 [18], indicating commencement of its zygotic expression. Time t is measured in minutes from the start of cleavage cycle 13. The interphase of cycle 13 lasts for 16.0 min, and its mitosis from 16.0 to 21.1 min. At t = 21.1 min, the thirteenth division is carried out by applying the division rule. The interphase of cycle 14A starts immediately after division, and lasts until gastrulation at t = 71.1 min.

S1.1 Equations
The two continuous rules, interphase and mitosis, use a system of ordinary differential equations (ODEs) to describe the dynamics of protein concentrations. Let there be M nuclei in the modeled region during a particular cleavage cycle. Let i denote a particular nucleus, counting from anterior to posterior. We denote a particular segmentation gene by a ∈ 1, . . . , N , where N genes are represented in the circuit. v a i (t) is the protein concentration of gene a in nucleus i. The time evolution of the state variables v a i (t) is given by the solution of the system of M × N ODEs, (S1) The first term on the right hand side of Eq. (S1) represents protein synthesis, the second one protein transport through Fickian diffusion and the last term represents first-order protein degradation.
The protein synthesis rate for gene a is determined by the maximum synthesis rate R a and the regulatory input to a, u a ≡ N b=1 The rate of protein synthesis is the product of R a and the regulation-expression function g(u a ) = 1 2 u a / (u a ) 2 + 1 + 1 (see Fig. S1). u Figure S1: The regulation-expression function g(u). The dashed vertical lines are at values of total input u at which synthesis rate is 10% or 90% of maximum.
The regulatory input u a in turn accounts for the transcriptional regulation of gene a by transcription factors. Each term of u a corresponds to a distinct type of factor. The first term N b=1 T ab v b i represents the regulation of gene a by the genes b ∈ 1, . . . , N of the circuit. The elements of the ge-netic interconnection matrix T , T ab , characterize the regulatory effect of protein b on the synthesis of gene a. A positive value of T ab represents activation and a negative value represents repression. Spatially and temporally homogeneous maternal factors are represented in u a via its fourth term, h a . The second term of u a represents the action of Bcd, which has a concentration that is spatially inhomogeneous, but constant in time [16]. m a is the strength of the regulation of gene a by Bcd, while v Bcd i is the concentration of Bcd in nucleus i. Finally, we represent the effects of transcription factors which vary over space and time and regulate gap gene expression but are not themselves regulated by the gap genes through the third term of u a , N e β=1 E aβ v β i (t). N e is the number of such factors in the circuit, E aβ is the regulatory effect of the time varying external input β on gene a, and v β i (t) is the concentration of external input β in nucleus i at time t. The gene circuit used in this study has two such factors, Cad (v Cad and Tll (v Tll i (t)). cad is expressed from both the maternal and zygotic genomes. However, its expression pattern is not affected in mutants for hb, Kr, gt and kni, except for a slight expansion of the posterior stripe in hb − during late cycle 14 [19]. Similarly, tll expression is not affected in these mutants [20]. The concentrations of time varying external inputs Cad and Tll were determined by interpolation from data [16] (see Section S1.2 for details).
The second term of Eq. (S1) represents protein transport between nuclei as spatially discretized Fickian diffusion. The diffusion parameter D a is assumed to vary inversely with the squared distance between neighboring nuclei. The third term is protein degradation, in which λ a is the decay rate of the product of gene a. It is related to the protein half life of the product of gene a by t a 1/2 = ln 2/λ a .
Since Kr, Gt, and Kni proteins first appear only in cycle 13, they have initial conditions of zero, For hb, the expression data from cycle 12 is used as the initial condition. These data are the maternal component of hb since its expression intensifies only in cycle 13.

S1.2 Numerical implementation of time varying external inputs
In order to specify the right hand side of Eq. (S1) fully, the concentrations of the time varying external inputs, Cad (v Cad i (t)) and Tll (v Tll i (t)), must be supplied for any time in the duration of the model. Average concentrations for these genes are known at ten time points t k , k = −1, 0, . . . , 8. t −1 = 0 min, t 0 is the midpoint of cycle 13, and t 1 , . . . , t 8 correspond to the eight time classes T1-T8 in cycle 14 (Table 1). For nucleus i and external input β, The concentration of external input β in nucleus i is then determined at an arbitrary time t by piecewise linear interpolation, Square boxes indicate data points. Cad concentration at t = 0 min is determined from cleavage cycle 12 data. Tll concentration is zero at 0 min since Tll is first detected in cleavage cycle 13 [16]. The other time points are midpoints of time classes (Table 1). Tll data were only available for time classes T3-T8. Lines are interpolated concentrations.
Fig . S3 shows such interpolation at 50% EL for Cad, and at 92% EL for Tll. Higher order methods like cubic splines were not used because they gave rise to artifacts from experimental noise.

S1.3 Optimization and selection procedure for gap gene circuits
The Parallel Lam Simulated Annealing algorithm [21,22] produces many candidate circuits, and we selected a circuit using a three-step method [5]. First, only circuits with an RMS score less than 12.0 were considered. These circuits were screened further for patterning defects, and any circuit with major defects was discarded. Finally, experimental [23] and theoretical [24] investigations have shown that Kr represses hb, therefore, we only select circuits for further consideration if they show this property.
With this screening process, we obtained 23 circuits out of a total of 65 optimizations. This set of circuits have the same network topology (Table S3) as the circuits studied in earlier work [5,24]. The signs of the regulatory parameters are the same in all circuits with three exceptions, T gt←kni , T Kr ←gt , and m kni . Of these three, the last two change sign in only one circuit, k1_014 (Table S3).
This level of consistency suggests that network topology has been inferred at a qualitative level, in agreement with the conclusions of a recent study of parameter determinability of gap gene circuits [25]. Bcd and Cad are activators of hb, Kr, gt, and kni. tll is an activator of hb, and a repressor of the other gap genes. The interaction between hb, Kr, gt, and kni is one of mutual repression, with two exceptions: (1) gt is an activator of hb in all the circuits obtained. (2) kni is an activator of gt in about half the circuits, and a repressor in the other half. This network topology is discussed in depth elsewhere [5,24]. These results also hold true for circuits obtained using other Bcd profiles from the middle of the parameter scatter in Fig. 2A (black circles).
We chose one circuit (k1_007), which has RMS score 10.76, for further analysis. Its parameters are shown in Tables S1 and S2. The chosen circuit's gap gene patterns (Fig. S4A,B) are consistent with data except for two minor defects. The first one is a bulge on the anterior border of the posterior hb domain. The second is that the posterior border of the posterior hb domain does not form fully. Some circuits reported in previous work [5] also suffer from these defects. The first defect is due to very low levels of spurious tll expression in the middle part of the embryo stemming from imperfect background removal. The second is due to the omission of the gene hkb [20,26,27] Table S2: Kinetic parameters of the gap gene circuit chosen for analysis. h a was kept fixed at −2.5 during optimization. The parameters are explained in Section S1.      2B) and analyzed in this study. The other circuits were fit using the Bcd profiles highlighted in Fig. 2A with black circles. The first column shows the six borders that have low positional variation in the simulations of Bcd variation. Bcd variation was simulated in each circuit with two families of Bcd profiles. The first family consisted of raw, background-removed Bcd profiles (even-numbered columns). The second family consisted of the exponential fits of the profiles of the first family made using Eq. (2) (odd-numbered columns). The positional variation of the borders in simulations of circuit k1_007 using the second family of Bcd profiles is given in Table 2.

S3 Regulatory analysis of gap gene borders
This section describes the methodology of the regulatory analysis presented in Section 2.3. At a border of the expression domain of gene a, the regulation-expression function g(u a ) (see Eq. S1) changes from a value close to zero to a value close to one. In the regulatory analysis we determine the set of regulators of the gene a that are responsible for this change in the value of g(u a ). Here, h a is the total regulatory input to the gene a. Let i B be the indices of the nuclei over which the border of gene a forms. Since the sigmoid g(u a ) is approximately linear between 10% and 90% expression levels (Fig. S1), we can write for the purposes of this analysis. Consider the posterior border of the expression domain of a gap gene a. Let the expression level of protein a be at 90% of maximum at i = i 1 , and at 10% maximum at i = i M . The analysis is similar for anterior borders, but with i = i 1 at 10% maximum, and i = i M at 90% maximum. The total change in u a is u a i 1 − u a i M . Since this change is just a sum of regulatory contributions, we can divide it into individual regulatory terms By comparing the magnitude of the change in different regulatory terms ( , we can determine which regulators are driving the formation of the boundary. In Fig. 4A-C and 5A-C, the spatial derivatives of u a and individual regulatory terms are plotted so that the total change in u a is the area below or above the curve, and the total change in each regulatory term is the area between curves. It is possible to simplify this analysis by eliminating many regulators that cannot set the border. Autoregulation, for instance, cannot form the border [5]. It can only make a border sharper. Since this is a posterior border, the concentration of a reduces as one goes from anterior (i 1 ) to posterior (i M ). An activator whose concentration is increasing from i 1 to i M will tend to counteract the reduction in a's concentration. Thus an activator whose gradient falls in the opposite direction of the border cannot aid its formation. Similarly a repressor whose concentration is decreasing from i 1 to i M , that is, a repressor gradient in the same direction as the border, cannot cause the border to form.
Such simplification is also possible for anterior borders (increasing from i 1 to i M ). Activator gradients in the opposite direction to the border, and repressor gradients in the same direction as the anterior border cannot cause it to form.
Here we list the regulatory inputs that were eliminated in the regulatory analysis shown in Fig. 4A-C and Fig. 5A-C following the procedure described above. For the posterior border of the anterior hb domain (Fig. 4A), Hb autoactivation and Cad activation were eliminated. Of these two, Cad activation has a negligible contribution and hence only Hb activation is shown in red. In the analysis of the regulation of the posterior border of the central Kr domain (Fig. 4B), Kr autoactivation, Cad activation, Tll repression, and Hb repression were eliminated and their combined contribution is shown in red. For the posterior border of the posterior kni domain (Fig. 4C), Kni autoactivation, Kr repression, and Cad activation cannot set the border and their contribution is shown in red.
In the analysis of the regulation of the anterior border of the posterior kni domain (Fig. 5A), Kni autoactivation, Bcd activation, and Tll repression were eliminated and are shown in red. For the anterior border of the posterior gt domain (Fig. 5B), Gt autoactivation, Bcd activation, Kni activation, and Tll repression were eliminated and are shown in red. For the posterior border of the posterior gt domain (Fig. 5C), Gt autoactivation, Cad activation, and Kr repression cannot set the border and are shown in red.