Reverse-Engineering Post-Transcriptional Regulation of Gap Genes in Drosophila melanogaster

Systems biology proceeds through repeated cycles of experiment and modeling. One way to implement this is reverse engineering, where models are fit to data to infer and analyse regulatory mechanisms. This requires rigorous methods to determine whether model parameters can be properly identified. Applying such methods in a complex biological context remains challenging. We use reverse engineering to study post-transcriptional regulation in pattern formation. As a case study, we analyse expression of the gap genes Krüppel, knirps, and giant in Drosophila melanogaster. We use detailed, quantitative datasets of gap gene mRNA and protein expression to solve and fit a model of post-transcriptional regulation, and establish its structural and practical identifiability. Our results demonstrate that post-transcriptional regulation is not required for patterning in this system, but is necessary for proper control of protein levels. Our work demonstrates that the uniqueness and specificity of a fitted model can be rigorously determined in the context of spatio-temporal pattern formation. This greatly increases the potential of reverse engineering for the study of development and other, similarly complex, biological processes.


Supplementary Text S2: Structural identifiability analysis.
Structural parameter identifiability analysis is performed under the assumption of an ideal scenario where noise-free time-continuous experimental data are available. The objective is to answer the question whether under those ideal conditions the parameters can be given unique values within a given search space (see also main text). It is worth remarking at this point, that structural identifiability analysis is different to practical identifiability analysis in the sense that the actual experimental conditions and available data are not considered, only model structure and an idealised version of the experimental design is taken into account. Therefore, structural identifiability analysis answers the question whether model parameters can be determined at all, given a certain type of model and a certain type of experimental scheme (observed quantities and stimulation conditions).
The model in Eq.(1) in the main text can be rewritten in matrix form as follows: where y ≥ 0,ẏ ∈ R n correspond to the protein concentrations and their time derivatives; B ∈ R n×n is the control matrix, the identity matrix in this case; u(t − τ ) ≥ 0 ∈ R n corresponds to the vector of delayed inputs and A ∈ R n×n is the so called state matrix, defined as the following tridiagonal symmetric matrix: Given a reference model M (θ * ) defined as: Given the linear nature of the model (Eq.1) both in the states and the (delayed) inputs and thatẏ, y and u are piecewise continuous in time and of exponential order, the Laplace transform (L[.]) method may be used to assess whether the s.g.i. condition holds (see [1] and the references cited therein). The underlying idea is to compute the system transfer matrix , and to check whether: Note that Eq. 3 results in a set of equations binding θ and θ * . If for almost any θ this set of equations has a unique solution for θ * , M (θ) is structurally globally identifiable (s.g.i.), if the set of solutions is finite, M (θ) is said to be structurally locally identifiable (s.l.i).
Since the computation is symbolic, writing the transfer matrix in canonical form will simplify the analysis. A canonical form may be obtained by writing down each entry of the transfer matrix as the ratio of two polynomials ordered in s, provided that the numerator and the denominator are simplified by their greatest common divisor and that the coefficient of the denominator monomial with highest degree is set to one.
Taking into consideration that u(t − τ ) = u(t) * δ(t − τ ), the Laplace transform of the system 1 reads: by means of the Gershgorin circle theorem, for example, it is straightforward to get a sufficient condition for [sI − A] to be invertible and assuming that u(t)gt0 for some t values (before and after mitosis), the system transfer matrix reads: In order to write each of the components of the transfer matrix as the ratio of two polynomials the Padé formula is used to approximate the exponential term e −τ s by a rational function. For the sake of simplicity in the analysis a first order approximation will be used: e −τ s = (2 − τ s)/(2 + τ s).
Exploiting the fact that the matrix [sI − A] is tridiagonal and symmetric it is possible to analytically obtain its inverse (see, for example, [2]) and therefore the elements of the transfer matrix result: with: and H ij =H ji (the transfer matrix is symmetric). It should be noted that, even though the matrix [sI − A] is tridiagonal and symmetric, its inverse will be dense, thus H ij = 0, ∀i, j ≤ n. In addition the complexity of the corresponding elements increases with the size of the model, i.e. with n, due to the recursive nature of the expressions in Eq. 6.
We now limit the analysis to a simplified model with a reduced spatial domain of 3 nuclei. This will not affect general conclusions and simplifies the symbolic manipulation. The canonical form of the transfer matrix results: s 4 + s 3 (4Dτ +3τ λ+2) τ + s 2 (3D 2 τ +8Dτ λ+8D+3τ λ 2 +6λ) τ + s(3D 2 τ λ+6D 2 +4Dτ λ 2 +16Dλ+τ λ 3 +6λ 2 ) τ + 6D 2 λ+8Dλ 2 +2λ 3 τ Using symbolic manipulation it is possible to asses condition in Eq. 3 is satisfied. Therefore the model is structurally globally identifiable under a general input provided u i (t − τ ) = 0, i.e. all parameters may be given unique solutions provided protein are available for time points before or after mitosis (i. e. when the production rate is not zero.