A simplified modelling framework facilitates more complex representations of plant circadian clocks

The circadian clock orchestrates biological processes so that they occur at specific times of the day, thereby facilitating adaptation to diurnal and seasonal environmental changes. In plants, mathematical modelling has been comprehensively integrated with experimental studies to gain a better mechanistic understanding of the complex genetic regulatory network comprising the clock. However, with an increasing number of circadian genes being discovered, there is a pressing need for methods facilitating the expansion of computational models to incorporate these newly-discovered components. Conventionally, plant clock models have comprised differential equation systems based on Michaelis-Menten kinetics. However, the difficulties associated with modifying interactions using this approach—and the concomitant problem of robustly identifying regulation types—has contributed to a complexity bottleneck, with quantitative fits to experimental data rapidly becoming computationally intractable for models possessing more than ≈50 parameters. Here, we address these issues by constructing the first plant clock models based on the S-System formalism originally developed by Savageau for analysing biochemical networks. We show that despite its relative simplicity, this approach yields clock models with comparable accuracy to the conventional Michaelis-Menten formalism. The S-System formulation also confers several key advantages in terms of model construction and expansion. In particular, it simplifies the inclusion of new interactions, whilst also facilitating the modification of regulation types, thereby making it well-suited to network inference. Furthermore, S-System models mitigate the issue of parameter identifiability. Finally, by applying linear systems theory to the models considered, we provide some justification for the increased use of aggregated protein equations in recent plant clock modelling, replacing the separate cytoplasmic/nuclear protein compartments that were characteristic of the earlier models. We conclude that as well as providing a simplified framework for model development, the S-System formalism also possesses significant potential as a robust modelling method for designing synthetic gene circuits.


Model equations
In each model below, c (m) i (t) and c i (t) denote the cellular concentration of the mRNA and protein of gene i, respectively. For the models where cytoplasmic and nuclear protein are treated separately, these two forms are labelled as c  i (t), respectively. The different gene products labelled by i are listed in S1 Table. c P (t) (or c (n) P (t)) denotes the light-activated PIF3-like protein introduced originally in JL2005 [S1] and subsequently used in all the other clock models considered in this study. The light input, L I (t), is modelled as a periodic square wave that switches between 0 and 1 (see eq. (3) of the main paper).

AP2012S: extended S-System formulation
Note: For the ZTL protein equation, we have included exponents h i,k controlling the stabilisation of ZTL by GI protein. This is because in MF2016K, this process was modelled as a rational function of the two protein concentrations (cf. eqs. (S1.9)), rather than as a product of the concentrations, as is usually the case (cf. eq. (13)).

MF2016KSorig: original S-System formulation
The sine-sweeping method In sine-sweeping, the input to the system is a sinusoidal signal whose frequency is varied within the range of interest. By collecting the magnitude and phase values of the output responses, the transfer function of the system can be approximated. Here, a summary of the method is given. For more details see [S2]. Let the sinusoidal input signal be x(t) = U in sin(ωt), where U in and ω are the amplitude and frequency respectively. According to linear systems theory, if the system is linear time invariant, the output response will be a sinusoid of the same frequency, but with scaled amplitude and a phase shift. In practice, nonlinearities, transients, and disturbances V (t) will also affect the output. Thus, we can write the output y(t) as and G(iω) is the transfer function relating the input and output.
By neglecting the initial part of the data and assuming that the linear dynamics dominate, we can ignore the effects of transients and nonlinearities, respectively. Furthermore, the effect of disturbances V (t) can be reduced through the correlation method [S2], where the idea is to correlate the output response y(t) with a sine and cosine of the same frequency, and then average over the signal length (S9 Fig). Writing the sampled output signal as {y(t k ) : 1 ≤ k ≤ L}, the resulting 14/16 averages are given below: By substituting (S2.12) into (S2.13) and performing some algebraic manipulation, we obtain 14) The second terms in the expressions for Q S (L) and then it can be shown that the third terms in the expressions for Q S (L) and Q C (L) also converge to zero as L → ∞ (see [S2] for complete details). From the remaining terms in (S2.14), the magnitude, |G(iω)|, and phase, ∠G(iω), can therefore be estimated using the equations below: Plotting |G(iω)| and ∠G(iω) against frequency ω then yields the Bode plot representation of the sine-sweeping approximation to the transfer function G(iω).

Transfer function analysis of protein shuttling in JL2006
The Bode plots for JL2006 generated using sine-sweeping that are shown in Fig 6 of the main paper imply that the linear systems approximations to the functions relating input mRNAs to output nuclear proteins can be accurately represented as first-order transfer functions. In this section, we provide further evidence for the viability of this approximation using frequency response analysis, and also show that a first-order transfer function is equivalent to having an aggregated protein equation, rather than two separate equations modelling protein shuttling. We start by noting that the protein shuttling mechanisms used in JL2006 all have the general form shown below (cf. eqs. (S1.3)): dP j,cy dt = αG i − β cy P j,cy + β nu P j,nu − γ cy P j,cy K cy + P j,cy , dP j,nu dt = β cy P j,cy − β nu P j,nu − γ nu P j,nu K nu + P j,cy . (S3.1) Here, G i is the gene, P j,cy is cytoplasmic protein, P j,nu is nuclear protein, α is the translation rate, {γ cy , γ nu } are degradation rates, {K cy , K nu } are Michaelis constants for degradation and {β cy , β nu } are transport rates. Linearising the nonlinear degradation terms in eqs. (S3.1) and rescaling the degradation rates yields the following: dP j,cy dt = αG i − β cy P j,cy + β nu P j,nu − γ cy P j,cy , dP j,nu dt = β cy P j,cy − β nu P j,nu − γ nu P j,nu . (S3.2) 15/16 REFERENCES Taking Laplace transforms of (S3.1) and assuming zero initial conditions, we arrive at the following equations involving the transforms of G i (t), P j,cy (t) and P j,nu (t): P j,cy (s) = α s + β cy + γ cy G i (s) + β nu s + β cy + γ cy P j,nu (s), Some algebraic manipulation then shows that the transfer function relating input mRNA (G i (s)) to output nuclear protein (P j,nu (s)) is given by: αβ cy s 2 + (β cy + γ cy + β nu + γ nu )s + (β nu γ cy + β cy γ nu + γ cy γ nu ) .
We note that eq. (S3.4) is a second order-transfer function. The only way in which eq. (S3.4) can be approximated by a first-order transfer function, as suggested by the sine-sweeping results, is if its two poles (the roots of its denominator) are far apart; in other words, if eq. (S3.4) has one fast pole and one slow pole. We demonstrate that is indeed the case for component Y of JL2006, and show the similarity between the Bode plots for the full second-order transfer function and its first-order approximation (similar results can be demonstrated for the other components of the model). In order to compute the full second-order transfer function of Y, we take the values α = p 4 = 0.2485, β cy = r 7 = 2.2123 and β nu = r 8 = 0.2002 from S5 Table. For the nonlinear degradation terms, we use the best linear approximations (see S10 Fig Here, a and b represent translation and degradation, respectively. Taking Laplace transforms and assuming zero initial conditions as before results in the following first-order transfer function: The values of a and b in eq. (S3.7) can be inferred from a Bode plot by using the fact that b is the corner frequency -the frequency at which the magnitude trace starts to deviate from its plateau value A vdB = 20 log 10 (a/b). The Bode plot for eq. (S3.5) implies b ≈ 0.090 and A vdB = −14 dB, from which it follows that a ≈ 0.018. Comparing Bode plots in S11 Fig shows that the full second-order transfer function for Y (eq. (S3.5)) is quite accurately modelled by the first-order approximation (eq. (S3.7)) with these values of a and b, particularly in the lower frequency range.  Y , respectively. In each case, degradation rate is plotted for expression levels ranging between 0 and the maximum level observed in the synthetic training and validation datasets (see S1 Fig and S2 Fig). In these ranges, the nonlinear functions are well-approximated by linear fits (red lines), the gradients of which are taken as the values of γ cy and γ nu used to derive eq. (S3.5) in S1 Text.

S12 Fig. Variation in optimised parameter values for the extended S-System models.
A-D: Fits of JL2005S, JL2006S, AP2012S and KF2014S to synthetic data. E: Fits of MF2016KS to experimental data. Boxplots show parameter distributions obtained from six independent optimisation runs. In each boxplot, the horizontal line denotes the median value, the edges of the box are the 25th and 75th percentiles, the whiskers denote the most extreme datapoints not considered to be outliers, and outliers are plotted as red crosses. Model parameter indices are defined in S3 Table ( Table. Optimal parameter valuesΘ S LL for the extended S-System formulation JL2005S of JL2005, obtained by fitting the model to the synthetic training data. For each parameter, the number in brackets is the normalised median absolute deviation (nMAD). This is calculated using the value shown, together with those obtained from five additional, independent optimisation runs. The rightmost column shows the parameter indexing, counting left to right across rows, that is used in S12 Fig.

Training dataset
Validation dataset Gene/protein   Table. Optimal parameter valuesΘ S LL for the extended S-System formulation JL2006S of JL2006, obtained by fitting the model to the synthetic training data. For each parameter, the number in brackets is the normalised median absolute deviation (nMAD). This is calculated using the value shown, together with those obtained from five additional, independent optimisation runs. The rightmost column shows the parameter indexing, counting left to right across rows, that is used in S12 Fig.

Training dataset
Validation dataset Gene/protein  S9 Table. Optimal parameter valuesΘ S LL for the extended S-System formulation AP2012S of AP2012, obtained by fitting the model to the synthetic training data. For each parameter, the number in brackets is the normalised median absolute deviation (nMAD). This is calculated using the value shown together with those obtained from five additional, independent optimisation runs. The rightmost column shows the parameter indexing, counting left to right across rows, that is used in S12 Fig.

Training dataset
Validation dataset Gene/protein   Table. Optimal parameter valuesΘ S LL for the extended S-System formulation KF2014S of KF2014, obtained by fitting the model to the synthetic training data. For each parameter, the number in brackets is the normalised median absolute deviation (nMAD). This is calculated using the value shown, together with those obtained from five additional, independent optimisation runs. The rightmost column shows the parameter indexing, counting left to right across rows, that is used in S12 Fig Table. Optimal parameter valuesΘ E LL for the extended S-System formulation MF2016KS of MF2016K, obtained by fitting to the experimental training data. For each parameter, the number in brackets is the normalised median absolute deviation (nMAD). This is calculated using the value shown, together with the values obtained from five additional, independent optimisation runs. The rightmost column shows the parameter indexing, counting left to right across rows, that is used in S12 Fig