A Quantitative and Dynamic Model of the Arabidopsis Flowering Time Gene Regulatory Network

Various environmental signals integrate into a network of floral regulatory genes leading to the final decision on when to flower. Although a wealth of qualitative knowledge is available on how flowering time genes regulate each other, only a few studies incorporated this knowledge into predictive models. Such models are invaluable as they enable to investigate how various types of inputs are combined to give a quantitative readout. To investigate the effect of gene expression disturbances on flowering time, we developed a dynamic model for the regulation of flowering time in Arabidopsis thaliana. Model parameters were estimated based on expression time-courses for relevant genes, and a consistent set of flowering times for plants of various genetic backgrounds. Validation was performed by predicting changes in expression level in mutant backgrounds and comparing these predictions with independent expression data, and by comparison of predicted and experimental flowering times for several double mutants. Remarkably, the model predicts that a disturbance in a particular gene has not necessarily the largest impact on directly connected genes. For example, the model predicts that SUPPRESSOR OF OVEREXPRESSION OF CONSTANS (SOC1) mutation has a larger impact on APETALA1 (AP1), which is not directly regulated by SOC1, compared to its effect on LEAFY (LFY) which is under direct control of SOC1. This was confirmed by expression data. Another model prediction involves the importance of cooperativity in the regulation of APETALA1 (AP1) by LFY, a prediction supported by experimental evidence. Concluding, our model for flowering time gene regulation enables to address how different quantitative inputs are combined into one quantitative output, flowering time.


Introduction
Flowering at the right moment is crucial for the reproductive success of flowering plants. Hence, plants have evolved genetic and molecular networks integrating various environmental cues with endogenous signals in order to flower under optimal conditions [1]. Various input signals are received and transmitted by signal transduction pathways including the photoperiod pathway, the vernalization pathway, the ambient temperature pathway and the autonomous pathway [2]. Finally, the input from these pathways is integrated by a core set of flowering time integration genes ("integration network"). This regulation contributes to the adaptation of plants to different environmental conditions and facilitated the successful dispersion of flowering plants over the world [2].
The complexity of flowering time regulation is enormous, even when focusing on the network involved in integrating the various signals. To understand how gene disturbances influence flowering time, it is not only important to know which genes regulate each other, but also how strongly these genes influence each other. Hence, quantitative aspects of flowering time changes upon perturbations of input signals cannot be understood by merely assessing qualitatively which interactions are present. To this end, a quantitative model describing how different genes in the network regulate each other is needed. Indeed, other complex plant developmental processes have been subject to extensive modeling efforts [3]. This includes processes such as the circadian clock [4][5][6][7], auxin signalling [8][9][10][11], photoperiod regulation of flowering time genes [12,13] and the development of floral organs [14][15][16][17], which all have been investigated in detail by computational models. These models enable to formalize biological knowledge and hypotheses, and, importantly, to investigate how various types of inputs are combined to give a quantitative readout.
Flowering time regulation has been extensively studied experimentally in the plant model species Arabidopsis thaliana. Substantial qualitative information is available about the factors involved and how these interact genetically. However, the information that is needed for quantitative and dynamic modelling is missing to a large extent. This includes comprehensive and standardized quantitative data on flowering time under various conditions and in different genetic backgrounds [18], and time series of expression for key flowering time integration genes in those backgrounds. In line with the scarcity of quantitative information useful for modelling, the floral transition in Arabidopsis thaliana has been scarcely studied using modeling approaches. Recently a few promising mathematical modeling approaches appeared aimed at modeling the floral transition in various plant species [19][20][21]. Dong et al. modeled a network of four genes involved in the floral transition in maize [19], and Satake et al. modeled a twogene network in Arabidopsis halleri [21]. Earlier work on modelling Arabidopsis thaliana flowering time did not take genetic regulation into account or used a mainly qualitative approach [22]. Only very recently a first quantitative model of the Arabidopsis thaliana flowering time integration network was presented [20].
We aimed to obtain a mechanistic understanding of the Arabidopsis thaliana flowering time integration network, by investigating a core gene regulatory network composed of eight genes ( Network (FLV), and by the Netherlands Consortium for Systems Biology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. RvH is employed by Keygene N.V. Keygene N.V. provided support in the form of salary for author RvH, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific role of this author is articulated in the 'author contributions' section.
Competing Interests: Roeland van Ham is currently employed by Keygene N.V. Upon submission he is not aware of patents or patent applications covering the content of this manuscript to which he or his current employer is involved as inventor or applicant. This statement does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.
APETALA1 (AP1), FLOWERING LOCUS T (FT), LEAFY (LFY) and FD. Although certainly more genes are involved in integrating the various signals influencing the timing of the floral transition [2,23], we focussed on these genes because a) we aim to model the core of the network responsible for flowering time regulation; and b) for these genes, the available experimental data renders a clear picture of their mutual interactions (see above; Table A in S1 File; Fig. 1). In the leaves, SVP and FLC repress the transcription of FT [24][25][26][27]. FT is produced in the leaves and moves to the shoot apical meristem (SAM) [28,29]. FT has the potential to interact with FD [30,31] and complex formation is supposed to occur at the SAM, leading to activation of SOC1 [32] and AP1 expression [30,33]. FLC and SVP are also expressed in the SAM, where they repress the expression of SOC1 [34][35][36]. SOC1 integrates signals from multiple pathways and transmits the outcome to LFY [37,38], which is supposed to act at least partially via a positive feed-back loop in which AGL24 is involved upon dimerizing with SOC1 [39]. In turn, LFY is a positive regulator of AP1 [40] and of FD [20]. The commitment to flower is ascertained by a direct positive feed-back interaction between AP1 and LFY. Once the expression of AP1 is initiated, this transcription factor orchestrates the floral transition by specifying floral meristem identity and regulating the expression of genes involved in flower development [41]. Importantly, in comparison with the recently presented model of the floral transition in Arabidopsis [20] we included the key floral integrator genes SOC1, SVP and AGL24 in our model. Red arrows represent repression, blue arrows activation. Most interactions were taken as given based on literature information, but for regulation of LFY by AGL24 and SOC1, different ways of combining the two inputs were tested (indicated by the light blue arrows). Dashed arrow represents FT transport. Junction symbol next to AP1 indicates cooperativity predicted for regulation of AP1 by LFY. As indicated, AP1 expression is used as a marker for the moment of the floral transition. This network was used to fit expression time-course data and to predict the effect of perturbations. Gene names are given in full in the text.
The above introduced interactions between the flowering time integration genes and the floral meristem identity genes at the end of the pathway allow to derive a set of Ordinary Differential Equations (ODEs) describing how genes in the network regulate each other. ODEs were chosen because they arise from continuum modelling of molecular interactions and allow quantitative analysis of the effect of perturbations on expression levels and finally on flowering time. Because of the above mentioned role of AP1 as orchestrator of floral meristem identity specification, the moment at which the AP1 expression level starts to rise is used as a proxy for flowering time in the model.
In order to build and validate an ODE model describing the network constituted by the eight selected genes, we obtained three quantitative datasets: i) gene expression time-courses of the selected eight genes in wild type; ii) flowering time of plants of different genetic backgrounds; and iii) expression data of the selected genes in the plants of these different genetic backgrounds. A key aspect of our approach is that we estimate model parameters using the dynamic gene expression time-course data for the components of the model, in combination with flowering time data (datasets i and ii). We validated our model by comparing predicted expression time-courses for mutants in components of the network with experimental data (dataset iii). Finally, we obtained detailed understanding of how genes are affected by perturbation in other genes, via the regulatory interactions that constitute the network.

Model building and parameter estimation
Given the importance of combining various input signals into a final decision to flower, a key question is how the integration network generates a quantitative response, i.e. how expression level perturbations of various magnitudes result in specified changes in other network components and finally in a change in flowering time. In order for the model to be able to link expression changes to changes in flowering time, we included AP1: expression of AP1 indicates that the switch from vegetative to reproductive growth has occurred [42]. As such, we use the moment at which AP1 expression rises above a certain threshold in our model as a proxy for the moment at which flowering starts (see Methods for details).
Our approach to investigate the network involves modelling by ordinary differential equations (ODEs), which describe how the expression level of each gene is influenced by the other genes. This regulation is described by Hill functions [43], which represent activation or repression by the various regulators. Genetic and molecular knowledge on the network structure is used as input to define these equations. Parameters in these equations represent interaction strengths and other biological or physical aspects of the system, and are estimated using wildtype gene expression time-course data. FLC and SVP are not known to be regulated by any of the genes included in our model, and for that reason, they are included as external input factors, that regulate one or more other genes in the model. In order to model transport of FT protein to the shoot apical meristem [44], we assumed that the FT produced in the leaves reaches the meristem with a delay. An optimal parameter set, which includes the FT transport delay, was identified by fitting the equations to qRT-PCR time-course data from leaves and SAM-enriched material obtained from Arabidopsis plants grown at 23°C under long-day (LD) conditions (Tables B-C in S1 File).
The genes in the core regulatory network of flowering time control cooperate to activate the flowering orchestrator AP1 [41]. This allows proper timing of AP1 expression and fine tuning of flowering time in response to different environmental cues. In wild type Arabidopsis, the AP1 level remains barely detectable in the SAM until about day 13 after germination and then sharply increases (Fig. 2). As mentioned above, we use the moment at which AP1 expression level rises as a marker to indicate that the transition to reproductive development is completed, which is interpreted as a predictor of flowering time. Based on that, we developed a fitting strategy that besides of aiming at a good fit, optimizes the correlation between predicted and observed flowering time. To be able to do so, we obtained a consistent set of flowering time measurements for mutants of six of the genes in our network (Figure A in S1 File). Flowering time was measured as the number of rosette leaves (RL) present at flowering. To compare model predictions for flowering time, expressed in units of days, these were scaled to RL (Methods).
A total of 35 parameters in six equations were estimated from the time series data containing 13 datapoints (expression levels) per gene (Tables B-C in S1 File; Fig. 2). Given the variability in the data, the fit is satisfactory, as indicated by the value of the normalized root mean square error (nrmse). For FT, for which the data shows highest variability, the highest nrmse (27%) was obtained. For SOC1, the overall fit was good, but does not capture the data point at day 9, which deviates from the general trend in the time series, resulting in a nrmse of 19%. For AP1 and FD the value of the nrmse was around 14%, and for AGL24 and LFY it was 7%. The FLC and SVP expression data were used directly as input to the model; these are shown in Figure B in S1 File. Interestingly, for the data describing AP1, we could only obtain a good fit by introducing a particular value of one parameter describing how AP1 is regulated by LFY. As Besides this exception, there is considerable agreement between data and predictions. Indeed, comparison of the Pearson correlation (R = 0.85, including the outlier) with correlation obtained using randomized data demonstrates the significance of this result (p<0.005), indicating a satisfactory model fit.

Model validation
A key issue in our model is the mechanism by which the network is able to give a quantitative response to specific perturbations. How are changes in a given gene expression level transferred to other components of the network, and how does this impact flowering time? In order to validate model predictions of how changes in expression propagate through the network, we simulated the expression time-courses for mutants and obtained independent experimental data for comparison. For that, microarray experiments were used, which were carried out for wild type and four mutant backgrounds (soc1, agl24, fd and flc). In these experiments, a flowering inducing shift from short-day to long-day conditions was used [45]. To account for the fact that these experimental conditions cannot directly be simulated, our comparison of the experimental microarray data with simulation results focusses on the overall effect of a mutation over the complete time-course (Methods). As indicated by the value of Pearson's R (0.69; p-value = 0.003), the predicted overall expression level changes of flowering time genes upon upstream mutations show a significant correlation with the experimental data (Figs. C-D in S1 File). Assessing the correlation per gene (across the different mutants) indicates similar correlation for each of the genes. However, assessing the correlation per mutation (across the different genes) indicates good predictive performance for SOC1, FD and AGL24 mutations, but not for FLC mutation. The latter could be due to the low expression and limited role of FLC in the Col-0 background due to the FRIGIDA (FRI) mutation [46]. The comparison with the microarray dataset constitutes an independent evaluation of the predictive performance of the model, demonstrating that the model allows predicting the overall magnitude of the effect of a perturbation in one gene on other genes in a quantitative manner.
To further assess the predictive performance of the model, we analysed five double mutants in which over-expression of one gene was combined with knock-out of a second gene. In all cases, both genes involved activators of flowering (Fig. A in S1 File), implying that it is intuitively difficult to predict whether the double mutant will be early or late flowering. These mutants were not used in the parameter estimation stage. The resulting prediction performance was satisfactory (Fig. 3A): for four out of five cases, the prediction was qualitatively correct ("early flowering"). Quantitatively, the correlation between experimental and predicted flowering times was reasonable as well, although not significant at the p = 0.05 level (Pearson R = 0.75; p = 0.1). It is good to realize that no perfect fit was expected in this case because of variable temporal and spatial overexpression levels due to the usage of the 35S promoter [47].

Spread of perturbations through the network
As a first example of quantitative understanding of flowering time regulation, we analysed the predicted expression changes in various mutant backgrounds (Fig. 3B). A key question here is how gene expression perturbations spread through the network. We found that the model predicts that the spread of a perturbation is not in all cases directly related to the position that different genes have in the network (Fig. 3B). For example, the effect of mutating SOC1 on LFY is smaller than its effect on AP1, although SOC1 regulates LFY and does not directly regulate AP1, but only indirectly via LFY. Analysis of the regulatory interactions and the associated parameters in the model allows rationalizing such differences. For the above-mentioned different magnitudes of the effect of soc1 mutation on LFY compared to its effect on AP1, it is relevant that the estimated expression activation strength (parameter β) for the influence of SOC1 on LFY (β 7 ) is much smaller than that for the influence of LFY on AP1 (β 9 ; Table C in S1 File). This means that the model predicts that a change in SOC1 will give rise to a relatively small change in LFY, which however will be amplified by LFY regulating AP1. This effect is visible in the experimental microarray data as well, where in the soc1 mutant background LFY expression is much less affected than AP1 expression (normalized AP1 expression change in the soc1 mutant compared to wild type is two times that of LFY; Figs. C-D in S1 File). This illustrates that the effect of perturbations can considerably grow in magnitude throughout the network.

Regulation of AP1 by LFY
As mentioned above, for the regulation of AP1 by LFY our initial analysis using the PCR timecourse data indicated that we needed to introduce DNA-binding cooperativity in the equations in order to get a reasonable fit of the data. There is indeed experimental evidence for cooperativity in the LFY-AP1 interaction, based on the LFY protein-DNA structure and additional experimental data [48]. In our modelling approach, cooperativity is defined by a Hill coefficient n>1 in the term in the differential equation describing the regulation of AP1 by LFY. For the regulation of AP1 by LFY, setting the value of n = 3 resulted in a markedly improved fit of the wild type time-course data (Fig. E in S1 File). No improvement of the fit could be obtained for the other interactions in the network by the introduction of a Hill coefficient larger than 1, meaning that the data does not contain evidence for cooperativity in those interactions. Cooperativity in the LFY-AP1 interaction provides an additional predicted mechanism by which a small change in LFY, can lead to a large change in AP1 expression. Experimental evidence indeed suggests that cooperativity in LFY binding to the AP1 promoter is important [48].

Regulation of LFY by AGL24 and SOC1
It has been suggested that SOC1 requires dimerization with AGL24 for binding to the LFY promoter. This is based on several sources of experimental evidence: (I) in yeast-two-hybrid assay, AGL24 and SOC1 form a heterodimer [49]; (II) SOC1 is only detected in the nucleus when AGL24 is present as well [39]; (III) LFY is expressed only in those tissues where SOC1 and AGL24 expression overlap [39]. Nevertheless, there is a significant difference between the flowering time observed for soc1 and agl24 mutants (Fig. 4A). If these two proteins would bind the LFY promoter as AGL24-SOC1 dimer only, then knockout mutations in either AGL24 or SOC1 would equally reduce the dimer concentration; therefore, one would expect the same effect on LFY.
Based on these considerations, in our final model, AGL24 and SOC1 have independent roles in regulating LFY. We tested an alternative model version in which AGL24 and SOC1 only regulated LFY as a dimer and not separately from each other. This resulted in a decreased goodness-of-fit in particular for LFY (nrmse 43% instead of 7%) and in this alternative model, indeed the effect of agl24 and soc1 mutation on LFY and on flowering time were comparable, which contradicts available experimental data.
In our model, in which AGL24 and SOC1 have independent roles in regulating LFY, the simulated LFY expression is reduced by only~25% in the agl24 knockout mutant relative to its time-course expression in wild type. In contrast, LFY expression is predicted to be reduced bỹ 65% in the soc1 mutant ( Fig. 4B; Fig. C in S1 File). These predicted changes are consistent with what is experimentally observed in the microarray data for LFY (Fig. 4C). If the expression level of SOC1 in wild type would be much higher than that of AGL24, a hypothesis to explain the observed difference between agl24 and soc1 could be that elimination of such more abundant factor would have a larger effect. However, in our expression data, expression levels of AGL24 and SOC1 are of the same order of magnitude. According to the model, two parameters are important in describing the regulatory effect of SOC1 and AGL24 on LFY: DNA binding efficiency (represented by parameter K) and expression activation strength (parameter β). A difference in any of these two parameters between SOC1 and AGL24 could lead to a difference in the effect of SOC1 versus AGL24 mutation. In the set of parameter values we obtained for our model, the DNA binding efficiency for AGL24 (K 10 ) and SOC1 (K 11 ) binding to the LFY promoter is quite similar. However, there is a substantial difference in activation strength (β 7 vs β 6 ), with SOC1 being much more able to activate LFY, resulting in a much larger effect of soc1 mutation compared to agl24 mutation. Analysis of predicted flowering times for a range of values of β for SOC1 and AGL24 confirms the dependency on the SOC1 activation strength (Fig. 4D). In addition, the flowering time observed for the double mutant soc1/agl24 suggests a small additive effect when both genes are simultaneously knocked-out (Fig. 4A). The difference between the flowering time of the double mutant and that of the single soc1 and agl24 mutants is significant (t-test; p-value 0.001). In agreement with this observation, the model simulation predicts a small additional reduction in LFY expression for the soc1/agl24 double mutant (~80% vs.~65% in single mutant; Fig. 4B). Overall, these examples demonstrate how we get quantitative insight in the spread of perturbations through the network. Moreover, this demonstrates that we can analyse how the quantitative output of the network as a whole is governed by specific molecular interactions that build up the network.

Discussion
An important reason to apply computational models to a biological system, such as the floral integration network, is that it allows investigating how the various interactions that together constitute the network, transmit perturbations into a final readout. Indeed, by integrating experimental data with modeling we analyse how different components of the flowering time regulation network react to changes in other components, finally leading to a specific flowering time. We specifically analysed the regulation of LFY by SOC1, the regulation of LFY by AGL24, and the regulation of AP1 by LFY. In these cases, the activation strength was found to be the most important cause of the observed differences in magnitude of effect of perturbations, according to the model. This could mean that the protein with the higher predicted activation strength itself is a stronger transcriptional activator than the other protein, or it could indicate involvement in a protein interaction with a partner (not explicitly included in the model), which is a stronger activator. In the case of the different effect of the soc1 mutation compared to the agl24 mutation, it is important to consider that both SOC1 and AGL24 are known to form additional complexes, and such dimers might also play a role in their differential functioning [49]. In addition, as a general note on our interpretation of parameter values, it is important to realize that we use a fixed conversion of mRNA levels to protein levels; this means that potential differences in e.g. translation rate could complicate the interpretation of the parameters.
In a recently published review, an overview is given of attempts to model plant reproduction Gene Regulatory Networks, including networks involved in flowering time regulation [50]. Previous work on modelling flowering time used the concepts of "photothermal units" or variants thereof as a way to computationally investigate flowering time and how it is influenced by the environment; as recently demonstrated, such models can in principle be connected to genetic information [51]. However, this does not provide a direct way to incorporate the regulatory interactions between genes, which are key towards a mechanistic understanding of flowering time regulation. Our work is more comparable to recent approaches, which start with defining interactions in a gene regulatory network and then develop a model based on this network [19,20]. Our approach extends the recently published Arabidopsis flowering time model [20] by fitting model parameters using dynamic expression data. The model by Jaeger et al. predicts a rather gradual upregulation of AP1, which does not reproduce the sharp transition from low expression to the on-state, as seen in our experimental data. This indicates that a model, in which parameters are estimated purely based on mutant flowering times, might miss important aspects of gene expression dynamics. Additional time course data could in the future be obtained at various experimental conditions (temperature, light) as a step towards including the effect of such conditions on the model. A direct advantage is that our parameters have a physical interpretation (e.g. activation strength, cooperativity, etc).
When analysing for which genes the model predictions were of better quality, the effects of an SVP overexpression mutant and an FT knock-out mutant on flowering time were predicted less accurately compared to other mutants (including SVP knock-out and FT overexpression mutants). FT and SVP are connected to each other in the network, which could indicate that in this part of the network the model needs refinement. In particular, given that SVP overexpression results in lower FT expression, the fact that both SVP overexpression and FT knock-out were not well predicted indicates that the effect of lower FT levels, either directly on AP1 or more indirect via SOC1, is not perfectly captured. It is however also important to consider that the FT levels used as input in our model are relatively low, which is related to the fact that they are not measured at the peak of diurnal expression of FT. Another aspect to consider is that FLC and SVP are present as external inputs in the model and are not directly modelled; if a mutation in one of these impacts the other as well, the model would miss such effect, which would deteriorate prediction performance. This might indeed be the case, according to ChIP-seq data [35,36].
Clearly, there are several directions to expand our work. We do not specifically represent protein and RNA separately; currently the state of the art in the proteomics field does not allow high-throughput and precise quantification of protein levels during the vegetative phase of plant development. Recent evidence indicates however that for at least one component in the model, SVP, the effect of protein stability is important [52]. In theory, for the differential effects of soc1 vs. agl24 mutation, for which we provide an explanation in terms of a difference in a specific parameter in the model, difference in protein levels in spite of similarity in RNA levels could also be relevant, although there is currently no experimental data that indicates this.
In general, the amount of detail in the model will always be a compromise. This holds as well for the type of interactions in the network. Currently, regulatory interactions are modelled, whereas protein-protein interactions are not explicitly included. Nevertheless, the way in which regulatory inputs are combined gives an implicit representation of the way in which proteins interact with each other. Although the importance of complex formation for the components of the network is clear [49,53], one reason why at our level of detail protein complexes can be excluded might be that they are mainly relevant for specifying the correct regulatory interactions (which are explicitly defined in the model equations) and less so for the kinetics of the model. Depending on the availability of proteomics data, it would however be straightforward to include e.g. protein dimerization explicitly in our equations. Another relevant addition could be to include post-translational modifications such as phosphorylation, which are known to be relevant for some of the components of the flowering time regulatory network [54].
Currently, we focused on a core set of genes involved in integrating various flowering time signals. Given that input from the environment converges on various components of the flowering integration network, an exciting follow-up step will be to incorporate environmental cues as the next layer of information in the gene regulatory network. This could include both direct environmental effects on some of the model components, or modelling complete upstream pathways. As an example of direct environmental influence that could be modelled, recent data indicates that the above mentioned effects of SVP protein stability as well alternative splicing of the flowering time regulator Flowering Locus M (FLM) depend on temperature [55]. To include the former, although protein levels are not explicitly present in our model, an effect of temperature on stability could be represented by changing the SVP decay parameter; for FLM, additional equations describing the two isoforms would be needed. As for the modelling of upstream pathways, in a recent overview of known effects of mutations,~150 genes were listed as being currently known to impact flowering time [56]. It remains to be seen which would be the best approach to include such genes and whether it is essential to include all of them for reliable predictions. Given sufficient time-course data it might be possible to use the same approach as presented here. However, it would also be an option to focus detailed modelling efforts on particular parts of the network; for example, for the influence of light on the circadian clock, models have already been developed [4][5][6] and these could be connected to our model. Other parts of the network could be treated in a more coarse grained, statistical approach.
To conclude, we present a dynamic and predictive model for flowering time regulation. Our work presents a framework for studying the mechanisms of flowering time regulation, by addressing how different quantitative inputs are combined into a single quantitative output, the timing of flowering. Quantitative qRT-PCR data RNA was isolated from the plant samples (max 100 mg of grinded plant material) using the InviTrap Spin Plant RNA Mini Kit. Subsequently, a DNAse (Invitrogen) treatment was performed, which was stopped with 1μL of a 20 mM EDTA solution and 10 minutes incubation at 65°C. Total RNA concentration was measured, and 1 μg RNA was used to perform cDNA synthesis by the Taqman MultiScribe Reverse Transcriptase kit (LifeTechnologies). qRT-PCR was performed with the SYBR green mix from BioRad using the gene specific oligonucleotides indicated in Table D in S1 File. YELLOW-LEAF-SPECIFIC GENE8 (YLS8) was implemented as reference gene for the analyses.

Plant materials and growth conditions
The relative gene expression was given by E target = 2 ΔCt , where Ct stands for the threshold cycle and ΔCt = Ct target -Ct reference . From that, the absolute abundance was estimated by A target-= E target × s, where s stands for a scaling factor obtained by dividing the average abundance that a transcript reaches in a cell by the highest E target value among all samples, and multiplying by an assumed maximal protein abundance. Since a linear relationship between abundances of RNA and protein is assumed in the model, the average transcript abundance was adjusted based on average abundance of a protein in cell. An available estimate for the range of protein abundance is between 400nM and 1400nM [57]. From this range, the average abundance for the flowering time gene products was arbitrarily chosen (500nM). This means that the maximum absolute expression among all samples is equal to 500nM (Fig. 2). Scaled expression values used in parameter estimation are available in S1 Dataset, which also contains model source code (see below).

Microarray data
Microarray time series experiments were performed as previously described [58] using RNA isolated from manually dissected shoot apices of Col-0, soc1-6, agl24, and fd-3. Briefly, biotinylated probes were prepared from 1 μg of total RNA using the MessageAmp II-Biotin Enhanced Kit (Ambion) following the manufacturer's instructions and hybridized to Arabidopsis ATH1-121501 gene expression array (Affymetrix). Arrays were washed on a GeneChip Fluidics Station 450 (affymetrix) and scanned on an Affymetrix GeneChip Scanner 7G using default settings. Expression data for Col-0, soc-6, agl24, and fd-3 have been deposited with ArrayExpress (E-MEXP-4001). Expression data for flc-3 (ArrayExpress: E-MEXP-2041) have previously been published [59]. The probe intensities were normalized and the gene expression estimates were obtained using the gcRMA library of R/Bioconductor [60].

The model
The regulatory interactions shown in Fig. 1 were modelled by equations based on Hill kinetics. It was assumed that spatial aspects could be ignored (except for FT transport); hence the interactions between the components are described by a set of ordinary differential equations (ODEs). Furthermore, only proteins were explicitly modelled, and a linear relationship between RNA levels and protein levels was assumed. The model is composed of the following equations: For FLC and SVP, gene expression is represented in the leaves (x FLC,l and x SVP,l ) and meristem (x FLC,m and x SVP,m ). For all the other genes, the variables correspond to expression in the meristem. Note that for SVP and FLC there are no equations; they act as external inputs in the model, and their regulation is not explicitly modelled. The parameters in the equations have the following meaning (see Tables B-C in S1 File for further details): parameters β and K stand for the maximum transcription rate and for the abundance at half-maximum transcription rate, respectively; d i stands for the degradation rate of the products of gene i; Δ stands for the time needed for transporting FT from the leaves to the meristem; x FT,t-Δ is the amount of FT in the meristem at time t which is assumed to be equal to that in the leaves at time t-Δ; and n is the Hill coefficient describing cooperativity in the regulation of AP1 by LFY.
Equations (1)(2)(3)(4)(5)(6) are based on the following specific assumptions: (I) SVP and FLC bind to FT and SOC1 promoters as a dimer. This is implicitly represented by the multiplication of the Hill terms associated to the SVP-and FLC-mediated regulations of FT and SOC1. (II) FD requires dimerization with FT in order to activate SOC1 expression. (III) FD can activate AP1 as a monomer. (IV) Recently it was shown that in rice the interaction between FT and FD is bridged by a 14-3-3 protein [61] and probably this is also the case in Arabidopsis; nevertheless, we did not include 14-3-3s in our model, because these proteins seem to be highly abundant and hence not limiting for floral induction. The specific form of the equations and the assumptions that they represent were adjusted by assessing the fitting and the flowering time predictions of variants for the five equations. In addition, to obtain a good fit for the equation associated to AP1, the degree of cooperativity (n) for the LFY-mediated regulation of AP1 was set to n = 3. Model source code is available as S1 Dataset.

Parameter estimation
In the model, the expression dynamics x i of a gene i depends on the parameter values associated to dx i dt and on the expression values over the time-course of the direct regulators of i. To independently fit an equation dx i dt to its corresponding time-course, the expressions of the direct regulators of i in the right-hand side of the equation were taken from the data, and interpolated with a polynomial fit. This decoupling method has previously been described in full detail [14]. By applying this method, it is possible to find the parameters for each equation without knowing the parameters associated to the other equations; thus, alleviating the high computational demand put on the search algorithm by the total number of parameters. This optimization step was carried out by the MultiStart solver implemented in MATLAB (R2012a, The MathWorks UK, Cambridge). The parameters were then input in the whole systems of equations as starting point for a second optimization step. In this second step, the equations were solved as a system and the expressions of the direct regulators of i were taken from their associated ordinary differential equation solutions. This was carried out by the lsqnonlin solver (implemented in MATLAB) to fine-tune the fitting obtained by the first optimization step.
To assess the goodness of fit for each gene, the normalized root mean square error (NRMSE) was used, which equals RMSE x max Àx min , with RMSE equal to x min are the maximum and minimum observed expression value; x i exp and x i pred are the experimental and predicted values at time i; and the sum is over all n timepoints.

Model simulations
The equations were solved using MATLAB, integrated with the stiff solver ode23s. For simulations of gene expressions of Arabidopsis wild type grown at 23°C/LD, the initial gene abundances were taken equal to the first expression time-points and the parameters were the same as described in Table C in S1 File. To simulate gene expression in mutants, the expression associated to a mutated gene i was fixed to a constant value x i = k mut . For the knock-out null mutants (ft-10, fd-3, flc-3), the values of k mut were adjusted to zero; and for the knockdown mutants (not null mutations), k mut values were adjusted to a small percentage of the expression of i observed in the first time-point from wild type Col-0 (Table E in S1 File). For the overexpression mutants, the values of k mut were set to five times the maximum absolute expression among all samples (2500nM). To assess the model predictions of changes in gene expression we compared predicted relative changes with relative changes obtained with microarray data. To do so, we calculated the predicted total amount of expression (integral of the predicted time-course from day 0 to day 20) using the trapz function in MATLAB. Subsequently, these values were scaled by subtracting the wild type value and then dividing by the wild type value. Similarly, the experimental relative change was calculated based on the microarray data. Note that comparing these values focusses on the effect of a mutation on dynamics of genes in the network over the complete time-course and as such takes into account the fact that the experimental conditions of the microarray experiment cannot directly be simulated (flowering-inducing shift from short-day to long-day conditions).

Model predictions of flowering time
The predictions of flowering time were based on AP1 expression. For that it was assumed that, at a molecular level, Arabidopsis undergoes the floral transition in the moment that AP1 expression initiates. Therefore, according to our experimental AP1 time series, for wild type Col-0, the floral transition takes place between days 12 and 13 after germination. For simplification, we take the exact day 12.6 because it corresponds to the average number of rosette leaves (RLs) observed at the onset of flowering for wild type Col-0. To estimate the flowering time from mutant simulations, we use the time in which AP1 expression reaches the same simulated expression value as obtained at day 12.6 for wild type Col-0. This implies that the AP1 expression threshold for triggering the floral transition is the same for different plant growth conditions and mutants. Because flowering times are usually reported in number of rosette leaves (RLs) we subsequently scaled the predicted days to RLs by assuming a linear relationship between the number of RLs observed at the onset of flowering and the time in days after germination that Arabidopsis thaliana undergoes the floral transition at a molecular level.
In addition to the set of mutants obtained in consistent conditions in this work, we also included existing mutant data. Wild type Col-0 flowering time in these experiments is somewhat different from that observed in our experiments. In addition, flowering times in literature are mostly reported in rosette leaves (RL), and not directly in days. To be able to integrate these data, we scaled existing mutant data with a linear factor which is chosen in such a way as to scale the wild type Col-0 flowering time to 12.6 RL.