Information integration and decision making in flowering time control

In order to successfully reproduce, plants must sense changes in their environment and flower at the correct time. Many plants utilize day length and vernalization, a mechanism for verifying that winter has occurred, to determine when to flower. Our study used available temperature and day length data from different climates to provide a general understanding how this information processing of environmental signals could have evolved in plants. For climates where temperature fluctuation correlations decayed exponentially, a simple stochastic model characterizing vernalization was able to reconstruct the switch-like behavior of the core flowering regulatory genes. For these and other climates, artificial neural networks were used to predict flowering gene expression patterns. For temperate plants, long-term cold temperature and short-term day length measurements were sufficient to produce robust flowering time decisions from the neural networks. Additionally, evolutionary simulations on neural networks confirmed that the combined signal of temperature and day length achieved the highest fitness relative to neural networks with access to only one of those inputs. We suggest that winter temperature memory is a well-adapted strategy for plants’ detection of seasonal changes, and absolute day length is useful for the subsequent triggering of flowering.


Source data cleaning
The data preparation is illustrated by using data from Cologne. The retrieved temperature data from NOAA includes recordings of daily maximal and minimal temperatures of 2 stations in Cologne. A snapshot of the customized CSV data file is shown in Fig. S1. Each station independently recorded daily temperatures. Thus, for Cologne, we have 98 years of daily temperature summaries (Table S1).
Noises of different sources are presented in the temperature data, for instance, missing dates or missing records in a day. We refer to the missing values as "holes" of the data. The dataset was cleansed as follows: • For the simplicity of computation, each year was trimmed to 360 days and each month was modified to 30 days for the neural networks based models. For the stochastic model, yearly cycle of 365 days was used; • The holes were filled by averaging neighbouring two days or just copying the neighbouring date if the hole was less than 10 days. For holes larger than 10 days, they were filled with the same dates from a nearby station; Figure S1: A snapshot of the customized CSV file retrieved from NOAA with 4 columns.
Data from the stations which have more than 20% of holes were discarded. The day length data of Cologne were downloaded from ptaff.ca. The orbit of the earth around the sun is stable such that the day length variation remains the same from year to year. Thus we just replicated the yearly day length for Cologne to 98 years. Meanwhile, to count influence of weather conditions on day lengths, Gaussian noises with adjustable strength were added to different years.
The same cleaning procedure was applied to other climates. The datasets of all retrieved climates are summarized in Table S1.
where T (t) denotes the real temperatures, T (t) is the deterministic temperature dynamics with period of 365 days andT is the arithmetic average of temperature. The daily average temperature is obtained by averaging the maximal and minimal temperatures. Cologne. The fitted temperature curve is based on a second order Fourier series.

Fourier series based regression for T (t)
To mimic the long term adaptation of plants to local climates, the average of 98 years of temperature over one-year period gives an estimate of temperature dynamics and averaged temperature fluctuations. The temperature dynamics was fitted by a secondorder Fourier series as T (t) = a 0 + a 1 cos(wt) + c 1 sin(wt) + a 2 cos(2wt) + c 2 sin(2wt) with determined parameters (a 0 , a 1 , c 1 , a 2 , c 2 , w) T = (0, −7.775, −2.669, −0.1552, 0.4052, 0.0172).
The parameterization of the fitting was performed using Riemann sum [1]. The fitted result is shown in Fig. S2.
The exponential decay in temperature time series The autocorrelation in temperature fluctuations is defined as where n is the total number of days, and k is the presumed maximal correlated number of days, which was set as 80 in our calculation. It is shown in Fig. S3a that the autocorrelations in temperature fluctuations follow an exponential decay which can be fitted as where σ 2 = 12.67 and τ −1 = 0.22 with a fitting R 2 score of 0.934. The least square method implemented in SciPy [2] was used to fit the exponential decay. In addition, we performed a bootstrap analysis to estimate the confidence interval of the fitted coefficients. Since the fluctuations are correlated, in order to preserve the correlation, the sampling used a block length of 50 days. From the bootstrap with 10000 samples, the confidence intervals for σ 2 and τ were estimated as (11.90, 13.42) and (0.199, 0.253) respectively. The fitted coefficients indeed located in the confidence interval. The bootstrapped fitting results are shown in Fig S3b.

Autocorrelation analysis for other climates
Similar to the calculation of autocorrelation in temperature fluctuations for Cologne, the procedure was applied to other selected climates as shown in Fig S4. It can be observed that the autocorrelations for Cologne, Oslo and Auckland showed exponential decays but with different lower bounds, which may be due to different noise levels in temperatures. And for regions with less seasonal changes like Hawaii and San Francisco, exponential decays were not observed. 3 Modeling the process of producing active cells Figure S5: Temperature-driven birth-death process of cells with active histone modifications. n stands for the number of actively modified cells, the production rate βT (t) depends on the temperature T (t) and λ is the degradation rate.

The master equation and its analytical solution
Following the previous work of modeling simple birth-death process for one species [3,4,5], the process of having n cells with active modifications is simplified to a reaction of depends on the temperature T (t) and λ is the degradation rate of active cells. The total number of cells N was reduced 60 for computation feasibility. The master equation for describing the reaction can be written as To solve the partial differential equation, we first need to introduce the time-dependent probability generating function G(s, t) = ∞ n=0 s n p(n, t). Then multiplying s n to the equation (5) and summing over n can transform the equation to First neglecting the time dependence of the temperature, the solution of the equation (6) is (the derivation of the solution can be found in chapter 7 of [4]). Now considering the time dependence of T , we can assume the solution becomes To find F (t), substituting (8) into (6) yields By solving the differential equation (9), we have Under the assumptions of c = 0 and T (t) = d + T (t) + δT (t) where d is the constant for converting temperature in Celsius to Kelvin, T (t) is the temperature average over years and δT (t) is the temperature fluctuation, the equation (8) can be written as Considering the stationary state of G(s, t), which follows a yearly cycle and is still time dependent, we have Noise-free temperature Now, starting with the simple scenario, we assume temperature dynamics are noisefree, which means δT (t) = 0. Consequently, the steady state solution becomes Zooming into the equation (13), we need to deal with the part related to temperature dynamics: e −λt t −∞ T (t ) e λt dt . For the simplicity of notation, we define For the data collected from Cologne, the temperature dynamics T (t) can be numerically approximated by a second-order Fourier series (Fig. S2). Using the parameters fitted in the approximation (1), we have D(λ, t) = a 0 λ + a 1 w 2 + λ 2 (λ cos(wt) + w sin(wt)) + c 1 w 2 + λ 2 (λ sin(wt) − w cos(wt)) + a 2 (2w) 2 + λ 2 (λ cos(2wt) + 2w sin(2wt)) + where t ∈ [0, 2L] with 2L the period of temperature cycle, and coefficients were determined in section 2. By further defining b := dβ λ + βD(λ, t), the stationary generating function becomes which is a generating function for a Poisson distribution. Therefore, in the case of noisefree temperatures, the stationary solution to the master equation (5) is Noisy real temperature Now considering the real noisy temperature, we need to deal with the part caused by temperature fluctuations: e β(s−1)e −λt t −∞ δT (t )e λt dt . Since the fluctuations has been shown to have an exponential decay in autocorrelations for temperatures in Cologne, by Doob's theorem [6], they follow a Gaussian Markovian Process. Therefore, we have With a := β 2 σ 2 2λ(λ+τ −1 ) and b = dβ λ + βD(λ, t), the stationary G(s, t) (12) can be written in a closed form as G * (s, t) = e a(s−1) 2 +b(s−1) .
Due to the quadratic term in the exponential of (21), finding the associated probability density function requires the introduction of an auxiliary generating function which can be expanded in terms of Hermite Polynomial By changing variables a = −α 2 , t = αs and x = b+2α 2 2α , we have the following deduction H n (x) t n n! .
(23) In short, the generating function (21) becomes Substituting the variables a and b to (24), we have Finally the probability density function is derived as

Optimization
The model parameters can be uniquely determined by optimizing a proper decision function in such a way that the signal-to-noise ratio determines the degradation rate and the decision boundary determines the production rate. When the degradation rate λ is very small, in order to have a reasonable distribution over different number of active cells, the reaction rate has to be very small too. In this slow reaction scenario, the effect of temperature dynamics is diluted, which make the reaction hard to capture the useful information in temperature. On the contrary, if λ is very large and β should be very large to maintain a certain number of active cells. In the fast production and degradation scenario, every single fluctuation in temperatures would drive the reaction in an undesirable manner. Both extreme cases are not practical for plants to rely on. Therefore, λ has to be tuned by the signal-to-noise ratio in temperature to a reasonable level. Meanwhile, the optimal value of β, λ can be determined by setting the decision boundary to a suitable number of active cells.
With the obtained parameters d = 283.3K, τ −1 = 0.22, σ 2 = 12.67 for the exponential decay in temperature fluctuations and the parameters for the Fourier fitting of temperature dynamics in (2), the probability distribution p(n, t) (26) has only the undetermined reaction rates β, and λ. By optimizing the following objective the optimal β and λ were obtained as β = 6.45, λ = 2.94.

Artificial Neural Networks
A feedforward neural network with two layers is illustrated in Fig. S6, which can be mathematically described as where F k (x, W ) denotes the kth output, d, M denote the dimension of input x (number of nodes in the input layer) and number of nodes in the hidden layer (second layer) respectively.
ji connecting ith input to jth node in the hidden layer, W (2) k is the weight matrix with w (2) kj connecting jth hidden node to kth output node. In the context of regression, the output is usually one-dimensional, the index k can be dropped. Figure S6: A fully connected feedforward neural network.
For a given i.i.d. dataset D: x i ∈ R d , y i ∈ R, i ∈ 1, 2, · · · , N , assuming that the targets y have a Gaussian distribution centered on F (x, W ) gives rise to with β −1 the variance of Gaussian noise. With defining the mean squared error between the targets and their model estimates as the likelihood function can be written as where Z(β) = (2π/β) N . Taking negative logarithm of (31) gives where c is a constant related to β which can be neglected for the purpose of minimizing negative logarithm likelihood. The equation (32) shows that minimizing error function is equivalent to maximizing the likelihood function. Consequently adding L 2 norm regularization leads to the final loss function as

Classification for estimating memory length
The input features are (x l1 , · · · , x lk , x h1 , · · · , x hk ), where k is window length, x li and x hi , i = 1, · · · , k, stand for daily lowest and highest temperatures respectively. For example, when the window length is set to be k = 30 (i.e. a month), without loss of generality, April is set to be class "1", the rest months are set to be class "0". Then for the temperatures of each year, the corresponding targets are (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0). The same configuration can be applied to other window lengths. We have tested lengths k ∈ {28, 42, 60} by using a repeated cross validation routine, that is, for each model training, 80 years was randomly selected for training and the rest 18 years was used for testing. The averaged testing results were shown in Fig.S7. For length k = 30, the false positive rate is 26.3%, although the overall accuracy is 93.3%. When the length is increased to k = 42, the false positive rate dropped to zero, indicating that this length provides sufficient information for determining the season, which can be used as the effective memory length of plants for recognizing seasons.  Figure S7: The confusion matrices for the classification of different temperature windows. The temperature data from Cologne was used. The typical window sizes of 4 weeks and 6 weeks were used to test the effective memory length. It was also shown that window size of 60 days did not improve the classification performance.

Regression of idealized expression patterns for various climates
In this section, different climates information was used to fit the idealized expression pattern of FLC. Based on the classification results, the input features and targets for regression were constructed as following. Each day of a year has a corresponding ex-pression value from the idealized expression curve as the target. The features for that day comprised of the temperatures (daily maxima and minima) and/or day lengths of its precedent days. The number days for temperatures was taken as the effective memory length from the classification. For temperate region, long term memory of temperatures played a key role in the regression and the addition of day lengths significantly improved the regression.  Regions as San Francisco and Kahului have very similar temperature every year, which means very few information can be extracted from their temperature. This also led to the poor generalization in predicting the idealized expression patterns (Fig S9). Moreover the predicting results showed no significant differences for different temperature memory lengths, e.g. 7 days, 20 days and 40 days. The results agree with that plants in tropical region, which do not rely on long term memory of temperature for flowering decision making. In order to better fit FLC expression patters of regions such as San Francisco and Kahului, temperature memory was reduced to at most one and longer term of day lengths were added as input features. For both San Francisco and Kahului, it was seen that the trained models with only temperatures as features generalized poorly. But the fitting accuracy can be improved by adding day lengths.
For the sake of comparison, the fitting results of using only day length were shown in S10 for Cologne and Hawaii. For both regions, 7 days of day length which were added with the same level of Gaussian noises were used. It can seen that the fitting for Cologne was better than that of Kahului, Hawaii. This may be due to the flatter day length dynamics in Hawaii than that in Cologne.

Evolution Simulation
We used neural networks as plant agents to simulate their adaptations to three settings of information, only temperature, only day length or both. Neural networks consist of multiple layers of neurons which can be regarded as decision makers. Individual neurons weigh the evidences which they receive from upstream neurons and output decisions using activation functions (e. g. sigmoid, linear or other functions) [7]. On the other hand, although gene regulatory networks in plants are very complex molecular systems, individuals genes usually take actions based on the regulations that they receive. The relation between gene production rate and transcription factor concentrations can be modeled by Hill function [8]. With proper Hill coefficients, the Hill function may behave similarly to the sigmoid function. Therefore we hypothesized that neural networks can serve well as plant agents in terms of investigating information from temperature and day length data. The simulation algorithm was summarized in Fig. S11. The Kullbach-Leibler divergence between the idealized expression pattern and the neural network-predicted expression level was used as the fitness for individuals. The individual reproduction rate was depended on their fitnesses. The mutation of offsprings was realized by mutating the weights of the neural network agents. More specifically, it followed three steps: • Randomly select the specified number of weights from individual networks.
• Calculate the mean of all weights of the network.
• The mutation rate is applied to the calculated mean (mutationRate * Mean). The product is the used as the standard deviation from applying Gaussian distributed mutations to the selected weights.
The key parameters in the simulation are explained as following: • indPerGroup: number of individuals with access to temperature, day length, or both); • evCyc: number of evolution cycles, fixed as 500; • Ntrain: number of training years for individual networks, 15 years was used in the final results; • Ntest: number of testing years for evaluating Kullbach-Leibler divergence as fitness measure, 3 years was used in the final results; • Nmut: number of weights from individual networks for mutation, the weights were randomly selected from all weights of individual networks, 5 weights was used in the final results; • benRate: the learning rate of individual networks to adapt to new input data after mutation, 0.05 was finally used; • mutRate: the standard deviation of Gaussian distribution which is used for mutating weights; • rep: repetition number of simulation, 50 was used.
The simulation results with different mutation rates and indPerGroup were shown in Fig. S12.