Ensemble methods for stochastic networks with special reference to the biological clock of Neurospora crassa

A major challenge in systems biology is to infer the parameters of regulatory networks that operate in a noisy environment, such as in a single cell. In a stochastic regime it is hard to distinguish noise from the real signal and to infer the noise contribution to the dynamical behavior. When the genetic network displays oscillatory dynamics, it is even harder to infer the parameters that produce the oscillations. To address this issue we introduce a new estimation method built on a combination of stochastic simulations, mass action kinetics and ensemble network simulations in which we match the average periodogram and phase of the model to that of the data. The method is relatively fast (compared to Metropolis-Hastings Monte Carlo Methods), easy to parallelize, applicable to large oscillatory networks and large (~2000 cells) single cell expression data sets, and it quantifies the noise impact on the observed dynamics. Standard errors of estimated rate coefficients are typically two orders of magnitude smaller than the mean from single cell experiments with on the order of ~1000 cells. We also provide a method to assess the goodness of fit of the stochastic network using the Hilbert phase of single cells. An analysis of phase departures from the null model with no communication between cells is consistent with a hypothesis of Stochastic Resonance describing single cell oscillators. Stochastic Resonance provides a physical mechanism whereby intracellular noise plays a positive role in establishing oscillatory behavior, but may require model parameters, such as rate coefficients, that differ substantially from those extracted at the macroscopic level from measurements on populations of millions of communicating, synchronized cells.


Introduction
Gene regulation is an intrinsically stochastic process [1][2][3]. The low copy numbers of some molecules, such as genes, involved in gene regulation lead to a noisy time series of numbers of Ensemble methods solve this problem. Based on a Bayesian posterior distribution or likelihood function, they produce large samples of parameter values consistent with observed data that can then be model-averaged to capture the system behavior [27]. Consequently, they can produce confidence intervals of the model parameters [28]. More recently, these methods have evolved into Approximate Bayesian Computation (ABC) and have been used successfully in other biological contexts [27,29]. For large networks, ensemble-based parameter inference methods employ Markov Chain Monte Carlo (MCMC) simulations techniques to draw samples from the high-dimensional model parameter spaces [27,30,31]. These MCMC simulations are highly CPU time consumptive and one of the challenges is then to find efficient computational approaches to make these simulations feasible within reasonable computation time limits.
In modeling stochastic molecular time series data, it is important to notice that the individual random trajectories of molecule numbers cannot, in general, be compared directly to individual observed single-cell fluorescence time series. The stochastic variability of the individual trajectories in both model and experiment preclude a meaningful comparison. In the context of potentially oscillatory data, as expected in the biological clock system, it is also not useful to compare the average of model molecular time series to the average over the observed single fluorescent data from all cells. Both model and observed trajectories are typically randomly phase-shifted relative to each other, and averaging them therefore tends to cancels out the oscillatory part of the signal. Consequently, it is important to design meaningful summary statistics which preserve the information about the oscillatory signal, including oscillation periods, phases and amplitudes, when averaged over all cells and model trajectories, respectively [32][33][34].
One aim of this paper is to provide a fast, computationally feasible method for parameter inference in a stochastic oscillatory biochemical network and show its successful application to understanding one of the best studied biological clocks at the molecular level [35]. The method proposed uses different MCMC methods, such as Metropolis-Hastings and parallel tempering, to fit the average periodogram [36], also known as the power spectrum, of the model to the average periodogram of the data, where the average is taken over the periodograms of individual cell fluorescent trajectories. The periodogram is a summary statistic which preserves two of the relevant features of an oscillatory process, its amplitude and period. We use deterministic mass action kinetics to initialize the Markov chains with parameter values that produce small chi-squared values relatively fast on General Purpose Graphics Processor Units (GPGPUs) [37]. Thus, we can rapidly obtain model parameter sets that capture the important periods and amplitudes in the data. These models can be further used to match the observed phases to test the adequacy of the models.
A second aim of the paper is to explore whether the oscillations in such a network, might actually be caused by or reinforced by the molecular noise in the cell through a Stochastic Resonance-like phenomenon [38][39][40]. Stochastic Resonance is a theory that arose in physics [38] to explain the behavior of physical oscillators. Under the Stochastic Resonance hypothesis the stochastic intracellular noise is assumed to have a positive role in generating periodic behavior provided this noise is not too little or too large in magnitude. The key to the Stochastic Resonance hypothesis is that the presence of oscillations has a nonlinear relation with the level of stochastic intracellular noise.
It is therefore important in explaining oscillations in a stochastic network to infer the impact that noise has on cellular mechanisms and to quantify how these mechanisms respond to different noise levels. Our model and methods to fit a stochastic network (Fig 1) were developed with these purposes in mind.

Model
Our models simulate a well-stirred biochemical system with N molecular species {S 1 , S 2 , . . ., S N } having discrete-valued molecular numbers given by X = {X 1 , X 2 , . . ., X N }. These molecular numbers change in time through the firing of M reactions {R 1 , R 1 , . . ., R M }. The state of the system at a time t is given by the random vector X(t) = {X 1 (t), X 2 (t), . . ., X N (t)} with X i (t) being the number of molecules of species S i at time t, i = 1, . . ., N.
Knowing the state of the system at time t, X(t) = x, we assign to each reaction R j a propensity function a j (x) whose product with an infinitesimal time increment dt determines the probability that reaction R j fires in the next infinitesimal time interval [t, t + dt). These propensity functions a j (x), j = 1, . . ., M, are defined based on mass action kinetics, a j (x) = k j b j (x) where k j is a  [13,28]. While an exometabolite S external is being produced, there are no other cells in a droplet to sense this signal for the data described in Materials and Methods. The boxes outlined (in red) with dashed lines and labeled a through i in red, define regions of the network between which there is little or no net flow of molecules. These boxes are utilized to define scale factors for converting the concentrations of molecules in the deterministic network to molecular numbers in the stochastic network while preserving the network dynamics as described in the Materials and Methods. A few reactions are outside the boxes because in fitting the model WC-2 was assumed to be constant. The reaction of WCC to reaction A produces a small flow into a box, but to good approximation the role of WCC in the A reaction can be viewed as catalytic if the number of molecules of WCC is in the hundreds. (B) Cartoon of the stochastic network highlighting important features in panel A. Arrows are used to indicate a positive effect. A line with a bar (-|) is used to indicate a negative effect. kinetic constant specific to reaction R j and b j (x) counts the number of ways reaction R j can occur given state X. For instance, for mono-molecular, homo-bimolecular and hetero-bimolecular reactions b(x) takes the form x 1 , x 1 (x 1 − 1)/2 and x 1 x 2 , respectively. The sum of all propensities is denoted by a 0 (x).
The parameters we need to infer are initial molecular numbers and the kinetic constants, Θ = {X 1 (0), X 2 (0), . . ., X N (0), k 1 , . . ., k M }. Gillespie showed that from knowing these parameters we can build an exact sample trajectory of the network to find the molecular numbers at any later time. Thus, a parameter set Θ gives a model of the network's dynamics.
Our study is focused on a well-studied oscillatory network, the clock network of Neurospora crassa [28]. This network is presented in Fig 1. The species with superscript r denotes an mRNA; the ones in capital letters are proteins. The rest are genes. As shown in Yu et al. [28], we can consider the protein WC-2 to be constant, so we can ignore the species wc-2 1 , wc-2 r1 and the reactions in which they are involved. Also, wc-1 1 is constant. Thus, we reduce the network to 12 molecular species and 22 reactions, which makes our parameter space 34-dimensional. Earlier work [28] has shown this is a good approximation. This particular model is in one of two classes of negative feedback models for clocks in different organisms termed a Hilltype transcriptional repression model [41,42].
Some essential features of the model are captured in the cartoon ( Fig 1B). The genes white collar-1 (wc-1) and white collar-2 (wc-2) produce a heterodimer WCC = WC-1/WC-2, which activates the oscillator gene frequency (frq) and a clock-controlled gene (ccg). The FRQ oscillator protein in turn provides a negative feedback loop to deactivate WCC. The genes wc-1 and wc-2 encode the positive elements in the clock mechanism, and frq encodes a negative element [28]. The FRQ protein also appears to have a role in stabilizing the wc-1 mRNA (wc-1 r ) [28].

Single cell data of N. crassa
Two single cell data sets were used [13]. One data set has 868 single cells; the second one as a replicate has 1591 single cells. These two data sets were generated through time-dependent oscillatory fluorescent measurement on single N. crassa cells encapsulated in aqueous droplets of~100 um in diameter and physically separated from each other as a result. The measurements were through the use of a fluorescent recorder (mCherry) linked to a promoter on a clock controlled gene-2 (ccg-2) [43]. We obtained one data set including the time series of 868 single cells over ten days, and another data set including the time series of 1,591 single cells over ten days. The second data set is attached as a supplementary excel file. An improved celltracking method was used to bring the data set to 1,644 cells from 1,591 cells as originally described [13]. In the supplemental spread sheet each column is a different cell, and each row, a different time point. There are 563 time points per cell taken from time 0 to time 261.5 hours every half hour. Each single cell time series is Rhodamine B normalized, detrended, and biascorrected [13]. Only 61 st to 540th time points were used in the analysis to allow each oscillator the opportunity to reach a stable limit cycle and to maintain cell viability at the end of the series.

Rescaling from deterministic model units to stochastic molecular number units
profiling data was found. If we wish to use concentration results from these deterministic models as inputs in a stochastic framework, then we need to rescale the initial molecular concentrations in the deterministic model to molecular numbers in the stochastic model, that is non-negative integers counting molecules in Fig 1, while preserving the deterministic dynamics [44]. This conversion reduces to a change in measurement units for each species such that the total gene concentration of a species in the new molecular number units is 1 in a single cell, and the time-averaged mRNA and protein concentrations in molecular number units are equal to the observed RNA:DNA and protein:DNA ratio, respectively. The RNA:DNA and protein:DNA ratios were determined experimentally, as described below, and summarized in Table 1. The RNA:DNA and protein:DNA ratios used below were 128.7 and 412, respectively, averages from Table 1.
Following an established notation [28] are abbreviated here to u r0 , u r1 , u p , w, f 0 , f 1 , f r , f p , g 0 , g 1 , g r , g p , respectively, with constant total gene concentrations f G = f 0 + f 1 and g G = g 0 + g 1 . To convert the parameters of the network from deterministic model units to molecular number units of counts of molecules in a cell we do the following: 1. divide the network into separate components or "boxes" in such a way that two different boxes will be connected only through catalytic reactions (Fig 1). So, there will be no net flow of molecules between different boxes. Thus, the scales of measurement units between boxes can be varied independently without changing network dynamics. We obtain 9 boxes denoted by letters from a to i. They are, a: w, v p , u p , b: f 0 , f 1 , c: f r , d: f p , e: u r0 , u r1 , f: v r , g: g 0 , g 1 , h: g r and i: g p .
2. If we denote the model units by mu and real molecular number units by ru, then knowing the value of a concentration parameter expressed in mu, we need to convert it to a value that uses ru. We just need to find the ratio mu ru . Each box will have its own mu ru ratio. For an arbitrary box z, we denote by mu ru À Á z its conversion ratio. For the box containing a gene, like box b with species f 0 , f 1 , we have the values f 0 mu ; f 1 mu = (.356365, .0824576) from the deterministic model in Table 1 (column 2), so we know f G;mu ¼ f 0 þf 1 mu ¼ 0:4388226. We also know f 0 þf 1 ru ¼ 1, because there is just one frq gene in the cell. Then mu When applying this conversion factor, a gene is converted to the nearest whole gene so there isn't a fractional gene.
For boxes with an mRNA we take the average value of an mRNA concentration parameter over a simulated trajectory obtained using the deterministic model and compare it to the RNA:DNA ratio. If the simulated trajectory contains a transient signal, we discard the corresponding part of the trajectory. Since the deterministic models display sustained oscillations, we want our mRNA deterministic values to come from the purely oscillatory part of the solution (not the transient part).
Thus, for box c we find f r;mu ¼ f r mu ¼ 1 is the value of fr at time t in the deterministic simulation of the network between times t 0 and t 1 . In the time interval [t 0 , t 1 ) the deterministic trajectory traced 10 complete cycles. The ratios RNA:DNA and Protein:DNA were found experimentally from Table 1. Then, to convert a molecular concentration given by a deterministic model to a molecular number we just multiply the molecular number by the conversion ratio of the box to which the species belongs. To change from S mu to S ru , we need to multiply the former term by mu ru , but this ratio depends on the box in which the species S resides. The value S ru is then rounded to the closest integer. If, as above, the deterministic model contains a transient signal, we discarded it. We took s mu s t 0 ð Þ, that is concentration to be converted is the value of species S at the beginning of oscillatory part of deterministic trajectory of S.
3. To convert the reaction rates from model units to real units we use the law of mass action and the conversion ratios found at step 2.
We will show how the method works using L 3 , the translation reaction to FRQ.
We have f r where hr stands for hour, our unit of time, and mu d and mu c are the model unit of concentration for species in box d and c, respectively. L 3,mu and f r mu c are the values of L 3 and f r expressed in model units. They are found from the deterministic model. When L 3 is expressed in molecular number units, we have where L 3,ru is the value of L 3 expressed in molecular number units. Likewise, the other reaction rates can be converted from model units to molecular number units using the deterministic values and the conversion ratios found in step 2.
Note that when converting the reaction rates, we keep the ratios dS dt the same. Here S is the concentration of a species. We do not change the qualitative behavior of the system by conversion, just express it in different measurement units.

Method of determination of protein:DNA and RNA:DNA ratios specifying the scale of the stochastic model
The protein:DNA and RNA:DNA ratios in a cell were experimentally determined to set the scale parameters for the stochastic network. Protein, RNA, and DNA samples were extracted simultaneously from cultures of Neurospora crassa, strain FGSC 1858 "bd" (Fungal Genetics Stock Center, 4024 Throckman Plant Sciences Center, Kansas State University, Manhattan, KS 66506). The cultures were grown over 48 hours in the dark such that the total growth time was kept constant as previously described [45] under the "cycle 1" experiment. A kit from "Norgen Biotek Corporation,"(3430 Schmon Parkway, Throld, Ontario, Canada L2V 4Y6) was used to extract RNA, DNA and protein from the same sample. The kit used was Product # 47700, "RNA/DNA/Protein Purification Plus Kit." Their protocol was followed, including the step 1F, for cell lysate preparation for fungi. Samples were done from thirteen different time points spaced at 4 hour intervals over 48 hours. A total of three preps were done for each time point, with a useable sample detected in 2-3 of the preps. The DNA and Protein amounts from each of these preps, were determined on a "Qubit 2.0 Fluorometer" instrument (ThermoFisher Scientific, 168 Third Avenue, Waltham MA 02451). The RNA concentration was determined using an Agilent BioAnalyzer RNA 6000 Nano chip (Agilent, Palo Alto, California). The amounts were converted to nanomoles [46], averaged, and then ratios RNA:DNA and DNA: Protein were calculated (Table 1).

Stochastic simulation algorithm-direct method
For simulating exact trajectories of a network's temporal evolution, we used a variant of Gillespie's simulation algorithm called the direct method [13,20]. Gillespie showed that knowing the state of the network at a time t, we can infer the exact distribution of the time of next reaction, t + τ, and the probability of each reaction taking place at time t + τ. Thus, we obtain an exact distribution of the state of the network at time t + τ. The Direct method uses these distributions to sequentially sample the time of the next reaction and the reaction that occurs next. It works as follows.
3. Draw the random time step value to the next reaction, τ, as an exponential random variable with mean 1/a 0 (x) and the next reaction. Draw the type of the next reaction to be executed, j next , as a discrete random variable with probabilities  4. Update the state X assuming reaction R j next took place. Update the time, t = t + τ.
5. If t < T go to step 2, else stop.
The Direct method yields a trajectory of the network state {x(t 0 ), The trajectory can be thought of as belonging to a single cell. We refer to this trajectory as a Gillespie trajectory (of a single cell). Here 0 = t 0 < t 1 < Á Á Á < t k < Twith t i ,i = 1,. . .,k, being the reaction times of the reactions that fire before T. Such a trajectory completely identifies the network state at any time in the interval [0, T].

The fitting method
To analyze the initial behavior of the clock network we collected data on a CCG protein from 868 single cells. The fluorescence level of the CCG protein in each cell was recorded every half hour for 10 days [13].
As a first pass, we constructed a normalized periodogram for each cell, and then we calculated the average (over 868 cells) of these 868 periodograms. Through normalization we made the sum of normalized periodogram values equal to 1. Normalization enabled us to use periodogram values that are invariant to scaling [13]. We did not use this periodogram normalization in the more sophisticated analysis of the 1591 single cell data set where we applied a bias-correction to the average observed periodogram (See section below on removing the detection noise).
At the level of millions of cells the level of CCG is circadian, i.e. cyclical with a period of~24 hours. To obtain stationary time series we used the moving average method to remove the 24-hour linear trend from the original time series. The periodograms were calculated on detrended data [13].
We assume the average periodogram describes the dynamical behavior of CCG protein because it captures the periods and amplitudes in the system.
We selected this summary statistic in fitting the stochastic network because we looked for models with periodic behavior at the single cell level. This choice was first proposed to describe stochastic oscillatory networks near their Hopf bifurcation (i.e., a point in the parameter space where oscillations first appear), and the use of the periodogram as the statistic driving the fitting was successfully used in this context [36]. The periodogram captures two important features of an oscillation, amplitude and period. As we expect the oscillatory trajectory to be a mixture of sinusoids with only few of them being relevant, we think the important features of an oscillatory trajectory are embedded in its periodogram. Also, unlike other methods that try to match individual trajectories produced by a stochastic model [23] [47], we try to fit the average of these periodograms. Our view is that to compare two stochastic models it is better to use summary statistics that relates to an average of the stochastic trajectories rather than comparing individual trajectories for four reasons. One, the individual stochastic trajectories are very noisy. Two, averaging the periodogram of individual trajectories reduces this noise. Three, this fitting approach has already proven successful [36]. Four, fitting using 1000s of individual trajectories by the method of maximum likelihood has not proved computationally tractable.

Markov Chain Monte Carlo (MCMC) methods
We used MCMC methods [13,27,28] to find sets of parameters in the stochastic network ( Fig  1) that best describe the observed average periodogram of a collection of cells. In each Monte Carlo update we used Gillespie's direct method to simulate 1024 Gillespie trajectories of the system state using a given set of parameters. Here the parameters are the 12 initial molecular numbers and 22 reaction rates as described earlier in Materials and Methods. We calculated the average periodogram of simulated trajectories and determined how well it matched the cell average periodogram. Then we updated a randomly chosen parameter using the Metropolis-Hastings algorithm. The 1024 simulated trajectories were run in parallel on a GPU.
The ensemble Q used to fit the average periodogram is where Q cell f and Q sim f are average periodogram values of cell and simulated trajectories, respectively, calculated at frequency f. The parameter s 2 f is the variance of cell periodogram values at frequency f and is determined experimentally by bootstrapping the periodograms of single cells. The form of the likelihood entails invoking the Central Limit Theorem. The associated We initialized our Markov chains with working oscillatory networks describing the clock at the macroscopic level of 10 7 cells determined previously by the ensemble method [28]. For each chain, we took a set of parameters from the deterministic ensemble family given in Yu et al. [28] and converted it to a set of stochastic parameters as described above.
To make sure we are covering a broad region of the parameter space, we started 4 Monte Carlo chains, each one with a different set of parameters. After running these Monte Carlo chains for about 74,000 iterations we ended up with chi-squared values of different levels, see Fig 2A.
We concluded that each of these chains might have been trapped to a different local minimum. To avoid being stuck at a local minimum we introduced an alternate MCMC method, the parallel tempering algorithm [48,49]. Each set of parameters used to start a Ensemble methods for stochastic networks with special reference to the biological clock of Neurospora crassa Metropolis-Hastings chain was now used to start a parallel tempering algorithm. (see Materials and methods).
Our Metropolis-Hastings algorithms were designed as random walk algorithms. The target density was Q. At iteration k we randomly picked a parameter x k i of the parameter set x k ¼ ðx k 1 ; x k 2 ; . . . ; x k 34 Þ and updated it using a uniform proposal kernel Uðx k i À a i ; x k i þ a i Þ, i.e. the proposal density was q x kþ1 The proposed parameter set was y kþ1 ¼ ðx k 1 ; x k 2 ; . . . x kþ1 i ; . . . ; x k 34 Þ. Then, we set x kþ1 ¼ y kþ1 with probability rðx k ; y kþ1 Þ; It is well known that for random walk Metropolis-Hastings algorithms the step-widths α i must be fine-tuned to ensure the chain is converging in a manageable time. While we tried to optimize the choice of the α i 0 s, we noticed that it might take too long for some chains to converge (Fig 2A). Parallel tempering algorithm avoids the calibration of these 34 hyperparameters.

Parallel tempering as an ensemble method
The idea of a parallel tempering algorithm is to simulate K replicas of the original system, each replica being simulated at a different temperature. So, each replica is a Markov chain having a tempered target distribution of the form For high "temperature" T, the peaks of Q T become flatter and broader, making the distribution easier to sample via MCMC methods. High-temperature replicas can sample large volumes of parameter space, whereas low-temperature chains are usually sampling from a local region of the parameter space which may trap them to a local minimum. Parallel tempering achieves superior results by allowing different replicas to exchange their states. Thus, hightemperature replicas ensure that lower temperature chains can access different regions of the parameter space.
The way that a parallel tempering run is set up is as follows. To a set of K replicas we assign temperatures from a grid T 1 < T 2 < Á Á Á < T k , with T 1 = 1 corresponding to our target replica. Each replica explores its tempered distribution using an MCMC method. After a predetermined number of in-chain iterations, swaps between usually adjacent replicas are proposed. A proposed swap between replicas at temperatures T i and T j is accepted with probability where x (i) is the state of i th replica. When a swap is accepted, the replicas exchange their positions in the parameter space; replica i takes configuration x (j) and j assumes the position at x (i) .
Since hottest replicas can sample big regions of the parameter space, then, if their locations propagate to the coldest replica, they can help it explore different regions of parameter space. Thus, the goal in choosing an effective grid of temperatures is to make sure the hottest replicas can freely explore the parameter space, i.e choose T k big enough, and to choose the intermediate temperatures in such a way that ρ i,j 0 s are big enough to allow each replica to easily move between configurations sampled at different temperatures.

Choosing the grid (K) and temperatures in parallel tempering
Now we show how we chose K, the number of replicas, T K , the maximum temperature and the temperature grid Our method is based on the procedure described previously [50]. First, we chose the number of replicas K ¼ ffiffiffi d p , where d is the number of components of θ. Then we chose maximum temperature T K .
We took T k ¼ w 2 ðy 0 Þ 30 , i.e. we divided the chi-square value of our initial parameter set θ (0) by 30. The hottest replica will start with a chi-square value of 30. We wanted the hottest replica to have a high in-chain acceptance rate while having a not too flat distribution. The number of data points used in calculating chi-square values was 85.
Then we ran Metropolis-Hastings for a replica with this temperature for 200 iterations. If acceptance rate was outside the range (0.6, 0.75), then we changed T K and ran M-H again for 200 iterations. We did this until the acceptance rate fell within the range (0.6, 0.75).
We made a linear grid with K temperatures and set a target swap rate of 0.4 for any two neighboring replicas. Then we run the parallel tempering in the following way: 1. every replica does an update of its parameter set θ.

repeat steps 1), 2) and 3) 200 times
For every pair of neighboring replicas (i, i+1) we calculated where N ði;iþ1Þ swap is the number of attempted swaps between replicas i and i+1 and r l i;iþ1 is acceptance probability of the l th attempt at swapping i and i+1.
ln ð0:4Þ q h i > 0, then we add to the grid R i,i+1 temperatures, evenly spaced between T i and T i+1. We run this add-temperature process 3 times to make sure we have enough temperatures in the grid.
Then we shifted the temperatures between T 1 and T k as follows.
1. Run parallel tempering with the new temperature set doing the above steps 1), 2) and 3) 350 times.

For each temperature T i calculate the flow fraction
where n up (T i ) and n down (T i ) is the total number of replicas that were drifting upward, respectively downward, when they visited T i .
3. Linearly interpolate f between temperatures 4. Calculate g the inverse function of f

Change the temperature values from T i to T new
This process of shifting the intermediate temperatures was repeated 3 times. The shifting of temperatures was done to optimize the flow of replicas through the temperature grid.
After that, we ran the parallel tempering algorithm for about 60,000 Monte Carlo updates, where by update we mean the steps 1), 2) and 3) described in the add-temperature process. Some of the control parameters and summary statistics for the MCMC runs are also summarized in S1 Table. Removing the detection noise from the average periodogram A model to calculate the contribution of detector noise to the periodogram variance was derived under mild assumptions, and the detector noise, propagated to the periodogram in the supplement [13]. The assumptions in this calculation were that the total noise could be decomposed additively into stochastic intracellular noise and detector noise, that the stochastic intracellular noise component is independent of the detector noise component, and that the detector noise in the Rhodamine B level measurements used in normalization of fluorescence measurements can be neglected [13]. The detector noise was independently quantified by replacing cells with fluorescent beads in the microfluidics experiments. In this way a universal system of measurement of gene expression at the single cell level was developed and used here [13].
Suppose we observe the fluorescence values in an experiment with K cells and L equidistant observation times t j ¼ j À 1 ð Þ T L . Here j = 1, . . ., L and T is the duration of the experiment. Denote the frequencies of interest by f l ¼ l T , l = 0,. . ., [L/2]. Also assume the cells are treated with rhodamine B to reduce experimental noise.
Then, the detector noise contribution to the periodogram variance at frequency f l is given by [13]: where hQ(f l )i and hR(f l )i are the population means of, respectively, average periodogram and average squared Fourier transform of the observed rhodamine B-normalized, detrended fluorescence time series. The quantity s 2 is the variance of the fluorescence signal due to the detector noise averaged over all cells and time points. This variance was determined experimentally by varying the incident light intensity and measuring the resultant variance in fluorescence of fluorescent beads replacing cells in a microfluidics experiment identical to that used for cells [13]. The quantities γ Q (l) and β Q (l) are functions of the weights used in the moving-average detrending process [51], a standard for the literature. They do not depend of the observed fluorescence signals.
To compare the simulated average periodogram values with observed average periodogram values we need to remove the bias due to detection error.
The bias formula is given by When we try to fit the cell average periodogram, we compare Q(f l ) − Q bias (f l ) to Q model (f l ), with Q(f l ) being the average periodogram over K cells calculated at frequency f l and Q model (f l ) being the average periodogram of the simulated time series calculated at f l . Unlike the analysis of the 868 single cell data set, the analysis of 1591 single cell data set with the bias correction does not normalize the periodograms. The whole fitting process is done on an absolute scale without periodogram normalization. The ensemble to fit the average periodogram becomes with ðs c f l Þ 2 ¼ s 2 f l À ðs e f l Þ 2 . The Central Limit Theorem is being invoked to obtain the approximate distribution of the average periodogram over > 1000 single cells needed to write down the ensemble in (2). See [13] for details.

Parallel tempering as opposed to Metropolis Hastings (M-H) Monte Carlo is sufficient for fitting stochastic models with many parameters
As seen in Fig 2B the parallel tempering algorithms greatly improved the mixing of the chains and helped them escape local minima. These algorithms also converged much faster when compared to M-H algorithms. Their only downside is that they are slightly more computationally intensive and in general take longer to run when compared to the simple Metropolis-Hastings algorithms. However, parallelization can greatly reduce the additional time taken to run a parallel tempering algorithm when compared to Metropolis-Hastings algorithm. Better mixing and faster convergence more than compensate for the longer run time of each iteration. We see that, when using Metropolis-Hastings algorithm, computation time, convergence and mixing varies greatly with the initial parameters in Fig 2A. Even though the computation time when using parallel tempering algorithm was on average longer compared to the time used by the Metropolis-Hastings algorithm, we see that the parallel-tempering chains quickly set to what is likely the region of the parameter space with lowest chi-squared value. Beginning with iteration 20,000 the chi-square value varied between 82.87 and 125.5 for all chains. The mixing of these chains was excellent, with the swap acceptance rate between replicas at neighboring temperatures being larger than 0.5 for all paralleltempering chains. The maximum number of replicas used in a parallel-tempering algorithm was 7. The maximum temperature was 10.
The best model given by the parallel-tempering algorithm gave a good fit to the average periodogram of the cells as can be seen in Fig 3. The model captures the main frequencies in the data. It does not fit the data very well at high frequencies, but at high frequencies data are very noisy.

The fit of the stochastic model can be improved substantially by increasing the number of cells and subtracting the detection noise from the periodogram
To reduce the noise in the periodogram in Fig 3 we increased the number of isolated cells to 1591 in a replicate experiment and removed the detection noise in the periodogram. See Table 2. Without normalization of the periodogram the final χ 2 was 671.332 as opposed to 12,024.9, when the bias correction was not made. With 240 data points and 34 parameters, the chi-squared contribution per data point was 2.80, which is comparable to earlier work on a macroscopic scale [45].

There are strong similarities in the rate constants between the stochastic clock network and deterministic clock network
The ensemble averages of parameters and their standard errors across the ensemble are reported in Table 2 and in bar charts in supplementary S1 Fig. Some of the parameters are key to sustained oscillations in the deterministic model [28]. Some of these include the activation (A) and deactivation rate(Abar), the decay rate of the stabilized wc-1 mRNA (D7) [28], the decay rate (D6) of the FRQ protein [52,53].
The initial parameter values were computed by MCMC Metropolis Hastings Method [28] for a deterministic model, in which the protein WC-2 was treated as constant to good approximation. The best parameter values in this ensemble were then converted to the molecular number units of the stochastic network in Table 2 (column 3). For example, in the stochastic network the initial numbers of molecules in each cell in Table 1 are given as opposed to concentrations used in the deterministic model. This conversion is described in Materials and Methods. Generally there is good agreement between the estimated rate constants estimated from the trajectories of 1,591 cells (column 6) and the initial guess from the deterministic model (column 3), but no such agreement exists for the smaller experiment with only 868 single cell trajectories. All discussion below is for the larger single cell experiment with 1,591 cells. In this discussion below wc-1 and wc-2 and their products are positive elements in the clock, while frq and its products are negative elements providing negative feedback to wc-1 and wc-2 and their products [35] in  Ensemble methods for stochastic networks with special reference to the biological clock of Neurospora crassa Table 2

. Ensemble means and standard errors indicate that the parameters in stochastic network for single cells are tightly specified by the new fitting method.
The parameters including the initial numbers of molecules and the rate constants in Fig 1 are labeled in the first column. In the second column are the initial parameter values used from a deterministic model ensemble [28], in which WC-2 constant over time was used to initialize both the Metropolis Hastings MCMC runs (Fig 2A) and the parallel tempering MCMC runs (Fig 2B). In the third column the parameters from the deterministic model ensemble are converted into units appropriate for the stochastic network as described in Materials and Methods. The last four columns are the ensemble means and standard errors (across the ensemble) generated by parallel tempering (see Materials and methods) for a single cell experiment with 868 single cells or 1591 single cells. Ensemble methods for stochastic networks with special reference to the biological clock of Neurospora crassa

Parameter
The decay rate of FRQ, D6, is thought to determine the period of the clock oscillator [52] and be involved in the phenomenon of temperature compensation in the clock. As the FRQ decay rate D6 decreases, the period is expected to increase. This coupling of period and FRQ may be more complicated [53]. The stochastic network's decay rate (.194 +/-.002) is in quite good agreement with the macroscopic deterministic model (0.152).
The decay rate of the stabilized wc-1 mRNA, D7, in Fig 1 is thought to be critical determinant of clock oscillations [28]. The theory predicted (and experiment confirmed in previous work [28] that there should be small decay rate or a long half-life at the macroscopic level. The decay rate D7 in the stochastic network (2.131 +/-0.090) appears somewhat higher than measured in the deterministic model (0.138). One possible explanation is that the constraint on decay rates for isolated cells that experience stochastic intracellular noise in phase may be relaxed relative to that in a deterministic model at the macroscopic level. If the oscillations are actually supported by the noise, by way of some Stochastic Resonance mechanism [38], then this may impose less severe constraints on the decay rate D7.
Another critical parameter for oscillations to occur in the deterministic model is the activation (A) and deactivation rates (Abar) of the oscillator frq gene by WCC in Fig 1. These activation and deactivation rates in the stochastic model (2.56E-10 +/-7.31E-12 and 1.590 +/-0.036) tend to be qualitatively similar (6.06E-13 and .0.547), both being quite small.
Another critical parameter for oscillations in the deterministic model is the rate of deactivation of WCC (the P reaction) by FRQ [28]. The deactivation rate (P) in the stochastic model (2.7E-9 +/-4.8E-11) is similar to that estimated at the macroscopic scale (3.12E-11), both being small. To assist in comparing the fitted macroscopic model with the fitted single cell stochastic model, bar charts of the parameters in Table 2 can be found in supplementary S1 Fig. The new MCMC method for specifying the parameters from periodogram tends to produce standard errors across the ensemble that are one to two orders of magnitude smaller than the ensemble means. The parameter values are quite tightly specified by the new estimation method and the use of at least 1,500 cells.
It is interesting to see what this model actually looks like. Various views of a single Gillespie Trajectory are shown for one of the best fitting models (Fig 4). In panel A is shown the molecular counts of the positive element WCC. The counts are correlated with the activation of the FRQ gene in panel B, but the correlation is not perfect. Switching on the frq gene is a stochastic event in this model. The result of switching on the frq gene is transcriptional bursts in its mRNA in panel C. There are at least 9 such bursts in Panel C over a 240 h interval. The width of these bursts as shown is wider than the time that a frq gene is active in a cell. In turn there are resulting even broader peaks in the FRQ protein production in panel D. The noisiest trajectory is what we actually see, CCG-2 in panel E. These views of a Gillespie Trajectory give us a picture of the noise in a single cell under one fitted model in the model ensemble.

The stochastic intracellular noise level (i.e., the size of the cell) can be experimentally determined as a parameter in the model
The mRNA to DNA ratios and protein to DNA ratios in conidia have been previously determined to be~1:18:50 [54]. Our ratios from Table 1 tend to be higher as 1:129:412, although the protein/RNA ratio is similar to previous reports. Here we report a higher amplification in the DNA -> RNA step of the Central Dogma. These ratios were then used to set the noise in the stochastic network (Fig 5). The network was subdivided into independent blocks that were only linked by catalytic reactions. Then the ratios in Table 1 were used to convert concentrations in the deterministic model into molecular numbers within the cell (as described in Materials and Methods for each independent block) as illustrated in Table 2.
The ratios of RNA to DNA and protein to DNA set the level of noise in the Gillespie trajec-slightly better fit could be obtained by allowing the protein/DNA ratio to be slightly higher. The values of the chi-squared statistics across an ensemble would suggest that the fit of the model ensemble if fairly robust to variation in these ratios.
The robustness of the ensemble across different RNA/DNA and protein/DNA ratios can be understood by the fact that our stochastic models can be well approximated by a chemical Langevin equation, which in turn can be approximated by the ODE equations of the deterministic model. The chemical Langevin equation(CLE) [10,55] is a stochastic equation that describes the rate of change of the state vector of molecular numbers, X, as follows: The molecular numbers comprised in X are treated as continuous random variables. The first term on the right-hand side is just the rate function of the corresponding deterministic model, and the second term represents the noise due to the stochasticity of the reaction events.
The v j is the vector of changes in molecular numbers produced by the firing of reaction R j and Γ j 0 s are statistically independent Gaussian white-noise processes. The crucial point here is that the strength of the noise term in the CLE increases relative to the deterministic rate term in the CLE, as the molecule numbers for RNA and protein decrease.
The change of RNA/DNA and protein/DNA ratios in Fig 5 was effected by rescaling the RNA and protein numbers in such a way that the corresponding deterministic model remained unchanged and only the noise term in the CLE was affected. So, if the deterministic term dominates the stochastic term, the noise level will not matter, hence the robustness. independent of the periodogram and hence independent of the first two quantities [13]; therefore, the phase can be used as a test of the adequacy of the model. The phase is not used in the fitting (Fig 3). The Hilbert phase can be calculated for each single cell trajectory and each Gillespie trajectory and measures the amount of cycles completed in a fixed period of time (and hence is a function of time). So, for example, 4 tires on a car would complete the same number of cycles in a fixed period of time and be in phase. This phase measure does vary with time and is a well known measure of phase [56].
The histogram of Hilbert phases for single cells and Gillespie trajectories from the model ensemble are compared as a measure of goodness of fit (Fig 7). There are two sources of variation in the Gillespie trajectories, the random variation in phase between Gillespie trajectories of one model and the variation in phase between models in the ensemble of fitted models. The histogram of Hilbert phases in Fig 7 reflects both sources of variation. For each model, 1024 Gillespie trajectories were simulated, and each Gillespie trajectory has a Hilbert Phase. In addition the process was then repeated for over 1000 models in the fitted ensemble (Fig 3) to generate all the values for the model histogram (Fig 7). We see that the histogram for the Gillespie trajectories for over a thousand models in the fitted ensemble, covers the histogram measured on single cells over an 85 hour window. The difference between the two histograms is significant by a Kolmogorov-Smirnov (KS) nonparametric test. We carried out a KS-test of the difference of the two histograms (P < 0.0001) with the maximum difference in cumulative histograms being 0.1747 [57].
There are three possible reasons for the discrepancy between phase predicted and phase observed. The systems is experiencing Stochastic Resonance [38][39][40]. A second possible reason for the discrepancy is cell-to-cell synchronization by quorum sensing. It is unlikely that a quorum sensing mechanism is at work because the cells are physically isolated in droplets in Fig 7[13]. A third reason could be that the Hilbert phase results are dominated by noise fluctuations.

Fig 7. Goodness of fit for the model ensemble is tested with the Hilbert phase for 868 single cells (blue) and
Gillespie trajectories (red) under the model with smallest chi-squared statistic in the fitted ensemble (Fig 3). The computation of the Hilbert phase for each trajectory is described previously over a 30 to 115 hour window. [13]. The model histogram is that of the Hilbert phases for 1024 Gillespie trajectories on each of > 1000 models in the best fitting model ensemble (Fig 3). https://doi.org/10.1371/journal.pone.0196435.g007 Ensemble methods for stochastic networks with special reference to the biological clock of Neurospora crassa

Is there an intermediate optimum in the oscillatory signal as a function of the stochastic intracellular noise?
One possible explanation for the results on goodness of fit may be synchronization through the Stochastic Resonance mechanism acting on isolated single cells. Under this hypothesis there is a non-monotonic relation between peak height (i.e., signal strength) in the periodogram (Fig 3) and the estimated stochastic intracellular noise [38]. Here we examine how the periodogram, which captures the oscillatory signal, varies, as the noise is varied (Fig 8).
One of the clearest examples of the effects of stochastic resonance is in a simple two-dimensional system, in which the polar coordinates (r, θ) evolve according to the following dynamical system [39]: where the -terms are the noise terms. There are two fixed points to this system, and a limit cycle in the deterministic system without the -terms exists for b > 1 [39]. What is interesting that in the presence of sufficient noise and b < 1, there is directional flow between the fixed points and hence oscillations. If there is too little noise, the dynamical system cannot escape from the stable fixed points and does not oscillate; likewise, too much noise will also wipe out the oscillations. This model illustrates the hypothesis of stochastic resonance. Consistent with the Stochastic Resonance hypothesis, too little noise may not allow our dynamical system in Fig 1 to escape stable fixed points; likewise, too much noise may not allow There is a non-monotonic relation between the oscillatory signal strength in the normalized periodogram for the CCG protein species and the stochastic intracellular noise. The red curve is for the best fitting model ( Fig  3B), using the observed RNA/DNA and protein/DNA ratios of 128.7 and 412, respectively. The blue, green and yellow curves have a bigger stochastic intracellular noise than the best fitting model, by shrinking the protein/DNA and RNA/ DNA ratios 7-, 100-and 180-fold, respectively, relative to the observed values. The black curve has a smaller stochastic intracellular noise by increasing the protein/DNA and RNA/DNA ratios by a factor of 8. The underlying model was determined without bias correction using Eq (1) for 868 single cells. https://doi.org/10.1371/journal.pone.0196435.g008 Ensemble methods for stochastic networks with special reference to the biological clock of Neurospora crassa the dynamical system in Fig 1 to settle into a flow between stable fixed points. Yet, if there is the right level of noise, an oscillatory signal may emerge in the periodogram. Here we test this hypothesis in the N. crassa clock using the single cell data.
What we see in Fig 8 is exactly what we would predict under the Stochastic Resonance hypothesis. As the noise is increased above "normal", the peak in the periodogram is diminished, and the oscillatory signal is diminished. If the noise is decreased sufficiently from "normal", the peak in the periodogram is also diminished, and the oscillatory signal is diminished, consistent with the trapping of the real dynamical system in stable fixed points. Only at an intermediate level of noise do we see oscillations in single cells in Fig 8 in As a final note, in Fig 8 the protein/DNA and RNA/DNA ratios were varied to cause a change in the stochastic intracellular noise (Fig 5), while the reaction propensities were left constant. Some might argue from Eq (3) that the lead deterministic term in the CLE should be kept constant while varying the stochastic intracellular noise. In this way the limiting deterministic dynamics would be kept constant while the noise is varied. In comparing models with the same deterministic dynamics, it was necessary to vary the propensities so that the lead term in the CLE did not change using the rescaling method in the Materials and Methods. The result of this experiment was the same outcome as in Fig 8 with the only change that a much higher RNA/DNA and protein/DNA ratio (~3000) was needed to see the non-monotonic response to stochastic intracellular noise in Fig 8.

Discussion
In order to describe the stochastic behavior of single cell oscillators, a variety of methodological challenges needed to be surmounted. First and foremost, a scalable fitting method that would work with thousands of trajectories on single cells was needed. Existing methods do not operate on this scale. To overcome this challenge a fast scalable ensemble method for stochastic networks was developed using the periodogram or power spectrum of an average model trajectory to be compared with the average periodogram over single cells. The method is scalable to thousands or tens of thousands of cells on GPUs.
One of the limitations of this approach is that normal Metropolis Hastings ensemble methods [28] are not adequate for stochastic networks. Parallel tempering methods are shown to work well on the stochastic networks examined here (Fig 2). As more complicated alternatives to the model in Fig 1 are considered, surmounting the rate limiting step of generating a Gillespie trajectory may be achieved by other means than through GPUs alone. A promising avenue is generating approximations to the Gillespie trajectory with Quasi Steady-State Approximations (QSSA) to the full stochastic model considered here [58]. The right steady state approximation can help the identifiability of fitting methods for stochastic networks even in simple networks [59].
A second challenge in the fitting process is that there are two sources of error in single cell trajectories, the stochastic intracellular noise and the experimental error [13]. The latter introduces biases into the fitting process. We developed a statistical methodology to remove the experimental error from the fitting process to the periodogram of the model ensemble and thereby achieved a better specification of the model. This new procedure for removing the bias of the experimental error from the periodogram is presented and utilized.
Another challenge of stochastic network identification is characterizing the size of the cell, which sets the level of stochastic intracellular noise, a parameter missing in deterministic models [44]. We developed an empirical approach to identifying the cell size for a stochastic network from the protein/DNA and RNA/DNA ratios for the system under study. We showed that the fitted model was quite robust to variation in these ratios (Fig 6). A final challenge is developing protocols to make single cell measurements [60].
In the study of stochastic periodic phenomena we need both ways to fit such models with large amounts of single cell data as well as ways to test the success of these models. Three statistics provide useful summaries of periodic stochastic networks: period, amplitude, and phase of single cell trajectories. Fortuitously the periodogram is functionally independent of the phase of single cell trajectories [13]. A goodness of fit statistic was then developed using the phase, which included variation in phase across trajectories and the model ensemble.
We then applied these new methodologies to understand the oscillatory behavior of single cells in N. crassa. We found that parallel tempering was quite successful in identifying a stochastic network to describe the oscillator in single cells. In many cases the rate constants of macroscopic deterministic models based on millions of cells were similar to those in microscopic stochastic models of single cells (Table 2). Yet, there were also some key differences. For example, in macroscopic models the half-life of the wc-1 mRNA was measured to be quite long and found to be a critical feature in maintaining oscillations [28]. Yet at the single cell level the half-life was estimated to be much shorter. One possible explanation may be that single cells have other mechanisms to produce oscillations than those that operate at the macroscopic scale. For example, single cells experience stochastic intracellular noise that can move cells from one stationary state to another [39], and this behavior may generate oscillations. This phenomenon depends on not having too much or too little stochastic intracellular noise. We show this phenomenon of Stochastic Resonance [38,61] may play a role in seeing clock-like behavior in single cells (Fig 8). The kinetics of single cells may operate under a set of more relaxed rules for oscillation than those that apply to millions or tens of millions of cells.
Other hypotheses for explaining single cell oscillations need testing [61], such as quorum sensing [62] or cell-cycle gated circadian rhythms [63]. Cell-cycle gating was controlled by choosing a medium inhibiting cell division [13]. The quorum sensing hypothesis would require communication between cells. Yet, the cells examined here were isolated in droplets. In future work cells will be allowed to share the same droplet, and the associations between cells within droplets provide additional statistics to supplement the periodogram to distinguish between quorum sensing and stochastic resonance. Here we have created both an experimental and methodological arrangement to study the physical theory of Stochastic Resonance in living cells. For the first time, we have a window on a new set of dynamics at the single cell level that play by a different set of rules potentially than ensembles of millions of cells.