Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

Modern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases AURKB, NEK1, TTK, and WEE1 causes simulated HeLa cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.


Introduction
Originally proposed by Conrad Waddington in the 1950s [1] and Stuart Kauffman in the 1970s [2], analysis of biological processes such as cellular differentiation and cancer development using attractor models -dynamical systems whose configurations tend to 1/18 evolve toward particular sets of states -has gained significant traction over the past decade [3][4][5][6][7][8][9][10][11][12].One such attractor model, the Hopfield model [13], is a type of recurrent artificial neural network based on spin glasses.It was designed with the ability to recall a host of memorized patterns from noisy or partial input information by mapping data directly to attractor states.A great deal of analytical and numerical work has been devoted to understanding the statistical properties of the Hopfield model, including its storage capacity [14], correlated patterns [15], spurious attractors [16], asymmetric connections [17], embedded cycles [18], and complex transition landscapes [19].Due to its prescriptive, data-driven design, the Hopfield model has been applied in a variety of fields including image recognition [20,21] and the clustering of gene expression data [22].It has also been used to directly model the dynamics of cellular differentiation and stem cell reprogramming [23], as well as targeted inhibition of genes in cancer gene regulatory networks [24].
Techniques for measuring large scale omics data, particularly transcriptomic data from microarrays and RNA sequencing (RNA-seq), have become standard, indispensable tools for measuring the states of complex biological systems [25][26][27].
However, analysis of the sheer variety and vast quantities of data these techniques produce requires the development of new mathematical tools.Inference and topological analysis of gene regulatory networks has garnered much attention as a method for distilling meaningful information from large datasets [28][29][30][31][32][33][34].But because life is a non-equilibrium phenomenon that can only be truly understood at the dynamical level, there is a growing need to develop new methods for analyzing time series data.As experimental methods continue to improve, more and more high-resolution time series omics and even multi-omics [35] data sets will inevitably become available.Here, we demonstrate that time series omics data (in this case, transcriptomic data) representing cyclic biological processes can be encoded in Hopfield systems, providing a new model for analyzing the dynamics of, and exploring effects of perturbations to, such systems.
The dynamics of cell cycle (CC) -the process in which a parent cell replicates its DNA and divides into two daughter cells -is both scientifically interesting and therapeutically important.Even relatively simple simulated systems such as an isolated, positively self-regulating gene subject to noise can exhibit rich dynamical behavior [36]; but like many biological processes, the proper functioning of CC requires the decentralized, coordinated action of hundreds of genes.CC thus provides researchers with a convenient case study of self-organization in a noisy environment.CC is also an upregulated process in many forms of cancer [37][38][39][40], and control of CC using pharmaceutical compounds such as kinase inhibitors is a critical goal in cancer research.The combinatorics of selectively inhibiting sets of genes makes exhaustive experimental searches difficult or impractical [41].However, network-based mathematical models such as the one presented here enable researchers to examine the effects of perturbations to complex systems [42,43] by testing potential inhibition targets in silico.The efficacy of these predictions can then be experimentally validated or invalidated, providing new information and insights to further refine models.
The remainder of this article is structured as follows.In the Models section we first discuss how periodic genes were identified in the time series gene expression data sets, and how Boolean attractors were extracted from the continuous data (explained in greater detail in the Methods section).We then introduce the Hopfield model and discuss the specific form of the coupling matrix used in this application.We discuss how to interpret the results of Hopfield simulations in the context of gene expression and cells.We also explain the objective function used by the genetic algorithm to identify potential inhibition targets, designed with the intention of disrupting CC.In the Results and Discussion section, we show that this model qualitatively recreates experimental gene expression data, and we demonstrate and analyze some dynamical properties of the cerevisiae) [44] and Dominguez et al. (HeLa, human cervical cancer) [45].For consistency and due to its higher resolution, the S. cerevisiae data set was chosen to produce all images and movies in this article, but both data sets were analyzed.In order to encode these CC data sets into the Hopfield model, periodic genes needed to be identified, their frequencies and phases computed, and their expression converted from continuous to Boolean form.As detailed in the Methods section, decaying sinusoids were fitted to the trajectory of each gene i, and genes with sufficiently high quality fits were kept.This resulted in 379 periodic genes in S. cerevisiae and 519 periodic genes in HeLa cells.Figure 1A shows a heat map of the expression of all periodic genes detected in the Eser data set sorted by their fitted phases, and Figure 1B shows the same genes with the fitted expression curves.These fitted curves were converted from continuous values x i (t) ≥ 0 to Boolean values ξ i (t) = ±1 (over/underexpressed) as shown in Figure 1C.Finally, one CC period was divided into eight uniformly spaced states These states, shown in Figure 1D, were used as the eight attractor patterns in the Hopfield model.

3/18
The Hopfield model The Hopfield model [13] is an Ising model whose configuration is defined by N spins σ i (t) at integer time t.The state of each node (gene) takes one of two values, σ i (t) = ±1 (over/underexpressed).The coupling matrix J ij defines the strength and sign of the signal sent from node j to node i, and its construction is discussed in the following subsection.The total field at node i at time t is given by where j J ij σ j (t) is the internal field at node i due to its coupling with all nodes j and h ext i is an optional external field applied to node i representing the action of therapeutic compounds, e.g.kinase inhibitors.The dynamical update rule is given by where the factor of 2 in the exponent is conventional and T is an effective temperature representing the level of noise (not a physical temperature).Biologically, this noise represents the effects of all kinds of biochemical fluctuations present in cells.Note that for h i (t) → ±∞, σ i (t + 1) = ±1; for T → ∞, σ i (t + 1) = ±1 with equal probability; and for T → 0, σ i (t + 1) = sign (h i (t)).
The update rule from Eq. 2 may be implemented in various ways.The synchronous scheme updates the state of all nodes in the system at every time step, but this is sensible only if the simulated system has a central pacemaker coordinating the activity of all nodes.A more appropriate choice for decentralized systems like gene regulatory networks is the asynchronous scheme in which the state of a randomly chosen subset of nodes is updated at each time step.Here, we use the asynchronous scheme with update probability 0.2 for each node.

Coupling matrix
In the canonical Hopfield model, the coupling matrix is constructed to store a set of p linearly independent (i.e.distinct) Boolean patterns ξ µ i = ±1 as point attractors, where i = 0, 1, . . ., N − 1 is the node index and µ = 0, 1, . . ., p − 1 is the pattern index.The point attractor coupling matrix J ij is given by where [15,46] With this coupling matrix and T = 0, if at some time t the configuration is given by σ i (t) = ξ µ i + δ i for a small perturbation δ i , then lim t→∞ σ i (t) = ξ µ i .Note that this formulation means ±ξ µ i are both attractors of the system.
A simple modification [19] to Eq. 3 produces a cyclic attractor coupling matrix J ij , constructed according to

4/18
At T = 0, this coupling matrix cyclically maps through the sequence of p patterns or their negatives.For the remainder of this article, all attractor indexing is understood to be modulo p.
A delayed cyclic Hopfield model may be constructed by combining the point and cyclic attractor matrices into one coupling matrix, for an adjustable transition strength parameter λ with 0 ≤ λ ≤ 1.If σ(t) = ξ µ i , λ 1, and T = 0, the point attractor term dominates and σ i (t) = σ(t + ∆t) for all ∆t = 1, 2, . ... If T > 0, however, stochastic fluctuations eventually push the configuration out of the basin of attraction of the µ th attractor and into the (µ + 1) th basin, then eventually to the (µ + 2) th basin, and so on.The dynamics of the delayed cyclic Hopfield model are thus governed by noise-induced transitions.
Due to the sinusoidal nature of the gene expression in these CC data sets, however, the attractors are structured such that ξ µ i = −ξ µ+4 i , making Q µν rank deficient and thus noninvertible.Because the definition of J ij automatically guarantees that if any sequence {+ξ µ i } is an attractor, then {−ξ µ i } is also an attractor, encoding the sequence automatically encodes the sequence In this special case of sinusoidal trajectories, the limits of summation in Eqs.3-5 need only run over the first four indices, µ = 0, 1, Finally, to reflect the fact that real gene regulatory networks are sparse, weak edges were removed by setting all elements of the coupling matrix with |J ij | < median(|J|) to zero, where |J| is element-wise absolute value.

Biological interpretation of the dynamics
Extracting biological meaning from this model requires defining some convenient coarse-grained quantities.The overlap of the state vector σ i (t) with the µ th pattern is given by where −1 ≤ m µ (t) ≤ +1.The overlap measures the similarity between the (discretized) experimental and simulated gene expression profiles, and m µ (t) = +1 means there is perfect agreement between the simulated cell's expression and pattern µ.
A single configuration vector σ i (t) represents the expression profile of a single cell.
For many cells κ, let σ ik (t) be the expression of gene i in cell k.Define as the overlap of cell k with attractor µ.Because the microarray and RNA-seq data used here report the gene expression averaged over many cells, it is appropriate to define the population-averaged (i.e.ensemble-averaged) expression, 5/18 Rather than work with a continuous vector quantity like m µ k (t), each cell can simply be identified as being in a discrete phenotypic state at any given time.Define the state of cell k as i.e. the index of the attractor with maximum overlap, which may be interpreted as cell k's phenotype.To better understand population-level dynamics, define the discrete probability distribution P µ (t) as the fraction of κ cells with s k (t) = µ; that is, P µ (t) is the probability that a randomly chosen cell is in state µ at time t.Finally, define the time-averaged distribution of states as for a window of time τ .
For each data set, T and λ were tuned to the "edge of chaos" [47] such that the cyclic attractor was preserved and the time between transitions was approximately constant, but the system was sensitive enough to perturbations that some targeted inhibitions produced noticeable changes in P µ T .See S1

Gene inhibition optimization
In this application, the goal is to identify perturbations that halt or retard the encoded cyclic attractor.A standard genetic algorithm (GA; explained in S1 Text) was employed to identify an optimal control field h opt i that maximized a given objective function f (h ext i ), where h ext i is the control vector given by for a fixed number of targets (nonzero elements) n targ .Only negative control fields are used here to simulate the effects of targeted gene inhibition from pharmaceutical compounds.The objective function used here is P µ T , meaning that the optimal control field maximizes the time-averaged number of cells occupying a particular attractor state µ.This search was conducted across all attractors µ to determine the controllability of each attractor state.

Results and Discussion
Note: The supporting information can be downloaded from this temporary Dropbox link: http://bit.ly/2tEekWK.For convenience, the four supplementary videos can also be viewed using the following unlisted YouTube links: 1. S1 Video: https://youtu.be/LOUjRftAeYMThe system began with the configuration σ i (0) = ξ 0 i and was allowed to evolve according to the Hopfield signaling rules with zero external field, mapping cyclically through the set of eight attractors.The pattern and cycle durations vary due to the system's stochasticity.

Dynamical behavior
Figure 2 shows the time evolution of s k (t) for a single simulated cell using the attractors derived from [44].As expected, the system progresses cyclically through the eight attractor states.The duration of each cycle varies somewhat due to the stochasticity in the update rule from Eq. 2.
Although the gene expression for each simulated cell k has σ ik (t) = ±1, the population-averaged expression has −1 ≤ σ i (t) K ≤ +1, and for many cells initially synchronized with σ ik (0) = ξ 0 i for all k, σ i (t) K successfully recovers the experimentally observed decaying sinusoidal gene expression.Figure 3 shows a comparison between the experimental expression x i (t) from the Eser data set and the mean simulated expression σ i (t) K with κ = 50 for i = SLD2 , one of the genes responsible for initiating DNA replication in S. cerevisiae [48,49].The simulation time t was rescaled by eye to align the simulated and experimental curves.
Trajectories can be visualized by projecting them onto the first two principal components (PCs) of the attractor configurations.Figure 4 shows the eight attractors as stars, and a single cell trajectory (left panel) and 100 cell trajectories (right panel) with random initial states as curves with line segments colored according to s k (t) (as computed in the full N -dimensional space).Although the cells begin nearly equidistant from all ξ µ i , they quickly relax into encoded cycle.S1 Video shows an animation of a system of κ = 50 cells with random initial conditions projected onto the same PCs, where cells (circles) are colored according to s k (t).As with the cells shown in Figure 4, all initially random configurations eventually converge to the cycle.S2 Video shows an animation of κ = 50 cell trajectories with σ ik (0) = ξ 0 i .As time progresses, the phases of the initially synchronized cells slowly decohere because cells stochastically and independently transition between attractors due to the finite temperature in Eq. 2.
S3 Video demonstrates the effect of temperature on the dynamics.50 cells were given random initial states, and the temperature was increased and decreased in steps.Cells rarely escape the eight attractor states for T = 0.045, and one cell becomes stuck near the center in a spurious attractor (unintentional metastable states that arise from the model's nonlinearity).At T = 0.06, fluctuations allow the cells to transition somewhat regularly through the encoded cycle, and the cell trapped in the spurious attractor eventually escapes and joins the cycle.At T = 0.09 the cells begin to 7/18 The measured expression of the S. cerevisiae gene SLD2 from [44] was scaled to the range [−1, +1] and is shown as a black beaded curve, and the population-averaged expression of the same gene as defined in Eq. 12 for κ = 50 cells is shown in green, with the t axis rescaled by eye to match experimental time.Transient points (red X's) were ignored when fitting Eq. 17.
noticeably diverge from the eight attractor states, but still collectively display a net clockwise flow.The noise is too great for the cells to follow the cycle for T = 0.15, but lowering the temperature again returns cells to the cycle.This illustrates the fact that the cycle is preserved only for intermediate temperatures: cells become "frozen" in intended or spurious attractor states at low temperatures, but at high temperatures the noise is too great and the couplings between genes become irrelevant to the dynamics.

Optimal control fields
The GA was used to identify some effective combinations of gene targets that slowed progress through the cyclic attractor for varying numbers of targets, n targ .Because each gene has one of eight discrete phases, there can be multiple equivalent optimal control sets.Here we present and discuss only some of the optimal sets.Extensive tables of results can be found in S1 File.
The GA found that inhibiting the set of eight S. cerevisiae genes HEK2, PRR1, QRI1, RFC4, STB1, TDA7, VPS17, and ZIM17 was sufficient to trap ∼95% of cells in the µ = 7 state.The effects of this control field on the time evolution of P µ (t) for κ = 50 and κ = 5000 are shown in Figure 5. Cells were given random, independent initial states at t = −200 (not shown), quickly settling into the cyclic attractor with evenly distributed phases so that P µ (0 ≤ t < 200) ≈ 1/8.The control field was activated at t = 200, causing the cells to accumulate in the µ = 7 state.The field was then disabled at t = 1000, allowing the cells to resume cycling with initially synchronized phases, as shown by the sequence of oscillations in P µ (t > 1000).The stochastic nature of the transitions causes the cells' phases to slowly spread so that , eventually returning the system to a desynchronized state.
The effects of this control field can also be visualized using a PC projection as shown in Figure 6 and S4 Video.The same set of κ = 50 trajectories from Figure 5 was projected onto the attractors' PCs, with cells colored according to s k (t).The control field manages to fix most cells near the µ = 7 state, but as shown in the t = 910 panel in Figure 6, fluctuations occasionally push individual cells out of the µ = 7 basin and back into the cycle.
Four further searches were constrained to inhibiting 2, 4, 6, and 8 out of the 24 periodic kinases [50,51] in HeLa cells.In all cases, the GA found µ = 2 (M phase) to be 8/18  The trajectories used to make the κ = 50 panel of Figure 5 were projected onto the the first two principal components (PCs) of the attractor array ξ µ i (labeled stars).Cells (circles) are colored according to the closest attractor as computed by Eq. 13.When the external field is activated, most cells become trapped in the µ = 7 state, although occasionally cells break from the group and complete another circuit before becoming trapped again.After the external field is removed, the cells eventually return to a desynchronized state.See S4 Video for an animation of these trajectories.1. HeLa kinase inhibition search results.The genetic algorithm was used to identify the best combinations of four kinases to inhibit in order to freeze cells in each attractor, along with the optimal score (the time-averaged fraction of cells occupying that attractor state).Phases marked with an asterisk (*) were not found to be significantly enriched for any CC phase, and so are labeled as being between the previous and next known phases.
The structure of the coupling matrix was probed using centrality measures from complex network theory [52] by taking the absolute value of the coupling matrix's elements as edge weights.Katz centrality and PageRank were found to be poor predictors of optimal target sets, but betweenness centrality proved to be a very effective predictor.BRD4, MAPK1, NEK7, and YES1 have the four highest betweenness centralities in the network (with a mean betweenness centrality of 2.4 × 10 −3 and a p-value of 4.7 × 10 −4 , using the inverse of the absolute value of the coupling strengths as weights), indicating that this set of kinases forms a kind of bottleneck in the transmission of signals through the network.Structural network measures, however, do not account for the time-dependent expression of targeted genes, nor how downstream gene expression reacts to upstream perturbations.Controlling nonlinear dynamical systems requires investigating both the structure of the underlying network and the specific form of interactions as defined through the signaling rules.

Conclusions
Above we presented a delayed cyclic Hopfield model designed to store CC time series gene expression data from synchronized S. cerevisiae and HeLa cells, and the behaviors of both individual cells and populations of cells were studied.The dynamics of populations of cells successfully recreated the experimental gene expression data, including the slow decoherence of initially synchronized cells due to the stochastic transitions between attractors.Optimal control fields that freeze or stall the cyclic attractor by inhibiting only a small number of genes were identified.These predictions could be experimentally validated or invalidated using kinase inhibitors or knockout studies.
Admittedly, there are several limitations to this model.The specific results reported here depend to some degree on the free parameters T , λ, p, and the node update probability.Tuning the system to match the behavior of the underlying data places some constraints on these parameters, but a more detailed study of the sensitivity of the results to these parameters could prove useful.Additionally, although using the temporal ordering of the time series gene expression samples provides more information 11/18 about potentially causal relationships than static samples, the Hopfield model is ultimately an effective model that builds gene-gene couplings from pairwise correlations in gene expression, thereby capturing direct, indirect, and spurious relationships between genes.Independently derived network information with experimentally confirmed molecular regulatory interactions could perhaps be used to refine the construction of the coupling matrix.
Our approach can be generalized and improved in many ways.This incarnation of the model causes simulated cells to continuously undergo CC with no G0 (resting) phase.Adding a relatively stable G0 attractor between the M and G1 phases could cause cells to pause between cycles.A GA search could then be conducted to find the best sets of inhibition targets to freeze cells in the G0 state, or to find the best sets of targets to stimulate entry into CC, mimicking the effects of environmental signals such as growth factors.
We chose to discretize the continuous gene expression data using a traditional two-state model, which assumes that each gene is either fully activated or fully deactivated.Using a multi-level Hopfield model [53] could better reflect the continuous nature of gene expression data and potentially improve the search results.This model can also incorporate additional omics information, e.g.proteomics and metabolomics, simply by increasing the number of nodes in the system.We plan to explore this option as more multi-omics time series data sets become available.Single-cell experimental techniques and analytical tools are also rapidly improving in quality, decreasing in cost, and gaining in popularity [54][55][56], and using techniques like pseudo-temporal ordering [57] could allow the Hopfield model to encode single-cell RNA-seq data as well.
Although the above simulated populations of cells exhibit intriguing dynamical and statistical properties, they behave as completely homogeneous, non-interacting particles.The importance of cell-cell communication and interactions in populations of cells has been demonstrated in a variety of systems including bacterial quorum sensing [58] and community spatial patterning [59], neuron synchronization in circadian rhythm [60], and various forms of cancer [61][62][63][64][65].As with many nonlinear systems, even seemingly minor changes can produce dramatically different outcomes.More complex extensions to our model could incorporate cell-cell communication by, for example, adding couplings between known signaling molecules and receptors between different cells, and could even allow for interactions between heterogeneous cell types.This would increase the computational complexity of the model, but could better reflect the underlying biology.

Gene expression fitting
In order to encode these CC data sets into the Hopfield model, periodic genes needed to be identified, their frequencies and phases computed, and their expression converted from continuous to Boolean form.SciPy's Trust Region Reflective method [66] was used to identify genes i with periodic expression x i (t) by fitting to the form for amplitude a i , decay rate b i , angular frequency ω i , phase φ i , and asymptotic mean expression x i0 .(Because the HeLa data set has fewer time points (14) than the S.
cerevisiae data set (41), analysis of the HeLa data set was preceded by a smoothing step using a simple box filter to aid in fitting.)The first several time points were ignored to avoid fitting the parameters of Eq. 17 to chemically perturbed (transient) states.A gene 12/18 was labeled periodic if the maximum relative uncertainty in its parameters from the fit, was less than the thresholds defined in S1 Table .Once all frequencies {ω i } for periodic genes were computed, the frequency was fixed to the mean frequency ω and the fits were recomputed for each periodic gene using the form thus producing the final set of continuous phases {φ i }. Figure 1A shows a heat map of the expression of all periodic genes detected in the Eser data set sorted by their fitted phases, and Figure 1B shows the same genes with the fitted expression curves.These fitted curves were converted from continuous values x i (t) ≥ 0 to Boolean values ξ i (t) = ±1 (over/underexpressed) by assigning as shown in Figure 1C.Finally, one CC period was divided into eight uniformly spaced states {ξ µ i } = {ξ 0 i , ξ 1 i , . . ., ξ 7 i }.These states, shown in Figure 1D, were used as attractors in the Hopfield model.

Determining cell cycle phase
To determine the approximate CC phases for each attractor µ in the HeLa data set, over-representation analysis was conducted using the hypergeometric distribution to calculate p-values with the Benjamini-Hochberg procedure [67] to correct for multiple hypothesis testing with false discovery rate < 0.05, using all genes i with ξ µ−1 i = −1 and ξ µ i = +1 as the µ th input set and using all detected cyclic genes as the background.In the Dominguez HeLa data set, µ = 0 was enriched for the gene ontology (GO) terms "negative regulation of cell proliferation" (p = 2.3 × 10 −2 ) and "DNA double-strand break repair" (p = 3.1 × 10 −2 ), corresponding to the G2/M checkpoint.µ = 2 was enriched for the GO term "nuclear envelope breakdown" (p = 4.2 × 10 −3 ), corresponding to the preparation of chromosome condensation and cellular mitosis.µ = 5 ("TP53 regulates transcription of DNA repair genes," p = 5.0 × 10 −3 ) and µ = 6 ("DNA strand elongation," p = 2.0 × 10 −3 ) correspond to the G1/S phase checkpoints and the elongation of DNA in S-phase respectively.The database yeastgenome.org[68] was used to determine the CC phases for the Eser S. cerevisiae data set.µ = 0 is enriched for the GO term "DNA replication" (p = 2.02 × 10 −12 ), indicating an attractor in the S phase of CC. µ = 2 is enriched for "mitotic spindle organization" (p = 2.3 × 10 −3 ) indicating the beginning of mitosis in S. cerevisiae.µ = 6 from Eser is enriched for the GO term "pre-replicative complex assembly involved in nuclear cell cycle DNA replication" (p = 3.0 × 10 −5 ), indicating an attractor at the end of G1 phase as the cells prepare for DNA replication.

Supporting Information S1 Video
50 cell trajectories with random initial conditions.Data was projected onto the first two principle components of the attractor array ξ µ i .Attractors are shown as stars, and cells are shown as circles.Cell colors are assigned using s k (t) as measured in the full N -dimensional space.All cells k were given random initial conditions σ ik = ±1 with equal probability for all i and k, but eventually converge to the cyclic attractor.

13/18
S2 Video 50 cell trajectories with identical initial conditions.See the caption of S1 Video for an explanation of the projection and colors.All cells k were initially synchronized with σ ik (t) = ξ 0 i , but progress through the cycle stochastically, causing the distribution of s k (t) to broaden.

S3 Video
Effects of temperature on 50 cell trajectories.See the caption of S1 Video for an explanation of the projection and colors.All cells were given initial random states, and the temperature was increased and decreased in steps as shown in the top panel.

S4 Video
Principal component projection of 50 cells being synchronized.See the caption of S1 Video for an explanation of the projection and colors.This video is an animation of the trajectories used in Figures 5 and 6.

S1 Text
Explanation of genetic algorithm.

S1 Table
List of parameters used for each data set.

S1 File
Zip file containing results of genetic algorithm's gene inhibition searches.

Figure 1 .
Figure 1.Obtaining attractors from expression data.(A) Heat map of the expression of all detected periodic genes from [44] sorted by their fitted phases.(B) Fitted gene expression.(C) Boolean form of fitted expression, separated into periods by dashed black lines.(D) Final set of p = 8 attractors taken from one period.

Figure 3 .
Figure 3. Measured and simulated gene expression.The measured expression of the S. cerevisiae gene SLD2 from[44] was scaled to the range [−1, +1] and is shown as a black beaded curve, and the population-averaged expression of the same gene as defined in Eq. 12 for κ = 50 cells is shown in green, with the t axis rescaled by eye to match experimental time.Transient points (red X's) were ignored when fitting Eq. 17.

Figure 4 .Figure 5 .
Figure 4. Principal component projection of unperturbed cell trajectories.The simulated single cell (left panel) and 100 cells (right panel) began with random initial states (projected near the center of the plot), but quickly settled into the encoded cycle.Line segments were colored according to s k (t), i.e. which of the eight attractors (labeled stars) had maximum overlap at time t.

Figure 6 .
Figure 6.Principal component projection of 50 cell trajectories.The trajectories used to make the κ = 50 panel of Figure5were projected onto the the first two principal components (PCs) of the attractor array ξ µ i (labeled stars).Cells (circles) are colored according to the closest attractor as computed by Eq. 13.When the external field is activated, most cells become trapped in the µ = 7 state, although occasionally cells break from the group and complete another circuit before becoming trapped again.After the external field is removed, the cells eventually return to a desynchronized state.See S4 Video for an animation of these trajectories.
Table for a list of parameters used for each data set.
Figure 2. Unperturbed cell state versus time.Boxes indicate s k (t), i.e. the index of the attractor with maximum overlap at time t.