Fig 1.
Schematic of the three stages of the Inherent dynamics pipeline in which each step uses different features of the input gene expression time series data.
In the node finding step, each gene expression time trace is used independently to score the strength of periodicity. Genes with stronger periodicity and higher amplitude are hypothesized to be part of the core network and are correspondingly ranked higher. Top ranked nodes are passed into the edge finding step, where time series are used in pairs to score the likelihood of a positive or negative regulatory event in one or the other direction. A ranking is determined using high to low likelihoods and top ranked edges are passed to the network finding step, where subsets of gene expression data consisting of three or more time traces are compared to the global dynamics of network models to produce top ranking networks that are consistent with the order of peaks and troughs across the time series. Statistics of these top ranked networks are used to suggest experimental intervention at the node level.
Table 1.
Metrics table: Key terminology for scoring and ranking nodes, edges, and networks in the Inherent dynamics pipeline in order of computation.
Fig 2.
Schematic of the network finding step.
From upper left in a horseshoe to lower left: The node finding step produces a thresholded list of nodes that are passed into edge finding. Edge finding ranks all possible edges between these top nodes and the very top ranked of these are used to create a seed network. The seed network is the initial condition for a neighborhood search in network space. In this neighborhood, a collection of strongly connected networks are sampled and scored according to user-specified choices of the oscillation and pattern match scores in Table 1. The participation and prevalence of nodes and edges in the top ranked networks globally matching the dynamics in the experimental data permit a reordering of nodes and edges that provides hypotheses for experimental guidance.
Fig 3.
Panel A. Ground truth regulatory network, where sharp arrows indicate activation (or positive regulation) and blunt arrows indicate repression (or negative regulation). Panels B-D. Synthetic time series from 3 different parameterizations of a single Hill model (see Methods Section 4.4.1). Panel E. The subnetwork formed from the network in panel A of this figure by removing the node D. Panels F-H. Synthetic time series from the same parameters as in panels B-D of this figure, but excluding node D.
Fig 4.
Histograms of the oscillation and pattern match scores of collections of 2000 networks using in panel A, the top-ranked LEM edges of a simulation for the parameterization in Fig 3B and in panel B, the bottom-ranked LEM edges for the same edge finding step.
Fig 5.
Local versus global node participation scores.
Mean ± standard deviation node participation scores for the simulations shown in Fig 3B (panel A); Fig 3C (panel B); Fig 3D (panel C). Each synthetic dataset was run through the edge and network finding steps five times. The mean (blue dots) and standard deviation (blue bars) of the local node participation scores across the five runs is plotted against the mean and standard deviation of the global node participation scores. The node participation score for each simulation is computed only over the edges in the intersection of the top-ranked LEM edges of all five simulations. This excludes edges that do not have sufficiently high local edge ranks in all five simulations. Nodes located above the red diagonal line indicate an improved global node participation score versus their local node participation score. Notice that the node G is noticeably downranked in all three panels.
Fig 6.
Summary of yeast cell cycle rank changes for substantiated edges in global edge ranking.
The table shows the amount of information provided in each scenario. The box plots show the distributions of the median local (blue) and global (orange) edge rankings for the subset of substantiated edges with nonzero edge prevalence scores. A lower median indicates a better ranking and therefore a better result. In both S− scenarios, the global edge ranking outperforms the local edge ranking.
Fig 7.
Local versus global node participation scores for the yeast cell cycle network.
Mean ± standard deviation node participation scores for the four scenarios S+A+, S+A−, S−A+, and S−A−. Each scenario was run through the edge and network finding steps five times. The mean (blue dots) and standard deviation (blue bars) of the local node participation scores across the five runs is plotted against the mean and standard deviation of the global node participation scores. The node participation score for each simulation is computed only over the edges in the intersection of the top-ranked LEM edges of all five simulations. This excludes edges that do not have sufficiently high local edge ranks in all five simulations. Nodes located above the red diagonal line indicate an improved global node participation score versus their local node participation score.
Fig 8.
The true negative time series G for the synthetic network.
Table 2.
Synthetic network ODE parameters.
Table 3.
Edge finding hyperparameters for synthetic network and yeast cell cycle applications.
Table 4.
Network finding hyperparameters for synthetic network.
Table 5.
The list of 9 substantiated and 2 unsubstantiated yeast cell cycle genes.
Table 6.
Substantiated edges used in the yeast cell cycle results as determined from YEASTRACT and [35].
Every node acts both as a target and as a source. When annotations are specified, HCM1, NDD1, and SWI5 are activators only and NRM1, CLN3, and WHI5 are repressors only, as can be verified from the table. For example, HCM1 activates NDD1, NRM1, and WHI5, but has no repressing activity. All other nodes may be either activators or repressors.
Table 7.
Network finding hyperparameters for yeast cell cycle.