Hub-Centered Gene Network Reconstruction Using Automatic Relevance Determination

doi:10.1371/journal.pone.0035077

Table 1.

Evaluation of Predicted Networks.

More »

Expand

Table 2.

Overview of analyses on network inference.

More »

Expand

Table 3.

Overview of analyses on hub identification.

More »

Expand

Figure 1.

Yeast Cell Cycle Core Network.

Core yeast cell cycle network, as derived by [33] from literature. There is one external checkpoint, cell size, which initiates progression through the cell cycle. Activations are shown in green, inhibitions in red, and self-regulations in yellow.

More »

Expand

Figure 2.

ROC and PR results, simulated data.

The Figure shows receiver operator characteristic (ROC) and precision to recall curves (PR) for network reconstruction on simulated data, for different network sizes and different numbers of time points. A, B: ROC and PR curves for the network with 11 genes, C,D: ROC and PR curves for network with 100 genes, E.F: ROC and PR curves, respectively, for network with 1000 genes. Black: 20 time points used for network reconstruction, red: 40 time points, blue: 200 time points. It can clearly be seen how performance deteriorates with increasing network size and decreasing number of different time points. We note that, due to the three-class classification problem underlying the graphs, random guessing of network topologies would not yield a diagonal line in the ROC plots, but a significantly lower line with an area under the curve of approximately 0.33.

More »

Expand

Figure 3.

Inferred degree density distribution on scale-free and random networks.

To test whether artificial hubs are generated in network inference due to their used prior distribution, we performed a comparative analysis on two different 1000 gene networks. The first network is the JumboSF network, a large scale-free network with central hub genes. The second network is a random Erdös-Rényi network, which does not contain any hubs. Network inference was performed using identical parameter values for the hyperparameters on both data sets. The figure shows the degree distribution of the inferred networks, in dependence of the number of time points used for network inference (left: JumboSF, right: random network). The plot shows that, provided sufficient data is available, the prior distribution does not lead to artificial hubs. On the other hand, if only little data is used for network inference, the prior starts dominating the results, as one would expect.

More »

Expand

Table 4.

AUC results for network reconstruction and hub identification on simulated data.

More »

Expand

Figure 4.

Effect of Starting Point on obtained AUC values.

Shown are the distribution of AUC values (left: ROC, right: PR) of 1000 gradient descent runs, for randomly chosen starting values for , on the yeast core network. For the parameter vector , randomly chosen values within ranges of , and were used as a starting points for the calculations with CG. This was done for each of the suggested ranges 1000 times, and AUC ROC and AUC PR values were computed. The boxplots show the comparison between the different AUC values for these calculations. It can be clearly seen, that randomly sampled start values close to zero allow the approach to obtain better results for the optimal values of w. If the range of initial values for is too large, the optimization ends in suboptimal local optima corresponding to overly complex networks with many non-zero edges.

More »

Expand

Figure 5.

Multimodal Distributions in the Yeast Cell Cycle Core Network.

Shown are Dip scores for the distribution of sampled edge weigths from the Markov chain. The Dip value measures the departure of an empirical distribution from the best fitting unimodal distribution. Large scores indicate a stronger deviation from unimodality. Rows in the diagram represent source (regulating) genes for edges, columns the target (regulated) genes. Colors have been used to indicate the magnitude of the deviation from unimodality.

More »

Expand

Figure 6.

Hub Genes in the Yeast Cell Cycle.

Histogram of reconstructed regulation strength for the full yeast cell cycle dataset. Negative weights correspond to inhibitions, positive weights to activations. Weights in the vicinity of zero indicate no regulation between two genes. The plot shows the distribution of regulation strengths between any two genes, showing clearly that only few genes exhibit strong regulations. The inset shows a histogram of the corresponding hyperparameters (equation 6), controlling the magnitude of the regulations exhibited by a particular gene. As can clearly be seen, most genes have only small importance corresponding to low values of , and only few genes are assigned large values of and correspondingly large weights on their outgoing connections.

More »

Expand

Figure 7.

Receiver Operator Characteristic Analysis for the Prediction of Hub Genes in the Yeast Cell Cycle.

Genes were split in two groups “hub” and “non-hub” based on a threshold on the degree of the gene in the literature derived network, and ROC curves were computed by then varying the threshold on . ROC curves were summarized for each using the area under the curve. The plot shows over . The red curve shows results for the inferred network using the method presented, the black dotted line shows results using the method with a Normal prior, the brown dashed line using a L1 “sparseness” prior distribution. The dashed blue line was obtained using Banjo, the dot-dashed green lines shows results of ARACNE, the dot-dashed pink line represents results of MRNet. The grey dashed line corresponds to the expected value for randomly guessing a network. Larger AUC values indicates better performance.

More »

Expand

Figure 8.

Sensitivity Analysis for the Network Inference performance on synthetic data with respect to parameters

and . Plots comparing distributions of AUC values for ROC graphs for different a and r settings (x- and y-axis), for the synthetic networks of sizes 11, 100 and 1000, using data sets with 20 and 200 time points, respectively. The plots show that results are relatively insensitive over a large range of parameters. Smaller values of the hyperparameter correspond to a more peaked prior distribution, resulting in “sparser” networks. Correspondingly, the figure shows that smaller values of should be chosen for larger networks. Although the effect of changing seems not as pronounced, larger values of correspond to a narrower prior distribution, and should therefore be used if fewer data are available to avoid overfitting.

More »

Expand

Figure 9.

Sensitivity Analysis for the Prediction of Hub Genes in the Yeast Cell Cycle with respect to parameters

and . To assess the effect of changes of model parameters and , both parameters were varied individually and together by up to percent. Network reconstruction was restarted for each combination of values for and , and average AUC values were computed for the reconstructed networks in comparison to the STRING network. The figure shows the resulting AUC values over , indicating that results are relatively insensitive over a wide range of parameter values.

More »

Expand