A novel mutual information-based Boolean network inference method from time-series gene expression data

doi:10.1371/journal.pone.0171097

Fig 1.

Overview of a Boolean network inference problem.

An unseen target network G(V,A) produces a time-series gene expression dataset that is converted to a Boolean time-series dataset by a discretization method. An inference algorithm trains the Boolean dataset as an input and infers a Boolean network G′(V′,A′) as an output. The inference performance is evaluated by a structural accuracy comparing the inferred connections A′ to the true connections A and by a dynamics accuracy comparing the predicted Boolean time-series data V′(t) to the observed data V(t).

More »

Expand

Fig 2.

Pseudocodes of two main subroutines in MIBNI.

(a) MIFS subroutine. This returns k most informative variables among W for a given target variable v₀. (b) SWAP subroutine. This improves the dynamics accuracy by iteratively swapping a variable from the set of selected variables S with another variable from the set of unselected variables W.

More »

Expand

Fig 3.

Overall framework of the MIBNI algorithm.

Every Boolean variable with a single-step time lag (v_i(t + 1)) is specified as a target variable. If the entropy value of the target variable is zero, it means that there is no regulatory gene. Otherwise, MIBNI selects an initial set of k most relevant variables (S) among W using the MIFS subroutine for a given a target variable v_O and a set of candidate Boolean variables W = {v₁,v₂,…,v_N}. Then, some variables in S can be replaced with the same number of other variables in W\S using the SWAP subroutine to improve the identification of input variables. The process is repeated by increasing k until an optimal set S is found or until k equals a parameter K.

More »

Expand

Fig 4.

Comparison of precision, recall and structural accuracy between MIBNI and other methods in BA random networks.

Results of (a) precision, (b) recall, and (c) structural accuracy, respectively. In each figure, 300 BA random networks with different network sizes (|V| = 10,20,…,100) were used as target networks, and a total of 16,500 nodes in those networks were classified into nine groups by the number of incoming links. The maximum time step was set to |V| + 10 in generating artificial gene expression data. Y-axis values show the average precision, recall, and structural accuracy values with respect to the target genes in each group. Error-bars mean the standard deviations. MIBNI showed the best performance in terms of precision, recall, and structural accuracy.

More »

Expand

Fig 5.

Comparison of dynamic accuracies between MIBNI and other methods in BA random networks.

(a) Results versus the numbers of incoming links. A total of 300 BA random networks with different network sizes (|V| = 10,20,…,100) were used as target networks, and 16,500 nodes in those networks were classified according to the number of incoming links. (b) Results versus the network sizes. For each different number of nodes, 30 BA random networks were examined. Error-bars mean the standard deviations. In each figure, the maximum time step was set to |V| + 10 in generating artificial gene expression data.

More »

Expand

Fig 6.

Performance improvements by SWAP subroutine.

Two versions of MIBNI, in which the SWAP routine was included and not included, respectively, were compared with respect to dynamics accuracy. (a) Results versus the numbers of incoming links. Three hundred BA random networks with different network sizes (|V| = 10,20,…,100 and |A| = 2 ∙ |V|) used as target networks, and 16,500 nodes in those networks were classified according to the number of incoming links. (b) Results versus the network sizes. For each different number of nodes, 30 BA random networks were examined. The Y-axis value indicates the average dynamic accuracy of the nodes in each group, and error-bars mean the standard deviations. In each figure, the maximum time step was set to |V| + 10 in generating artificial gene expression data.

More »

Expand

Fig 7.

Comparison of the running times among the network inference methods.

The Y-axis values represent the ratios of the average running times of six other methods over that of MIBNI. The Y = 1 line denotes the baseline for comparisons. The running times of Best-Fit, REVEAL and BIBN were significantly larger than that of MIBNI, whereas those of RelNet, CST, and CLR were similar to it.

More »

Expand

Fig 8.

Inference performance of MIBNI with two real biological network datasets.

The green, red, and blue interactions denote true positive, false positive, and false negative predictions, respectively. (a) Inference result of an E. coli gene regulatory network consisting of 10 genes and 11 interactions. The maximum time step of the gene expression data was 21. The predicted result shows 7 true positives, 4 false positives, and 4 false negatives. The structural and dynamics accuracies were 0.9200 and 0.9700, respectively. (b) Inference result of a fission yeast cell cycle network consisting of 10 genes and 23 interactions. The maximum time step of the gene expression data was 10. The predicted result shows 14 true positives, 3 false positives, and 9 false negatives. The structural and dynamics accuracies were 0.8800 and 0.9800, respectively.

More »

Expand