NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

doi:10.1371/journal.pone.0092709

Table 1.

Characteristics of the different datasets used for evaluation.

More »

Expand

Figure 1.

Overview of the EFS approach to the network inference task.

The problem is split into independent regression subproblems for each gene in the network. Next feature importance (FI) scores are calculated in each subproblem for all possible regulatory genes with respect to the target gene using an ensemble feature selection (EFS) method. These FI scores are then assigned as the weight of an edge in the network from the regulatory gene to the target gene. Finally, all weights are aggregated across the subproblems, creating a global confidence ranking of edges. We cast any feature selection (FS) method which can provide a ranking into an EFS method by taking random samples of varying size of both the experiments and the possible predictor genes and assigning a score of 1 to the top features in the ranking.

More »

Expand

Table 2.

Performance comparison of several algorithms on the DREAM4 in silico multifactorial dataset.

More »

Expand

Table 3.

Performance comparison of several algorithms on the DREAM5 dataset.

More »

Expand

Figure 2.

Boxplots of AUROC and AUPR scores on the three artificially created datasets.

Shown in blue is the performance of three individual algorithms: GENIE3 and the ensemble versions of support vector regression and the elastic net (E-SVR, E-EL). Indicated in green the results after rankwise merging the individual methods and in yellow the performance of TIGRESS.Indicated in the figure are the results of Mann-Withney U-tests between GENIE and GENIE3+E-SVR (sample size 20 for GNW-100 and SYN 100, sample size 15 for GNW-200) showing that the AUROC scores are significantly improved.

More »

Expand

Table 4.

Influence of the subsampling scheme parameter Z on the E-SVR AUROC score using the DREAM4 dataset.

More »

Expand

Table 5.

Influence of the subsampling scheme parameters and on the E-SVR AUROC score using the DREAM4 dataset.

More »

Expand

Table 6.

Influence of the subsampling scheme parameters and on the E-SVR AUROC score using the DREAM4 dataset.

More »

Expand

Figure 3.

Boxplots of AUROC scores over ten runs with respect to the amount of iterations.

The boxplots show the AUROC score over ten runs of the E-SVR algorithm on the first network of the DREAM4 dataset. The variance decreases as the amount of subsamples is increased, reaching a stable result at about 1500 iterations.

More »

Expand

Figure 4.

Node degree distribution of four network predictions selected across the different datasets.

Networks predictions were interpreted in an undirected setting. The networks were created from the rankings by imposing a cut-off value close to the amount of true links in the corresponding gold network. Although the figure indicates that the node degree distribution can vary for the different algorithm predictions, there is no consistent pattern across the expression sets.

More »

Expand

Figure 5.

Comparison of the given rank at the edge level of two algorithm predictions.

In the figure on the left we plot the rank of the top most confident links of the GENIE3 prediction versus the rank which these edges received in the E-SVM prediction. True positive links are indicated as green squares. Although the AUROC and AUPR scores of both methods are almost identical for this network, several top predicted edges by GENIE3, including true positives, appear much further down the ranking of E-SVM and vice versa.

More »

Expand

Figure 6.

Boxplots showing the ability to predict the correct directionality of a true positive link.

For all predictions we counted the amount of times a gold standard link was ranked before the opposite link , proportional to the total amount of links in the gold network. We performed this analysis for all networks in the DREAM4, GNW-100, SYNTREN-100 and GNW-200 datasets. The boxplots show the results for all algorithms. The GENIE3+E-SVR is significantly better at predicting the correct direction compared to GENIE3 (Wilcoxon rank sum test with continuity correction: GENIE3 and GENIE3+E-SVR, sample size 60, p-value = ).

More »

Expand

Table 7.

Comparison of indicative running times of E-SVR, E-EL and GENIE3.

More »

Expand