Benchmarking network propagation methods for disease gene identification

doi:10.1371/journal.pcbi.1007276

Fig 1.

Benchmark overview.

This work describes six performance metrics using two input streams (genetic association and drug-based genes) to predict drug target-based genes for 22 common diseases. 3-fold cross-validation (CV), repeated 25 times, was run under three CV strategies. The gene identifiers in each fold are determined using only the drugs data, regardless of the input. Two validation strategies are complex-aware and therefore needed this data to define the splits. 15 methods based on network propagation (including 4 baselines) were evaluated, using two networks with different properties, by modelling their performance -averaged on every CV round- with explanatory models. After obtaining the performance metrics, the explanatory models allowed hypothesis testing and a direct performance comparison between diseases, CV strategies, networks and methods, by setting them as the independent variables of the models. The latter is depicted by pink (independent variables) and yellow (dependent variable) blocks, and should not be confused with the “model fitting” block, which refers to the network propagation prioritisers.

More »

Expand

Fig 2.

Additive explanatory models for AUROC and top 20 hits.

Each column corresponds to a different model, whereas each row depicts the 95% confidence interval for each model coefficient. Rows are grouped by the categorical variable they belong to: method, cv scheme, network and disease. Each variable has a reference level, implicit in the intercept and specified in brackets: pr method, classic validation scheme, STRING network and allergy. Positive estimates improve performance over the reference levels, whereas negative ones reduce it. For example, the data suggest that method rf performs better than the baseline using both metrics, and is the preferred method using the top 20 hits. Switching from STRING to the OmniPath network, or from classic to block or representative cross-validation, has a negative effect on both performance metrics. Specific model estimates and confidence intervals can be found in Tables H and I in S1 Appendix.

More »

Expand

Fig 3.

Performance predicted for AUROC and top 20 hits through the additive explanatory models.

Each row corresponds to a different model and error bars depicts the 95% confidence interval of the additive model prediction, averaging over diseases. In bold, the main network (STRING) and metric (top 20 hits). The exact values can be found in Table I in S1 Appendix.

More »

Expand

Fig 4.

Pairwise contrasts on top 20 hits predicted by the main quasipoisson explanatory model.

Differences are expressed in the model space. Most of the pairwise differences are significant (Tukey’s test, p <0.05) – non-significant differences have been crossed out.

More »

Expand

Fig 5.

Ranking of all the methods.

Ranking according to the predictions of the main explanatory models (left) and the reduced explanatory models within the STRING network and block cross-validation (right), in both cases on the drugs input and averaging over diseases. The main models serve as a global description of the metrics, whereas the reduced models are specific to the scenario of most interest. A column-wise z-score on the predicted mean is depicted, in order to illustrate the magnitude of the difference. Note how the top 20 hits and the AUPRC metrics lead to similar conclusions, as opposed to AUROC.

More »

Expand

Fig 6.

Multi-view MDS plot displaying the preserved Spearman’s footrule distances between methods.

The differential ranking of their top 100 novel predictions using known drug target inputs are taken into account across all 22 diseases. Results are shown separately for the 2 networks considered in this study. Seed genes are excluded from the distance calculations.

More »

Expand

Fig 7.

Disease performance in terms of input size and modularity.

Disease performance ranked by the number of known target genes and their modularity (obtained using the igraph package, see Figure F in S1 Appendix). Modularity is a measure of the tendency of known target genes to form modules or clusters in the network. Diseases have been ranked using their explanatory model coefficient from the top 20 hits metric with known drug targets as input (x axis) and their modularity (y axis). As discussed in the text, best predicted diseases tend to have longer gene lists and be highly modular.

More »

Expand

Table 1.

List of methods included in this benchmark.

More »

Expand

Table 2.

List of diseases included in this study.

More »

Expand

Fig 8.

Input gene scores.

Two input types were used to feed the prioritisation algorithms: the binary drug scores in panel (A) and the binary genetic scores in panel (B). In both cases, the validation genes were deemed unlabelled in the input to the prioritisers. Cross-validation folds were always calculated taking into account the drugs input and reused on the genetic input.

More »

Expand

Fig 9.

Cross-validation schemes.

Three cross-validation schemes were tested. (A): standard k-fold stratified cross-validation that ignored the complex structure. (B): block k-fold cross-validation. Overlapping complexes were merged and the resulting complexes were shuffled. The folds were computed as evenly as possible without breaking any complex. (C): representative k-fold cross-validation. Overlapping complexes were merged and the resulting complexes from which unique representatives were chosen uniformly at random. Then a standard k-fold cross-validation was run on the representatives, but excluding the non-representatives from train and validation.

More »

Expand