Fig 1.
Overview of network algorithm benchmarking workflow: All algorithms considered in this work required a set of identified genes of relevant to a disease, pathway, or treatment (i.e. “start nodes”) as inputs while some also required fold changes and/or p-values.
The output of algorithms differed depending on algorithm class, with subnetwork ID algorithms returning highly connected subnetworks; node prioritization algorithms returning ranked lists of genes; and causal regulator algorithms returning ranked lists of hypotheses corresponding to a positive or negative effect of a given gene on the observed data. In the case of node prioritization and causal regulator algorithms, we considered the “output nodes” as the top ranked nodes using a rank cutoff equal to the number of input start nodes for each data set. Also, we note that subnetworks could be constructed from the interactions among the most highly ranked genes in the output lists. For illustration purposes for this figure, we have used the list of top 100 hits (based on p-value) from a CRISPR survival screen in the KBM7 cell line [7]. Each output network contains genes that were included in the input start node list (blue) as well as genes that were identified by the algorithms (pink).
Table 1.
Algorithms evaluated.
Fig 2.
Characterizing algorithms using average fraction of start nodes in the output to indicate tendency to return start nodes in output (A, top left) and degree to indicate tendency to return nodes with many edges (B, top right). Cross-validation performance of algorithms as indicated by the fraction of datasets for which the algorithm appeared in the top five when ranked by AUROC (C, bottom left) or Fraction recovered (D, bottom right). For the fraction recovered analysis, the top nodes were defined as the 200 top-ranked nodes for node prioritization and causal regulator algorithms or any node present in a subnetwork for subnetwork ID algorithms.
Table 2.
Number of nodes ranked in top 200 when algorithms were run with 200 randomly chosen nodes as input start nodes.
Fig 3.
Connectivity Map target prediction in the composite network or metabase signed+directed.
Performance was characterized by the ability of the algorithms to highly rank known targets of drugs. (A, top left) Fraction of datasets for which the algorithm appeared in the top five when ranked by fraction of drug targets recovered (B, top right) Fraction of datasets for which the algorithm appeared in the top five when ranked by AUROC.
Table 3.
Summary of Algorithm Characteristics and Performance.
“Tunable” indicates that the algorithm contains an tunable parameter directly related to the evaluated aspect. Bold italics are used to indicate algorithms that perform well for the indicated metric with flanking asterisks distinguishing the top performers.