I have read the journal's policy and the authors of this manuscript have the following competing interests: All authors were employed by Novartis when the work was completed, and some have equity interest in Novartis. AH is currently employed by Pfizer.
Current address: Rare Disease Research Unit, Pfizer, Cambridge, Massachusetts, United States of America
Computational approaches have shown promise in contextualizing genes of interest with known molecular interactions. In this work, we evaluate seventeen previously published algorithms based on characteristics of their output and their performance in three tasks: cross validation, prediction of drug targets, and behavior with random input. Our work highlights strengths and weaknesses of each algorithm and results in a recommendation of algorithms best suited for performing different tasks.
In our labs, we aimed to use network algorithms to contextualize hits from functional genomics screens and gene expression studies. In order to understand how to apply these algorithms to our data, we characterized seventeen previously published algorithms based on characteristics of their output and their performance in three tasks: cross validation, prediction of drug targets, and behavior with random input.
This is a
In 2000, Schwikowski et al. demonstrated the utility of the guilt by association principle to assign function of yeast genes by examining the function of neighboring genes in a proteinprotein interaction [
In our labs, we aimed to use these algorithms to contextualize hits from functional genomics screens. The hits from a functional genomic screen represent a list of genes that affect a given cellular phenotype (eg. survival [
While many of these network contextualization algorithms have been developed in academia in the context of specific biological questions [
This work evaluates the ability of seventeen algorithms to use a proteinprotein interaction (PPI) network to contextualize and extend a list of genes of interest.
The output of algorithms differed depending on algorithm class, with subnetwork ID algorithms returning highly connected subnetworks; node prioritization algorithms returning ranked lists of genes; and causal regulator algorithms returning ranked lists of hypotheses corresponding to a positive or negative effect of a given gene on the observed data. In the case of node prioritization and causal regulator algorithms, we considered the “output nodes” as the top ranked nodes using a rank cutoff equal to the number of input start nodes for each data set. Also, we note that subnetworks could be constructed from the interactions among the most highly ranked genes in the output lists. For illustration purposes for this figure, we have used the list of top 100 hits (based on pvalue) from a CRISPR survival screen in the KBM7 cell line [
In this work, we considered seventeen algorithms (
Algorithm  Category  Network Requirment  Brief Description  Reference 

Random Walk  Node Prioritization  Models path of a random walker starting from nodes of interest and walking to other nodes based on edges in the network  [ 

Network Propagation  Node Prioritization  Random walk based approach controlled for degree of nodes  [ 

ToppNet KM  Node Prioritization  Directed  Random walkbased method with limited number of steps  [ 
ToppNet HITS  Node Prioritization  Directed  Random walkbased method that also takes into account hubness and authority of nodes  [ 
Overconnectivity  Node Prioritization  Enrichment of start nodes and gene sets consisting of each network nodes’ neighbors  N/A  
Interconnectivity  Node Prioritization  Enrichment based method that identifies nodes between other nodes  [ 

Hidden Nodes  Node Prioritization  Enrichment based method that uses shortest paths to identify nodes between other nodes  [ 

GeneMania  Node Prioritization  Ranks nodes by topological closeness to start nodes in an integrated network  [ 

Guilt By Association  Node Prioritization  Fraction of neighbor nodes that appear in the start node list  [ 

Neighborhood Scoring  Node Prioritization  Guiltbyassociation based approach with optional weighting for start nodes  [ 

Causal Reasoning  Causal Regulator  Signed and Directed  Processes network and calculates directional consistency and overconnectivity with start nodes  [ 
SigNet  Causal Regulator  Signed and Directed  Processes network and calculates several metrics to infer relationship with start nodes  [ 
DIAMOnD  Subnetwork ID  Evaluates overconnectivity enrichment iteratively until it reaches a userdefined number of nodes  [ 

Pathway Inference  Subnetwork ID  Heuristic methods that identifies subnetworks enriched in start nodes  [ 

Active Modules  Subnetwork ID  Memetic algorithm with addition of encoding/decoding scheme and local search operator  [ 

CASNet  Subnetwork ID  Signed  Considers edge sign to determine relevance to provided start nodes  [ 
HotNet1  Subnetwork ID  Diffusion based method accounting for FDR  [ 

HotNet2  Subnetwork ID  Directed  Extension of HotNet1 approach than incorporates insulated diffusion and edge direction  [ 
Start Node Links  Subnetwork ID  Directly extracts connections between start nodes  N/A 
When considering these algorithms, we noted they could be divided into three main categories: (1) node prioritization algorithms that prioritize network nodes that are near input nodes, where the definition of "near" varies depending on the specific algorithm, (2) causal regulator algorithms that prioritize network nodes that regulate input start nodes based on their network connectivity, and (3) subnet identification (ID) algorithms that identify regions of the network that connect input nodes and include additional nodes for their connection if warranted. In the case of subnetwork identification algorithms, we wanted to be able to compare to the simplest case of network connections between nodes. Thus, we include output from an algorithm called “Start Node Links”, which connects input start nodes to each out.
We applied the algorithms to hundreds of datasets from four sources, aiming to test the algorithms on a large selection of data sets of different types and confidences. Initial characterization was performed using three types of data meant to capture phenotype or diseaserelevant pathways: (1) KEGG and REACTOME pathway genesets provide highconfidence, well characterized data sets; (2) DisGeNET provides data sets describing curated diseasegene associations [
To determine which algorithms extended the list of interesting genes beyond the input list provided, we first sought to determine the proportion of output nodes that were contained in the input start nodes (
We also sought to understand which algorithms had a tendency to include high degree nodes in the output (i.e. “hub nodes”). Hub nodes are those with many edges (or connections) to other nodes. Across all algorithms, several returned extremely highdegree outputs: DIAMOnD, Interconnectivity, and Overconnectivity (
In assessment of performance, we performed 10 repeats of 10fold crossvalidation to determine how well the algorithms were able to recover nodes randomly excluded from the input lists. The excluded nodes were true positives in that they were related to the remaining input nodes on the basis of their membership in the original list. Thus, this test determined the ability of the algorithms to identify nodes biologically related to the input list. To summarize the results from cross validation, the area under the receiver operator curve (AUROC) is often evaluated. This metric assumes a perfect gold standard and takes into account both the true positives with the sensitivity metric and false positives with the specificity metric. However, we noted that our input lists were not perfect gold standards in that some nodes returned by the algorithms might appear to be false positives but actually be biologically related to the input list (i.e. nodes designated as false positives by the specificity calculation might actually be false negatives in the original input list). Thus, we also computed the fraction of excluded nodes that were recovered in the top 200 nodes returned by each algorithm (i.e. the fraction recovered). This metric does not take into account false positives and instead asks the question relevant to our intended use of the algorithm: if we were to follow up on the top 200 nodes returned by the algorithms, would nodes known to be biologically relevant to the initial input list be recovered? It is equivalent to the true positive rate (i.e. sensitivity) computed when the top 200 nodes returned by the algorithm are considered the output of the algorithm.
We calculated the AUROC and fraction recovered for each data sets tested. To summarize across individual data sets, we noted that variability in the metrics across datasets made it difficult to determine which were performing better than others (
In order to determine whether certain nodes, particularly hub nodes, would be highly ranked by a given algorithm regardless of the input list, we ran the algorithms on 10,000 randomly selected input start node lists. We then compiled the output and calculated the fraction of times that each node appeared in most highly ranked nodes. For most algorithms, a few hundred nodes were ranked in the top 200 nodes in more than 5% of randomly generated list (
Algorithm  Number of nodes highly ranked in 50% of random input tests  Number of nodes highly ranked in 5% of random input tests 

Causal Reasoning (Pollard Rank)  64  1129 
InterConnectivity  44  1042 
Hidden Nodes  0  559 
SigNet  200  375 
Network Propagation  0  309 
ToppNet–HITs  239  289 
Random Walk  4  200 
Guilt by Association  0  119 
ToppNet–KM  0  56 
Causal Reasoning (Enrichment Rank)  0  0 
Overconnectivity  0  0 
Neighborhood Scoring  0  0 
GeneMania  0  0 
Because causal regulator algorithms were developed to identify upstream regulators of differentially expressed genes, we tested their ability to accomplish this goal using the Connectivity Map [
Performance was characterized by the ability of the algorithms to highly rank known targets of drugs. (A, top left) Fraction of datasets for which the algorithm appeared in the top five when ranked by fraction of drug targets recovered (B, top right) Fraction of datasets for which the algorithm appeared in the top five when ranked by AUROC.
Taken together, our results clearly demonstrate the strengths and weaknesses of several algorithms (
“Tunable” indicates that the algorithm contains an tunable parameter directly related to the evaluated aspect. Bold italics are used to indicate algorithms that perform well for the indicated metric with flanking asterisks distinguishing the top performers.
Algorithm  Highly ranks start nodes  Output Degree  Highly ranks nodes with random inputs (number of nodes in 50%/5% of test cases)  Number of datatypes for which algorithm is top for gene list extension (AUROC, FR)  Number of networks for which algorithm is top for target prediction task (AUROC, FR) 

Network Propagation  tunable  
Random Walk  Y, tunable  
GeneMania  Y  
Interconnectivity  High  44, 1042  
ToppNet–HITS  Y, tunable  239, 289  
Overconnectivity  High  
DIAMOnD  tunable  n/a  
ToppNet–KM  tunable  Low  0, 0  
Hidden Nodes  
Guilt By Association  Low  0, 0  n/a, 0  
Neighborhood Scoring  Y, tunable  Low  0, 0  
Pathway Inference  Y, tunable  n/a  n/a, 0  n/a, 0  
Active Modules  Y, tunable  tunable  n/a  n/a, 0  n/a, 0 
CASNet  Y  n/a  n/a, 0  n/a, 0  
HotNet1  Y, tunable  n/a  n/a, 0  n/a, 0  
HotNet2  Y, tunable  n/a  n/a, 0  n/a, 0  
Start Node Links  Y  n/a  n/a, 0  n/a, 0  
Causal Reasoning  Low  64, 1129 (Pollard)  0, 0  n/a, 0  
SigNet  High  200, 375  0, 0 
In this work, we have characterized the algorithms’ performance using a wide range of data sources in order to understand the broad behavior of the algorithms. However, it is possible that a specific dataset of interest will require a different algorithm than that recommended by these results. For this work, we limited ourselves to algorithms implemented as part of the CBDD collaboration, since the consistent interface resulting from this effort facilitated well our benchmarking study. However, we note that many additional network algorithms are have been developed in the literature (eg. [
The majority of these results were obtained using a large network containing PPIs from multiple sources. However, we note that we have run these same characterizations with multiple networks [
Finally, we did not explore individual algorithm parameters, instead relying on author recommendations. However, we note in
For each algorithm, parameters were chosen to moderate the behavior of the algorithms (
In the KEGG and Reactome data sets, all sets with 20 or more nodes were included, yielding 165 sets from KEGG and 307 from Reactome. We also used curated genedisease associations from DisGeNet [
To test the algorithms using real experimental data, 43 pooled CRISPR screens from Novartis were used as an example set of experimental data with relatively low noise. For CRISPR experiments, cells were transfected with a GFPtagged target protein of interest and Cas9, then exposed to a pooled library of sgRNA. Cells were FACSsorted into high and lowGFP populations, and sgRNA count was used to calculate fold changes and RSA pvalues for each targeted gene [
The causal regulator algorithms were originally developed to identify proteins upstream of observed gene expression changes. Since this approach was not specifically relevant to the pathway and screening data described above, we also used data from the Connectivity Map [
Three different network sources were used for this work: (1) The “Composite network” consisting of highconfidence, PPI or transcription factorgene interactions from the Metabase manually curated network, STRING [
For the purposes of these calculations, “output nodes” were considered to be the top
Ten repeats of 10fold crossvalidation were performed for each data set to calculate the area under the ROC curve (AUC). Each data set was divided into tenths, with one tenth left out each time; then that process was repeated ten times for a total of 100 lists each with 90% of the original input list. Sensitivity and specificity were found using the omitted 10% of nodes as "true" nodes to be found by the algorithms. We also as examined Fraction Recovered as the fraction of left out nodes recovered in the top nodes (top 200 nodes for node prioritization or any node present in a subnet for subnet id algorithms). When omitted input nodes were not included in the network, they were excluded from the list of "true" nodes, as the use of that network prevented them from being included in the output regardless of the algorithm used.
For connectivity map data, sensitivity, specificity, and fraction recovered were calculated based on ranking of known drug targets in algorithm outputs where known drug targets were determined as described previously [
To determine whether nodes were highly ranked based on the network properties only (irrespective of the input list) we generated lists of randomly selected input nodes. Fold changes were chosen from a random distribution with mean 0 and standard deviation 1, with corresponding pvalues. Fold change and pvalue pairs were randomly assigned to all possible nodes, and the nodes with highest fold change were used as the input list. We generated 10,000 random gene lists each of length 200 and ran the algorithms on these input lists. We were thus able to determine, for each node and algorithm, the frequency each node was ranked higher than a chosen output rank.
Causal regulator algorithms consider each node in two directions–positive (black points) and negate (red points).
(EPS)
Average fraction of start nodes in the output (A) and median degree (B) characterization of each algorithm. Crossvalidation performance of algorithms as indicated by the fraction of datasets for which the algorithm appeared in the top five when ranked by AUROC (C) or Fraction recovered (D) from the CRISPR screen hits, Genetic Association, and KEGG/REACTOME datasets using HumanNet as the network. Note: Because HumanNet contains no signed or directed edges, the causal regulator algorithms were not examined in this analysis.
(EPS)
(DOCX)
(CSV)
(CSV)
We wish to thank Alexander Ishkin and the team at Clarivate Analytics for their excellent implementation of the CBDD software. We also thank Douglas Lauffenburger for his guidance and support.