Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction
The (A) co-expression network and the (B) extended protein-protein interaction network are compared with the (C) ligand derived network for their ability to characterize gene function (defined in the Gene Ontology, GO). We assessed performance through cross-validation (area under the ROC curve, AUROC) of a neighbor-voting algorithm. Each curve represents the distribution of AUROCs across 790 GO terms. The dark grey shows the scores in cross-validation in each network, the black curves are the AUROCs after permuting the network nodes, while the light gray curves are the scores using the node degree as a generic predictor across all functional categories. The PPI network has the highest performance (B, dark grey, AUROC = 0.68) but this reflects node degree bias (light grey line, AUROC = 0.6). Co-expression has less bias (A, light grey line, AUROC = 0.52), but performs less well (dark grey line, AUROC = 0.62). The ligand network performs almost as well (C, dark grey line, AUROC = 0.67) as the extended PPI network with little node degree bias (light grey line, AUROC = 0.52). The random permutation of each network (black), have AUROCs between 0.48 and 0.5.