An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae
Performance of the hypergeometric probabilistic score is shown for gene functional associations inferred from (A) protein-protein physical interactions measured by the high-throughput yeast two hybrid (Y2H) screen of Ito et al. , (B) affinity-purified complexes identified by mass spectrometry by Gavin et al. , and (C) genetic interactions , . Performance with the probability score is measured cumulatively for each successive bin of 200 interactions (A–C, red filled triangles), ranked by probability score. Recall and precision are calculated using the reference linkages derived from Gene Ontology “biological process” annotation masking the term “protein biosynthesis”. The Y2H core model described in  (A, filled circle) is more precise than the complete data set (A, open circle), but with reduced recall. Similarly, two different ways of inferring binary linkages from mass spectrometry-derived protein complexes —the spoke (B, filled circle) and matrix models (B, open circle)—show differing trade-offs between precision and recall. The set of binary genetic interactions (C, open circle) shows very low precision for functional inferences, although the false positive rate of genetic interactions is generally perceived to be low; in contrast, the hypergeometric probability identifies a functionally informative subset of linkages. In general, the hypergeometric probability scores provide an excellent ranking of interactions in each of the data sets consistent with the linkages' functional informativeness.