An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae
(A) Frequency histograms of the usage of 1,067 Gene Ontology “biological process” annotations, ranked by the number of genes annotated with each term (black bars) and by the number of reference linkages derived using that term (white bars). Functional annotation is highly biased towards genes with the term “protein biosynthesis”. This functional bias becomes more severe in the reference linkages, given the combinatorial increase after linking all genes sharing a given term. As a result, linkages among protein biosynthesis genes compose >27% of total reference linkages. By contrast, the second most frequent term accounts for <5% of total reference linkages. (B) The likelihood of functional association between genes on the basis of the co-expression of their mRNAs across DNA microarray experiments (here, following heat-shock ) is significantly affected by the dominant reference term “protein biosynthesis”. For example, for the 1,000 most strongly co-expressed gene pairs, the likelihood of functional association between co-expressed genes is ∼30 fold higher than random chance (LLS∼3.4) (empty circles), but drops to ∼6 fold (LLS∼1.8) after masking the term “protein biosynthesis” in the reference set (filled circles). Thus, the high likelihood score from the biased reference set cannot be generalized to other functions. The black and red lines indicate sigmoid curve fits to the unbiased and biased reference analyses, respectively.