Functional Genome Annotation by Combined Analysis across Microarray Studies of Trypanosoma brucei

Figure 1

Integration of microarray data for identification of functional linkages among genes.

(A) The correlation coefficients between genes were calculated for each T. brucei dataset separately, for the combination of the three datasets, and for a selected subset of the experiments. The probability density function (PDF) of correlation coefficients among functionally associated and non-associated genes is shown by blue and red, respectively. It can be seen that the data from the work by Kabani et al. [9] are poorly correlated with functional linkages. This is while the other two datasets from Queiroz et al. and Jensen et al. [10], [11] can discriminate functionally linked gene pairs based on the higher correlations of their expression profiles. Consequently, the procedure that we used for selection of the best subset of the experiments automatically excluded the data from Kabani et al. [9], while retaining most of the experiments from the other two datasets (the right panel). The enrichment of functional linkages at a given correlation coefficient, shown by the thick black line, was calculated by dividing the values of the two PDFs. (B) Precision (positive predictive value, PPV) vs. ORFeome coverage for prediction of functional linkages based on coexpression is shown in this graph. ORFeome coverage is defined as the fraction of ORFs (open reading frames) with associated expression profiles that are coexpressed with at least one other ORF. By decreasing the threshold for identification of coexpressed pairs, more ORFs are included in the network, but the fraction of coexpression relationships that reflect functional linkages (i.e. precision) decreases. At a precision of 0.75, CoExp1Tbr and CoExp2Tbr include 10.7% and 55.4% of T. brucei ORFeome, respectively. The correlation coefficient cutoff for CoExp1Tbr is 0.94 and for CoExp2Tbr is 0.957. (C) In CoExp1Tbr, functionally related genes cluster together. A global view of CoExp2Tbr is also provided in panel (D). Stage-specific expressions are shown by node colors, with yellow for PF-specific and blue for BF-specific proteins. These two networks are provided in Supplementary Dataset S1 and can also be downloaded at

