Table 1.
Number of KEGG orthologs (KO) predicted for P. carinii and S. pombe in 56 pathways that correspond to basal metabolism and cellular processesa.
Table 2.
Number of enzymes dedicated to the biosyntheses of amino acids identified in P. carinii and S. pombea.
Table 3.
Some features of free-living microorganisms and obligate parasites.
Table 4.
Proteomes investigated for transfer the KEGG annotations of the P. carinii predicted proteome.
Figure 1.
Principle of the numerical experience used to optimize the precision and recall of the annotation predictions.
The S. pombe proteome (right box) was blasted against an intermediary set of fungal proteins, i.e. the proteome of S. cerevisiae in this example (middle box), and only the highest scoring blast matches were retained. By utilizing the S. cerevisiae mapping to the KEGG Orthologs (between the middle and left boxes), one can produce a mapping through S. cerevisiae of the S. pombe proteins to the KEGG Orthologs. The latter mapping can then be compared with the one that is actually provided by KEGG to compute precision and recall values. The experience was systematically repeated using different proteomes as intermediary data sets (or several proteomes at once), to eventually determine the optimal one.
Figure 2.
Estimation of the quality of the mapping onto KEGG maps by performing a re-prediction of the annotation of S. pombe proteome through intermediary data set consisting of one, two, three, or 18 fungal proteomes.
The KO - S. pombe association pairs obtained by “blasting” an intermediary data set were evaluated a posteriori as true positive (TP) or false positive (FP) according to the KO - S. pombe mapping which is provided by KEGG. Those missed KO - S. pombe pairs existing in KEGG were taken as false negatives (FN). The overall quality of the obtained mapping can be expressed in terms of precision TP/(TP+FP) and recall TP/(TP+FN).