Reconstruction and Validation of RefRec: A Global Model for the Yeast Molecular Interaction Network

doi:10.1371/journal.pone.0010662

Figure 1.

Overview to the presented analysis.

The RefRec reconstruction was integrated from selected databases. The reconstruction was converted to a set of model alternatives which were used to assess the importance of different molecular –omes for accurate phenotype prediction (from top to down: RefRec; RefRec with KEGG replaced by iND750; RefRec without a metabolic network; RefRec without a protein-protein interaction network; RefRec without a metabolic network and a protein-protein interaction network). All the model alternatives were analyzed using a single analysis workflow that first estimates gene knockout damages in mutant strains and then trains a computational classifier to predict the mutant viability.

More »

Expand

Table 1.

Origin and number of the molecular species and interactions in the reconstruction.

More »

Expand

Figure 2.

Structural details of RefRec.

The entire reconstruction is visualized in top right where different types of molecular species are grouped to layers (from top to down: genome, transcriptome, proteome, protein complexes, and metabolome) and interactions are depicted by the connecting edges. Structural details are shown from each of the layers as follows: Ovals represent interactions and the other nodes represent molecular species of different types. The Ensembl database provides source information for protein synthesis pathways. Two genes (labeled as G[•]) are transcribed to transcripts (Tr[•]) and further translated to proteins (P[•]). The KEGG database provides knowledge for the metabolic network including metabolites / compounds (C[•]) and metabolic reactions. The enzymatic activity of proteins is described by the Swiss-Prot ENZYME database that is used to connect the proteins to the metabolic reactions. The protein-protein interaction network including the protein complex assembly interactions and the protein complexes / macromolecular complexes (M[•]) is based on the IntAct database. The IntAct database does not provide information about enzymatic activity for protein complexes, and therefore protein complexes do not catalyze any interaction in the reconstruction. A dashed arrow presents the control of an interaction by a molecular species. A solid arrow presents material flow.

More »

Expand

Figure 3.

Size distribution of the damage estimates for single gene knockouts.

The damages are estimated by Boolean analysis of the RefRec reconstruction.

More »

Expand

Table 2.

Statistics about damage estimates for single gene knockouts.

More »

Expand

Figure 4.

Essentiality scores for molecular species and interactions.

The essentiality score indicates the relative frequency of the experimentally observed inviable phenotype under the cases where the examined molecular species or interaction was blocked. The objects are sorted in the descending order of the essentiality score. In panel A, the graphs indicated in the legend are merged because of negligible differences between them. In panel B, 1,005 metabolic KEGG reactions having the score value zero are not shown.

More »

Expand

Table 3.

Experimental yeast mutant data used in the analysis.

More »

Expand

Figure 5.

Gene Ontology enrichment analysis for the unconditionally essential genes.

The figure presents a representative partial list of the significantly enriched GO Biological Process categories associated with the unconditionally essential genes. Note the wide scope of the term metabolism in Gene Ontology where metabolic processes are associated with all types of molecular species.

More »

Expand

Figure 6.

Prediction performance of the inviable phenotype affected by the number of training samples.

In each case, 1/6 of the training samples are inviable. The average in the repeated analysis is represented using the central mark, and the whiskers represent the standard deviation excluding outliers.

More »

Expand

Figure 7.

Prediction performance of the inviable phenotype affected by the fraction of inviable training samples.

The total number of training samples is fixed to 7,000, and the test varies the number of inviable training samples within them. The average in the repeated analysis is represented using the central mark, and the whiskers represent the standard deviation excluding outliers.

More »

Expand

Figure 8.

Prediction performance of the inviable phenotype affected by the available molecular -omes.

RefRec with iND750 refers to the modified RefRec where the metabolic damage is estimated using the iND750 reconstruction instead of the KEGG reconstruction. The average in the repeated analysis is represented using the central mark, and the whiskers represent the standard deviation excluding outliers.

More »

Expand

Figure 9.

Growth phenotype prediction performance in relation to damage involvement in various biological processes.

The selected Gene Ontology categories describe biological processes working with multiple molecular -omes and covering a large number of genes (given in parentheses). For the both reconstructions, bar lengths represent the number of single gene knockout predictions in proportion to the total number of genes in a category. The predictions with RefRec are produced using the unconditional damage estimates (this study) while the predictions with iND750 are produced using cellular growth estimates under seven cultivation conditions [16]. Note the more restricted scope of the term metabolism in RefRec and iND750 in comparison to Gene Ontology that associates metabolic processes with all types of molecular species.

More »

Expand

Table 4.

Number of unique molecular species and interactions in the iND750 metabolic reconstruction.

More »

Expand