Figure 1.
We have developed four gap filling workflows and used them to generate the results in this paper: targeted parsimony-based gap filling, targeted likelihood-based gap filling, iterative parsimony-based gap filling, and iterative likelihood-based gap filling. The individual steps are described in detail in the methods, and the technical details of running them using the web interface are described in the supplementary material. Green boxes represent inputs to the workflows. “Limit” is the user-defined time limit and tmax is a system-defined maximum possible time limit for gap filling (currently one day) to prevent overloading the compute servers.
Figure 2.
We computed the likelihood of all possible gene-reaction pairings from the ModelSEED database and compared the likelihoods of those pairings present in the iJR904 E. coli and iBSU1103 B. subtilis models (‘true positives’) to those which were not (‘false positives’). Each point in the curve represents the percentage of true and false positive linkages remaining at different likelihood cutoffs (labeled on each point). We found that there was a significant enrichment of true positives at high likelihood levels and false positives at low likelihood levels.
Figure 3.
Proof of principle: Gap filling highly-likely reactions in B. subtilis.
B. subtilis synthesizes lipids via the non-mevalonate pathway (blue) [37]. We removed this pathway from the B. subtilis genome-scale model and then tried to fill the gap using both the likelihood and parsimony-based approaches. The parsimony-based gap filling approach instead filled the gap with the mevalonate pathway (red), which is shorter but not supported by genetic evidence. The likelihood-based approach filled the gap with the correct pathway. Black indicates reactions that were not knocked out (there was no explicit link to literature evidence in the B. subtilis model). The numeric labels are the computed likelihoods of gap filling reactions.
Figure 4.
Genes added to the model using likelihood-based and parsimony-based gap filling.
Likelihood-based gap filling produced more new gene annotations than post-processing gap filled reactions generated using the parsimony-based approach. The plot shows the number of uniquely-added genes by likelihood-based and parsimony-based gap filling approaches (genes in common with both approaches are omitted for clarity but tended to be more than those unique to either approach). A) Number of genes added after targeted gap filling to activate biomass production. B) Number of genes added after iterative gap filling.
Figure 5.
Likelihoods of gene-reaction associations added using likelihood-based and parsimony-based gap filling.
The average likelihood of links between genes and reactions that were added using likelihood-based gap filling tended to be greater than the average likelihood of links resulting from post-processing the parsimony-based gap filling result. Note that it was not greater for all models (e.g., Pseudomonas aeruginosa) because the likelihood-based gap filling approach maximizes likelihood of reactions, not annotations, and as a result picks fewer reactions with 0 likelihood (no predicted gene associations). A) Targeted gap filling result. B) Iterative gap filling result.
Table 1.
Average phenotype consistency across all test organisms for models gap filled using the four evaluated algorithms.
Figure 6.
Knockout lethality accuracy for genes added in gap filling.
Gene knockout simulations were performed for models gap filled with each of the four workflows to assess the consistency between lethality prediction and knockout lethality data for genes added in gap filling. Likelihood-based gap filling was able to produce the most candidate gene associations, with high specificity and low sensitivity in lethality predictions. The difference in accuracy between likelihood-based and parsimony-based gap filling was not statistically significant. A) Number of positive growth predictions, B) Number of negative growth predictions.