Traceability, reproducibility and wiki-exploration for “à-la-carte” reconstructions of genome-scale metabolic models

Genome-scale metabolic models have become the tool of choice for the global analysis of microorganism metabolism, and their reconstruction has attained high standards of quality and reliability. Improvements in this area have been accompanied by the development of some major platforms and databases, and an explosion of individual bioinformatics methods. Consequently, many recent models result from “à la carte” pipelines, combining the use of platforms, individual tools and biological expertise to enhance the quality of the reconstruction. Although very useful, introducing heterogeneous tools, that hardly interact with each other, causes loss of traceability and reproducibility in the reconstruction process. This represents a real obstacle, especially when considering less studied species whose metabolic reconstruction can greatly benefit from the comparison to good quality models of related organisms. This work proposes an adaptable workspace, AuReMe, for sustainable reconstructions or improvements of genome-scale metabolic models involving personalized pipelines. At each step, relevant information related to the modifications brought to the model by a method is stored. This ensures that the process is reproducible and documented regardless of the combination of tools used. Additionally, the workspace establishes a way to browse metabolic models and their metadata through the automatic generation of ad-hoc local wikis dedicated to monitoring and facilitating the process of reconstruction. AuReMe supports exploration and semantic query based on RDF databases. We illustrate how this workspace allowed handling, in an integrated way, the metabolic reconstructions of non-model organisms such as an extremophile bacterium or eukaryote algae. Among relevant applications, the latter reconstruction led to putative evolutionary insights of a metabolic pathway.

Wiki-based exploration of metabolic networks: a novel method to explore and monitor GSM reconstructions and their associated metadata. GSM information can be displayed in the wiki according to its origin: orthology, genome annotation, gapfilling or manual curation.
Wikis powered by AuReMe display the two families of metadata described in the article. Biological metadata supported by wikis encompass i) all initial data related to an imported model (e.g. conservation of original stoichiometry of reactions, gene associations, data from the 'note' section of SBML model files, etc.), ii) the reasons that led to a curation of the model, i.e., why a reaction/metabolite was added or deleted (this information is stored when the user completes a form to curate the model), iii) the corresponding identifiers from different databases for most of the reactions and compounds and iv) the traceability of compounds used as seeds (e.g. growth medium compounds) and targets (compounds known to be produced or biomass components) during model simulations (Supp. Fig. A). GSM reconstruction process metadata include i) the source of each reaction and compound (output of a tool/reconstruction step or organism origin of the model for multispecies modeling), ii) the version of the metabolic database used for data standardization, iii) the steps and tools used during the reconstruction process and iv) the manual curation history i.e. all the a posteriori modifications made to the model (Supp . Fig. A).
Wiki pages related to genes, reactions, metabolites or pathways contain both static and linked information. Names, synonyms, formulas, etc., are displayed, in accordance with the data of the original reference database (MetaCyc, BiGG) used to reconstruct the GSM. Links to the latter or other databases between two entities to enrich the relation (relation entities). The external metadata entities allow compounds, reactions and pathways to link to external databases such as MetaCyc, BiGG, UniProt and KEGG. The internal metadada entities allow reactions, gene, pathways and metabolic networks to link to information relative either to reconstruction pipeline information or to metabolic network characteristics. Supplementary Table A. Impact of the various pipeline steps on the functionality of the built GSM. Network column describes the methodology used at each step of the pipeline. The functionality of each intermediary network was test according to the production of biomass (FBA growth rate), and more generally to the production of a predetermined family of compounds of interest (targets), which contain all the biomass components, with respect to each species.

Reaction -Pathway relation
Pathway

Is included in
Gene -Reaction association Gene is linked to

AuReMe environment user interface and customizability
For all reconstructed networks, The GSM reconstruction workflow was described in a configuration file (called Makefile), which handled the reconstruction process by running simple commands such as: make orthology-based, make annotation-based, make draft, and make gapfilling. The last two commands run the first two provided they had not been run yet and the corresponding data are available.
The configuration file could be personalized in order to select the tools used for each step of the reconstruction workflow. The network analysis was handled through the commands make curation, make menecheck, and make fbacheck. The creation of the wiki pages was handled by make wikipage.

Local and webserver wiki creation
The creation of the wiki for E. siliculosus (http://gem-aureme.irisa.fr/ectogem) and T. lutea (http://gemaureme.irisa.fr/tisogem) GSMs was handled in two steps. First, the commands make wikipages in the AuReMe workspace launched the creation of the wiki pages for genes, metabolites, pathways and reactions in a local repository of the workspace. Second, the commands make build and make send-allpages launched the creation of a preconfigured Docker container hosting the wiki infrastructure which could be locally accessed through a web-interface. This local wiki was used to perform the manual exploration and curation of the metabolic reconstructions. Once the networks were curated, the command make web-send-pages uploaded the wiki pages to the webserver on which the Mediawiki technology had been previously installed.

Turning metabolic network information into a RDF triplestore
The script padmet-to-tsv from the module connection of padmet-utils was used to export relations between the entities of the T. lutea model that we obtain and the MetaCyc database in padmet format as TSV files. Based on the RDF graph, shown in Supp. Fig. B, these files were transcribed into RDF triple which were stored into a SPARQL endpoint freely accessible at http://bit.ly/tisoSparql. This representation performed various complicated and precise queries and also made the junction to other databases such as MetaCyc, BiGG, KEGG and UniProt. Based on the latter, SPARQL requests were generated to exhibit pathways which contain exclusive reactions from different sources.

The PADMet library and PADMet-utils
The PADMet-utils is a suite of scripts based on PADMet library to link admissible input data to the customized workflows and the various analysis tools available in the workspace. The PADMet-utils ls contains four main modules for data management, connection to software, data exploration and manual curation assistance. For instance, pgdb-to-padmet from the module connection to software was used to compile the output of Pathway Tools, the PGDB folder to one unique file in PADMet format. In the same module, sbml-to-padmet was used to convert one or more SBML to one unique file in PADMet format with or without a database of reference. add-seeds-rxn from the module data management was used to add the exchange and transport reactions of a set of metabolites in a given metabolic network. fba-test from the module data exploration was used to perform FBA. To get an idea of the way to use the PADMet-utils simply explore the Makefile of AuReMe. This toolbox only requires the PADMet library and is available in the AuReMe workspace or can be downloaded on Gitlab https://gitlab.inria.fr/maite/padmet-utils and used in stand-alone mode.

Exporting a model produced in the workspace to Pathway Tools
AuReMe can be used as an intermediary between the creation of of model in a major platform such as To do this, we carried out the following steps: 1) In AureMe, we exported an SBML file with the added reactions: • Padmet-utils script: sbmlGenerator.py • output file: added_AureMe_PadMet.xml • Step 2: In the SBML import window, we selected the "Create a New Database…" option: • Step 3: In the pop-out window we entered the required information to create our new PGDB (addedcyc): • Step 4: Back in the SBML import window we selected the "Import->Select and Read SBML File …" command and selected our SBML file with the AureMe • Step 7: We ran the command "Import->Merge SBML Reactions".
• Step 8: Finally, we saved our new addedcyc PGDB by running the command "Database->Save DB".
3) In Pathway Tools v 21.0 we exported all reactions in our newly created PGDB (addedyc) to a Lispformat File. To do this, we started Pathway Tools though the Lisp interpreter with the following command: