Reconstruction of Danio rerio Metabolic Model Accounting for Subcellular Compartmentalisation

Plant and microbial metabolic engineering is commonly used in the production of functional foods and quality trait improvement. Computational model-based approaches have been used in this important endeavour. However, to date, fish metabolic models have only been scarcely and partially developed, in marked contrast to their prominent success in metabolic engineering. In this study we present the reconstruction of fully compartmentalised models of the Danio rerio (zebrafish) on a global scale. This reconstruction involves extraction of known biochemical reactions in D. rerio for both primary and secondary metabolism and the implementation of methods for determining subcellular localisation and assignment of enzymes. The reconstructed model (ZebraGEM) is amenable for constraint-based modelling analysis, and accounts for 4,988 genes coding for 2,406 gene-associated reactions and only 418 non-gene-associated reactions. A set of computational validations (i.e., simulations of known metabolic functionalities and experimental data) strongly testifies to the predictive ability of the model. Overall, the reconstructed model is expected to lay down the foundations for computational-based rational design of fish metabolic engineering in aquaculture.


Introduction
High blood levels of long-chain omega-3 fatty acids (n-3 FAs), eicosapentaenoic acid (EPA, 20:5) and docosahexaenoic acid (DHA, 22:6), decrease risk of human cardiovascular disease events [1]. Intake of fatty, cold-water fish such as salmon, mackerel, and sardines, or fish oil capsules are the most effective methods of increasing EPA and DHA levels in the blood [2]. Fish metabolic engineering predominantly involves fisheries, however the fish farming industry has been criticised for being a net consumer of marine resources, in the form of fishmeals and fish oils used in feeds, despite the efforts made to replace them with alternatives, such as vegetable proteins and oils [3].
Current challenges in using fish as factories for public health nutrition and nutraceuticals require predesigned and efficient strategies for metabolic engineering [4]. Currently, fish metabolic engineering mostly involves trial-and-error approaches, without the utilisation of computational modelling procedures to rationally design genetic modifications. The marginal contribution played by metabolic modelling in fish until now stands in marked contrast to its prominent success in plant and microbial metabolic engineering [4][5][6]. The reconstruction process is well established for metabolic networks [7]. Once assembled, the reconstruction can be readily converted into a mathematical format by adding balances (e.g., mass-balance constraints), steady-state assumptions [8]. The resulting model is condition-specific and can be used for phenotype simulations using various constraint-based reconstruction and analysis methods [8,9]. This approach has proven successful for numerous microorganisms [10] and eukaryotes for addressing various biological and biotechnological questions, such as the analysis of knowledge gaps [11], simulation of phenotype traits [12], analysis of evolution of metabolic networks [13,14] and metabolic engineering applications [15]. These analyses solely rely on simple physical-chemical constraints, thus overcoming the problem of missing enzyme kinetic data [5]. Numerous applications for large-scale plant and microbial networks have proven to be highly successful in predicting metabolic phenotypes in metabolic engineering and many other applications [5,16]. The reconstruction of metabolic network models for multicellular eukaryotes is significantly more challenging than that of bacteria, because of the larger size of the networks, the subcellular compartmentalisation of metabolic processes, and the considerable variation in tissue-specific metabolic activity [17]. A significant step forward in fish metabolic modelling was made with the publication of large-scale MetaFishNet model analysing high throughput expression data, as well as a framework for future systems studies [18]. This genome-wide model offered a significant expansion of the Kyoto Encyclopedia of Genes and Genomes (KEGG) zebrafish model [19]. In this work we present a computational reconstruction of genome-scale, subcellular compartmentalised metabolic network model for D. rerio (ZebraGEM), which relied on genomic and biochemical data extracted from various databases and literature. The validity of the model reconstruction steps is demonstrated via computational tests (cross-validation tests and simulations of known metabolic functionalities).
In this study, we present the reconstruction of the global D. rerio metabolic map. ZebraGEM is a comprehensive literature-based genome-scale metabolic reconstruction using manual curation, including extensive gap-assessment and filling. ZebraGEM has 9 compartments (cytoplasm, endoplasmic reticulum, extracellular space, Golgi apparatus, lysosome, mitochondria, mitochondrial intermembrane space, nucleus and peroxisome), 18 interfaces (surface orientations) and accounts for the functions of 4,988 proteins, 1,553 metabolites, and 2,824 metabolic and transport reactions. This network reconstruction was transformed into an in silico model and validated through the simulation of known metabolic functions found in a several tissue types.

Global Model Reconstruction
We adopted a reaction-centered view of D. rerio metabolism, where each metabolic gene is assigned to one or more reactions. We used Enzyme Commission numbers [20] to identify an initial set of metabolic genes from the KEGG database [19], National Center for Biotechnology Information's Gene [21], and Reactome Database [22]. These genes were mapped to a rudimentary network of 1,374 metabolic enzymes and 1,550 reactions (see Methods). In addition we collected zerbafish-specific reactions from MetaFishNET [18]. The final D. reria metabolic model (ZebraGEM) was then reconstructed based on biochemical and physiological knowledge by surveying literature and mining databases. We evaluated more than 15 years of biological evidence from more than 1,800 primary literature articles, reviews, and biochemical textbooks. The reconstruction was almost entirely constructed from zebrafish-specific data and includes many reactions directly extracted from the literature that are not described in any chart or database. Furthermore, it represents carefully formulated metabolites and reactions, which account for known reaction stoichiometry, substrate/cofactor specificity, and directionality, as well as overall conservation of mass and metabolite ionisation states at pH 7.2. Unambiguous metabolite names were verified using Chemical Identifier Resolver [23].

Compartmentalised Model Reconstruction
To extend the genome-scale D. rerio model reconstructed in the above step and to account for subcellular compartmentalisation of metabolic processes, we considered the known sub-localisation of the reactions taken from literature articles and reviews. Manual literature-based reconstruction ensured that the network components and their interactions were based on direct biochemical and subcellular evidence and reflected the current knowledge of D. rerio metabolism (see Methods).
The analysis resulted in a compartmentalised model Figure 1), with each enzyme localised between eight intracellular spaces (cytoplasm, endoplasmic reticulum, Golgi apparatus, lysosome, mitochondria, mitochondrial intermembrane space, nucleus and peroxisome) and the extracellular space Figure 1A. Furthermore, for each enzyme, the precise sub-cellular location and surface orientation on the membrane site or interface (when a reaction other than a transport is associated with the membrane; e.g., endoplasmic reticulum membrane, cytosolic side) was identified.

Gap Filling
First, we decided to test whether the model was able to produce biomass by optimising the corresponding reaction accounting for all known biomass precursors required for cell replication. The non-compartmentalised draft model was able to produce biomass without any modifications. In contrast, the compartmentalised draft model was unable to produce biomass. Therefore, we sought to gap fill the draft compartmentalised to identify transporters needed to be added to the model for production of all biomass components. We searched for the minimal number of transporters necessary to add to the model in order to enable the production of biomass. The results of the algorithm were checked manually for two criteria: (i) the result did not suggest a reversible reaction for a known irreversible reaction and (ii) the added reaction transports a metabolite from a compartment where it was produced to a compartment where it was used as reactant.

Compartmentalised Model
A detailed list of the transporters is provided in Table S1. A small but significant percentage of the reactions are occurring in multiple compartments, in accordance with the experimental data used for the model reconstruction 20%, 3%, and 2% of the reactions were localised to two, three, or a higher number of compartments, respectively; This described 2,286 biochemical reactions, 2,824 compartment-specific reaction (with associated enzymes) and 2,910 compartment-specific, surface-specific reactions (e.g., one cytosolic reaction can occur both on the cytosolic side of the Golgi apparatus and the endoplasmic reticulum membranes). However, even if some reactions occur in more than one compartment, enzymes are reaction-and compartmentspecific; No enzyme was described in more than one compartment. Table S2 describes the main pathways, their compartment, and the reaction count. Figure 1B outlines the model by summarising the reaction and gene content for each subcellular compartmentalisation and metabolite exchanges between compartmentalisation and with the environment. ZebraGEM contains 9 different compartments Figure 2A. 18 interfaces were used for transport or exchange with the environment, and orientate membrane-associated reaction sides Figure 2B. There are 47 nutrient uptakes, while oxygen, water and carbon dioxide can freely exchange with the environment. The synthesis of 42 biomass components is summarised in the biomass reaction (Table S3)

Model Validation
Following previous model reconstructions [25][26][27], we further validated the global model reconstruction using a set of simulations of known metabolic functions, such as amino acids, as well as secondary metabolites biosynthesis and degradation (Table S4). All simulations were successful, demonstrating that this model is indeed functional (see Methods).

Experimental Support
Examination of experimentally characterised pathway activity was used to validate the global metabolism model. For example, the de novo biosynthesis of taurine, our model suggests that the only path able to synthesis the taurine is controled by cysteine sulfinic acid decarboxylase. This finding is in accordance with the recently reported critical role of cysteine sulfinic acid decarboxylase in D. rerio taurine homeostasis and cardiac development [28]. Likewise, a large portion of the unique metabolites detected in the endoplasmic reticulum were fatty acid-related, including sterols, sphingolipids, and CoAs [29]. In the peroxisome, CoA metabolites were prominent [30][31][32][33].

Conclusion
System biology approaches to evolution are attractive as they bring the promise of quantitative analyses of trait evolution and behaviour, integrating phenomena from the molecular to the ecosystem level. As a result, they allow us to describe precise hypotheses as to how traits can be successively used for metabolic engineering.
Several limitations should be noted: (i) As the approach hinges upon different molecular data, its accuracy depends on the quality of the latter. (ii) Several assumptions are used in the reconstruction process, such as the assumption of minimal addition of reactions and directionality relaxation to resolve network gaps, and the assumption of minimal number of metabolite transmembrane transport used in the localisation assignment. These assumptions, although commonly used for similar purposes [34,35], may in some cases be oversimplified. (iii) The resulting models do not explicitly account for transcriptional and metabolic regulation; data that is still unavailable for most D. rerio metabolic functions, and thus cannot accurately predict some of the organism's functions. It should be noted, however, that further experimental validation should be performed to reaffirm the predictions.
A number of future refinements and extensions will improve the quality of the resulting network models. Such reconstructed models, in addition to those presented here for D. rerio, open up future opportunities for metabolic engineering of increased production rates of various nutraceuticals important for human and fish health.
In this work we present the reconstruction of a fully compartmentalised model of D. reria, a model fish and present a new feature in metabolic modelling: surface orientation. Despite its evolutionary divergence to farm fish, the genetic make-up of the D. reria model fish is the first step toward fish metabolic engineering, making its study highly useful to other fish, as well as to the fish industry, where metabolic engineering is commonly used for quality-trait improvement.

Reconstruction Procedure
An initial component list was assembled as described in the text (see Results and Discussion). Putative gene assignments were collected for KEGG build 39 [19], and verified based on evidence collected from genome annotation databases, namely EntrezGene [21], Ensembl 64 [36] D. rerio assembly Zv9, and scientific literature. Extra reactions were collected from MetaFishNET release 1.9 [18]. Substrate and cofactor preferences were identified from the literature and NCI/CADD Chemical Identifier Resolver beta 4 [23] or ChemSpider [37]. Metabolite formulae and charges were calculated based on their ionisation state at pH 7.2, which was assumed to be constant across all compartments. Reaction directionality was determined from thermodynamic data or inferred from empiric data and textbooks. For clarity, we have used only the apparent primary specificity of each enzyme in our network. Compartmentalisation was determined from protein localisation data; sequence targeting signals, and indirect physiological evidence. Transport reactions were entirely reconstructed based on literature reports and biochemistry textbooks because the current annotation of transporters is not sufficiently specific with regard to substrates and mechanisms. Each reaction in our network was manually curated with regard to reaction stoichiometry, subcellular localisation and directionality. By default each particular reaction was defined to be bidirectional, except if explicitly stated otherwise by the literature or the source databases. Based on the controlled nomenclature, the function of a reaction was used to generate descriptive statistics. The entire content of D. rerio ZebraGEM, with the references used to describe each reaction, is freely available Systems Biology Markup Language format [24] as EBI BioModel: MODEL1204120000.
Each mapped reaction is defined as a node in our network. Edges between these nodes are defined by shared metabolites between the reactions. The network is directed: for irreversible reactions if the product of one reaction is a reactant in the second, we defined a directed edge. Reversible reactions are treated similarly, except that both directions of the reaction were allowed and handled independently. The networks were visualised with Gephi v0.8 beta [38].

Exchange Reactions and Biomass Synthesis
Thirteen currency metabolites (H + , H 2 O, ATP, ADP, P i , PP i , Na + , coenzyme A, O 2 , NAD + , NADH, NADP + , and NADPH) were removed from all analyses (but the flux balance analyses) in whichever compartments they occurred [39]. Exchange reactions (Table S2, type 'import') allow metabolites to enter and/or leave the metabolic network and were formulated considering the minimum nutritional physiology of the cell. According to published dietary requirement from farm fish [40], the network considers 10 essential amino acids, 2 lipids, 9 carbohydrates, 6 macrominerals, 6 microminerals, 4 fat-soluble vitamins and 10 water-soluble vitamins nutrient uptakes. Likewise, four lipids can be used in fish nutrition; linolenic and linoleic acid, mainly for freshwater fish and eicosapentaenoic and docosahexaenoic acid for seawater fish; since the D. rerio in a freshwater fish and the model allows the synthesis of eicosapentaenoic and docosahexaenoic acid, only linolenic and linoleic acid were included. Oxygen, water and carbon dioxide can freely exchange with the environment. Metabolic network biomass is composed of all amino acids (protein), nucleic acids (DNA, RNA) and lipids, including triacylglycerol and cholesterol (Table S2, type 'biomass').

Gap Filling
Gap filling was done by importing the required metabolites from one compartment to another in order to allow all reactions to occur, and subsequently to produce all biomass components. Every metabolite that could not be produced or consumed was manually examined to identify possible reactions describing its degradation, production, or transport. Each added transporter was manually checked for applicability (such as the feasibility of the suggested directionality) and for being reported in at least one other species (fish when available, mammalian otherwise) before been included. We searched for the minimal number of transporters necessary to add to the model in order to enable the production of biomass.

Flux Balance Analysis
We used the Systems Biology Research Tool v2.0.0 [41] to perform flux balance analysis on the networks [34]. Exchange reactions were added to enable uptake and secretion of extracellular metabolites for the purpose of simulations (Table  S5). The optimum flux distribution is here defined as the flux distribution that minimises uptakes for a fixed rate of biomass synthesis. The biomass composition (Table S2) was taken from the published dietary requirement from farm fish [40]. Note that exchange fluxes are negative for influx and positive for efflux by definition.

Functional Validation
Functional validation was performed by using flux balance analysis. The test simulations validates that the targeted metabolite could be synthetised from a particular source metabolite without violating the other model constraints (i.e., reaction directions, model inputs).