Integration and Validation of the Genome-Scale Metabolic Models of Pichia pastoris: A Comprehensive Update of Protein Glycosylation Pathways, Lipid and Energy Metabolism

Motivation Genome-scale metabolic models (GEMs) are tools that allow predicting a phenotype from a genotype under certain environmental conditions. GEMs have been developed in the last ten years for a broad range of organisms, and are used for multiple purposes such as discovering new properties of metabolic networks, predicting new targets for metabolic engineering, as well as optimizing the cultivation conditions for biochemicals or recombinant protein production. Pichia pastoris is one of the most widely used organisms for heterologous protein expression. There are different GEMs for this methylotrophic yeast of which the most relevant and complete in the published literature are iPP668, PpaMBEL1254 and iLC915. However, these three models differ regarding certain pathways, terminology for metabolites and reactions and annotations. Moreover, GEMs for some species are typically built based on the reconstructed models of related model organisms. In these cases, some organism-specific pathways could be missing or misrepresented. Results In order to provide an updated and more comprehensive GEM for P. pastoris, we have reconstructed and validated a consensus model integrating and merging all three existing models. In this step a comprehensive review and integration of the metabolic pathways included in each one of these three versions was performed. In addition, the resulting iMT1026 model includes a new description of some metabolic processes. Particularly new information described in recently published literature is included, mainly related to fatty acid and sphingolipid metabolism, glycosylation and cell energetics. Finally the reconstructed model was tested and validated, by comparing the results of the simulations with available empirical physiological datasets results obtained from a wide range of experimental conditions, such as different carbon sources, distinct oxygen availability conditions, as well as producing of two different recombinant proteins. In these simulations, the iMT1026 model has shown a better performance than the previous existing models.


Results
In order to provide an updated and more comprehensive GEM for P. pastoris, we have reconstructed and validated a consensus model integrating and merging all three existing models. In this step a comprehensive review and integration of the metabolic pathways included in each one of these three versions was performed. In addition, the resulting iMT1026 model includes a new description of some metabolic processes. Particularly new information described in recently published literature is included, mainly related to fatty acid and sphingolipid metabolism, glycosylation and cell energetics. Finally the reconstructed model was tested and validated, by comparing the results of the simulations with available empirical physiological datasets results obtained from a wide range of experimental Introduction metabolite and reaction identifications and nomenclature. On the other hand, iLC915 incorporates more P. pastoris specific gene-protein-reaction associations and hence, a larger number of genes, but several extracellular and nuclear reactions are missing. In general terms, these models cover the same metabolic processes, but iLC915 is more detailed. Nevertheless, there still exist some critical issues in these models, such as missing and divergent information or reactions that require manual revision and curation.
Similar to other organisms, such as Saccharomyces cerevisiae, new versions and updated models integrate previous versions and incorporate new features and information from newly published literature. In the case of S. cerevisiae, despite the existence of other versions, a consensus metabolic model was developed [39] and it was further developed and upgraded, being expanded and revised up to the 7 th version [40][41][42][43].
In this work, we compare the models of P. pastoris and provide an upgraded version. As mentioned above two different strains were used to obtain these models. Nonetheless, there is a high degree of identity at the amino acid coding sequences level (93.7%) and functional annotation between the two genome sequences [32]. In addition, no differences were observed in reactions involved in the metabolization of the different carbon sources [44]. Therefore, a major objective of our study was to obtain a general model that can be applicable to both strains. Furthermore, an extensive analysis was performed on several pathways, comparing the three models and updating them with the newly published literature. Recent findings on P. pastoris physiology and metabolism enabled to complete sphingolipid biosynthesis metabolism and glycosylation pathways, as well as the oxidative phosphorylation electron transport chain. Furthermore, we included different biomass compositions specific for each of the alternative carbon sources used. Finally, the model accuracy was tested in a variety of physiological conditions.

Model merging
For the model comparison, an initial step of metabolite nomenclature unification was required. In PpaMBEL1254 only the identifier (ID) was available in the SBML file, i.e. neither the complete name of metabolites nor any association to a reference database was included. These metabolite IDs, were mostly the standard IDs most commonly used and therefore also included in both in BiGG [45] and The SEED [46] reference databases. After this first metabolite parsing and renaming step was done, MetaNetX [47] database was used in order crosslink information from KEGG [48], ChEBI [49] and MetaCyc [50]. This step not only allowed unifying metabolite names but also to include its molecular formula and charge at pH 7.2.
Once all metabolite names were unified, PpaMBEL1254 and iPP668 were compared using ModelBorgifier [51], thereby obtaining a first pre-merged model. In a second step, this merged model was compared to iLC915, generating a first draft of the consensus model. Due to important differences in model structure between iLC915 and the other models, a manual comparison was necessary. This was performed by analyzing the structure of each of the remaining pathways or subsystems. Differences were resolved according to the available literature, comparing the reactions with those included in two latest versions of the consensus S. cerevisiae GEMs [42,43] or in another recently published S. cerevisiae GEM [52]. Divergences in gene assignments were resolved using P. pastoris or S. cerevisiae literature. The P. pastoris high quality sequence annotation [30] and the automatic reconstructions for P. pastoris [34,35] were also used to verify annotations and gene-reaction assignments from the previous models. Finally, pathway revamping and addition of new reactions was performed based on available yeast literature and metabolic pathways/reaction databases [45,50,53,54].
Eventually, the network was loaded into a convenient environment for debugging [55]. Thus, both COBRA [56] and RAVEN [57] toolboxes were used in order to ensure pathway connectivity and biomass formation. Duplicated reactions in the final model were deleted and blocked reactions were connected (gap filling) to the network when few steps were required. In addition, the elemental mass balance of each reaction was checked and corrected when unbalanced. The final model (S1 and S5 Files) can also be obtained in SBML format from BIOMO-DELS database with accession number: MODEL1508040001 [58]. The SBML model was generated with the RAVEN toolbox [57] and validated with SBMLeditor [59].

Biomass and recombinant protein composition
The biomass reaction is defined by the sum of biomass components, grouped in macromolecules (carbohydrates, proteins, lipids, DNA, RNA), essential cofactors and ATP consumption associated to growth. This equation was adapted depending on culture conditions or carbon source used in accordance to the available literature experimental data [60,61]. In addition, composition of each macromolecule type, such as lipid and carbohydrate, was updated and extensively detailed due to the recently published detailed information of the specific composition [62,63]. See S2 File for details in composition and calculations.
The model was also tested for the expression of two different recombinant proteins under different growth conditions: i) the antibody fragment 2F5 (FAB), expressed constitutively under the GAP promoter [64] and, ii) a Rhizopus oryzae lipase (ROL), regulated by the methanol inducible AOX promoter [65]. The dataset from the FAB-producing strain was used for simulations in oxygen limiting growth conditions [26,66], whereas the dataset from the ROL producing strain was used in simulations for glycerol-methanol co-feeding experimental conditions [25,61]. Reactions for heterologous protein production were included considering different levels: DNA sequence, transcription and mRNA formation, as well as translation and protein formation. Similarly to PpaMBEL1254 and iLC915, a ratio of 1:100:10 5 between recombinant DNA (gene copies), mRNA and heterologous protein was assumed, as described in [32]. These equations also include energetic requirements for polymer formation [67]. Details of DNA, RNA and amino acid composition of each protein, as well as equations for the biosynthesis of their components can be found in S3 File.

Energy requirements
Before the validation step, classic energetic parameters were estimated using experimental data. These parameters are the growth-associated maintenance energy (GAME) and non-growthassociated maintenance energy (NGAME). Both are represented as ATP consumption in the model. For the NGAME calculation, a classical approach was used [68]. For the glucose-limited cultivations, the glucose uptake rate (mmol glucoseÁgDCW -1 Áh -1 ) was represented against the specific growth rate using data from [24,60,69,70]. The y-intercept of the linear regression line to this data corresponds to the amount of glucose needed for maintenance by non-growing cells. Using this value and the model, the NGAME can be calculated by maximization of the ATP turnover per mmol of glucose for the case of no biomass growth. Using the obtained value as fixed value for the non-growth associated maintenance, GAME is determined by adjusting the ATP consumption coefficient in biomass equation to fit biomass-substrate yields using experimental data (including CO 2 and O 2 constrains) from [60].
However, for the case of glycerol:methanol growth conditions NGAME was directly taken from the calculated values reported in [61]. This was necessary due to the range of cultivation conditions considered and the insufficient experimental data available. Using these values, and similarly to the glucose-only growth condition, the GAME values for glycerol:methanol conditions were calculated by fitting the predicted values to the range of experimental biomass-substrate yields reported in [61].

Model analysis and validation
Model analysis and validation were performed using both RAVEN [57] and COBRA [56] toolboxes as described below.
Carbon assimilation capabilities were determined maximizing growth rate and arbitrarily constraining the carbon source influx to 10 mmolÁgDCW -1 Áh -1 except were otherwise stated.
Reaction essentiality was determined performing an additional set of simulations. The procedure consisted on sequentially deleting each reaction of the model, maximizing biomass production and calculating the ratio of the resulting growth rate over the wild type result (GR KO = growth KO /growth WT ). The ratios obtained allowed to classify each reaction into three categories: i) essential (GR KO = 0), ii) partially-essential (0 < GR KO < 1) iii) non-essential (GR KO = 1).
Evaluation of the effect of oxygen limiting conditions on glucose cultures was performed constraining glucose and oxygen uptake rates to the measured values [66]. For the glycerol: methanol experimental conditions, only glycerol and methanol uptake rates were constrained to the experimental values [24] while O 2 uptake rate was left unconstrained. In all these cases biomass, CO 2 and by-products were left as unconstrained positive values and therefore appeared as calculated products were necessary. In all these cases biomass production was the maximized objective function.

Model merging
As described in Materials and Methods section and summarized in Fig 1, the generation of the new model consists of several steps. In the first step of reconciliation, PpaMBEL1254 and iPP668 were automatically compared using modelBorgifier [51]. After the initial pairing step, 75% of the complete set of reactions was identified as identical (exact coincidence) reactions. This pre-merged model was compared with iLC915 resulting in a low number of identical or equivalent reactions (36% of the complete set). Nevertheless, a larger number of reactions were comparable. Those mainly differ in having different stoichiometric coefficients, being assigned to different compartments, decomposed in multi-step reactions, or using alternative names for metabolites (different synonyms, generic names or corresponding to enantiomer compounds).
The iPP668 and PpaMBEL1254 models were the first models to be published, and they are more similar to each other than to iLC915. Approximately 83% of the reactions from PpaMBEL1254 are present in iPP668 and 89% of the reactions in iPP668 are shared with PpaMBEL1254. Overall, models iPP668 and PpaMBEL1254 have a 75% of reaction identity. Furthermore, similar nomenclature and abbreviations are used in these two models. In addition, model structure and detail are similar to iND750 [71], from S. cerevisiae, and those models in the BiGG database [45].
The third published model, iLC915, has many differences with the previous two models. Its nomenclature and structure is KEGG-based [48]. Therefore, its metabolites are fully protonated and many pathways include the same number of steps described in KEGG. That is, many condensed or simplified metabolic branches in the other two models appear decomposed as multi-step reactions in iLC915. Such reaction differences among the models are one of the major reasons for the low pairing of iLC915 with the other two models and seem to be the result of the main database or model scaffold used as basis for model reconstruction.

Updated pathways
As a result of the model comparison and merging step, some divergences in several pathways of the three existing models were evident. In addition, some of them were incomplete or misrepresented in all three models. Therefore we engaged in a curation step according to the recently published literature and database information. Nevertheless, some difficulties in gene and pathway verification were found. The fact that new genomes are usually automatically annotated, at least in an initial step, and that enzyme activities or functions are inferred by homology, propagates errors from already annotated sequences to the new ones [72]. This issue arises when there is limited biological knowledge of the organism. Furthermore, not only genome annotations are based in other organisms and sequences, but GEMs are also commonly developed from previous existing models. As a result, annotation errors or misrepresented pathways are also spread to the subsequent new/updated models. Moreover, as new GEMs are mostly based in pre-existing reconstructions, few new metabolic reactions are often incorporated in the novel GEMs versions. Consequently, the metabolic potential and biological diversity is often not fully reflected in the GEMs and the total number of enzymatic activities present in the existing models remains far below from the complete catalog of enzymatic steps described in the literature for each organism [3].
In the case of P. pastoris, the current annotation of its genome is rather limited [73], with most of all annotations being inferred by homology, mainly from S. cerevisiae. According to the best annotation available P. pastoris has 5040 protein-coding annotated genes of which only 3532 has been assigned an Ontology term and all but 21 annotations are automatically inferred [73]. Despite to the fact that P. pastoris and S. cerevisiae belong to the same family (Saccharomycetaceae), they present significant differences in their metabolic capabilities. Thus, besides P. pastoris well known additional pathways such as the methanol incorporation steps, other significant metabolic differences exist. More specifically, in this work, pathways such as sphingolipid biosynthesis, oxidative phosphorylation and glycosylation pathway, were adapted and redefined, as described below.

Fatty acid biosynthesis
Due to the limited information on specific fatty acid (FA) metabolic pathways in P. pastoris, it was assumed that most of S. cerevisiae fatty acid pathways were identical in the P. pastoris case. According to Hiltunen and co-workers [74], the latest version of yeast consensus model [43] and iTO977 model of S. cerevisiae [52], fatty acid biosynthesis takes place both in mitochondria and cytosol by fatty acid synthase (FAS) type II and I, respectively [75]. FAS type II has individual enzymes for each reaction in fatty acid de novo biosynthesis and elongation. Despite it is well known that mitochondrial FAS type II synthetizes at least up to C 8 fatty acid, some evidences suggest that this system can synthesize longer fatty acids [74,76]. While in S. cerevisiae 7 th version of the consensus model [43] mitochondrial biosynthesis is up to C 8 , this model also include reactions for up to C 18 biosynthesis.
On the other hand, cytosolic FAS is a complex formed by Fas1p and Fas2p within which the successive elongation reactions take place and only the final acyl-CoA is released [77]. The final products of this cytosolic complex are considered to be C 14 to C 18 acyl-CoAs, mainly because they are the main fatty acids found in P. pastoris [62]. The biosynthesis takes place inside the complex in a number of four step cyclic reactions for each acetyl-CoA added. Different number of cycles results in a range FA (C 14 , C 16 and C 18 acyl-CoAs). In addition to de novo biosynthesis, P. pastoris also has fatty acid elongation enzymes, which are able to extend C 12-14 fatty acids and generate very long chain fatty acids (up to C 26 ).
Activation of free fatty acids (FFA) was considered to take place in cytosol only for C 14 , C 16 and C 18 FFA, as well as their respective acyl-CoA hydrolysis. Finally, only acyl-CoA desaturations were included (that is not acyl-ACP or FFA) according to the pathway defined in S. cerevisiae [43].

Fatty acid oxidation
Two different transport mechanisms are commonly described depending on the FA chain length [78][79][80], both being closely coupled to its activation to acyl-CoA [81][82][83][84][85]: on the one hand, a simple diffusion and further activation of medium-chain fatty acids (up to C 12 chain length) and, on the other hand, long and very long chain fatty acids are translocated as acyl-CoA concomitant with the corresponding ATP hydrolysis [86]. For the active transport mechanism, the ATP has a cytosolic origin in ILC915, while in PpaMBEL1254 and iPP668 the required ATP is peroxisomal. According to [87,88], peroxisomal ATP is only required for medium chain fatty acid activation, therefore long and very long chain fatty acids transport should be dependent on cytosolic ATP.
Each cycle of β-oxidation is represented by 4 reactions. Nevertheless, for unsaturated fatty acid degradation, and due to its highly complex degradation steps which depend on the position of its double bounds [89,90], lumped reactions up to the generation of acetyl-CoA were taken from iPP668. However, the desaturation reaction of C 18:3 to C 18:2 was taken from iLC915.
None of the three models include GlcCer biosynthesis. Ternes and co-workers [97,98] identified the gene role in GlcCer pathway and described the fatty acid specific composition in GlcCer and other sphingolipids, as well as the main chain sphingoid bases in P. pastoris.
Sphingoid bases can be derived from palmitoyl-CoA and stearoyl-CoA. However, only a 5% of the detected species correspond to the last one. In fact, Ternes and co-workers [98] characterized sphingolipid composition assuming all the species were formed with a palmitoyl-CoA derived sphingoid base. This sphingolipid composition is in agreement with other literature sources [62,99,100]. As palmitoyl-CoA bases represents around 95% of sphingoid bases, only palmitoyl-CoA derived ones are taken into account in this model.

Glycosylation pathways
Protein glycosylation pathways are not accurately described in previous models of P. pastoris. Only iLC915 partially included N-glycosylation, O-glycosylation and glycosylphosphatidylinositol-anchor (GPI-anchor) biosynthesis pathways. However, compartmentalization of several reactions of this pathway also required revision.
The first part of the N-glycosylation process is highly conserved among eukaryotes [101,102]. It takes place in the cytosol up to the addition of 5 mannose residues (Man) forming (Man)5(GlcNAc)2(PP-Dol)1 oligosaccharide. At this point, the oligosaccharide is transferred to the endoplasmic reticulum (ER), where up to 9 Man and 3 glucose residues (Glc) are further added [103]. In the second and less conserved part of the pathway, the dolichol diphosphate attachment to the protein is represented by a pseudo-reaction forming the compound (Glc)3 (Man)9(GlcNAc)2(Asn)1 in which Asn represents an asparagine residue from the targeted protein. Once the oligosaccharide is attached to the Asn residue of the target protein, it is further modified by the removal of one Man. The resulting glycoprotein is transported to the Golgi Apparatus [104,105]. There, an heterogeneous pattern of glycosylation has been observed corresponding to the different heterologous proteins expressed in P. pastoris [19,106,107]. As an example, differences in Man residues range from 6 to 18 [108][109][110] and even may include hypermanosylation [111]. Due to its complexity and variability, in this model an average glycan is assumed to consist of (Man)9(GlcNAc)2(Asn). The resulting oligosaccharide contributes to the formation of a mannan (Man polymer represented by 1 mannose residue polymer) and a chitin (N-Acetylglucosamine polymer). Both contribute to the biomass formation as a specific component of the carbohydrate fraction.
Similarly to N-glycosylation in mannan formation, O-glycosylation is included assuming an average of 3 Man oligosaccharides [112][113][114][115]. O-glycosylation is also represented by a pseudo-reaction forming the compound (Man)1(Ser/Thr)1 in which Ser/Thr represents a serine or threonine residue of a protein.
Finally, GPI-anchor biosynthesis was also reviewed and compartments reassigned according to Orlean and Menon [116].

Oxidative phosphorylation
There are P. pastoris specific traits in respiratory chain that should be included in the GEM. As an example, while complex I is not present in S. cerevisiae [117], it is described in P. pastoris [118,119]. None of the previous P. pastoris models include respiratory complex I and so, an important proton translocation step was missing. In general, two main traits of oxidative phosphorylation were deeply analyzed: the proton stoichiometry and the complexes integrating the electron transport chain.
In this model the mitochondrial intermembrane space has been included. Hence proton translocation is assumed to occur from the mitochondrial matrix to the intermembrane space. Thus, protons pumped out from the mitochondria do not merge with the high amount of protons from the cytosolic space. Regarding the electron transfer chain reactions, each of the previous P. pastoris models shows a different stoichiometry for proton translocation. After a review of the stoichiometry and relevant literature it was decided to apply a stoichiometry that satisfies the H + balance of the metabolites' charged formula and including complex I stoichiometry considerations proposed by Wikström and Hummer [120]. This includes the complex I translocating 4 H + to the intermembrane space [120,121].
In the present model, reactions for non H + translocating outer and inner mitochondrial membrane NAD(P)H dehydrogenases (cytoplasmic side and matrix side) were also included. While inner NADH dehydrogenase appears in all three models, outer NADH dehydrogenase was only present in iPP668 and PpaMBEL1254, despite both dehydrogenases have been previously reported [118,122,123]. We also included PAS_chr1-4_0299 putative NADPH dehydrogenase homologue to Kluyveromyces lactis [124,125] and also included in its metabolic reconstruction [126].
On the other hand, complex III and IV are included in all three models, but several discrepancies exist on the details of the H + balance due to the consideration of alternative metabolite's molecular formulas or the result of using different criteria when considering chemical and translocated protons [127]. An additional trait for complex III equations is the complexity on the stoichiometric representation of Q-cycle [128,129]. Therefore, in this model stoichiometric coefficients of equations for complex III and IV reactions were chosen with special attention to the proton balance. The selected stoichiometry was of 2H + /2efor complex III and 4H + /2efor complex IV according to recent literature [121,128,129].
One additional characteristic of P. pastoris is the presence of an alternative oxidase that could bypass complex III and IV in the respiratory chain [117,130] which seems to be active only in certain growth conditions [117,122,131]. Although this oxidase was only included iLC915, our consensus model also incorporates this reaction in the electron transport chain module. Finally, the stoichiometry for ATP synthase was maintained as in PpaMBEL1254 and iPP668 (4H + /ATP, resulting in a final maximum theoretical stoichiometry of 2.5e -/ATP) as its H + balance is in agreement with the available literature [132].

Other reviewed pathways
Metabolization of some sugars was also updated. Sugars such as starch, maltose or cellobiose were able to be assimilated in some of the previous models. However, Kurtzman [44] and Naumov and co-workers [133] characterized substrate assimilation in P. pastoris and reported non growth for these carbon sources. Consequently reasons for their metabolization were revised and discarded reactions are detailed in S4 File. L-rhamnose assimilation was only possible in PpaMBEL1254. However the specific metabolic steps included for its metabolization were not typical of yeast species. Therefore alternative reactions and genes associated to the metabolization of L-rhamnose were added as suggested in [134], also described similarly for P. stipitis [135,136].
A full list of modifications from the original models, also including some additional pathways and subsystems not discussed above such as phosphatidylinositol synthesis or transport reactions, is provided in S4 File.

General characteristics of the model
After following all the steps described above, an extended model (iMT1026) is obtained that integrates previous P. pastoris' GEMs, which is provided in S1 and S5 Files and is also available at BIOMODELS database (MODEL1508040001) [58]. The general characteristics of the resulting model are described in the following.
In the first place this model includes an increased number of gene-protein-reaction relationships as can be seen in Table 1.
Our final model incorporated 185 new reactions that didn't appear in previous models and has 614 common reactions in all three models (Fig 2). Reactions appearing in Fig 2 as common to two or all three different models include those reactions that have been taken directly from the previous model. Therefore, they do not include those reactions that are either not the same but equivalent, have been decomposed in several reactions or are a combination of several other reactions. Thus, taking into account these multi-step, lumped or decomposed reactions, in the final model there are up to 721 equivalent reactions in common in all three models, 504 in two of the three models and 638 reactions in only one of the models, without any clear equivalence in any of the other two models.

Maintenance and growth-associated ATP calculations
As described in Materials and Methods, data from [24,60,69,70] was used to calculate the NGAME. As an initial step, a value of 32 mol ATP/ mol glucose, was obtained by maximizing Integration, Upgrade and Validation of the P. pastoris' GEMs the ATP turnover using 1 mmol/(gDCWÁh) of glucose. The y-intercept from a representation of glucose uptake rate versus growth rate was 0.0878 mmol glucose/(gDCWÁh) which corresponds to 2.81 mmol ATP/(gDCWÁh) using the glucose-ATP conversion factor obtained in the initial step. This calculated glucose NGAME value is similar to the 2.3 mmol ATP/(gDCWÁh), previously proposed by [31] for glucose, and close to the NGAME estimated for Pichia (Scheffersomyces) stipitis also growing in glucose [137].
On the other hand, GAME for glucose was estimated by fitting the calculated biomass-substrate yields to experimental data. This way a value of 72 mmol ATP/gDCW was obtained. This amount of ATP associated to cell growth is also similar to 70.5 mmol ATP/gDCW calculated previously for P. pastoris [33] and close to the experimentally calculated values for S. cerevisiae of 62-71 mmol ATP/gDCW [138,139], and to the 69.2 mmol ATP/gDCW, in silico estimated [5].
For the case of the glycerol and methanol co-feeding cultivations, ATP maintenance values calculated by Jordà and co-workers [61] were used as NGAME as available data was insufficient for a new determination. These values range from 4.5 to 5.7 mmol ATP/gDCW and are similar to 6 mmol ATP/gDCW, proposed in iLC915 [33]. For the different conditions tested, specific GAMEs were calculated by fitting the simulations to experimental data from [61]. These experimental data show that the ratio of the glycerol or methanol uptake rates with the growth rate is different for each pair of glycerol:methanol feeding conditions (80:20, 60:40 and 40:60, % w/w at 0.05 and 0.16 h -1 growth rates), therefore a specific GAME was calculated for each case. The obtained values ranged within the 69.8 and 125.6 mmol ATP/gDCW interval. These GAME values increase with the fraction of methanol in the mixed feeding and are in agreement with those calculated by Caspeta and co-workers [33], who estimated a maximum GAME for methanol as sole carbon source of 150 mmol ATP/ gDCW. Integration, Upgrade and Validation of the P. pastoris' GEMs

Carbon source assimilation
The model agreement with P. pastoris utilization of different carbon sources was tested and compared to experimental data [44,133]. A total of 47 carbon sources were evaluated (Table 2) using an in silico minimal medium, with ammonium, phosphate and biotin. The model successfully predicts carbon assimilation for all sources tested.

Reaction essentiality
An interesting trait of the obtained GEMs is the identification of those reactions critical or essential for biomass growth (essential reactions). Simulations to determine reaction essentiality were performed for glucose in normoxia, limited oxygen and hypoxia conditions and also for mixtures of glycerol and methanol at different growth rates. No significant difference in reaction essentiality was observed in all the conditions tested. Thus, similar patterns of distribution of essential reactions in each pathway are observed for all the cases. The results, grouped into major metabolic pathways, are summarized in Fig 3 (and S1 Fig). From a global point of view, essential reactions represent a 15-16% of the total reactions, while 76-79% of the reactions are classified as non-essential. The remaining 6-9% are partially-essential and its deletion causes a decrease in growth rate.
The results show that there are 312 condition independent essential reactions, which are essential in all the performed simulations. These essential reactions can be grouped in three main groups: in the first group, most of them are associated with lipid metabolism (40%), in the second group most reactions belong to the amino acid metabolism (18.6%) while the third group mostly includes cofactor related essential reactions as a 14.1% of all common essential reactions. Similarly to other GEM models, such as S. cerevisiae, extending the model and including more detailed biomass composition results in an increase of essential reactions directly linked to the biomass related metabolites. Nevertheless, this essentiality could be overestimated in silico, due to the fact that in vivo systems are able to replace de missing species with other similar biomass components.

Model validation
An additional step of model validation was performed by comparing the model predicted values with an additional set of experimental data including diverse combinations of glucose, glycerol and methanol chemostats [61,66].
For the glucose chemostats, simulations with different oxygen availabilities, growth rates and CO 2 production were successfully predicted with errors lower than 6% for both FAB expressing and non-expressing strains (Fig 4A, 4B and 4C). According to the available experimental data [60,66], when the oxygen availability decreases ethanol and other metabolites are secreted. The present model is also able to predict by-product formation when oxygen-limited conditions are simulated. However, the model predicts a slightly higher production of ethanol and none of the other by-products, such as arabitol or pyruvate. Nevertheless, when ethanol secretion flux is constrained to the experimentally measured value, arabitol secretion is also predicted (Fig 4C). In addition to arabitol, pyruvate is also secreted in the in silico predictions when both ethanol and arabitol are constrained to the experimental values. This discrepancy of the model to directly predict arabitol or pyruvate secretion if no additional constraint is imposed points to additional regulatory constraints other than those strictly stoichiometric. In addition, different cofactor utilization by combinations of isoenzymes [143,144], and their impact on NAD(P) + /NAD(P)H regeneration, could be one of the key factors for the production of those alternative products [145,146]. Constraining the model with additional 13 C-labelling data [147] as well as studying the impact of cofactor perturbation and analyzing these  a Experimental data from [44,140]. b Described in [141]. c Described in [142]. doi:10.1371/journal.pone.0148031.t002 Integration, Upgrade and Validation of the P. pastoris' GEMs cofactor demands for cell growth, as done in other organisms [148] would be interesting approaches to consider in future studies. For the second dataset, (glycerol:methanol mixtures), specific growth rate, together with specific O 2 consumption and CO 2 production rates were predicted within a 11% of deviation, as shown Fig 4E, 4F and 4G. Similarly to the above described glucose tests, arabitol was only produced when ethanol was constrained to the experimental values; otherwise, ethanol is the preferred product of the stoichiometric model. As in the previous glucose case these results point to another possible level of regulation for arabitol production not included in the model, as without it ethanol production appears as the most efficient way to regenerate NAD + for maximum biomass production.
In order to compare the accuracy of our model with the previous existing models a set of simulations were performed for glucose and glycerol:methanol cultivations (S5 File). The same constrains were set to all the models and growth rate, CO 2 production and O 2 consumption (only in the glycerol:methanol simulations) were compared to the experimental values [61,66]. As shown in S5 File, iMT1026 can predict the evaluated macroscopic cultivation parameters more accurately, i.e. with smaller deviations from experimental data. Moreover, our model is also able to describe byproduct secretion under respirofermentative conditions.

Conclusions
In summary, a consensus GEM of the yeast P. pastoris integrating the three preexisting models has been obtained. Importantly, the new GEM, iMT1026, is more complete and includes a comprehensive revision and upgrading of several metabolic processes (e.g. fatty acid and sphingolipid metabolism, protein glycosylation and energy metabolism) based on new information emerged from recent literature. Furthermore, the new GEM has been validated using different sets of experimental data corresponding to a wider range of physiological states than previous GEMs. From our point of view this GEM improves the capabilities in terms of accuracy of predictions/simulations in relation to previous models. Overall, we provide an improved tool to the P. pastoris community for the physiological analysis and understanding  Integration, Upgrade and Validation of the P. pastoris' GEMs of this yeast. It is expected that on-going efforts in the functional (re)annotation of the P. pastoris genome will allow for further improvements of its GEMs by all the P. pastoris community. From a wider perspective, it also has to be pointed out the importance of curating and manually revising new GEMs of non-model organisms that are based on GEM scaffolds from related model organisms. Despite the comprehensiveness of these scaffolds, an exhaustive analysis of specific metabolic traits of the non-model organism is still essential to construct a GEM describes/predicts its metabolic phenotype accurately.