Bioenergetics-based modeling of Plasmodium falciparum metabolism reveals its essential genes, nutritional requirements, and thermodynamic bottlenecks

Novel antimalarial therapies are urgently needed for the fight against drug-resistant parasites. The metabolism of malaria parasites in infected cells is an attractive source of drug targets but is rather complex. Computational methods can handle this complexity and allow integrative analyses of cell metabolism. In this study, we present a genome-scale metabolic model (iPfa) of the deadliest malaria parasite, Plasmodium falciparum, and its thermodynamics-based flux analysis (TFA). Using previous absolute concentration data of the intraerythrocytic parasite, we applied TFA to iPfa and predicted up to 63 essential genes and 26 essential pairs of genes. Of the 63 genes, 35 have been experimentally validated and reported in the literature, and 28 have not been experimentally tested and include previously hypothesized or novel predictions of essential metabolic capabilities. Without metabolomics data, four of the genes would have been incorrectly predicted to be non-essential. TFA also indicated that substrate channeling should exist in two metabolic pathways to ensure the thermodynamic feasibility of the flux. Finally, analysis of the metabolic capabilities of P. falciparum led to the identification of both the minimal nutritional requirements and the genes that can become indispensable upon substrate inaccessibility. This model provides novel insight into the metabolic needs and capabilities of the malaria parasite and highlights metabolites and pathways that should be measured and characterized to identify potential thermodynamic bottlenecks and substrate channeling. The hypotheses presented seek to guide experimental studies to facilitate a better understanding of the parasite metabolism and the identification of targets for more efficient intervention.

We inferred the function of the mitochondrial BCKDH-PDH in P. falciparum using as reference the experimental characterization of this complex in P. berghei. We performed a bidirectional homology measurement of the protein sequences in P. falciparum and P. berghei with the RAVEN Toolbox [1] and we have inferred the orthologous genes in P. falciparum with the enzymatic function of BCKDH-PDH complex (Table SII). Based on of the BLASTp, the sequence similarity between the orthologous sequences is lower than 1E-172 for the four genes involved in the BCKDH-PDH complex, which suggests this function is also present in P. falciparum. Table SI. Experimental annotation of mitochondrial BCKDH-PDH in P. berghei as defined by Oppenheim et al. [13].  Compartmentalization. The compartmentalization process of iPfa involved the localization of enzymes in five intracellular compartments (i.e. the cytosol, the apicoplast, the mitochondrion, the endoplasmic reticulum and the nucleus) and the definition of transport reactions between these compartments. Nearly 25% of the enzymes in iPfa were localized based on experimental evidence. The information on experimental localization of enzymes in P. falciparum was obtained from the online databases ApiLoc [14] (last time updated in 2011) and from MPMP [15] (last time accessed in March 2015). The remaining 75% of the enzymes were assigned to the intracellular compartments based on localization scores. These scores compare the protein sequences to be localized with the amino acid sequences that target the proteins to the specific intracellular compartments, such as the bipartite targeting sequence for the apicoplast in P. falciparum [16]. The software used to generate the localization scores were TargetP [17] (version 1.1), MitoProt II [18] (version 1.101), and ApicoAP [19] (version 2). The input to these software was the FASTA file obtained from PlasmoDB (version 11.1, strain 3D7) containing the information on the protein sequences of P. falciparum, as described in the generation of a draft metabolic network. The experimental localization and localization scores obtained for the enzymes and the associated genes in iPfa are summarized in S7 Table. P. falciparum has other intracellular compartments (e.g. digestive vacuole, Golgi apparatus) that were not explicitly included in iPfa. These compartments were a priori considered in the reconstruction of iPfa. However, very few enzymes in iPfa (i.e., four enzymes in the digestive vacuole and one enzyme in the Golgi apparatus) were experimentally localized in these compartments (S7 Table). We then decided to include the corresponding reactions in the cytosol. For example, some of the reactions that occur in the digestive vacuole, such as the hemoglobin digestion/degradation, are included in iPfa and located in the cytosol. Other processes in this organelle, such as the accumulation of hemozoin and Ca 2+ , are not considered in this study since a (quasi)-steady state condition for the metabolites should be satisfied within flux balance and thermodynamics flux analyses.
Intracellular transport reactions allow the exchange of metabolites between intracellular compartments. Under the absence of information or experimental evidence about the transport reactions in P. falciparum [20], we defined transporters for all cytosolic metabolites that were also present in other intracellular compartments, only if these metabolites fulfilled three conditions: they were not phosphorylated, they did not present a Coenzyme A (CoA) and neither a [acyl-carrier protein] (ACP) moiety attached (following the same considerations that had been applied in previous studies, such as ToxoNet1 [21]). In these cases, the transport mechanism considered was simple diffusion. We allowed the transport of phosphorylated molecules that has been previously characterized in P. falciparum, such as the transport of phosphoenolpyruvate and glycerone phosphate into the apicoplast [22].
Biomass definition. The biomass reaction in a genome-scale metabolic model (GEM) comprises all major biomass components of the cell. We defined in the biomass reaction of iPfa five major biomass macromolecule groups, i.e., DNA, RNA, Protein, Lipids and Carbohydrates and others. We defined the metabolic precursors or biomass building blocks and their quantity in each macromolecule group based on available experimental data in the literature. The frequency of DNA and RNA nucleotides and amino acids in P. falciparum has been experimentally characterized and reported [23]. In the same way, we obtained the frequency of the lipid components in the parasite, Table 1 in [24] and Table 2 in [25]. The distribution of free fatty acids was estimated from available experimental results [25,26]. The biomass precursors for which no experimental quantification in P. falciparum was found were grouped in Carbohydrates and others. They represent nucleotide sugars, polyamines, prosthetic groups, coenzymes and ions, which are commonly defined as essential biomass components [27]. In this case, we calculated their frequency assuming a distribution proportional to the number of carbons present in their molecular composition. We did not define other putative biomass precursors like glycoproteins, which are part of the GPI-anchors, because detailed knowledge about their production from the sugars nucleotides was not found for P. falciparum. Instead their precursors, i.e. the nucleotide sugars, where defined as biomass components.
When it was not specified, the frequencies obtained from the different sources were assumed to be in %-mol/mol. All frequencies were normalized to calculate the final stoichiometric coefficients, as defined in standard reconstruction protocols [28]. Growth associated maintenance (GAM), which accounts for non-metabolic processes, was calculated based on the biomass composition, as commonly performed [28].
Definition of the available substrates in iPfa: uptakes and secretions. iPfa has an in silico rich medium composed of 236 potential substrates. The medium in iPfa was primarily compiled based on the general assumption about the transportability of the metabolites in the cytosol: unless evidence was found to define a specific transport mechanism, we allowed the transport of all molecules that do not incorporate phosphate, coenzyme A (CoA) and acyl-carrier proteins (ACP) in their molecular structures, and we assumed that transports occur through passive diffusion. We further included in the medium metabolites that were not present in the cytosol of the draft metabolic network, and that are part of the the RPMI 1640 media formulation [2] and the serum composition [3]. These metabolites were accounted in our gap-filling approach to identify precursors of the biomass building blocks. Such an approach allowed the definition of precursors like pantothenate, thymidine and succinate in the in silico rich medium of iPfa.
In addition, iPfa integrates available experimental data on uptake and secretion rates in P. falciparum trophozoites. The efflux values integrated included the maximum uptake rate of L-isoleucine (set to 0.053 mmol/h-gDW cell, as calculated from the value 553 ± 27 µmol/(10 12 cells per hour) [8]) and of D-glucose (set to 0.62 mmol/h-gDW cell, as calculated from the value 120 ± 34 µM glucose/(10 9 parasitized red blood cells per 24 hour) [7]), and the maximum secretion rate of lactate (set to 0.76 mmol/h-gDW cell, based on the value 143 ± 47 µM lactate/(10 9 parasitized red blood cells per 24 hour) [7] and consistent with previous data [29,30]). To calculate these values we assumed a cell dryweight for P. falciparum of 1.05•10 -11 gDW/cell [23], which represents around 30% of the erythrocyte dry-weight [31].
The predicted growth rate in iPfa is 0.16 h -1 after integration of the L-isoleucine uptake rate. This value is comparable with the growth rates derived from the doubling time in P. falciparum trophozoites [32]. The number of malaria parasites in the blood stages increases exponentially [33], hence Eq 1 can be applied to estimate the growth rate of P. falciparum in the blood stages.
where N(t) and N(0) is the number of cells at time t and time 0 and gr is the growth rate. The coefficient of N(t) and N(0) is two in the calculation of the doubling time.
For a doubling time of 12.3 h [32], the specific growth rate of P. falciparum in the blood stages is around 0.06 h -1 . In this study, we assumed a broader physiological range of growth rate, i.e. between 0.05 and 0.14 h -1 . This range accounts for the development of 14 and 32 merozoites within 24 or 48 hours as observed in the in vitro cultivation of P. falciparum during one cycle of erythrocyte infection.
Thermodynamic curation of iPfa. Data integrated: pH, membrane potential and concentration data. Statistics. The Gibbs free energy of a reaction (∆ r G') is a measure of its driving force and determines thermodynamically feasible direction under which the reaction can operate, defined as reaction directionality [34][35][36]. We integrated the thermodynamic properties of the metabolites and reactions in iPfa in the form of thermodynamic constraints following the systematic approach defined within the framework of Thermodynamics-based Flux Analysis (TFA) [4][5][6]. Thermodynamic constraints determine the feasible range of ∆ r G' and hence reduce the uncertainty in the reaction directionalities and with it the feasible solution space that is characteristic of highly underdetermined problems like the analysis of metabolic networks with Flux Balance Analysis (FBA). These constraints in iPfa accounted for the intracellular conditions, like the pH, the membrane potential and the intracellular concentration ranges of the metabolites. TFA further allowed the integration of experimentally measured concentration ranges of metabolites [37][38][39][40].
We defined a pH of 7.3 in the cytosol, based on the pH value and the membrane potential reported in [41,42]. Under the absence of information, we assumed the same pH value in the endoplasmic reticulum and the nucleus. We estimated a pH of 7.5 in the mitochondrion of P. falciparum, based on measurements in Saccharomyces cerevisiae [43] and the similar mitochondrial membrane potential that both organisms present, as argued below and as it has been hypothesized before for similar pathogens like Trypanosoma cruzi: "it is able to build up and retain a membrane potential of a value comparable with that of mammalian mitochondrion" [44]. We defined a pH value of 6.8 in the extracellular medium, i.e., the parasitophorous vacuole or in silico rich medium in iPfa, which was obtained from interpolation of Fig 9 in [42] for a ±95 mV membrane potential, and it is consistent with the statement in [45] that the extracellular pH should be higher than 6.5. We defined a membrane potential of ±95 mV for the extracellular membrane of the parasite, as measured in [42]. Based on the experimental studies in P. berghei and P. yoelii trophozoites [46,47] and Toxoplasma gondii tachyzoites [48], the membrane potential in the mitochondrion of Plasmodium parasites should be higher than ±150 mV. We assumed a membrane potential of ±180 mV for the mitochondrial membrane, which is the value in S. cerevisiae [49].
We created a thermodynamically curated version of iPfa where we allowed the concentration of every metabolite to vary between 1 µM and 50 mM, which is the physiological range used in similar TFA studies [4,50]. The extracellular metabolite concentrations were allowed to vary between 0.01 µM and 100 mM. We also generated various thermodynamically curated versions of iPfa integrating one-at-a-time each of the ten metabolomics data set considered in this study [37][38][39]. Nine metabolomics data sets were measured with NMR: eight were obtained from different isolates of P. falciparum trophozoite-infected red blood cells [38] and one from isolated trophozoites [37]. The remaining data set was measured with LC-MS in isolated trophozoites [39]. The physiological range of concentration (defined above) was considered for a metabolite, if no data was available in the metabolomics sets. For the metabolites present in more than one intracellular compartment of iPfa, the same concentration range was defined in all of these compartments. We also generated a combined metabolomics data set [37][38][39][40], where one unique concentration range was calculated for each metabolite appearing in multiple data sets. This unique concentration range comprised all the measured concentration values (Fig SI A and B). The absolute concentration values and the concentration ratios of the metabolites determine the final ∆ r G'. For example, the range of the ratio NADP/NADPH in the combined metabolomics data set varies between 1.6 and 6.0 mol/L cell / mol/L cell .

Fig SI A and B. Absolute concentration ranges in logarithmic scale (LN of mol/L of cell) of the metabolites in the combined metabolomics data set.
We calculated the thermodynamic properties of the compounds in iPfa using the group contribution method (GCM) [5,6]. In this study, the R groups of large fatty acids and phospholipids were not specified and hence their thermodynamic properties were not considered.
In addition, we checked all reactions for correct atom balancing at the pH considered in the corresponding intracellular compartments. Overall in iPfa, thermodynamic properties were calculated for 70% of the total number of enzymatic and transport reactions and nearly 84% of them did not present pre-assigned directionality. The pre-assigned directionalities in iPfa are based on the directionalities defined in the KEGG database, which were reviewed using the enzyme database BRENDA [51]. We additionally prevented the hydrolysis reactions ( thermodynamic constraints due to the connectivity of the metabolic network and the growth requirement, as discussed in previous studies on network thermodynamics [34]. Metabolic tasks and gap-filling. Metabolic tasks are defined in this study as the production of biomass building blocks [1,21]. The draft metabolic network of iPfa did not contain all the necessary reactions that allowed simulation of growth. This is a common situation in the reconstruction process of GEMs [28] and requires the inclusion of reactions without gene association in a process called gap-filling [52]. We evaluated each metabolic task in iPfa and performed gap-filling when a metabolic task was not feasible, i.e. when a biomass building block could not be produced in iPfa. We developed a gap-filling approach based on a mixed-integer linear programming (MILP) formulation to look for alternative groups of minimal number of reactions (borrowed from another metabolic network) that determined the feasibility of all metabolic tasks. This formulation allows the integration of thermodynamic constraints.
In the draft metabolic network of iPfa 24 of the 73 total metabolic tasks were not feasible. We used ToxoNet1 [21] to gap-fill 15 of these metabolic tasks. The remaining 9 metabolic tasks were gap-filled with reactions from the KEGG-based metabolic network as of 2015, which comprises all to date known enzymatic reactions in any organism [11]. These gapfilling reactions and alternatives are reported in S1 Table. Part 2. Analyses of iPfa Studies on essential genes and reactions. Following standard procedures, single reaction and gene knockout simulations were performed in iPfa; the reactions and genes were deleted one-by-one and growth was simulated. In a similar fashion, double gene knockouts implied the pairwise deletion of non-single essential genes. Reactions or genes were defined as essential when simulation of their knockout resulted in a growth rate smaller than 10% of the optimal predicted value, i.e., 0.16 h -1 . This value corresponds to a 90% reduction of the maximum specific growth rate predicted with iPfa when experimental efflux values are integrated (see the step on the definition of uptakes and secretions in the reconstruction process of iPfa to learn about the specific growth rate of P. falciparum). The results of gene and reaction essentiality in iPfa reported in S2 Table and S3 Table were obtained using the in silico rich medium composed of 236 metabolites.
The optimal predicted growth value was the same in all scenarios, i.e. using FBA and TFA. The threshold of 10% used to identify essential genes and pairs of genes does not have any impact on the identification of essential genes and reactions in this study (see Fig SII  A and B), since no knockouts led to in silico growth reducing phenotypes. No additional filtering was applied to identify essential genes and pairs of genes.
The comparison of single gene and reaction essentiality allowed to understand what enzymes of iPfa rendered the genes essential (S2 Table). Two classes of genes were identified based on the essentiality of its associated reactions. The first class involves genes that are essential because one (or more) of its encoded enzymes is (are) alone indispensable for growth. In this case, the gene is essential in silico and one (or more) of its associated reactions is (are) also essential in silico. The second class involves genes that are essential because a group of its encoded enzymes is indispensable (and not any of its encoded enzymes alone). In this case, the gene is essential in silico and none of its associated reactions appears as single essential, which happens when the gene-toreaction associations is not one-to-one. In S2 Table the reactions and enzymes that are responsible for the essentiality of the in silico essential genes are presented. This information is relevant because genes are normally classified as essential or non-essential based on the outcome from experimental gene knockouts, but medical therapies are normally designed to target essential enzymes in the pathogens. Thus, it is of interest to know, which of the enzymes encoded by the essential gene performs the essential function in the metabolic network of the pathogen.

Fig SII A and B.
Predicted phenotype upon single (A) and double (B) gene knockout in iPfa. Predictions made with TFA with the combined metabolomics data set integrated.
Gene essentiality per metabolic task. This analysis allowed the identification of the metabolic tasks or biomass building blocks that could not be performed or produced, respectively, upon knockout of the essential genes. In addition to the analysis described in Methods, a MILP formulation was defined to identify the groups of building blocks that could not be produced at the same time due to stoichiometric requirements. The analysis was performed with FBA and TFA.
The study of the metabolic tasks using TFA provides additional understanding of the effect of the knockouts. TFA constrains further the metabolic network and thus there are less alternative pathways to fulfill a metabolic task. In turn, when thermodynamics is taken into account, the number of metabolic tasks impacted by the knockout of one gene might increase. This analysis also suggests a thermodynamic dependency between the metabolic subsystems.

Ranking of metabolites based on the Reduction of Uncertainty (RoU) in the Gibbs free energy of the reactions (∆ r G').
FBA defines mass balance constraints for each metabolite in the GEM and solves a linear problem to determine the intracellular fluxes. Thus, the FBA of a genome-scale model with a higher number of reactions than metabolites is underdetermined and has an inherent uncertainty in its solution, referred to as feasible solution space [53]. TFA, with the integration of thermodynamic constraints reduces this space and allows only the thermodynamically feasible solutions. However, there is still an uncertainty or feasible range in the fluxes and other thermodynamic properties like the ∆ r G'. The range in the ∆ r G' is directly related with the concentration ranges of the metabolites participating in the reaction.
In iPfa a reduction of uncertainty in the concentration ranges of the metabolites and hence in the ∆ r G' of the reactions was achieved by integrating metabolomics data (see the

A B
description of the thermodynamic curation of iPfa to know the concentrations used). Here, studies were performed on iPfa to identify the metabolites from the metabolomics data sets used that determined the highest reduction of uncertainty in the ∆ r G', referred to as Reduction of Uncertainty (RoU) and defined in Eq 2. The RoU identifies which reactions are impacted individually by each metabolite concentration, to which extent are these reactions impacted and hence which are the most impacting metabolites (S1 Dataset). This analysis is independent of the studies on bottleneck metabolites ( Table 2, Materials and Methods), and allows a ranking of the metabolites based on a different metric. Consistent with experimental observations [54], the results of both studies indicate that the phosphorylated nucleotides appear among the most impacting metabolites.
The RoU in ∆ r G' was calculated for a reaction i by comparing its range of ∆ r G' when physiological ranges were assumed for all metabolites (defined as ∆∆ r G' ref,i in Eq 1) with its range of ∆ r G' when the experimental concentration range of one metabolite j was integrated (∆∆ r G' [met j], i ). A global RoU was then calculated by adding the RoU i of all the reactions in iPfa to define the overall impact of each metabolite (S1 Dataset).
The metabolite concentration ranges of the combined metabolomics data sets were used for this study (see the description of the thermodynamic curation of iPfa to know the experimental data that comprised this set). The range of ∆ r G' was obtained with Thermodynamic Variability Analysis (TVA) [4], which follows the same principles as the Flux Variability Analysis (FVA) [55]. In TVA a growth rate of iPfa between the 80% of its optimal value and its optimal value, i.e., between 0.14 and 0.16 h -1 , was required (see the step on the definition of uptakes and secretions in the reconstruction process of iPfa to understand the source of these values).
This study falls within the scope of a well developed framework called Thermodynamicsbased Metabolite Sensitivity Analysis (TMSA) that has been developed by A. Kiparissides & V. Hatzimanikatis [56] for the ranking of metabolites that reduce the uncertainty of different parameters in the metabolic networks and allow a better definition of the internal states of the cells.

Studies of bottleneck metabolites and reactions impacted.
This study identified the metabolites responsible for the thermodynamic bottlenecks, which determined the directionality of a set of reactions and allowed the identification of eight genes as essential ( Table 2). iPfa was used with generic concentration ranges (1 µM -50 mM) or simultaneous integration of the experimental concentration ranges, and each of the eight genes was knocked out separately (S2 Table). These models were feasible with FBA, but not with TFA. A MILP formulation was defined to search for the minimal number of metabolites whose concentration ranges should be relaxed to make the model feasible in TFA. All the alternative solutions were obtained, and the minimal sets were formed by picking one metabolite from each alternative. The minimal set should involve at least one metabolite from each alternative. The metabolites that were shared among more alternatives appeared in the minimal sets.

Studies on in silico minimal media (IMM).
The IMM analysis was performed following the strategy defined before [21]. Here, TFA was applied with the combined metabolomics data set integrated in iPfa (see the description of the thermodynamic curation of iPfa to know the experimental data that comprised this set). In this analysis, the growth rate was set to 0.06 h -1 to study the nutritional requirements of normally growing P. falciparum (see the step on the definition of uptakes and secretions in the reconstruction process of iPfa to learn about the specific growth rate of P. falciparum). This means that the minimal nutritional requirements of iPfa were not underestimated (since a minimal growth was required) and were not overestimated (since growth was not allowed to reach extremely big and biologically irrelevant values).
The IMM analysis of iPfa suggested that iPfa requires as little as 23 substrates for growth and identified 10,032 alternative such IMM. There are 52 substrates that compose the alternative IMM and 16 of these substrates appear in all alternative IMM, here referred to as constitutive substrates. The remaining 36 substrates, here referred to as nonconstitutive substrates, combine in groups of size seven to form the alternative IMM. In this way, each IMM, which is defined by 23 substrates, is composed of 16 constitutive substrates and seven non-constitutive substrates. A priori one would expect all the combinatorial possibilities of 36 substrates in groups of size seven to form the alternative IMM: 36!/((36-7)!*7!) = 8,347,680 combinations. However, the relatively low number of alternatives found in the IMM analysis suggests that not all combinations of the 36 substrates are possible, which indicates that not all substrates can substitute for each other.

Identification of groups of substrates that can substitute for each other for growth.
Given the requirement of minimal utilization of substrates in the IMM studies, one would expect that some of the non-constitutive substrates that do not appear in the same IMM could substitute for each other. We created groups of non-constitutive substrates that did not belong to the same IMM and found that 26 of the 36 components were substitutable, suggesting that each of these components serve a specific biosynthetic requirement. However, ten can be substituted by one or more than one substrate in the IMM, which indicates that they serve multiple biosynthetic requirements, as discussed for S-adenosyl-L-methionine in the main text. This result demonstrated that the non-appearance of two substrates in the same IMM is a necessary but not sufficient condition to define the substrate substitutability. We next looked at the molecular structure of the 36 nonconstitutive substrates and identified the backbone moieties. Substrates were grouped if they never appeared in the same IMM and presented a common molecular substructure or backbone moiety.
To validate that the substrates with a backbone moiety were able to substitute for each other and support growth, we performed the following analysis: First, we tested that iPfa could not grow when all substrates in a group were removed from the in silico medium of 52 metabolites identified in the IMM. Second, we tested that under such conditions, the inclusion of each substrate of the group individually allowed simulated growth in iPfa. This analysis was also performed in the rich medium of 136 substrates to validate the groups of substrates that contained the three backbone moieties (the sources of carbon, phosphate and purine) indicated in Fig 3. The 10,032 alternative IMM can be regenerated by combining all the constitutive metabolites with one non-constitutive metabolite from each group reported in Table 2.
However, the following constraints should be considered: • There should be at least one molecule that provides a ribose as source of carbon.
In the IMM of iPfa, this ribose is provided by some molecules that also serve as source of pyridine ring or as source of DNA nucleotides (marked with * in Table 2). • The combination between orthophosphate and S-Adenosylmethioninamine is not possible.

Analysis of thermodynamic infeasibility in iPfa to utilize orthophosphate and S-Adenosylmethioninamine (SAM) as the only sources of phosphate and purines.
The TFA of iPfa with metabolomics data integrated suggested that P. falciparum cannot utilize orthophosphate and SAM as the only sources of phosphate and purines, respectively.
We designed two sets of analyses to investigate the reason for this observation. 1. Identification of biomass building blocks (BBBs) whose production is impacted and limit the parasite's growth. 2. Identification of thermodynamic bottlenecks responsible for the thermodynamic infeasibility of P. falciparum to grow on orthophosphate and SAM.
In these analyses orthophosphate and SAM were forced to be the only sources of phosphate and purines. This was done by blocking the uptake of the remaining substrates that are sources of these backbone moieties in the in silico rich medium, which are defined in Figure 3. We performed TFA with the combined metabolomics data set.

Analysis 1. Identification of biomass building blocks (BBBs) whose production is impacted and does not allow growth.
When orthophosphate and SAM were forced to be the only sources of phosphate and purines no biomass growth was predicted. We tested the production of each BBB at a time to identify the cellular processes that could not be fueled with these metabolites. Interestingly, all BBBs could be individually produced. In this case, no growth was predicted because some BBBs could not be produced at the same time in the needed stoichiometric amounts. This result suggests that all cellular processes can function individually when orthophosphate and SAM are the only sources of phosphate and purine. However, the utilization of orthophosphate and SAM does not allow the required distribution of resources (e.g., carbon, energy or redox power) between the metabolic pathways needed for growth.
We next aimed to quantitatively understand the impact on the BBB production when orthophosphate and SAM were utilized. We compared the production of each BBB when all sources of phosphate and purine were available (as defined in Figure 3) and when only orthophosphate and SAM were used. The maximum production of 22 BBBs was considerably reduced in the latter scenario (Table SIV A). The most impacted BBB was ATP. This result suggests that P. falciparum might not be able to grow on orthophosphate and SAM because these precursors do not allow enough production of ATP for the cellular processes required for growth.  We then investigated which BBBs could not be produced at the same time in the needed stoichiometric amounts. We developed a MILP formulation in which we aimed to minimize the number of BBBs whose production should be abolished to allow stoichiometric production of the remaining BBBs. The lower bound for the BBB demand reaction (LB) was set to 10% of the maximal production of each BBB (UB) at 0.06 h -1 : We identified three such sets of BBBs (Table IV B). Table IV B. BBBs whose production should be abolished to allow stoichiometric production of the remaining BBBs. All alternative solutions are reported. This result supports the previous observation of ATP limitation when P. falciparum grows on orthophosphate and SAM. And it further identifies the BBBs that are mostly impacted upon ATP limitation: the production of sphingomyelin, nucleotides and nucleotide sugars. The sets identified also indicate the parts of the metabolism that compete for the available resources such as ATP.

Alternative sets BBBs that imbalance production of biomass
Furthermore, we identified that an unlimited production of Ubiquinone-8 in the mitochondrion or Sphingomyelin in the cytosol would allow growth of P. falciparum on orthophosphate and SAM. In this case, the production of ATP and the nucleotides is increased and there is stoichiometric production of the BBBs. This result suggests that P. falciparum might use the electron transport chain to increase the ATP production.
Note: We also defined the MILP formulation such that the minimal (LB) and maximal (UB) bounds of the BBB demands equal the maximum production rate of the BBBs at 0.06 h -1 . In this case, 24 additional sets appear (compared with the sets defined in Table IV B). ATP is a component of all these sets and it is combined with BBBs such as CoA, FAD + , NAD + , NADP + , nucleotides, and nucleotide sugars. This result further supports the conclusion that there exits ATP limitation to produce biomass when P. falciparum grows on orthophosphate and SAM.

Analysis 2. Identification of thermodynamic bottleneck that are responsible for the thermodynamic infeasibility to grow on orthophosphate and SAM.
The infeasibility of iPfa to grow on orthophosphate and SAM was predicted with TFA and metabolomics data integrated. We applied the analysis of bottleneck metabolites to identify the metabolites responsible for the growth limitation.
Six bottleneck metabolites distributed in two sets of five bottleneck metabolites were identified (Table IV C). This result indicates that the concentrations of ATP together with other four metabolites render thermodynamic bottlenecks that do not allow growth on orthophosphate and SAM. The bottleneck metabolites identified participate in the production of phosphatidylcholine, sphingomyelin, nucleotides and nucleotide sugars. As suggested in the previous analysis (Table IV B), these metabolic processes might compete for the available resources. The results here presented further indicate that there might exist a thermodynamic coupling between the production of sphingomyelin, nucleotides and nucleotide sugars through the concentrations of the six bottleneck metabolites.
We next investigated where the thermodynamic bottlenecks were located in the metabolism of P. falciparum. We identified the reactions whose directionality was impacted upon integration of the bottleneck metabolite concentrations. We compared the directionality of the reactions in iPfa in two conditions: when the generic concentration ranges (1 µM -50 mM) were defined for all intracellular metabolites, and when the experimental concentration ranges of the bottleneck metabolites in each set was further integrated. The directionality of the reactions was identified through Thermodynamic Variability Analysis (TVA) [4] as explained in the Materials and Methods section. Both sets of bottleneck metabolites impacted the same reaction directionalities. In these analyses all sources of phosphate and purine were available (as defined in Figure 3).
We identified two reactions whose directionality changed upon integration of the bottleneck metabolite concentrations (Table IV D). The reactions impacted participate in the production of phosphatidylcholine and imply the consumption of ATP. Gene essentiality in the IMM. The results on essential genes that can become essential upon substrate inaccessibility reported in S5 Table were obtained by defining one-by-one the 10,032 alternative in silico minimal media (IMM), which are composed uniquely of 23 substrates. The composition of all IMM can be derived from the Table 2. The methods used for the identification of the essential genes and reactions in these analyses are reported in S2, S3 and S5 Tables. The studies on essentiality with TFA follow the same principles as with FBA and further account for thermodynamic constraints, which reduce the feasible solution space from FBA by allowing only thermodynamically feasible solution of fluxes (see the studies on Reduction of Uncertainty). The constraints in the reaction directionalities are responsible for the identification of the additional essential genes observed with TFA. At the same time, this implies that the genes identified as essential with FBA are always a subset of the genes identified as essential with TFA.
Although we expect that most of the genes that TFA predicted to be essential in the in silico rich medium (S2 Table) will also be essential in vivo, some of the genes identified as non-essential in this study could be essential under certain in vivo conditions. The model and the IMM analysis framework presented here can be used to further explore which substrates are not accessible by the cell or are not available in sufficient quantities to P. falciparum inside the host. For example, a future comparison of in vivo knockout highthroughput data and this essentiality analysis on the IMM will advance our understanding of the substrates that are accessible to the malaria parasite in the host.
Substrate channeling integration within TFA. Substrate channeling between two enzymes defines the process in which the intermediate produced by one enzyme is transferred to the next enzyme without complete mixing with the bulk phase [57]. This process might occur to facilitate the flux through a metabolic pathway. The presence of substrate channeling between two enzymes (E 1 and E 2 ) was simulated within the TFA framework. The reactions R 1 and R 2 catalyzed by E 1 and E 2 were lumped to define an overall reaction (L= R 1 +R 2 ). The overall reaction eliminates the common intermediates and recycled metabolites, since they are produced in the first reaction R 1 and consumed in the second reaction R 2 and vice versa, respectively. For example, if R 1 involves the transformation A + B ! I + C and R 2 is defined as I + D ! P + B, the final lumped or overall reaction L is defined as A + D ! P + C. The overall reaction L does not involve the intermediate metabolite I and the recycled metabolite B. It is important to note that reactions R 1 and R 2 should be first defined in the direction that satisfies mass balances and allows flux through the pathway of study. Then the lumped reaction can be formed. For example, if the pathway of study contains reactions R 1 and R 2 and normally the starting metabolite is A and the product of the pathway is P, the reactions R 1 and R 2 should be defined as A + B ! I + C and I + D ! P + B, respectively, and not as A + B ! I + C and P + B ! I + D. In this study, the reactions of the pyrimidine biosynthesis pathway and the Kennedy pathway were defined to allow production of UMP and PE, respectively. The overall reaction was then allowed to be bidirectional. Thermodynamic properties were calculated for the overall reactions and TFA and Thermodynamic Variability Analysis (TVA) [4] were performed to determine the thermodynamically feasible directionality of the overall reactions. Substrate channeling was suggested when the directionality of the overall reaction allowed flux through the biosynthetic pathway that would be otherwise thermodynamically infeasible. The directionality of the reactions was known from the sign of the range in the ∆ r G' obtained from TVA.
We hypothesize that substrate channeling might exist in three pathways (two of which are discussed in the main text): • It is important to note, that the presence of a channel in the Kennedy pathway, does not allow the correct identification of the gene PF3D7_0927900 as essential in iPfa using TFA with metabolomics data integrated. Instead, this gene appears in six additional synthetic lethal pairs (Table SV). The gene PF3D7_0927900 encodes a phosphatidyl serine carboxylase (E.C. 4.1.1.65) and has been observed as essential in the blood stages of P. falciparum using drugs [58]. We need to simulate different scenarios to identify PF3D7_0927900 as essential while channeling is present in the Kennedy pathway. For example, we can consider that there is a maximal allowable flux through E.C. 4.1.1.65 and the Kennedy pathway and therefore both pathways are required to maintain optimal growth. In addition, regulation might exist between these redundant pathways.
In the same way, the presence of a channel in the pyrimidine biosynthetic pathway implies that the backbone moiety in orotate and (S)-dihydroorotate can be synthesized by the cell. The uptake of this backbone moiety is then non-essential. Experimental studies have suggested that orotate uptake of in vitro extracellularly developing malaria parasites are a direct measurement of parasites growth [59]. The presence of orotate in the medium has been shown to inhibit the function of the DHOase enzyme [60]. This observation can be reflected in the model if we define a regulatory loop between orotate uptake and the pyrimidine biosynthesis pathway.
Using iPfa to study its metabolism in the blood and liver stages. iPfa can be implemented to study the metabolism of P. falciparum during the blood and liver stages.
Here, we suggest some modifications of iPfa that would render better context-specific predictions.
For modeling of blood-stage specific metabolism, the protein N6-(lipoyl)lysine in apicoplast (whose metabolite ID in iPfa is C16237_a) should be removed from the biomass. This metabolite is a precursor of the lipoyl-ACP, which is a cofactor of the pyruvate dehydrogenase complex in apicoplast, and its production requires the fatty acid synthesis II (FAS II) pathway in this compartment to be active. This pathway has been experimentally observed as dispensable during the blood stages but essential during the liver stages [61] (S2 Table).
For modeling the liver-stage specific metabolism, the uptake of hemoglobin (whose rxn ID in iPfa is T_c_to_e_C05781) should be prohibited and the reaction that represents the hemoglobin digestion (HBDG_c) should be blocked or erased, since this molecule is not supposed to be accessible inside the hepatocyte cell.

Part 3. Comparison between iPfa and the previous models of P. falciparum metabolism iTH366 [62] and PlasmoNet [63].
The model iPfa was compared at its initial and final reconstruction stages with previous models of P. falciparum metabolism. Here we provide the final comparison of the metabolic reactions in terms of E.C. identifiers (Fig SIII A), the BBBs (Fig SIII B) and the genes included (Fig SIII C).
The comparison of metabolic reactions shows that the models share a big set of enzymatic functions, i.e. 230 ECs. The model PlasmoNet involves the highest number of metabolic reactions, i.e. 475 ECs, followed by iPfa and iTH366. Since PlasmoNet does not include genes, it was not possible to perform a critical evaluation of the missing ECs in iPfa. If the unique metabolic reactions to PlasmoNet were not orphan, we would have considered their integration in iPfa. The unique metabolic reactions to iTH366 were not included in iPfa because they were orphan reactions or they were associated to genes with uncertain annotation.
The genes in iPfa could only be compared with iTH366, since no genes are provided with PlasmoNet. The models iPfa and iTH366 share a big set of 245 genes. The previous model iTH366 includes 122 genes that were carefully evaluated at the initial stages of the reconstruction of iPfa (July-December 2014). These genes were not included in iPfa because they were associated to putative annotations in PlasmoDB and no reference in the literature was found to support the suggested annotation.
We further compared iPfa with the previous models in terms of ad hoc reaction directionalities, blocked reactions and essential genes (Table SVI). The models iPfa and iTH366 share similar characteristics in terms of blocked reactions and essential genes. This occurs although iPfa is much more relaxed in terms of substrate availability, metabolite transportability and ad hoc reaction directionalities. To this end, iPfa represents an ideal scaffold for the integration of context specific information and the identification of an increased number of essential genes for specific life stages.