Genome-scale metabolic model of the diatom Thalassiosira pseudonana highlights the importance of nitrogen and sulfur metabolism in redox balance

Diatoms are unicellular photosynthetic algae known to secrete organic matter that fuels secondary production in the ocean, though our knowledge of how their physiology impacts the composition of dissolved organic matter remains limited. Like all photosynthetic organisms, their use of light for energy and reducing power creates the challenge of avoiding cellular damage. To better understand the interplay between redox balance and organic matter secretion, we reconstructed a genome-scale metabolic model of Thalassiosira pseudonana strain CCMP 1335, a model for diatom molecular biology and physiology, with a 60-year history of studies. The model simulates the metabolic activities of 1,432 genes via a network of 2,792 metabolites produced through 6,079 reactions distributed across six subcellular compartments. Growth was simulated under different steady-state light conditions (5–200 μmol photons m-2 s-1) and in a batch culture progressing from exponential growth to nitrate-limitation and nitrogen-starvation. We used the model to examine the dissipation of reductants generated through light-dependent processes and found that when available, nitrate assimilation is an important means of dissipating reductants in the plastid; under nitrate-limiting conditions, sulfate assimilation plays a similar role. The use of either nitrate or sulfate uptake to balance redox reactions leads to the secretion of distinct organic nitrogen and sulfur compounds. Such compounds can be accessed by bacteria in the surface ocean. The model of the diatom Thalassiosira pseudonana provides a mechanistic explanation for the production of ecologically and climatologically relevant compounds that may serve as the basis for intricate, cross-kingdom microbial networks. Diatom metabolism has an important influence on global biogeochemistry; metabolic models of marine microorganisms link genes to ecosystems and may be key to integrating molecular data with models of ocean biogeochemistry.


Introduction
Diatoms are unicellular photosynthetic eukaryotes derived from a secondary endosymbiotic event when a heterotrophic eukaryote engulfed a red algal cell and acquired a plastid [1]. They electron sink under these conditions. Next, we simulated growth in a nitrate-limited batch culture with dynamic Flux Balance Analysis (dFBA) using experimental biomass composition and PSII flux measurements from Liefer, et al. [25,26]. We found that sulfate reduction takes over the role of nitrate under these conditions. When biomass production is inhibited due to nutrient limitation, redox imbalances can be corrected by the excretion of organic carbon and sulfur compounds.

Network reconstruction and curation
A genome-scale metabolic model of Thalassiosira pseudonana CCMP 1335 was generated using iLB1027_lipid (the model of Phaeodactylum tricornutum CCAP 1055/1, [27]) as a starting point, based on similarities between the two diatoms. The T. pseudonana nuclear proteome was acquired from a dataset produced by Gruber, et al. [28], in which previous open reading frames (ORFs) were improved by ensuring that each gene starts with 'ATG', encodes an uninterrupted reading frame that ends in a stop codon, is less than 10 kb in length, and has EST support. The ORFs used in a T. pseudonana network reconstruction for BioCyc (including plastid and mitochondrial proteomes) were retrieved as well [29]; these proteins were re-annotated in 2012 using the JGI annotation pipeline for eukaryotes [30]. The P. tricornutum chromosomal proteome was downloaded from EnsemblProtists (ASM15095v2), the plastid and mitochondrial proteomes were downloaded from NCBI (acc no.: NC_008588.1, HQ840789.1). The authors of iLB1027_lipid provided a gene ID conversion table that we used to update the gene IDs in the published model. OrthoMCL [31] was used with the default 50% match and 1e-5 E-value cut-offs to identify gene orthologs of P. tricornutum and the two sets of T. pseudonana ORFs. A network of T. pseudonana reactions was generated by retaining reactions in iLB1027_lipid that contained gene orthologs from T. pseudonana and deleting reactions with no gene ortholog, except for spontaneous reactions. Protein localization predictions were performed on both T. pseudonana ORF sets (see Subcellular Protein Localization, below). BiGG IDs [32] were used for reactions and metabolites in the T. pseudonana network, and these were assigned to one of six different compartments: cytosol ('c'), mitochondria ('m'), peroxisome ('x'), plastid ('h'), thylakoid lumen ('u'), or endoplasmic reticulum ('r'). We implemented the guidelines from an established genome-scale reconstruction protocol [33] to refine the T. pseudonana model. All genes from Gruber, et al. [28] with orthologs in the reconstruction, all genes assigned to a reaction in BioCyc [29], and all T. pseudonana genes without an ortholog in P. tricornutum were annotated with InterProScan [34] and those annotations were used to verify gene-protein-reaction associations, and to detect missing genes, reactions, and pathways in the model. We used KEGG [35] and BioCyc [29] databases to aid in model curation and to make comparisons between organisms. The literature on T. pseudonana was examined for experimental evidence for the existence of different reactions and for protein localization data (see references and notes in S1 Dataset). When experimental protein localization data was available, it superseded the subcellular localization prediction. For each reaction, mass and charge balance were verified and links to external databases were added for each reaction and metabolite. TransportDB 2.0 [36] was used to generate a list of transporter gene annotations; transport reactions were added to the model in cases where the database included associated substrates with the annotation. Extracellular transport reactions were also added in cases where there is experimental evidence that a substrate is excreted or utilized by T. pseudonana or other diatoms (see references and notes in S1 Dataset).
Dead-end metabolites are metabolites present only in blocked reactions; blocked reactions cannot carry flux due to reactions missing in the network and dead-end metabolites cannot be produced by the model. These reactions were identified with the flux analysis module in COBRApy and the gapfilling module was used to identify gaps in the network [37]. Gaps were filled if a gene for the missing reaction could be identified, if there is physiological evidence that the reaction exists, if the majority of the pathway was otherwise present in the model, or if the reaction was required to produce biomass. In T. pseudonana, we first checked whether the reaction was present in another compartment and if there was any evidence that the subcellular localization prediction was too stringent (e.g., a low confidence prediction may be more likely based on the localization of other reactions in the pathway), if there is the possibility of dual-targeting, or if a different gene with the correct localization for the reaction could be identified. Occasionally, the JGI ORFs (rather than the ORFs from the Gruber proteome [28]) provided the missing gene for a reaction, or a more likely subcellular localization prediction. If there was no evidence that a protein in the model was incorrectly targeted, then transport reactions between compartments were added to connect the network. To identify and remove energy-generating cycles (EGC), we followed the method proposed by Fritzemeier, et al. [38], by using the GlobalFit algorithm [39] to suggest the minimum number of changes required to remove an EGC. The algorithm found that the reaction transporting water between the cytosol and the mitochondria ('H2Ot_m: h2o_c < = > h2o_m') and the ITP-apyrase reaction ('ITPA_c: h_c + idp_c + pi_c-> h2o_c + itp_c') created energy-generating cycles. EGCs were removed by setting the lower bound of H2Ot_m to zero and by setting the upper bound of ITPA_c to zero.
Broddrick, et al. [40] published a list of twenty-nine modifications to the P. tricornutum model in 2019. We evaluated whether these modifications should also apply to iTps1432 and adapted our model accordingly (S3 Dataset). iTps1432 is available in SBML format as S1-S3 Files and has been deposited in the BioModels database (acc no.: MODEL2010230001-3).

Subcellular protein localization
The protein localization pipeline developed by Levering, et al. [27] was updated for the T. pseudonana analysis (https://github.com/hmvantol/localization_pipeline). For plastid targeting predictions, TargetP [41] was replaced with ASAFind [28], a plastid proteome prediction tool developed for diatoms and other algae with plastids derived from a secondary endosymbiosis. All T. pseudonana proteins were used as input for SignalP [28], predictNLS 1. 3 [45], and scanProsite [46]. PredictNLS, a tool for predicting nucleus targeted proteins, was run in batch mode using a script that re-implements predictNLS 1.3 in Python (https://github.com/peterjc/pico_galaxy/ tree/master/tools/predictnls). ScanProsite was run to search for two peroxisomal targeting signals "[SAC]-[KRH]-[LM]>" and "S-S-L>" [47] and the PROSITE pattern PS00342 describing microbody C-terminal targeting signals, as well as the endoplasmic reticulum (ER) targeting signal "[KD]-[DE]-E-L>" [41,48] and the PROSITE pattern PS00014 describing other endoplasmic reticulum targeting sequences. All other programs were run with default settings. ERtargeted proteins were defined as 'Not plastid, SignalP positive' or 'Plastid, low confidence' by ASAFind and contained an ER-targeting signal identified by scanProsite. Plastid-targeted proteins include those identified by ASAFind as 'Plastid, high confidence' or 'Plastid, low confidence' with no recognized ER targeting signal. Mitochondria targeted proteins are SignalP negative and predictNLS negative and match one of the following criteria: (A) have a Mitoprot II score > 0.9, (B) have a Mitoprot II score > 0.8 and are mitochondria targeted according to HECTAR or have a mitochondrial targeting peptide according to TargetP, or (C) are predicted to be mitochondria targeted by HECTAR and have a mitochondrial targeting peptide according to TargetP. Peroxisome targeted proteins are SignalP negative, predictNLS negative, not mitochondria targeted, and contain a peroxisome targeting signal according to scanProsite. Proteins in the plastid and mitochondrial genomes were assigned to reactions in the plastid and mitochondria, respectively. All remaining proteins were assigned to the cytosol. A total of 408 sequences from the optimized gene catalog could not be run through this subcellular localization pipeline despite curation by Gruber, et al. [28] because they have internal stop codons or were either too long or too short for some of the programs.

Mechanistic model of light-harvesting
Using the Synechococcus elongatus model iJB785 [49] as a guideline, stoichiometric reactions were generated to represent light harvesting in T. pseudonana. The pigment weight and composition [50,51] of cells acclimated to each light-level were used in combination with the weight-specific absorption spectra for each pigment [52,53] to calculate the relative absorption of each pigment within 20 nm bins in the photosynthetically active radiation range (PAR range; 400-700 nm). Excitation energy transfer reactions were generated to account for energy loss in the transfer of excitation energy from different pigments in the fucoxanthin-chlorophyll a/c binding proteins (FCPs) to chlorophyll a in the reaction centers. Photon absorption was constrained for each wavelength using the absorption spectrum of T. pseudonana cells acclimated to different light levels [54][55][56] and the light intensity spectrum of a cool white fluorescent bulb according to the methodology in Broddrick, et al. [49]. We also extended the PSI and PSII reactions to include charge separation and recombination, as in iJB785. Photodamage of the D1 subunit was included as a component of the PSII reaction [49,57], and the metabolic cost of D1 repair was included as part of a non-growth associated ATP maintenance reaction as ATP-cost of phosphorylation and activation of the FtsH protease [58] and as ATP-and GTP-costs of biosynthesizing a D1 peptide. See S3 Dataset for calculations and references.

Flux balance analysis
Flux balance analysis (FBA) simulates the flow of metabolites through a network of reactions using the mass balance equation, where the change in metabolite concentration (dx) over time (dt) is equal to the stoichiometric matrix (S) describing a reaction network times a vector of fluxes (v), given that Intracellular metabolites are assumed to be at steady state, which allows v to be solved while maximizing an objective function within lower and upper bounds (LB, UB).
We used parsimonious Flux Balance Analysis (pFBA) to generate single solutions for each simulation. pFBA optimizes the model objective and then minimizes the total sum of flux to obtain a single solution. The second objective imitates a possible cellular objective of minimizing protein biosynthesis. The default GLPK solver in COBRApy [37] was used to optimize all FBA problems.

Simulation of steady-state light limitation
pFBA was used with the solver tolerance set to 1e-8 to simulate growth of iTps1432 in three chemostats maintained at three different light levels (5,60,200 μmol photons m -2 s -1 ) using photosynthetic production measurements from Fisher & Halsey [24] as constraints. The delivery rate of each nutrient was calculated based on the nutrient concentration in the media reservoir (0.25 mM NO 3 , 0.050 mM PO 4 , 0.106 mM Si(OH) 4 , 28.8 mM SO 4 ), the chemostat cell concentration, the gram dry weight per cell calculated from biomass composition, and the chemostat dilution rate. Cultures were continuously bubbled to avoid carbon limitation; we assumed the ambient CO 2 levels to be 390 ppm pCO 2 and calculated the resulting concentrations of CO 2 and HCO 3 in the media [57]. Michaelis-Menten parameters were calculated from the literature on T. pseudonana for CO 2 and HCO 3 [59]. The PSII reaction was constrained by fitting the Platt equation [60] (where P s and α are parameters of the hyperbolic tangent) GPP ¼ P s � ð1 À e À aI=P s Þ to each gross photosynthesis (GPP) curve generated by Fisher and Halsey ([24], data provided by K. Halsey) and calculating the 95% confidence interval of gross photosynthesis at each light level.
Photon absorption flux (PFA) was constrained for each 20 nm in the PAR spectral range using the method published in Broddrick, et al. [49], where the term r l 20 is the fraction of the light source's photon flux at a particular wavelength integrated within a 20 nm bin, I is the light intensity (μmol photons m -2 s -1 ), and a l 20 is the absorption of light at that wavelength (m 2 gDW -1 ) [54][55][56]. The linear relationship between irradiance and D1 protein damage in T. pseudonana [57] was used to calculate the number of inactivation events per photon at each light level. The quantum efficiency values measured by Fisher & Halsey [24] were used to convert the number of inactivation events per photon to events per oxygen molecule evolved at PSII (S3 Dataset).
For these simulations, sink reactions for chrysolaminarin and glyceraldehyde-3-phosphate were added to the model to simulate the respiration of a transient carbon pool. Utilization of these pools was calculated as the difference between gross and net carbon production [24]. Measurements of dissolved organic carbon were used to constrain a symbolic expression representing the sum of organic carbon secretion. Light dependent respiration and mitochondrial maintenance respiration were calculated by Fisher & Halsey for each chemostat [24]. Light dependent respiration is defined as the difference between 18 O 2 signals in the light and the dark and includes all light-driven respiration reactions (the reduction of oxygen to water). We constrained the sum of all respiration reactions with a symbolic expression in which the lower bound is equal to the measured light dependent respiration values. Mitochondrial maintenance respiration is defined as the difference between gross carbon production and net oxygen production (converted to C units) and includes all organic carbon driven respiration reactions. We constrained the sum of all CO 2 producing dehydrogenase reactions in the mitochondrial TCA cycle with a symbolic expression in which the lower and upper bounds are equal to the 95% confidence intervals of calculated mitochondrial maintenance respiration values. A Jupyter notebook of this analysis can be viewed at https://github.com/hmvantol/diatom-FBAnotebook.

Simulation of N-starvation
A dynamic FBA simulation was set up with the solver tolerance set to 1e-8 to simulate the progression in a batch culture of T. pseudonana from mid-exponential under NO 3 -limitation to mid-stationary phase under N-starvation using biomass composition and PSII photochemistry measurements from Liefer et al. [25,26]. Prior to the start of the experiment, cells were maintained at 85 μmol photons m -2 s -1 over a 12h: 12h light: dark cycle at 18˚C in filtered seawater amended with half the concentration of all f/2 nutrients except for NO 3 , which was reduced by~15 fold to 60 μM. At mid-exponential phase, the cells were diluted into fresh media with no added NO 3 [25,26] and measurements were taken 0, 1, 3, 7, and 10 days after the dilution to "N-free" media, corresponding to mid-exponential, late-exponential 1, lateexponential 2, early stationary, and mid-stationary growth phases, as defined in Liefer et al. [25]. Nutrient concentrations in the seawater used for the media were not presented. We assumed a background seawater nutrient concentration of 20 μM NO 3 , 0 μM Si(OH) 4 , and 1.86 μM PO 4 . This assumption fit well with most nutrient measurements made 1 day after transfer to the new media and with estimates of nutrient concentrations in surface seawater at Cape Tormentine (Canada) from the World Ocean Atlas (~28 μM NO 3 ,~47 μM Si(OH) 4 , 2 μM PO 4 ).
To simulate the ten-day experiment, the growth period was divided into eighty 3-hour intervals. Steady-state was assumed for each time interval and pFBA was used to solve for the growth rate and reaction flux distributions every 3 hours. Units were converted from mmol to μmol prior to the pFBA step to avoid numerical precision issues associated with small fluxes. Each time point of the simulation involved four major steps: (1) calculate the biomass objective function, (2) calculate the uptake rates for each metabolite or nutrient and set photosynthetic constraints, (3) solve the pFBA problem, (4) update biomass composition and the environment.
The biomass objective function (bof) used for each time interval was determined by calculating the relative difference between the simulated biomass composition at the current time point (t 1 ) and the next biomass composition measurement (t m ) provided by J. Liefer.
where each measured biomass component i is translated from pg/cell (bio i (t m ) 0 ) into g/L (bio i (t m )) using cell abundance measurements (c(t 2 )), and typically f = 1. During the dark period, pigment biosynthesis was not included in the biomass objective function since T. pseudonana lacks a light-independent protochlorophyllide oxidoreductase [61]. Chrysolaminarin and triacylglycerides were also excluded from the dark period biomass objective function because these biomass components are known to be consumed at night [62]. Production of these compounds was increased during the light period by setting f to the fold-change values measured by Jallet, et al. in P. tricornutum grown across a 12: 12-h light: dark cycle [62]. During time points where all Δbio i components are < 0, the objective function was switched to the ATP maintenance reaction (ATPM). At each time point, the upper bound of ATPM was recalculated for changes in mg chl a/gDW using the r NGAM equation [63]. Individual pigment components (total chlorophyll, chlorophyll a, and total carotenoid) were also measured over the course of the experiment [26]. Data from the literature shows that the metabolites making up bulk biomass components shift in relative abundance over the course of N-starvation and during growth in batch culture. These were similarly tracked and used as targets to construct new biosynthesis reactions for each time point. Protein [64] and free amino acid [65] composition measurements from nitrogen-replete and nitrogen-starvation conditions were used to calculate biomass targets for each growth phase; nitrogen-replete composition was applied as a target for mid-exponential and late exponential phase 1, while nitrogen-starved composition was applied as a target to late exponential phase 2, early and mid-stationary phases. Membrane lipid [66] and triacylglyceride [67] composition was measured during exponential, transition, and stationary phases in nitrogen-limited conditions. We used these measurements to calculate biomass targets for each growth phase: exponential composition was used as a target for mid-exponential and late exponential phase 1; transition phase composition was used as a target for late exponential phase 2; and stationary phase composition was used as a target for early and mid-stationary phase. See S4 Dataset for all calculations and references. Using the method described by Chiu et al. [68] and published by Noecker et al. [69], we determined the uptake limit for each metabolite j at each time point t, such that the lower bound of each exchange reaction is equal to either the flux predicted by the Michaelis-Menten equation or the concentration of each metabolite x j per gram dry weight (gDW) biomass (bio (t)) per time period Δt, whichever is closer to zero.
x j bioðtÞ � Dt ! Nutrient uptake rates were calculated using Michaelis-Menten parameters from the literature on T. pseudonana for CO 2 , HCO 3 [59], NO 3 [70], NH 4 [71], and PO 4 [72] (S3 Dataset). The V max was set to 0.2 and K m was set to 0.05 for all other uptake reactions, with the exception of H 2 O and H + which were assumed to diffuse freely and V max was set to 1000. The biomassdependent lower bound assumes that iTps1432 exists in a well-mixed environment where it can sense nutrient-availability and adjust its uptake rate accordingly [68].
PSII flux was calculated from fast repetition rate fluorometry (FRRf) parameters and culture data (electron transfer rate from PSII, PSII reaction centers per chlorophyll a) measured on days 0, 1, 3, 7, and 10 of the experiment (S3 Dataset, data provided by J. Liefer, [25]). The bounds of the PSII reaction were set as the 95% confidence interval of error propagated from the different measurements. To constrain the relative rate of CO 2 assimilation to O 2 production at each time point, we calculated the range of potential photosynthetic quotients (Q) by balancing the equation for the oxidation of new biomass to NO 3 or NH 4 with the chempy package [73]. Oxygen utilization and inorganic carbon evolution was not constrained during the dark period. After each time point, the concentration of O 2 , HCO 3 , and CO 2 was re-equilibrated with the atmosphere so that concentrations remained constant (well-mixed). We included a constraint for photorespiration where the oxygenase flux of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) ranges from 0.001-0.025 the carboxylase flux.
For each time point, constraints were converted from mmol mg chl a -1 h -1 to mmol gDW -1 h -1 using simulated chlorophyll a and biomass concentration (mg chl a L -1 /gDW L -1 ) ratios. Photon absorption flux was constrained for each time point as before, assuming a light absorption spectrum for cells acclimated to medium light [55] and the light intensity spectrum of a cool white fluorescent bulb (S3 Dataset). A new set of photon absorption (PHOA) reactions were constructed for each 20 nm wavelength at each time point using the simulated pigment composition.
We allowed for the mobilization of the biomass components chlorophyll a, chitin, polyphosphate, chrysolaminarin, protein amino acids, free amino acids, RNA, and triacyglycerides by including sink reactions for each of these components. The availability of each component was calculated as follows if Δbio j is < 0.
The degradation pathway for chlorophyll a is not characterized in diatoms, but chlorophyll a degradation was required to approximate the mg chl a gDW -1 factor. A demand reaction for chlorophyll a was included in the simulation and its lower bound was set to -1 � Δbio j .
We adapted the method described by Chiu et al. [68] to calculate changes in biomass composition and the concentration of metabolites or nutrients over time. To calculate biomass composition, we calculated biomass concentration bio(t 2 ) for the next time interval Δt using the exponential growth equation, where μ is biomass demand or v DM_biomass_c . Next we calculated the fraction of new biomass attributed to each component using the biomass objective function. To calculate the concentration of each metabolite x j in the next time interval Δt, where v j is the flux of each metabolite, and bio(t 1 ) is the biomass density or sum of all components at the current time point. Scripts for this simulation can be obtained from https://github. com/hmvantol/diatom-dFBA.

Reconstruction of the Thalassiosira pseudonana metabolic model
A genome-scale metabolic model of T. pseudonana CCMP 1335 (iTps1432) was generated using as a framework the Phaeodactylum tricornutum (CCAP 1055/1) models iLB1027_lipid [27] and iLB1034 [40]. Our model is distinct from a smaller, recently generated model of T. pseudonana (iThaps987) designed to explore production of industrially useful compounds [74] (see S1 Note for a discussion of differences between the two models). Differences between iLB1027_lipid and iLB1034 and their relevance to iTps1432 are listed in S3 Dataset. iLB1027_lipid has more reactions than iLB1034 due to the inclusion of detailed lipid metabolism. Additional elements incorporated into the T. pseudonana iTps1432 model include changes in the subcellular localization of reactions based on protein targeting sequences, improvements in modeled light absorption, and the inclusion of known metabolic differences between the diatoms. Several blocked reactions were resolved in iTps1432, and the number of dead-end metabolites in the network was reduced when compared to the P. tricornutum models. The T. pseudonana model contains 6,073 reactions that represent a network of 2,789 metabolites and 1,432 genes, approximately 10-12% of the T. pseudonana genome (S1 Table). The iTps1432 model contains reactions localized to six compartments representing the cytosol, mitochondria, plastid, thylakoid lumen, peroxisome, and the extracellular environment. As with the P. tricornutum model, our prediction pipeline did not localize any metabolic reactions to the endoplasmic reticulum, although other proteins were localized there. We evaluated the accuracy of our protein localization prediction pipeline by comparing our predictions to proteomics data from mitochondria and plastid fractions isolated from T. pseudonana [75]. Because the resulting fractions were not pure, we used peptide counts to determine if a protein was enriched in the mitochondrial or plastid fraction or in the cell lysate. A protein was considered enriched in one compartment if the proportion of peptides was at least twofold greater than both other fractions. Using this criterion, 107 proteins were enriched in the mitochondria, 86 proteins in the plastid, and 208 proteins in the cytosol or some other organelle. Overall, 57% of proteins enriched in the mitochondria and 57% of proteins enriched in the plastid were accurately localized by our prediction pipeline (S1a and S1b Fig). Seventy-one percent of the 208 proteins that were enriched outside of the plastid and mitochondria were assigned to other compartments by our pipeline (S1c Fig); one protein was predicted to be localized in the endoplasmic reticulum and seven were predicted to be localized in the peroxisome, suggesting that most proteins enriched in the cell lysate were probably from the cytosol. One hundred and forty-five of the proteins detected by Schober, et al. were present in iTps1432 and overall about 78% (113 out of 145) were accurately localized in the final model, 9 of which were manually curated (S1 Fig).
Two major metabolic differences distinguish the two diatom species. First, T. pseudonana has an absolute requirement for vitamin B 12 as it possesses only the B 12 -dependent methionine synthase gene METH, while P. tricornutum does not have a similar absolute requirement for B 12 as it has both METH and the B 12 -independent METE [76]. Adenosylcobalamin, methylcobalamin, and aquacobalamin were included in the biomass reaction as estimated millimole proportions of 1-gram dry weight ( [77], S2 Dataset), and the cofactors were included in the B 12 -dependent reactions. As a result, iTps1432 can only grow on media that contains cobalamin when the solver tolerance is set low enough. Second, T. pseudonana produces chitin [78]; iTps1432 was modified to include chitin biosynthesis and degradation pathways.
An orthoMCL comparison [31] of homologous genes between T. pseudonana and P. tricornutum guided additional modifications to iTps1432. These analyses confirmed that T. pseudonana lacks the enzymes guanine deaminase, tryptophanase, ATP citrate synthase, and βcarbonic anhydrase [79]. The citrate synthase reaction in iTps1432 was modified to deprotonate water rather than phosphorylate ADP, and a cytoplasmic carbonic anhydrase was added to the model [80]. The analysis also detected a few other differences between P. tricornutum and T. pseudonana. Reactions present only in P. tricornutum were eliminated from iTps1432; these include the isomerization of xylose to xylulose, transamination of 4-aminobutyrate, lysis of O-acetyl-L-homoserine to L-homocysteine and acetate, L-tryptophan deamination, agmatine hydrolysis, formamide hydrolysis, and guanine deamination. Reactions present only in T. pseudonana include chitin synthesis and hydrolysis, and cleavage of pyruvate into acetyl-CoA and formate (S2 Fig).
We also extended the iTps1432 model to include synthesis and respiration of the carbohydrate storage molecule chrysolaminarin, synthesis and hydrolysis of polyphosphate, exopolysaccharide biosynthesis, silicic acid condensation to a silica frustule, and pathways for the biosynthesis and respiration of 2,3-dihydroxypropane-1-sulfonate (DHPS) [22,81]. Lipid metabolism was re-configured in iTps1432 to reflect known differences in lipid composition [66,67]. We included transport reactions for those amino acids known to be excreted at high concentrations [82]. If genes encoding putative transporters for amino acids were identified, then additional transport reactions for chemically similar amino acids were added, with the assumption that these amino acids could use the same transporter. We included transporters for the following amino acids: L-glutamate, L-aspartate, L-isoleucine, L-leucine, L-valine, Lasparagine, L-glutamine, L-alanine, L-histidine, L-serine, L-threonine, glycine, and L-proline. Additional transport reactions for silicic acid, biotin, cyanocobalamin, aquacobalamin, DMSP, DHPS, glycine betaine, N-acetyltaurine, formamide, formate, uracil, acetate, choline, xanthine, ATP, AMP, triphosphate, and UDP-N-acetyl-alpha-D-glucosamine, were added based on information from the literature, gene annotations, and information from other algal models (S2 Table) [83]. Following the guidance of ref.
[33], all new transport reactions were set to reversible.
An important addition to iTps1432 was inclusion of a mechanistic model of photon absorption and electron transfer by fucoxanthin chlorophyll a/c binding proteins (FCPs). We included reactions that describe energy transfer efficiency from excited pigments to chlorophyll a in the photosystems, and pigment de-excitation reactions to dissipate excess energy as heat or fluorescence (see S3 Dataset). PSI and PSII reactions were modified to include charge separation and recombination reactions. Photodamage of the D1 subunit was included as a component of the PSII reaction [49,57], and the metabolic cost of D1 repair was included as part of the non-growth ATP maintenance reaction to calculate the ATP-cost of phosphorylation and activation of the FtsH protease [58] and the ATP-and GTP-costs of biosynthesizing a D1 peptide (S3 Dataset).

Development of biomass objective functions
A critical step in creating genome-scale metabolic models is development of accurate biomass production reactions in which precursor metabolites are converted into the cellular components that comprise the millimolar contribution to 1 g dry weight (gDW) of cell mass under specific growth conditions [33]. In iTps1432, the biomass reactions generate the DNA, RNA, protein, free amino acids, pigments, carbohydrates, lipids (phospholipids, sulfolipids, galactolipids, glycerolipids), triacylglycerides, chitin, chrysolaminarin, osmolytes, a silica frustule, polyphosphate, and a soluble pool of vitamins and cofactors that together define the total biomass of T. pseudonana under a particular environmental condition. Biomass reactions were developed for T. pseudonana cells acclimated to three different light levels (5, 60, 200 μmol photons m -2 s -1 ). Bulk biomass composition (carbohydrates, protein, total dry weight) and chlorophyll a concentration was measured in cells grown in chemostats maintained at 18˚C under 5, 60 and 200 μmol photons m -2 s -1 [24]. Pigment composition was derived from cells grown in a photobioreactor maintained at 18˚C at 30 μmol photons m -2 s -1 [50] and from exponentially growing cells maintained at 18˚C at 83 and 237 μmol photons m -2 s -1 [51]. The remaining biomass components were calculated from the literature, typically from exponentially growing cells, at temperatures ranging from 15-21˚C (optimal growth temperature is 21˚C); DNA nucleotide composition was calculated from the genome sequence data. A simplified siliceous frustule formation reaction was added to the model. The number of condensation reactions per Si atom was derived from NMR data on the degree of silica hydroxylation/condensation in T. pseudonana [84] and was used to calculate how much water should be released per gram of frustule formed. The weight of the frustule was calculated based on the expected degree of hydroxylation in the frustule and on either a linear relationship between growth rate and Si/C under light-limiting conditions (5, 60 μmol photons m -2 s -1 ) or a power law relationship under N-limiting conditions (200 μmol photons m -2 s -1 ) [85]. Calculations and references are provided in S2 Dataset.
Development of these biomass objective functions yielded information about the composition of T. pseudonana under different conditions and thus potential growth and acclimation strategies. The resulting total cell dry weights (scaled to the radius of each circle in Fig 2) is inversely related to light intensity (22.4, 16.6, 17.8 pg/cell for 5, 60, 200 μmol photons m -2 s -1 , respectively). Similarly, the silica frustule was the greatest component of biomass at the lowest light intensity (5.2 pg/cell vs 1.6 and 0.7 pg/cell, Fig 2), likely a consequence of the slower, light-limited divisions rates combined with non-saturable silicic acid uptake kinetics in T. pseudonana [86]. Protein contribution to biomass was inversely correlated with light intensity, with the least amount of protein at the highest light levels. In contrast, total carbohydrates increased with light intensity and represented the largest component at 200 μmol photons m -2 s -1 ([24], Fig 2). Pigments per cell were greatest at 5 μmol photons m -2 s -1 , resulting in an increased rate of photon absorption compared to the higher light levels. Photoprotective pigments (β-carotene, diadinoxanthin, diatoxanthin) were most abundant at 200 μmol photons m -2 s -1 ([50,51], Fig 2).

ATP maintenance cost calculation
Cellular energy requirements, in the form of ATP utilization, impact biomass production and metabolite excretion in metabolic models. ATP maintenance costs were calculated for the three light levels (5,60,200 μmol photons m -2 s -1 ) using our biomass objective functions and production measurements (provided by K. Halsey) from chemostats maintained at the same light levels [24]. Growth-associated ATP utilization represents the energy not accounted for in biopolymer formation reactions while non-growth associated ATP utilization accounts for energy utilization in the absence of growth. Growth-associated ATP maintenance calculations can be impacted by photodamage due to high light intensities [87]. In the chemostat studies [24], the maximum photochemical yield of PSII (Fv/Fm) remained constant (0.56-0.58), across the three light levels indicating a lack of photodamage. We therefore constrained biomass production using chemostat dilution rates and performed Flux Balance Analysis (FBA) for each light level by maximizing the ATP hydrolysis reaction. A linear relationship (R 2 = 0.94) between ATP utilization and growth rate allowed us to estimate the non-growth associated maintenance (r NGAM = -27 ± 52 SE) costs as the y-intercept and the growth associated maintenance (r GAM = 3809 ± 1221 SE) costs as the slope (S3 Fig). The small sample sizes resulted in large error bars (S3 Fig). Rather than constrain the ATP maintenance reaction with these bounds, we iteratively searched for individual r GAM values for each biomass objective function using the measured growth rate and calculated theoretical upper bounds for r NGAM .
The theoretical upper bound of non-growth associated maintenance (r NGAM ) was calculated for each chemostat using the compensation light level (I c ), which is the light intensity at which photosynthesis is equal to respiration and the growth rate is zero [63].
where a chl a is the chlorophyll a specific absorption coefficient (m 2 mg Chl a -1 ), r is the ratio of chlorophyll a to gram dry weight per cell (mg Chl a -1 gDW -1 ), ϕ m is the quantum efficiency of photosynthesis (mol O 2 mol photon -1 ), and e is the amount of ATP generated per oxygen. The parameter I k was estimated by fitting Chalker equation 1 [88] (where DR stands for dark respiration) to each net photosynthesis (NPP) curve generated by Fisher and Halsey ([24], data provided by K. Halsey). I k was then used to calculate the compensation light level (I c ) for each chemostat (Chalker equation 4).

Cyclic electron flow
Cyclic electron flow (CEF) and respiration both consume the reduced ferredoxin and downstream equivalents generated via linear electron flow (LEF), driving ATP generation and increased flux through PSII (Fig 1) and thus balance redox. Additionally, CEF helps generates a proton gradient across the thylakoid membrane through cytochrome b 6 /f. Constraining these reactions can improve the accuracy of flux predictions in iTps1432. Bailleul et al. [10] proposed that CEF is less than 5% of maximal total electron flow in diatoms. Maximal total electron flow occurs when light is saturating and increased flux of electrons cannot occur. To constrain CEF in iTps1432 we simulated the Bailleul et al. [10] light response experiment with FBA using the biomass equation we developed and absorption spectrum acquired for cells acclimated to medium light intensity (60 μmol photons m -2 s -1 ; their cultures were acclimated to 70 μmol photons m -2 s -1 ). The maximal total electron flow through a cell acclimated to medium light was estimated by setting the level of light exposure to~2000 μmol photons m -2 s -1 (saturating). PSII, oxygen exchange, and bicarbonate exchange reactions were constrained by the 95% confidence intervals of gross and net photosynthesis and carbon uptake, respectively, measured for cells acclimated to 60 μmol photons m -2 s -1 and exposed to 1952 μmol photons m -2 s -1 [24]. The rate of D1 damage for the PSII reaction was calculated based on the number of inactivation events per O 2 molecule evolved [57], using a quantum efficiency (ϕ m ) of 0.056 mol O 2 per photon [24] (S3 Dataset). Nutrient uptake rates were calculated using the Michaelis-Menten equation with f/2 nutrient concentrations. The upper bound of CEF was constrained with the following symbolic expression where CEF is set to 5% of total electron flow,

Effect of light limitation
The allocation of photosynthetic energy to biomass production and respiration was quantified using pFBA with iTps1432 at three different light levels [24], constrained by chemostat production data (provided by K. Halsey), the calculated ATP maintenance costs, and CEF, given the light-dependent differences in biomass composition (Fig 2). The pathways contributing to the dissipation of reducing equivalents generated by light energy include: CEF; respiration (ribulose-1,5-bisphosphate oxygenase, glycolate oxidase, plastid terminal oxidase, the Mehler reaction, alternative oxidase, cytochrome c oxidase); nitrogen assimilation (NO 3 reductase, NO 2 reductase, glutamate synthesis); sulfate assimilation (PAPS reductase, APS reductase, 2-aminoacrylate sulfotransferase, SO 3 reductase); carbon assimilation and biosynthesis of reduced metabolites (Fig 1). Some pathways also dissipate the electrons generated by transient pools of organic carbon respired to CO 2 . For example, rapidly dividing cells preferentially use storage polysaccharides such as chrysolaminarin, while slowly growing populations dominated by cells in G1 phase preferentially use newly formed glyceraldehyde-3-phosphate (G3P) [89]. Additional sink reactions for chrysolaminarin and G3P were included to simulate respiration of these transient organic carbon pools. From the chemostat dilution rates we calculated the fraction of cells dividing per hour (T. pseudonana cell division is not synchronized [90]). The sink reactions were constrained (proportionally to the fraction of cells dividing per hour) with the differences between gross and net carbon production (the carbon catabolism measurements) for each chemostat [24].
Biomass production and cyclic electron flow are the largest electron sinks (Fig 3a). G3P production by the Calvin cycle is a sink for electrons across all three light levels, with the lowest amount produced at the highest light intensity due to increased supply of G3P that is not entirely consumed by mitochondrial maintenance respiration. The biosynthesis of complex macromolecules, or further reduction of G3P, is another source of NADPH consumption, and becomes increasingly important at higher light intensities. Nitrate and sulfate reduction also consume reducing equivalents. At 5 μmol photons m -2 s -1 , nitrate reduction is a proportionally more important component of biomass production than at other light levels because of the high protein content of cells acclimated to low light (10.5 pg/cell vs. 4.3 and 3.3 pg/cell, Fig 2). Respiration is also more important at low light levels. Mitochondrial maintenance respiration was used to constrain respiration of organic carbon via the TCA cycle, and as a result cytochrome c oxidase is a constant proportion of TEF (Fig 3a). PFBA predicts that the Mehler reaction was the largest respiratory flux at 5 and 200 μmol photons m -2 s -1 , while at 60 μmol photons m -2 s -1 cytochrome c oxidase was the largest and there was also some respiration by plastid terminal oxidase (Fig 3a). To achieve a single solution, pFBA minimizes the absolute sum of fluxes with the objective of minimizing enzyme utilization. Given that this objective may not be relevant to photosynthetic organisms dealing with redox balance, we explored additional respiratory constraints in the simulation. First, we noticed that photorespiration was not a part of the original pFBA prediction and that it could be an important sink for reductants in photosynthetic organisms. Photorespiratory flux, or the specificity of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) for CO 2 versus O 2 , was calculated with the following expression [91] where v c is the carboxylase flux, and v o is the oxygenase flux through RuBisCO, V c K o /V o K c is the specificity factor of RuBisCO for CO 2 over O 2 , and [CO2]/[O2] is the ratio of CO 2 versus O 2 in the pyrenoid. We used a specificity factor of 79, as determined for the related diatom T. weissfloggii [92] because of an assumed similarity in function based on the predicted peptide level similarities between the RuBisCO enzymes from the two diatoms (rbcS is 97% identical, rbcL is 98% identical). The concentration of CO 2 in the pyrenoid, where most RuBisCO is located, is estimated at 100 μM [93]. The concentration of O 2 in the pyrenoid is unknown and is difficult to measure [94]. We therefore used the ambient concentration of O 2 in seawater at equilibrium with the atmosphere (200 μM).
Photorespiration was constrained with the following symbolic expression, where the oxygenase activity of RuBisCO is 2.5% the carboxylase activity. The addition of a photorespiratory constraint activates the oxygenase activity of RuBisCO as well as peroxisomal glycolate oxidase (Fig 3b). For a K m of 65 μM [92], pyrenoid CO 2 concentrations are likely higher than estimated because diatoms are not carbon-limited, and O 2 concentrations are thought to be lower (J. Young, pers. comm.) Thus, these simulations may overestimate the significance of photorespiration. In diatoms, a major component of redox balance is the flow of reducing equivalents from the plastid to the mitochondria [10], based on the observation that flux of electrons through PSII depends on mitochondrial respiration. In a third experiment, we included energetic coupling between the plastid and the mitochondria by redirecting reductants generated by linear electron flow to NADH ubiquinone oxidoreductase with the following symbolic expression, Little is known about the extent of energetic coupling in T. pseudonana and how it may change with acclimation to different light intensities, so the value of this constraint (NAD-HOR_m/PSI_u = 0.0015) was chosen for exploratory purposes. Flow of reductants from the plastid activates the mitochondrial alternative oxidase reaction in iTps1432 (Fig 3c). Larger values that increased the flux through alternative oxidase impacted the growth rate of iTps1432 because there was a trade-off with CEF which appears to be required for energy generation at these lower light levels.
The simulations under different light limiting conditions highlighted the interconnected pathways that diatoms rely on to balance reductant dissipation through biomass requirements and different respiratory pathways. These results raise the question of how the dissipation of reductants generated by photosynthesis occurs under nutrient-limitation conditions when biomass production cannot be used as an electron sink.

Effect of nitrogen starvation
The previous simulations highlighted the importance of nitrate reduction as an electron sink under steady-state conditions. We next explored the impact of nitrogen limitation and starvation on reductant flow using dynamic Flux Balance Analysis (dFBA) to simulate growth in batch culture where cells are not in steady-state. Throughout the simulation, we tracked simulated nutrient and excreted metabolite concentrations in the media as well as molecular and elemental biomass composition. The simulations relied on experimental data [25,26] derived from T. pseudonana cells grown at 85 μmol photons m -2 s -1 in a 12: 12h light: dark cycle at 18˚C in f/2 media modified to initiate N-starvation. T. pseudonana acclimated to medium light levels (60 μmol photons m -2 s -1 ) is the closest condition examined in the previous simulation. Biomass composition and macro-nutrient concentrations were determined experimentally at 0, 1, 3, 7, and 10 days after the start of the experiment. Measurements of biomass C, N, and P were made more frequently at 0, 1, 2, 3, 5, 7, and 10 days after the start of the experiment. Liefer, et al. [26] defined four different growth phases according to observed growth patterns: mid-exponential corresponded to N-replete steady-state growth (day 0); late exponential corresponded to reduced growth after dilution to N-free media prior to stationary phase (day 1-3); early stationary corresponded to one day after cessation of cell division (day 7); and midstationary corresponded to five days after cessation of cell division (day 10).
In batch culture, biomass composition measurements are equal to the composition of cells from all previous time points plus newly synthesized biomass minus the re-mobilized biomass components. Biomass objective functions were computed for each time point by comparing the simulated biomass composition at the current time point to the experimentally measured biomass compositions at the different time points (target biomass composition) (Fig 4a). As nitrogen is depleted, T. pseudonana uses up protein and accumulates carbohydrates and lipids; there is also accumulation of DNA as cells stop actively dividing, a decrease in RNA and pigments per cell, and accumulation of residual P likely corresponding to polyphosphate storage ( [25] , Fig 4a). Experimental measurements and simulations of biomass composition produce C:N or N:P ratios that indicate cell composition was impacted by nitrogen starvation after 3 days in culture (Fig 4b). After 10 days in culture, measurements of C:P decreased due to polyphosphate increasing as a cellular component in stationary phase. There is a similar change in C:P in the simulated iTps1432 biomass composition (Fig 4b). Small differences between measurements and simulated elemental ratios could be attributed to both error in the bulk biomass composition data taken from the literature to supplement the experimental measurements (osmolytes, chitin, vitamins), errors in the assumed compositional data making up the larger components (molecular composition of RNA, protein, free amino acids, protein, carbohydrates, EPS, and lipids), or over-production and under-production of various components at different points in the simulation (S4 Fig). Nevertheless, many measured biomass components were accurately simulated and the predicted timing of nitrogen starvation was accurate, giving us confidence that our model reflects the principle effects of nitrogen limitation and starvation in batch culture.
The added NO 3 (and NO 2 ) in the experimental media were depleted sometime between 1 and 3 days after the initiation of the experiment; low levels of background NH 4 in the media were depleted after 1 day. Simulated NO 3 was completely depleted from the media shortly after 1 day of growth whereas simulated NH 4 was released into the media concurrent with the NO 3 depletion and was subsequently taken up and re-excreted into the media between the different target days. The simulated PO 4 concentration in the media matched the observed gradual decrease over time (Fig 5a). The greatest discrepancy between the measured and simulated nutrients was for Si(OH) 4 concentrations. Experimental Si(OH) 4 concentrations were simultaneously drawn down with the NO 3 during the first day and then plateaued at~10 μM. The simulated drawdown of Si(OH) 4 instead plateaued at~25 μM shortly after 1 day in culture. This difference likely reflects errors in the weight of the cell frustule used in the biomass objective function. This value was not experimentally measured and so the value used for the biomass objective functions was extrapolated from the difference between dry weight measurements and the sum of other biomass components (S4 Dataset). Inaccurate simulation of silicate utilization will have little impact on the flow of reductants as the silicate condensation reaction to form the frustule is not a redox reaction. To simulate a diel pattern of biomass formation, we included over-production of chrysolaminarin and TAGs during the light period in the biomass objective functions and did not include production of TAGs, chrysolaminarin, and pigments in the dark period [62]. Biomass production was strongest during the light period and was typically followed by respiration of some biomass components in the dark to create a diel pattern of changing biomass concentration (Fig 5b).
The ability to accurately simulate uptake of nitrate and C:N under non-steady-state conditions motivated a more detailed examination of electron flow. Gross oxygen evolution from the PSII reaction was constrained based on the electron transfer rate and the number of PSII reaction centers per chlorophyll a measured on days 0, 1, 3, 7, and 10 of the experiment [25]. Per milligram of chlorophyll a, gross oxygen evolution decreased slightly on the first day and remained steady until the seventh day, with a sharp increase on the tenth day of the experiment (Fig 6a). There is good agreement between gross oxygen production measured at 85 μmol photons m -2 s -1 for cells acclimated to medium light intensity grown in chemostats [24] to the PSII flux calculated with data from FRRf for cells in mid-exponential phase (Fig 6a). On a gram dry weight basis, there is an initial increase in PSII flux followed by a gradual decline over the course of the experiment as a result of declining chlorophyll a concentrations relative to total biomass. PSII flux is only slightly higher on the tenth day of the experiment on a gram dry weight basis (Fig 6a). The simulation includes diel fluctuations in chrysolaminarin and TAG production which create regular fluctuations in the ratio of O:C of new biomass (Fig 6b). The  4 , and PO 4 were measured 0, 1, 3, 7, and 10 days after transfer to fresh media with low nitrate and compared to simulated media nutrient concentrations (b) Cell concentration (cells mL -1 ) was measured each day and converted to biomass concentration (g L -1 ) where measurements of cell dry weight were available and compared with the simulated biomass concentration. 12: 12 h light: dark cycles are depicted with white and grey stripes.
https://doi.org/10.1371/journal.pone.0241960.g005 elemental formula of newly produced biomass was used to calculate a range of possible photosynthetic quotients (PQ: the proportion of oxygen evolved to inorganic carbon assimilated in the light) given growth on nitrate or ammonia (Fig 6c). The combination of constraints on the PSII, oxygen exchange, and inorganic carbon assimilation reactions limit respiration during the light period, while respiration is left unconstrained in the dark period. In general, total respiratory flux increases with increased flux through PSII, although there is some influence by the photosynthetic quotient (Fig 6c and 6d). Diel changes in respiration are controlled by diel patterns in biomass formation in which the rates of chrysolaminarin and TAG production vary throughout the light period. During the light period plastid terminal oxidase and the Mehler reaction are the primary respiratory fluxes. During the dark period, respiration switches to peroxisomal glycolate oxidase and cytochrome c oxidase as organic matter is respired for energy (Fig 6d).
The photosynthetic quotient is the proportion of oxygen evolved to inorganic carbon assimilated in the light and its calculation assumes that all assimilated CO 2 is used to generate biomass. However, diatoms are known to excrete dissolved organic carbon during and the constraints affecting respiration for the transition of iTps1432 from exponential to stationary phase under nitrogen-limited conditions. (a) Flux through the PSII reaction is driven by photon absorption and is constrained by measurements taken 0, 1, 3, 7, and 10 days after transfer to fresh media with low nitrate (green, mmol O 2 mg chl a -1 h -1 , [25], S3 Dataset). The 95% confidence intervals were converted from mmol O 2 mg chl a -1 h -1 to mmol O 2 gDW -1 h -1 using simulated mg chl a / gDW and set as lower and upper bounds of the reaction. Steady-state N-replete ± 95% confidence interval (black) measurement of net oxygen production [24]  photosynthesis. For this reason, we explored a range of different PQ constraints (95% PQ-105% PQ) to see how this assumption impacts metabolite secretion (Fig 7). Metabolite excretion can occur if the by-product of a reaction cannot be assimilated by the organism; for example, cyanide is a by-product of cyanocobalamin utilization. Occasionally cyanide is used by 3-mercaptopyruvate sulfurtransferase in the de-sulfuration of 3-mercaptopyruvate, resulting in the excretion of thiocyanate. Cyanocobalamin is a form of vitamin B 12 frequently used in culture media, but is not widely available in the environment. Polyphosphate is also a by-product in vitamin B 12 metabolism and folate biosynthesis; excess is excreted and then re-assimilated at later time points as the preferred source of phosphorus [95]. In this simulation, metabolites are excreted predominantly as a result of biomass re-mobilization or to balance redox.
Formate and DMSP production were most strongly impacted by changes in the PQ. DMSP production increases with the photosynthetic quotient (or when respiration is low).

PLOS ONE
Conversely, formate production is highest when respiration is high (Fig 7). Similarly, more ethanol is excreted when the PQ is higher and more urea is excreted when the PQ is lower than predicted by biomass composition, and then both are re-assimilated at night. Other metabolites, proline, leucine, formamide, xanthine, and urate, appear to be by-products of remobilizing certain biomass components (protein, free amino acids, RNA) to achieve the targeted objective. Some metabolites including L-aspartate, acetate, glycolate, glycine, L-glutamate, and L-threonine appear to be more sporadically excreted and re-assimilated.
We evaluated reactions involved in N and S metabolism in simulations where DMSP is excreted and found increased sulfate assimilation after the onset of NO 3 depletion (Fig 8). NO 3 reductase uses 1 NADH to produce NO 2 in the cytosol, NO 2 reductase is localized in the plastid and uses 3 NADPH to produce NH 4 , and the GS-GOGAT cycle assimilates NH 4 while utilizing ATP and 2 reduced ferredoxin when localized in the plastid. After NO 3 is depleted, ammonia assimilation continues sporadically throughout the time course as a side-effect of the ammonia excreted due to the re-mobilization of different pools of organic nitrogen during biomass composition changes. SO 4 can be assimilated in the plastid by a plastid-localized ATP: sulfate adenylyltransferase that produces adenylyl sulfate (APS). APS reduction to SO 3 consumes a reduced thioredoxin, SO 3 reductase is plastid localized and consumes 6 reduced ferredoxin to produce H 2 S. SO 4 reduction to sulfide becomes more prevalent immediately after NO 3 is depleted and results in DMSP excretion (Figs 7 and 8). 2-aminoacrylate sulfotransferase also contributes to redox balance by consuming 1 NADPH in the production of L-cysteate which is thought to be an intermediate in DHPS production [22], a component of diatom biomass in iTps1432.

Discussion
The interdependence between biomass composition, photon absorption, nutrient utilization, and photosynthetic constraints is a distinguishing feature of iTps1432, and a critical component of photosynthetic modeling given the dynamic nature of biomass composition and its role in photosynthesis. These interconnections allowed us to model the different ways that diatoms may maintain redox balance. A critical first step was development of biomass objective functions for three different light levels (5,60,200 μmol photons m -2 s -1 ) and use of chemostat production measurements to constrain flux for different light intensities (Fig 2). Given an elemental formula for biomass composition, the degree of reduction can be calculated as the number of electron equivalents per gram atom C [96]. The inverse relationship between protein content of cells and light meant that the degree of reduction was also inversely related to light (5.66, low light; 5.47, medium light; 4.97, high light). More highly reduced biomass composition as result of increased nitrate assimilation, in addition to increased lightdependent respiration at low light could be part of a previously proposed strategy that T. pseudonana limits respiration of organic carbon by using alternative redox balance strategies in order to improve the efficiency of biomass production [24].
Photosynthetic organisms have evolved a variety of mechanisms to deal with the primary challenge of photosynthesis: how to capture light energy while evading potential damage. Over longer time periods, diatoms adjust their pigment composition to deal with incoming light energy, but short-term fluctuations in light intensity with inadequate sinks for electrons could theoretically cause a plastid to become over-reduced and damaged. Energy dissipation as heat or fluorescence (non-photochemical quenching) is one potential mechanism, summarized in iTps1432 as a photon loss reaction. The metabolic challenge of photochemical quenching is managing the mismatch between reduced ferredoxin produced by linear electron flow and the actual metabolic requirements for NADPH. Light-dependent respiration reactions in the plastid are one of the primary sinks for reductants. Alternatively, reduced metabolites can be exported from the plastid and oxidized by mitochondrial respiration. Finally, cyclic electron flow is another potential sink for reductants, as well as a non-reducing source of energy since the reaction drives proton pumping by cytochrome b 6 /f. Halsey & Fisher postulated that CEF could be more important at lower light levels, but did not measure it in their experiment [24]. Our FBA predicted that CEF is a major component of alternative electron flow, particularly at lower light levels (Fig 3a). This finding was unexpected as it has been shown that CEF is less than 5% of maximal electron flow and the transfer of reductants from the plastid to the mitochondria is more prevalent in diatoms when compared to plants [10]. The absolute flux of CEF remains relatively constant across all light levels and in multiple diatom species [10]. We used FBA to calculate an upper bound for CEF at saturating light levels (maximal electron flow). At low light levels, this flux value is relatively important given the decrease in total electron flow as light intensity goes down. In the data given by Bailleul et al. [10], CEF makes up about half of TEF at the lowest light level measured (~100 μmol photon m -2 s -1 ); so we may be underestimating its importance at these light levels due to the poorly constrained values of growth-associated ATP maintenance, and given the contribution of CEF to energy generation.
Respiration via cytochrome c oxidase is activated by the respiration of transient organic carbon pools by the mitochondrial TCA cycle. Some of the transient organic carbon that was calculated to be available (as the difference between gross and net carbon production measurements) is directly utilized, as evidenced by the decreased use of the Calvin cycle at 200 μmol photon m -2 s -1 (Fig 3a). Remaining respiratory flux was routed through the Mehler reaction or plastid terminal oxidase. In the Mehler reaction, oxygen reacts with reduced ferredoxin emerging from photosystem I and forms a superoxide anion. Superoxide dismutase neutralized two reactive anions into oxygen and hydrogen peroxide, and ascorbate peroxidase converts hydrogen peroxide into water. NAD(P)H is consumed through the glutathioneascorbate cycle (Fig 1). Chlororespiration results from the oxidation of the plastoquinone pool by plastid terminal oxidase (Fig 1) and was historically defined as an electron transport chain in the thylakoid membrane involving a proton-translocating NADH:plastoquinone reductase complex (NDH1) and a plastid terminal oxidase (PTOX) [97]. T. pseudonana and other unicellular algae lack an NDH complex [98], and likely rely on a non-electrogenic NDH2 or ferredoxin:plastoquinone reductase (CEF) to fuel the terminal oxidase [97,99]. Chlororespiration is thought to be only active in low light conditions or darkness [99,100]. Here FBA predicts that PTOX is active at the medium light levels, and the Mehler reaction is active at low and high light. Parsimonious FBA minimizes the absolute sum of fluxes, and therefore returns a single solution with the lowest possible number of active fluxes. As a result, only one light-dependent respiration reaction is active and this may not be realistic. We experimented with different constraints on respiration using the available information in the literature (Fig 1).
Photorespiration occurs whenever there is oxygen in the pyrenoid because the enzyme RuBisCO cannot always distinguish between O 2 and CO 2 . We activated photorespiratory flux by assuming a high concentration of O 2 relative to CO 2 in the pyrenoid. This constraint activated the oxygenase activity of RuBisCO as well as peroxisomal glycolate oxidase (Fig 3b). In diatoms, photorespiration is truncated and 2-phosphoglycolate is not recycled back to ribulose-1,5-bisphosphate [101]. Glycolate can be oxidized to glyoxylate and then converted to either malate or transaminated to glycine or is excreted under certain conditions [102,103]. Alternative oxidase was activated by imposing an energetic coupling constraint on PSI and NADH ubiquinone oxidoreductase to simulate energetic coupling [27] (Fig 3c). However, only a low level of energetic coupling (NADHOR/PSI = 0.15%) could be introduced before the constraint impacted biomass production. This observation supports the idea that CEF is more important at low light levels, and can be inhibited if reductants are diverted to the mitochondria. CEF helps generates a proton gradient in order to fuel ATP production; this could be an important reaction when light levels are low and cells are more energy starved, which is why our predictions are dependent on the GAM.
In addition to respiration, nitrate and sulfate assimilation also contribute to balancing redox reactions in the plastid, the relative impact of which depends on how cells adjust respiration rates in response to changing redox pressures [104]. We simulated the progression of a batch culture from nitrate limitation to N-starvation using dynamic FBA (Fig 5). When nitrate is depleted, diatoms continue to produce biomass by re-mobilizing internal sources of nitrogen and producing more of the carbohydrates [26]: chrysolaminarin and EPS (Fig 4a). Diatoms also experience decreased flux through PSII caused by pigment degradation [25], which decreases light absorption. Data on net oxygen evolution as nitrogen is depleted in batch culture was not available; we calculated a range of possible photosynthetic quotients from the new biomass equation at each time point based on biomass production from nitrate versus ammonia. This strategy only accounts for metabolites that are part of the objective function and does not account for the possibility of excreted metabolites or the re-mobilization of biomass components. We tested a range of different PQ constraints (95%-105% the original PQ value) and found a potential role for sulfate assimilation after nitrate is depleted (Fig 8). Secretion of the organic sulfur compound DMSP as a result of increased sulfate assimilation is in line with previous experimental work indicating that nitrate limitation causes the greatest increase in intracellular DMSP production [105] (Fig 7). DMSP is hypothesized to act as an osmolyte replacing stores of proline, under nitrate limitation [105]. Many of the compounds produced and excreted by iTps1432 are compatible solutes, a class of compounds known to be transported in and out of cells in response to changes in osmotic pressure. We could not account for the possibility of changing osmolyte composition as part of the biomass objective function as the only measurement of osmolytes are from N-replete conditions [22].
Experimental support for a role for nitrate uptake in response to fluctuating redox pressures comes from the observation that nitrogen-replete diatoms secrete ammonium during rapid increases in irradiance [106]. Additionally, enzymes within nitrogen and sulfur assimilation pathways in diatoms, are redox-sensitive [107]. Nitrate assimilation is likely to be more effective at dissipating reductants than sulfate assimilation because of low consumption of ATP relative to NADPH equivalents. However, sulfate is consistently available at high concentrations (28 mM, [108]) throughout the ocean whereas nitrate is typically limiting in both coastal regions [109] and in the subtropical gyres [110]. Although our focus was on growth limitation by nitrate, other types of nutrient-limitation (such as silicon, iron, zinc, or vitamin B 12 [111]) may impact redox balance by limiting biomass formation.
In our simulations, iTps1432 reduces ATP demand by limiting flux through the TCA cycle and instead secretes ethanol and formate (Fig 7), both of which are suggestive of some level of fermentation. Similarly, the eukaryotic green alga Chlamydomonas reinhardtii has the full complement of enzymes used to ferment pyruvate and is known to excrete fermentation products malate, ethanol, acetate, formate, and H 2 under dark anaerobic conditions [112]. In T. pseudonana, formate is a by-product of the methionine salvage pathway or DMSP biosynthesis, and is produced via pyruvate formate lyase, an alternative reaction to pyruvate dehydrogenase. In our simulations, formate is excreted primarily in the dark period as a result of chrysolaminarin utilization, while Chalmydomonas produces formate during degradation of starch reserves. In Chlamydomonas, ethanol is typically produced via alcohol dehydrogenase that detoxifies acetaldehyde generated by pyruvate decarboxylase during fermentation [112]. T. pseudonana lacks the gene for pyruvate decarboxylase, and instead produced acetaldehyde as a by-product of threonine aldolase. In iTps1432, acetaldehyde is either converted into ethanol and excreted or into acetate which may rejoin the TCA cycle as acetyl-CoA. In our simulations, ethanol is secreted during the light period and re-assimilated during the dark period. Examples of fermentative metabolism in photosynthetic organisms during both the light and dark period have been previously reported. Cyanobacteria in hot spring microbial mats are known to experience anaerobiosis at night and switch to fermentative metabolism [113]. Chlamydomonas cultures grown with limited aeration, under low nutrient or low light conditions, also experience anaerobiosis due to respiratory utilization of oxygen exceeding oxygen produced by photosynthesis [112]. When the PQ is constrained to 95% the calculated value, formate is produced during both the light and dark period, likely due to respiration exceeding photosynthesis. Amino acids are commonly secreted by a variety of diatoms [21,114]. We were surprised to predict the secretion of amino acids by iTps1432 under nitrate starvation conditions (Fig 7). Secretion of these amino acids occurs mostly during shifts to new biomass targets where protein re-mobilization is possible and are not impacted by changes in respiration. We did not include transport reactions for small peptides although these compounds are an important component of secreted metabolites in T. pseudonana, possibly as a by-product of protein turnover [115]. During N-starvation we would expect protein re-mobilization to contribute to biomass formation for other metabolites requiring nitrogen rather than result in excretion of amino acids. This observation is similar to the degradation of RNA nucleotides into urate and xanthine and the excretion of formamide (Fig 7). Perhaps these nitrogenous compounds are too highly reduced to be useful during nitrate starvation and are therefore released into the environment.
Most of the metabolites secreted by iTps1432 can be consumed by marine bacteria. For example, a subset of marine bacteria can utilize glycolate as a sole carbon source [116], and bacterial transcripts for glycolate oxidase were found to vary on a diel cycle during a phytoplankton bloom [117]. Many bacteria from the Roseobacter clade rely on organic nitrogen and sulfur compounds produced by phytoplankton or other bacteria as they are unable to reduce nitrate or nitrite and some cannot reduce sulfate [118]. Alphaproteobacteria and Gammaproteobacteria are known to degrade DMSP into methanethiol (CH 4 S) or dimethyl sulfide (DMS) [119]. Differences in the metabolic networks of phytoplankton as well as differences in respiration, and how different species react to fluctuations in environmental conditions and redox imbalances will impact the character and quantity of metabolites secreted. These are likely major factors that structure the bacterial community associated with phytoplankton and control bacterial succession over the course of a bloom [120].

Conclusion
Thalassiosira pseudonana CCMP 1335 was isolated from Moriches Bay, New York, in 1958, and the whole genome was sequenced in 2004 [121]. With availability of the genome, T. pseudonana has been studied from a systems-wide perspective using transcriptomics, proteomics, and metabolomics (eg. [115,[122][123][124][125]). The genome-scale metabolic model of T. pseudonana created here builds on previous modeling work [27], incorporates currently available physiological and genomic data, and will serve as a powerful tool to generate hypotheses about diatom metabolism and to interpret future experiments.
We simulated the metabolism of T. pseudonana under steady-state light limited conditions and found that cyclic electron flow plays an important role in generating energy and balancing redox under low-light conditions, while the role of energetic coupling between plastids and mitochondria is minimal. We also found that nitrate reduction plays a critical role in dissipating reductants in the plastid under nutrient-replete conditions, while sulfate reduction replaces nitrate assimilation during non-steady-state simulations of nitrate-limitation and N-starvation. The character and quantity of metabolites secreted during our simulations depends on the conditions of redox balance, the relationship between photosynthesis and respiration, and biomass re-modeling.
iTps1432 incorporates a large body of experimental and genetic knowledge. We caution, however, that our predictions also rely on a few network reactions for which there is little or no experimental or genetic evidence. Many enzymes are promiscuous, catalyzing a wider variety of substrates than represented here, and errors in annotation are possible. iTps1432 should be considered a preliminary version of a metabolic network that will improve with further curation and expansion as research on this diatom continues. iTps1432 could be extended in a variety of directions in the future. Reactions describing complex formation for metal-and cofactor-requiring proteins could be added to the model to better describe vitamin and trace metal utilization, as trace metal limitation is an important nutrient condition in the ocean [111] and vitamins play an import role in interactions with bacteria [126,127]. Additionally, an effort to better characterize transporters would significantly improve prediction of metabolite secretion and mechanisms of energetic coupling between the plastid and the mitochondria. The development of representative marine metabolic models, such as iTps1432, could allow us to integrate molecular data with models of ocean biogeochemistry in the future. measured growth rate and optimal ATP utilization in iTps1432, given the biomass composition at different light levels. The x error bars represent the 95% confidence interval of measured dilution rates, and the y error bars follow from the 95% confidence intervals of bulk biomass measurements (carbohydrates, protein, total dry weight, in pg C/cell). (PDF) S4 Fig. Comparison of simulated biomass components versus target components at the  onset of different growth phases (days 0, 1, 3, 7, 10). The best and worst (blue, red) estimates of biomass composition were marked for each biomass component.