A systems-biology approach to molecular machines: Exploration of alternative transporter mechanisms

Motivated by growing evidence for pathway heterogeneity and alternative functions of molecular machines, we demonstrate a computational approach for investigating two questions: (1) Are there multiple mechanisms (state-space pathways) by which a machine can perform a given function, such as cotransport across a membrane? (2) How can additional functionality, such as proofreading/error-correction, be built into machine function using standard biochemical processes? Answers to these questions will aid both the understanding of molecular-scale cell biology and the design of synthetic machines. Focusing on transport in this initial study, we sample a variety of mechanisms by employing Metropolis Markov chain Monte Carlo. Trial moves adjust transition rates among an automatically generated set of conformational and binding states while maintaining fidelity to thermodynamic principles and a user-supplied fitness/functionality goal. Each accepted move generates a new model. The simulations yield both single and mixed reaction pathways for cotransport in a simple environment with a single substrate along with a driving ion. In a “competitive” environment including an additional decoy substrate, several qualitatively distinct reaction pathways are found which are capable of extremely high discrimination coupled to a leak of the driving ion, akin to proofreading. The array of functional models would be difficult to find by intuition alone in the complex state-spaces of interest.


Introduction
The proteins and protein complexes known as molecular machines perform essential functions in the cell, including transport, locomotion, energy production, and gene expression [1]. Secondary active transporters, the focus of the present study, move ions and small molecules across a membrane driven by an electrochemical gradient of an ion [1]. For example, sodiumglucose transporters (SGLT) are of biomedical interest due to the vital role that SGLT1 and SGLT2 play in the uptake of glucose in the small intestines and reabsorption in the kidneys, respectively [2], which in turn has prompted biophysical scrutiny of their mechanisms [3][4][5][6]. Numerous other transporters have also been assayed on a quantitative basis [7][8][9][10][11][12].
The biological mechanisms of transporters as well as other molecular machines can be modeled using chemical reaction networks, typically along with mass action kinetics [13]. In a chemical reaction network, the system process is decomposed into discrete states connected by transition rates between states [13] forming a network of interconnected reactions in the state-space (Fig 1). These networks can be modeled using the chemical master equation: a set of differential equations describing the state probabilities and connected transition rates for each state [14]. Biochemical networks are generally Markovian [15], have a number of different control patterns [16], and typically adhere to specific design principles [17].
Despite the complex state-spaces accessible to molecular machines such as transporters, their mechanisms are often described using single-pathway, highly machine-like cartoon-like models [1,18,19], building on the seminal suggestions of Mitchell [20] and Jardetzky [21] (see Fig 1). While such models are helpful for a qualitative understanding of complex protein behavior and chemical networks, simple models may also build in unwarranted assumptions about the system. There is growing evidence that molecular machines may exhibit complexity beyond that embodied in typical 'textbook' cartoons models [7,8]. Recent experimental studies have shown that certain traditional model assumptions such as fixed stoichiometry [9,10], homogeneous pathways [11], and unique binding sites [12] may be incorrect.
As an example of mechanistic alternatives within a simple state-space, consider a hypothetical cotransporter motivated by the SGLT symporter which transports a single substrate and is driven by an ion gradient. The state-space is constructed using three state 'dimensions': conformational state, ion binding state, and substrate binding state. For this hypothetical transporter there are two conformations, each permitting four ion/substrate binding states: fully unbound, ion bound, substrate bound, and fully bound. Within this relatively simple statespace, we can construct four ideal kinetic pathways (Fig 1) that connect the minimum number of states to produce a symport cycle (i.e., intracellular transport of the substrate coupled to ion flow). However, there are numerous additional mechanistic possibilities: combinations of the ideal pathways, or even non-ideal pathways including, e.g., an ion leak. Note that antiport cycles can be similarly constructed using this same state-space [22].
Beyond the nominal functions of transporters, we also take note of one of the most remarkable properties of some molecular machines: the ability to perform "proofreading" or error-correction [23][24][25]. Specifically, certain network topologies promote enhanced selectivity (i.e., reduced error) in systems with a competing substrate [23][24][25]; this enhancement in selectivity incurs a free energy cost, typically paid via hydrolysis of a phosphodiester bond. While some aspects of proofreading networks have been examined-such as the speed, accuracy, and dissipation trade-offs [26,27], as well as non-equilibrium proofreading regimes [28]-the possibility that transporters might exhibit proofreading has not been explored to our knowledge.
Here we pursue a systematic exploration of mechanistic and functional diversity, building on strategies developed largely within the field of systems biology. Due to the challenges of modeling complex biochemical networks, such as enumerating combinatorically large statespaces, new approaches have been developed to keep these systems tractable. Genetic-algorithm sampling has been used to evolve complex biochemical networks such as metabolic pathways [29,30].
In related work applied to ion channels, models have been fit to experimental data using genetic algorithms and simulated annealing [31][32][33]. These reverse-engineering studies primarily sought optimal individual models, not the model-sets we pursue here, and also did not account for non-equilibrium constraints on transition rates [13]. Multiple mechanisms for ideal symport. The four "ideal" kinetic pathways of a hypothetical symporter that transports substrate using the available free energy of the driving ion are shown. This state-space contains eight states, and symport models include at least six connecting transitions. https://doi.org/10.1371/journal.pcbi.1007884.g001

PLOS COMPUTATIONAL BIOLOGY
Motivated by the possibility of discovering new potential mechanisms and the limitations of current "manual" approaches for analyzing molecular machines, we systematically explore the biochemical network model space for a given molecular machine or function. Our approach generates diverse models that are both thermodynamically consistent and testable. Due to their biomedical significance, we first examined transporters motivated by SGLT-type proteins. Our results suggest there is a diverse set of possible mechanisms for these cotransporters, including proofreading driven by an ion leak.

Methods
We have developed a custom software prototype, ModelExplorer, to study molecular machine behavior. Although this study is focused primarily on membrane cotransporters, the software is designed to be general enough for the exploration of other molecular systems. ModelExplorer automatically generates combinatorial state-spaces and then uses a modified Monte Carlo Metropolis algorithm to sample the model space with a user-defined "energy" or fitness function. This fitness function could embody experimental measurements via a weighted sum of residuals (loss function), but here we use functionality-motivated fitness functions. The software can also impose constraints motivated by structural or biochemical knowledge, such as prohibited states or a known order of binding events.

Model specification
We create a system model consisting of states and connecting rate constants. Systems states are created from all the allowed combinations of user-specified conformational and chemical substates, and placed into physically equivalent groupings (see S1 Text). The Monte Carlo sampling generates a trajectory in model space (Fig 2 and see below) that allows the selection of the fittest models, with a tempering procedure used to avoid trapping. Each model is assessed by its steady-state behavior in the current implementation, although transient information could be employed. The generated models may then be analyzed for kinetic pathways as well as flow stoichiometry over a range of chemical potential conditions, resulting in experimentally testable models.
The behavior of each model is determined by the rate constants governing transitions among the states. Only elementary transitions are allowed, i.e., single (un)binding transitions or single conformational changes. To ensure thermodynamic consistency among all rate constants [13], we use an energy-based formulation [34][35][36] where a free energy value is assigned to each state and transition state. To simplify equations, we use reduced units, with all energies expressed in units of k B T. For a conformational transition from state i ! j, the Arrhenius-like first-order rate constant is expressed using Hill's notation [13] as: The user-defined prefactor k 0 is set arbitrarily to 10 −3 s −1 , while E bar ij corresponds to the transition state (free) energy and E i < E bar ij is the free energy of state i. Because k 0 is the pre-factor for all rate constants, its value does not affect the mechanisms discovered, but rather it influences the overall rates as a scaling factor.
For a process i ! j involving binding or unbinding, the ion or substrate concentration is built into an effective first-order rate constant [13] given by:

PLOS COMPUTATIONAL BIOLOGY
with the same parameters as above, and with Δμ ij being the non-equilibrium difference in the chemical potential for the i ! j state transition: Here, Δμ x is the difference in the chemical potential across the membrane for species x (e.g. ion or substrate) [13]: where [x in ] and [x out ] are the intracellular and extracellular concentrations of species x. Note that by construction only one species can have a binding/unbinding event during an i ! j state transition. The factor of 1/2 in Eq 3 is an arbitrary choice for dividing the driving force and results in a symmetrical splitting of the binding chemical potential difference. Although in principle such splitting factors can affect driven processes [37], we found empirically that changing the factor had a minimal effect on the models discovered for the transporter systems of interest. Note that Eq 4 does not include the membrane potential for charged species (e.g., a sodium ion), which is excluded from this study to focus on the simplest cases. Models are initially at a high energy but quickly find local minima. A tempering schedule (see S1 Text) of alternating temperature increases and decreases prevents the simulation from being trapped in local minima. https://doi.org/10.1371/journal.pcbi.1007884.g002 We use a novel string-based approach to construct the state-space for a molecular machine in a combinatorial fashion, filtering out user-defined exclusions. This is best illustrated with an example for a hypothetical alternating-access transporter of substrate (S) driven by a sodium ion (N): the state "OF-Nb-Si" represents the outward-facing (OF) conformation with a sodium bound (Nb) and substrate inside the pertinent cell or organelle. The string OF-Nb-Si fully defines the system state in a sufficient way for our kinetic scheme explained below, with further details given in S1 Text.
In order to investigate Hopfield-like enhanced discrimination [23], the cotransport system with a decoy substrate has additional parameters and constraints. The decoy-bound state free energies differ from their equivalent substrate-bound states by a fixed amount (ΔΔG parameter), and have equal barrier energies. This constraint forces the enhanced selectivity to result from the difference in binding affinities alone (ΔΔG), and not "internal proofreading", which is consistent with Hopfield's kinetic proofreading network [23]. Note that the difference in binding affinities in our Hopfield-like scheme effectively changes the activation barrier height, as seen via Eq 2. See Discussion. We note that occluded states (open to neither inside nor outside) are omitted for simplicity, and that decoy and substrate are mutually exclusive binders.

Model sampling
We use the Monte Carlo (MC) Metropolis-Hastings algorithm [38,39] with tempering to sample model space, where a trial move consists of randomly adjusting a single state or transition energy, E i or E ij , along with the energies of its equivalent tied members. As noted in the SI, the tied members consist of states which are physically equivalent given a steady state where the outside and inside concentrations are fixed. Physically equivalent transitions are those occurring between physically equivalent states; see S1 Text. Adjusting all the "tied" states during a trial move prevents physically equivalent states in the model from having different free energies, maintaining thermodynamic consistency.
For each trial model generated using MC, we evaluate its MC "energy" or fitness based on its steady-state characteristics. To do so, we construct a rate matrix, K, for the new model using the equilibrium and non-equilibrium (binding) energies for each state/transition along with the modified Arrhenius equations, (Eqs 1 and 2). The rate matrix, K, can be written with the state probabilities column vector,P, to form the chemical master equation [14]: where K ij = −α ji for each transition with i 6 ¼ j, and K ii is the sum over outgoing rate constants, ∑ j6 ¼i α ij . The chemical master equation can be solved under steady-state conditions using the definition of steady-state, dP ss i dt ¼ 0, and the additional constraint that ∑ i P i = 1. This yields the steady-state probabilities for each state, from which we can calculate the steady flows between states when combined with the rates [40]: where j ij is the steady-state flow from state i to state j, P ss i is the steady-state probability of being in state i, and α ij is the transition rate between state i and j. The overall flux, J x , of a species (e.g. substrate or ion) is then determined from the sum of net flows of user defined transitions, such as the substrate binding/unbinding transitions in the 'inward' facing conformation (see SI):

PLOS COMPUTATIONAL BIOLOGY
Each model (set of rate constants between states) will have a Monte Carlo 'energy' E MC assigned (i.e. fitness score) based on a user-defined general function. In this study we use substrate flows into the cell, as specified in the Results section for different choices of the fitness function.
Each Monte Carlo step results in a model, and thus each simulation yields a trajectory in "model space" (Fig 2)-i.e., a sequence of models. By convention, lower fitness scores are more fit. The current model's 'energy' is compared to the previous model's energy and accepted/ rejected based on the Metropolis-Hastings selection criterion [38,39]. That is, the probability of accepting a trial move is the usual p ¼ min½1; e À bDE MC �, where ΔE MC is the change in the Monte Carlo 'energies' of the two models, and β is the effective inverse thermal-energy parameter. We emphasize, however, that our MC procedure does not generate a true Boltzmann-distributed thermal ensemble, and the MC β parameter is used only to aid sampling. In order to avoid trapping in deep "energy" (high fitness) basins we employ a tempering [41] procedure, described in S1 Text.

Model analysis
Models can be analyzed in several ways using ModelExplorer: characterizing overall stoichiometries, kinetic pathway analysis, and manual adjustment of selected transition rates. The ion and substrate flux of a model can be calculated over a range of chemical potential differences by incrementing the desired chemical potential difference and updating the non-equilibrium energy terms for each state. The rates, steady state probabilities, flows, and then fluxes are then recalculated for each chemical potential increment, allowing for the examination of stoichiometry. Since ModelExplorer calculates the net flows between states, a kinetic diagram of the network pathways can be made for a model at user-defined chemical potential differences. Note that the absolute flow (and flux) values are not physically relevant due to an arbitrary overall rate constant prefactor: all flows can be scaled by an arbitrary constant and remain consonant with the governing equations.
Furthermore, individual models may be perturbed by modifying specific state/transition energies, leading to insights on pathway characteristics (e.g. pathway with or without leaks). Combining the kinetic pathway diagram with the flux analysis (Fig 3) provides a means to investigate the dynamic behavior of molecular machines and provide testable mechanistic hypotheses. Since the flow and flux values are arbitrary (see above) the ion and substrate flux have been scaled by the maximum flux (i.e. ion flux at largest chemical potential difference).
In addition to single model analysis, we have developed tools for the meta-analysis of the data-i.e., a "systems" analysis. A simple data pipeline allows for the analysis of sampling parameters, run-to-run model distributions, and model clustering, based on the differences in model flows. Sampling efficiency can be evaluated based on the average time to find a sufficiently different model (i.e. cluster) during the simulation as well as the run-to-run comparison. The run-to-run comparison tool computes the minimum distance between models found in simulation 'A' compared to simulation 'B' and vice versa, generating a model distribution between the runs. This provides insight into unique models found under different simulation conditions. Models may also be clustered hierarchically [42], allowing for the discovery of different model classes. The model differences are calculated using the Euclidean distance between the vectors of the scaled (by vector magnitude) net flows between the states of a given model.

Computing details
The current prototype of ModelExplorer was written in Perl 5 with additional scripts for analysis and visualization written using Python 2.7. The software was used on a desktop PC running Windows 10 64-bit OS with an Intel i7-6700 CPU. Each simulation was run for 1e6 MC steps, storing models every 500 MC steps to reduce the run-time and memory requirements. Each run yielded 2000 models. For simulation parameters see S1 Table. All scripts and data for the manuscript are available at: https://github.com/ZuckermanLab/ModelExplorer.

Results
We present results for a range of systems demonstrating the ability of the computational approach to discover both expected and surprising mechanisms in single and competing-substrate environments. For a simple cotransporter system without a decoy, the results include validation of ideal symporter/antiporter mechanisms, selected network topologies, and substrate/ion fluxes for a range of ion chemical potential differences. For cotransporters with an additional decoy substrate, we analyze several features of four simulations: selected network topologies, substrate/decoy/ion flux for varying ion chemical potential differences, network heterogeneity via clustering, and possible alternative mechanisms.

Ideal and mixed-mechanism cotransport with a single substrate
We first studied a transporter system with a single substrate driven by an ion gradient. To generate symport models, the substrate chemical potential difference (Δμ substrate ) was set to 2k B T and the ion chemical potential difference (Δμ ion ) was set to −4k B T; see (4). The fitness goal as embodied in the Monte Carlo energy was set to: where J substrate is the flux of the substrate into the cell (with units of s −1 that are scaled out by the Monte Carlo β parameter) as given by (7). This promotes intracellular substrate flow against its own gradient but down the ion gradient. Note that negative Monte Carlo energies are more fit by convention. The simulation of 1e6 MC steps yielded the four idealized symporter cycles of Fig 1, as well as combinations of the idealized symporter cycles (Fig 3 and see S1  Fig). All of these models exhibit a 1:1 ratio of substrate and ion flux into the cell-consistent with the idealized predictions of an optimized symporter (Fig 1). Antiporter behavior in the same state-space was explored by modifying the substrate chemical potential difference (Δμ substrate ) to −2k B T and setting the fitness goal to: where J substrate is the flux of the substrate (with units of s −1 that are scaled out by the Monte Carlo β parameter). Because MC favors lower energy, the use of (9) promotes extracellular flow of the substrate opposite to the ion gradient and flow. The simulation of 1e6 MC steps yielded antiporter models with a 1:1 ratio of substrate flux out of the cell to ion flux into the cell (see S2 and S3 Figs)-consistent with theoretical expectations for an optimized antiporter.

Discriminative models in the presence of a competing substrate
A primary motivation for this work was the challenge of generating models with particular functions in complex state-spaces where a combinatorial number of possibilities preclude guessing of mechanisms based on intuition. Specifically, motivated by hints that vSGLT exhibited non-productive reversal events (substrate unbinding to the extracellular side) [43], we wanted to investigate whether slippage events [13] might be able to enhance selectivity in the presence of a "decoy" substrate.
To seek models capable of enhancing selectivity for one substrate over another (beyond that generated by their differing affinities, importantly), a competing decoy substrate was added to a transport state-space that included a driving ion gradient. The decoy substrate was set to have a weaker affinity by ΔΔG = 1k B T, but otherwise the substrates are treated identically. The substrate and decoy chemical potential differences (Δμ substrate , Δμ decoy ) were both set to 2k B T, and the ion chemical potential difference (Δμ ion ) was set to −4k B T. The fitness function (Monte Carlo energy) was set to: where, � = 1e−15 improves numerical stability, and J substrate and J decoy are the fluxes of the substrate and decoy, respectively (with units of s −1 that are scaled out by the Monte Carlo β parameter) as defined in (7). The first factor of Eq 10 promotes the intracellular flux of the substrate (negative sign used by convention), while the second factor promotes a high ratio of substrate to decoy flux (i.e. enhanced selectivity). Note that while the models are optimized at an ion chemical potential difference of −4k B T, we are also able to find enhanced discrimination at larger ion chemical potential difference (i.e. concentration) values.

Analysis of a single discriminative model
To highlight the power of the sampling strategy to discover non-trivial mechanisms, we examine the kinetic pathways of a model (Fig 4A) with enhanced selectivity. (Below this is referred to as "Model B"-MC index 29000.1, see S2 Text). This model can be described as a combination of several pathways: two pathways which symport both the ion and substrate (Fig 4C), and two ion leak pathways which only transport the ion (Fig 4B and 4D). Here we have defined an ion leak as a "futile" cycle in the state-space leading solely to dissipation of the ion gradient. We can intuitively understand the discrimination mechanism: the ion leak pathways (which include the central horizontal arrow in Fig 4B and 4D) drive the substrate and decoy to unbind in the outward facing conformation (on the left side of the diagrams). Due to the 1k B T

PLOS COMPUTATIONAL BIOLOGY
difference in binding affinities, the substrate is more likely to rebind and be transported into the intracellular region. In fact, under the conditions which lead to the flows shown in Fig 4A, there is negligible decoy flow into the cell, which is why no flow arrow is shown connecting the two decoy-and-ion bound states. The mechanism of this model directly echoes the driven tRNA unbinding from the ribosome analyzed by Hopfield [23]. Selectivity can be examined over a range of driving ion chemical-potential differences, Δμ ion , by examining the substrate and decoy substrate fluxes (Fig 5). The discrimination ratio of substrate to decoy, J substrate /J decoy , is essentially perfect at the value used to perform the MC sampling, namely Δμ ion = −4k B T, where the decoy flux vanishes while the substrate flux remains significant. Interestingly, in the range −4k B T < Δμ ion ≲ −3k B T, substrate is pumped into the cell, while the decoy flows out, down its gradient.
To further investigate the mechanism of enhanced selectivity, we compared the original model B (Fig 5A) to the same model with the ion leak removed (Fig 5B, note absence of central horizontal flow). The leak-free model-generated by increasing the ion-only-bound conformational transition barrier energy by 100k B T-has symmetric pathways for both substrate and toxin transport (Fig 5B inset). Removing the leak dramatically decreases discrimination, as shown in Fig 5C, especially near the ion chemical potential difference Δμ ion = −4k B T at which MC sampling was performed. In particular, the leak-free model exhibits a decrease in selectivity, approaching the expected equilibrium-like value of e ΔΔG � 2.7 with ΔΔG = 1 (see Fig 5B  and S4 Fig) [23]. This suggests that the ion leak is the prime mechanism driving enhanced selectivity for this model.

Meta-analysis of discriminative models
Our model-sampling approach yields numerous models. We examined them on a "systems basis" using a filtering and clustering procedure.
To start, we filtered for the highest-performing models found during four separate runs of 1e6 MC steps (see S2 Text). Specifically, models were filtered based on the ion-to-substrate flux ratio being greater than 0.1, and the substrate-to-decoy ratio being greater than 10e ΔΔG (ten times the expected discrimination ratio at equilibrium) [23], where ΔΔG = 1 is the difference in binding affinities between the substrate and decoy substrate. These filtering constraints ensured that the analysis only contained models with a sufficient stoichiometry of sodium ions to the substrate, as well as a minimum baseline for enhanced selectivity.
Filtering for performance resulted in 1783 of the aggregate 8000 models exhibiting enhanced selectivity, and we probed these via clustering based on a similarity metric. Clustering revealed four different mechanistic model classes (clusters) with differing kinetic pathways leading to enhanced selectivity (Fig 6). Procedurally, for each model in the filtered set, the flows on all edges were scaled by the Euclidean norm of edge flows in that model. The normalized edge flows define a flow vector used to calculate Euclidean distances and perform complete-linkage clustering [42] based on a distance threshold of 0.65, which we found empirically to reveal qualitatively different pathways.
All model clusters, which importantly were filtered for enhanced selectivity of substrate over decoy, contain an ion leak. As discussed previously (see Fig 4), the models shown in Fig 7  contain "futile" ion-leak cycles, which consist of all the paths that include the central arrow connecting the ion-only-bound outward-facing to inward-facing conformations. In many, but not all models (see models in clusters A, B, D), this ion leak is coupled to substrate and decoy unbinding in the outward-facing state. In other words, the molecules bound to the OF state of the transporter are effectively ejected back to the extracellular space in a process that expends free energy from the ion gradient.

PLOS COMPUTATIONAL BIOLOGY
To further confirm the role of the futile ion cycle in promoting selectivity, the ion leak was removed for each of the representative cluster models (Fig 7A-7D). This was done by raising the corresponding energy barriers as described previously for model B. The resulting leak-free models again exhibited a decrease in selectivity to the expected equilibrium value (e ΔΔG with ΔΔG = 1), indicating that the ion leak is indeed the driving mechanism for enhanced selectivity in our models as in Hopfield-like kinetic proofreading [23]. We note that in our formulation, the difference in binding affinities is effectively a kinetic parameter that adjusts the reaction rate, as seen from (2). Our results are thus consistent with recent theoretical arguments which suggest that only kinetic parameters adjust the ratio of stationary fluxes [26].

Model class analysis
It is instructive to examine all the model classes further to understand their differences. Models corresponding to clusters A and B (Fig 7A and 7B) share a similar network structure where the substrate (or decoy) tends to bind before the ion, followed by the unbinding of both substrate and decoy, which favors the stronger-binding substrate for transport. A and B differ slightly in that models in cluster A contain an ion-only binding transition in the outward conformation and have a single unbinding pathway for the substrate and ion in the inward-facing conformation (i.e. one symporting pathway). The models in cluster B have two unbinding pathways for the substrate and ion in the inward-facing conformation (i.e. two symporting pathways, see Fig 4C).
The model class of cluster C (Fig 7) embodies a much less intuitive mechanism. First, the model is fully connected: each allowed state transition has a non-zero flow. Unlike the other PLOS COMPUTATIONAL BIOLOGY three model classes, class C does not exhibit a net flow of substrate unbinding in the outwardfacing conformation. Model C also contains parallel futile cycles with no net transport for either the substrate and decoy in which the ion dissipates its gradient. In the inward-facing conformation, both the substrate and decoy are driven by the ion leak to bind and then unbind again, resulting in more substrate than decoy flux due to their different binding affinities.
Models in cluster D (Fig 4) contain unbinding steps for both the substrate and decoy in both OF and IF conformations, driven by an ion leak. On the OF side, ion driving appears to force all the decoy to unbind, since there are no horizontal transitions from outward-facing on the left to the corresponding inward-facing states on the right, whereas a fraction of the native substrate remains bound (stronger affinity by ΔΔG = 1k B T) during the OF-to-IF conformational transition; more precisely, there is an equal and opposite flow of conformational transitions for the decoy-bound states, while the substrate exhibits a net productive flow into the cell. The ion leak also drives the substrate and decoy to bind and unbind in the inward facing conformation, but these processes 'cancel out' and do not lead to net flux of the decoy substrate.
For all four models, at low ion chemical potential differences (similar to the magnitude of the substrate/decoy chemical potential difference), there is a regime where the ion and substrate are transported into the cell, and the decoy is transported out of the cell (see Fig 5A and Fig 6). Each of these models exhibits a high level of selectivity due to an ion leak, as discussed in the text. Note that the net flow values shown along edges are scaled by the maximum flow edge of the individual model. We can also probe the costs and restrictions on enhanced selectivity. Although our Monte Carlo sampling optimized substrate-to-decoy selectivity at a particular set of chemical-potential differences, the enhanced selectivity generally occurs over a range of thermodynamic driving forces ( S9 Fig). However, some models exhibit high selectivity at a low cost (i.e., low stoichiometric ratio of ion to substrate flux), without any added constraints on the flux ratios. Analysis of the representative models suggests that uniformly high discrimination over a range of thermodynamic conditions occurs with a correspondingly uniformly high cost (see S8 Fig), although this is not necessarily a representative sample. This issue will be of interest in future work.

Discussion
Overall we have seen that Monte Carlo sampling of model space can find diverse, testable models that provide insight into complicated cotransport systems. Unlike prior related work for ion channels [31][32][33], we have not sought to fit or optimize a single model to a set of experimental data. Rather, we have taken a systems approach in asking for sets of models capable of performing a given function and applied this to discover multiple transporter models with a discriminatory capability not previously envisioned in the literature, to our knowledge. Our main point is not that proofreading surely occurs in transport, but rather that an automated approach is required to consider mechanistic possibilities in a systematic way. First, the model-sampling approach systematically identified ideal and mixture pathways for both simple symporters and antiporters, validating the approach. Subsequent study of a more complex state space including the very realistic possible binding of a "decoy" substrate indicates the possibility of cotransporters that exhibit enhanced selectivity similar to Hopfield's and Ninio's kinetic proofreading models [23,24]. The enhnanced-selectivity models all exhibit an ion leak which, when removed, prevents the enhanced discrimination. This apparently unreported mechanism for secondary active transporters can enhance selectivity to a remarkable degree in a limited range of conditions. Clustering analysis of all models sampled shows a fairly diverse group of model classes that exhibit enhanced selectivity using different kinetic pathways.
Every kinetic model that is generated is also thermodynamically consistent with both equilibrium and non-equlibrium constraints. Since all biochemical reactions are governed by the laws of thermodynamics, this consistency is an important part of accurately modeling the mechanisms of molecular machines. The energy-based formulation of reaction rates α ij , including non-equilibrium effects where appropriate, combined with an automated statespace construction distinguishes our method from similar approaches. We believe the steadystate conditions studied here provide the best simple model for many cellular conditions which may change very slowly-over timescales of minutes, hours, or days-but undoubtedly transient effects are important for some physiological conditions such as release from starvation conditions. Also many experiments study transient phenomena by construction [7,8,10,11] and so it will be valuable to use our models accordingly in future work.
Although our approach was not developed for the study of evolutionary molecular biology, the method "discovers" working models starting from unproductive models akin to earlier work [29][30][31][32][33]. The changes to rate constants observed in some trajectories may be related to structurally feasible changes. In the future, a more sophisticated approach could attempt to build in additional structural constraints to develop candidate feasible pathways for transporter evolution. We note that the present study already included constraints on the similarity of substrate and decoy, as well as their mutually exclusive binding.
In addition, by utilizing additional 'base states' (i.e. Ni, No, Nb, see S1 Text) our method may be extended to include an arbitrary number of additional binding sites for a given species. This will generate a larger state-space to search for models with a range of possible substrate/ solute binding sites and pathways. This capability is particularly useful to study systems which may exhibit variable (or unknown) substrate/solute stoichiometries [10] and is a direction for active work.
All the enhanced-selectivity models sampled in this study, by construction, were of the 'Hopfield type' [23] where substrate discrimination results from the interplay of an external driving force (due to the ion gradient) and a difference in binding affinity between the two substrates. No additional differences between the substrate and decoy were permitted: specifically, all barrier energies were constrained to be identical for both species. Note that by Eqs (1) and (2), our barrier energies E bar are absolute energy levels and not the conventional "barrier heights," which are energy differences E bar ij À E i . Thus, in our Hopfield-like model with equal E bar ij values for substrate (S) and decoy (W), the effective barrier heights differ between S and W because the stabilities (affinities) E i differ. In other words, the kinetic parameters are coupled to the affinity/stability parameters, although formulations can be constructed in which they are independent [26].
Hopfield-type proofreading differs from what might be called "internal proofreading" where additional discrimination results from differing barrier heights [25]. Biological proofreading can be expected to mix both mechanisms to some degree [26], but we chose to focus on Hopfield proofreading because an arbitrary degree of discrimination can result from differing barriers-for example, if the decoy species has a negligible on-rate for the transporter. Our approach differs somewhat from Hopfield's and Nino's [23,24] in that no irreversible steps enter our models. The full reversibility appears to be a necessary ingredient in the ideal discrimination (unbounded ratio of substrate to decoy flux) that occurs in some models for specific concentrations.
As emphasized decades ago by Hill [44], in complex networks such as those explored here, it is the network as a whole rather than key steps which define the mechanism. The current networks are simple enough that we can point to intuitive processes coupling ion leaks to unbinding, but the mechanism is defined by the overall process. Undoubtedly, in more realistic networks including a fuller set of conformational states, it will become more difficult to describe an intuitive mechanism. Nevertheless, the principles uncovered in the simple systems can provide useful guidance for conceptualizing complex systems.

Conclusion
Motivated by evidence for the alternative behavior of molecular machines, we have developed a thermodynamically consistent approach to systematically explore a range of mechanisms, generating multiple experimentally testable kinetic models based on a predetermined fitness function. Prior work has focused on developing individual sets of optimal parameters [31][32][33] and not on generating model sets, which we believe is essential for developing precise mechanistic hypotheses. The approach was designed for complex state spaces which can be automatically generated, and which would be difficult to analyze by intuition alone. To our knowledge, the platform is the first to enable model sampling building in both equilibrium and non-equilibrium thermodynamic rules. This 'systems biology' approach to analyzing mechanisms of molecular machines was applied to a cotransporter state-space with and without a decoy substrate. After validating the method against 'textbook' symporter/antiporter models, we generated a variety of mechanisms that enhance selectivity-including Hopfield-like proofreading networks, which could have important biological implications and biotechnology applications.
The mechanisms discovered using an automated approach would be difficult to design on an ad hoc basis starting from a limited set of experimental structures.
Supporting information S1 Text. Detailed methods. Details regarding the string-based creation of states, state definitions, equivalent states/transitions, tempering procedures, and flux calculations. (PDF) S2 Text. Model clustering method. Details on the clustering procedure, parameters, and the chosen representative models. (PDF) S1 Table. Simulation parameter values. Table containing  Simulations were run for transporters in a 'competitive' environment with a decoy. Models were filtered based on a cost (ion to substrate flux ratio) below 10, and selectivity (substrate to decoy ratio) above 10e ΔΔG=1 . Clusters were determined using hierarchical clustering with complete-linkage and the Euclidean distance between the scaled flows of each model. The threshold of 0.65 was determined empirically to produce qualitatively different kinetic pathways. These graphs indicate that each run only finds a few model classes during the simulation-implying the need for improved sampling methods. Note that in run 4, models meeting the selection criteria (i.e. cost and selectivity) were not found until approximately 1.5e5 MC steps. (TIF)

S8 Fig. Cost of the representative models.
The cost of the representative model for each cluster, over a range of ion chemical potential differences. All of these models exhibit a cost above the ideal 1:1 stoichiometric ratio for a wide range of chemical potential differences. The extra ions transported relative to the substrate suggest a futile ion transport cycle-i.e. an ion leak. Note that the cost was not included as a constraint in the energy function. The stoichiometric ratio of the substrate to ion flux (selectivity), over a range of ion chemical potential differences. All the models demonstrate enhanced selectivity over a range of chemical potential differences, and unbounded selectivity at the optimized condition (at Δμ ion = -4k B T). Models A and D exhibit enhanced discrimination over a wide range of conditions. The expected equilibrium value (e ΔΔG=1 ) is shown as a reference. Note that this stoichiometric ratio was used as a primary constraint in our energy function. (TIF)