LASSI: A lattice model for simulating phase transitions of multivalent proteins

Many biomolecular condensates form via spontaneous phase transitions that are driven by multivalent proteins. These molecules are biological instantiations of associative polymers that conform to a so-called stickers-and-spacers architecture. The stickers are protein-protein or protein-RNA interaction motifs and / or domains that can form reversible, non-covalent crosslinks with one another. Spacers are interspersed between stickers and their preferential interactions with solvent molecules determine the cooperativity of phase transitions. Here, we report the development of an open source computational engine known as LASSI (LAttice simulation engine for Sticker and Spacer Interactions) that enables the calculation of full phase diagrams for multicomponent systems comprising of coarse-grained representations of multivalent proteins. LASSI is designed to enable computationally efficient phenomenological modeling of spontaneous phase transitions of multicomponent mixtures comprising of multivalent proteins and RNA molecules. We demonstrate the application of LASSI using simulations of linear and branched multivalent proteins. We show that dense phases are best described as droplet-spanning networks that are characterized by reversible physical crosslinks among multivalent proteins. We connect recent observations regarding correlations between apparent stoichiometry and dwell times of condensates to being proxies for the internal structural organization, specifically the convolution of internal density and extent of networking, within condensates. Finally, we demonstrate that the concept of saturation concentration thresholds does not apply to multicomponent systems where obligate heterotypic interactions drive phase transitions. This emerges from the ellipsoidal structures of phase diagrams for multicomponent systems and it has direct implications for the regulation of biomolecular condensates in vivo.

Many biomolecular condensates form via spontaneous phase transitions that are driven by multivalent proteins. These molecules are biological instantiations of associative polymers that conform to a so-called stickers-and-spacers architecture. The stickers are protein-protein or protein-RNA interaction motifs and / or domains that can form reversible, non-covalent crosslinks with one another. Spacers are interspersed between stickers and their preferential interactions with solvent molecules determine the cooperativity of phase transitions. Here, we report the development of an open source computational engine known as LASSI (LAttice simulation engine for Sticker and Spacer Interactions) that enables the calculation of full phase diagrams for multicomponent systems comprising of coarse-grained representations of multivalent proteins. LASSI is designed to enable computationally efficient phenomenological modeling of spontaneous phase transitions of multicomponent mixtures comprising of multivalent proteins and RNA molecules. We demonstrate the application of LASSI using simulations of linear and branched multivalent proteins. We show that dense phases are best described as droplet-spanning networks that are characterized by reversible physical crosslinks among multivalent proteins. We connect recent observations regarding correlations between apparent stoichiometry and dwell times of condensates to being proxies for the internal structural organization, specifically the convolution of internal density and extent of networking, within condensates. Finally, we demonstrate that the concept of saturation concentration thresholds does not apply to multicomponent systems where obligate heterotypic interactions drive phase transitions. This emerges from the ellipsoidal structures of phase diagrams for multicomponent systems and it has direct implications for the regulation of biomolecular condensates in vivo.

Author summary
Spatial and temporal organization of molecular matter is a defining hallmark of cellular ultrastructure and recent attention has focused on membraneless organelles, which are also referred to as biomolecular condensates. Of  Introduction that drive phase transitions are biological instantiations of associative polymers [44] characterized by a stickers-and-spacers architecture [45]. Stickers contribute to a hierarchy of specific pairwise and higher-order interactions that are either isotropic or anisotropic whereas spacers control the concentration-dependent inhomogeneities in the densities of stickers around one another. Stickers can be hot spots or sectors [46] on the surfaces of folded proteins [15,29] or short linear motifs within intrinsically disordered regions (IDRs) [15,24,47]. Spacers are typically IDRs that contribute through their sequence-specific effective solvation volumes to the interplay between density transitions (phase separation) and networking transitions that are better known as percolation [28,29]. Spacers can also be folded domains that are akin to uniformly reactive colloidal particles, although this has not yet been explored. Proteins can be mapped onto the stickers-and-spacers architecture as linear multivalent proteins, branched multivalent proteins, or some combination of the two [13,15]. Simple two-component systems comprise of the solvent (which includes all components of the aqueous milieu) and a multivalent protein / RNA molecule. For fixed solution conditions, one can generate phase diagrams [25] as a function of protein concentration, the valence of stickers, the affinities of stickers, the sequence-specific effective solvation volumes of spacers, and the lengths / stiffness of spacers. The phase diagram can be investigated by keeping the valence of stickers, the lengths of spacers, and effective solvation volumes of spacers fixed while varying the concentration of stickers and the affinities between stickers [29]. Changes to protein concentration will enable density fluctuations and above the saturation concentration, designated as c sat , the density inhomogeneities lead to separation of the system into coexisting phases. The concentration of multivalent proteins in the dilute and dense phases will be denoted as c sat and c dense , respectively. For a given bulk concentration c bulk that lies between c sat and c dense , the fraction of molecules within each of the coexisting phases is governed by the lever rule [48].
Stickers also form reversible physical crosslinks and these crosslinks generate networks of inter-connected proteins. The number of proteins within the largest network of the system grows continuously as the protein concentration increases. Above a concentration threshold known as the percolation threshold and designated as c perc , the single largest network spans the entire system and this phenomenon is called percolation [49][50][51]. If the percolated networks have the rheological properties of viscoelastic fluids, the fluids are referred to as network fluids [15,52].
Phase separation and percolation can be coupled to one another. The coupling will depend on the values of c sat , c dense , and c perc relative to c bulk . If c bulk is smaller than all of c sat , c dense , and c perc , the system is in a single dilute phase with no large molecular networks (Fig 1A). If c bulk > c perc and c perc < c sat , then a system-spanning percolated network forms without phase separation (Fig 1B). However, the system undergoes phase separation and a dense phase forms as a percolated droplet if c bulk > (c sat , c perc ) and c sat < c perc < c dense (Fig 1C). Recent studies, using three-dimensional lattice models designed to mimic the poly-SH3 and poly-PRM systems of Li et al. [16], show that sequence-specific effective solvation volumes of linkers / spacers between folded domains directly determine whether phase separation and percolation are coupled or if percolation occurs without phase separation for linear multivalent proteins [28,29]. The coupling between phase separation and percolation is controlled by the extent to which spacers / linkers preferentially interact with the surrounding solvent.
Theory [17,24,25,27,34,[53][54][55][56][57][58][59] and computations [28,29,43,[60][61][62][63][64][65][66][67][68] have important roles to play in modeling and describing the phase behavior of multivalent protein and RNA molecules. Theories provide analytical routes to explain experimental observations and to make testable predictions. On the other hand, simulations work around many of the simplifying assumptions that are needed to make theories analytically tractable. In doing so, they provide numerical routes to enable comparative assessments across different systems; they help in making testable predictions about phenomenology through what if calculations targeted toward specific systems; and they pave the way for designing systems with bespoke phase behavior.
Phase transitions are collective phenomena that involve highly cooperative transitions of large numbers of multivalent polymers. The collective interactions that drive phase transitions are captured in terms of a small number of order parameters that are similar across disparate systems and represent a generic coarse-graining of the underlying system that defines parameters such as the correlation length and the sizes of cooperative units. Accordingly, practical considerations of computational tractability and rigorous considerations of identifying the relevant collective coordinates mandate the use of coarse-grained models for simulations of wherein the polymer chains form a percolated, system-spanning network through physical crosslinks among stickers results. (c) Droplet wherein network formation also causes the polymers to form condensed phases. (d) Two-dimensional representation of the LASSI architecture. The beads with arms denote stickers where arms denote that the monomers are capable of orientational interactions, and the curved lines connecting the monomers represent phantom tethers, which are allowed to freely overlap (implicit spacer model). Different colors denote different sticker and spacer species respectively. Note that the physical bonds are allowed to overlap (dashed circle). For the rest of this work, physical bonds will not be labeled and will only be depicted as overlapping orientational arms. phase transitions driven by multivalent protein and RNA molecules. We focus here on multivalent proteins, although the methods we describe are readily adaptable to RNA molecules as well.
Coarse-graining, an essential aspect of making simulations of large numbers of multivalent proteins a tractable proposition, comes in different flavors [69]. For simplicity, we divide considerations that go into the development of a suitable coarse-grained model into three categories (Fig 2). These are (1) the type of model, (2) the types of interactions among the entities in the simulation, and (3) parameterization of the interaction potentials for the model of interest. Two distinct choices for the type of model are the choice between simulations being performed using lattice models versus off-lattice models. In either space, one or all of the molecules can be represented explicitly using architectures that represent coarse-grained mappings of the protein of interest. Next, the interactions among the units that make up each protein can be modeled as being isotropic or anisotropic. This is true of simulations where proteins of interest are modeled explicitly. In contrast, numerical instantiations of field theoretic models model can also be brought to bear where only a single chain is modeled explicitly [60,70]. The remaining protein and solvent molecules are modeled as fields whose fluctuations are concentration dependent [71]. The effects of all other molecules influence the phase behavior of the explicitly modeled single chain through interactions of the chain with the field. Finally, the choice of interaction potentials is the bedrock of every simulation. The functional forms and parameters for potentials can be derived using phenomenological considerations intended to enable calculations of the "what if" variety-an approach that is common practice in statistical and polymer physics. One can also obtain system-specific parameters using information gleaned from atomistic simulations of smaller-scale facsimiles of the system of interest. These system-specific parameters are derivable using force matching methods pioneered by Voth and coworkers [72][73][74][75][76][77] or by prescribing a functional form for the potential that describes interactions in the coarse-grained space and employs machine learning methods to derive the relevant parameters [74,78]. Finally, one can adopt approaches similar to the parameterization of molecular mechanics forcefields and develop a single transferable model that should be applicable to a large number of disparate systems.
Different coarse-grained simulations represent different combinations of model, interaction type, and parameterization. Two illustrative examples for deriving coarse-grained models for simulations of phase behavior of multivalent proteins come from the works of Ruff et al. [78] and Dignon et al. [64,65]. Ruff et al. show how one can generate off-lattice models, of bespoke resolutions and learned parameters for isotropic potentials derived using machine learning that leverages information gleaned from atomistic simulations of individual proteins and protein oligomers. Dignon et al. also use an off-lattice model based on isotropic potentials whose parameters are designed to be transferable across disparate intrinsically disordered proteins.
It is worth emphasizing that at this juncture, there is no valid reason to stipulate that one combination of approaches for deriving a coarse-grained model is superior to another combination. As noted by Das et al. [67,68], all models have distinct strengths and limitations. However, for specific applications, some methods afford quantifiable computational advantages over others. In our case, we are interested in uncovering conceptual nuances of phase diagrams for multicomponent systems that comprise of multivalent proteins characterized by anisotropic interactions among domains / motifs. As noted above, these systems can be mapped onto a stickers-and-spacers architecture. The questions we are interested in answering pertain to the order parameters that describe phase behavior, the impact of chain connectivity and spacer effective solvation volumes on phase behavior, and the determinants of the shapes of phase diagrams of multicomponent systems where phase transitions are driven by heterotypic as well as homotypic interactions. In this context, it is noteworthy that lattice models have been adapted to model phase transitions for systems comprising of different numbers of multivalent protein and RNA molecules [28-30, 43, 79-81].
In the present work, we provide a formal description of the design and implementation of system-specific lattice models for simulating phase transitions of multivalent proteins. The simulation engine, known as LASSI for LAttice simulation engine for Sticker and Spacer Interactions, formalizes the approaches that have been developed and deployed in recent studies [28-30, 79, 80]. Accordingly, LASSI combines a lattice model with anisotropic interactions among stickers and the model, at least in the current formalism, is derived based on phenomenological considerations (Fig 2). Ongoing work shows that a machine learning methodology known as CAMELOT [78] can be adapted for using LASSI as a tool to model sequence-specific phase behavior. We describe the design of LASSI, focusing first on the overall structure of the model, the Monte Carlo sampling, and their justification for generic multivalent proteins. We further describe the calculation of order parameters for quantifying phase separation and percolation. Then, using two specific examples of linear and branched multivalent protein systems, we illustrate the deployment of LASSI to two biologically relevant systems. In both systems, we make a priori assumptions regarding the identities of stickers and spacers, which is a requirement for the deployment of LASSI. Although we focus here on systems with a few components, it should be emphasized that the design of LASSI is able to handle a wide range of multicomponent systems.

Materials and methods
Considerations that go into the development of a suitable lattice model include (a) the choice of the mapping between a specific multivalent protein of interest and a lattice representation, (b) the parameterization of the strengths and ranges of interactions for all unique pairs of beads and vacancies, (c) the design of move sets and acceptance criteria for Monte Carlo simulations that enable the sampling of local and collective motions of large numbers of latticeinstantiated multivalent proteins, (d) the efficient titration of key parameters such as protein concentrations and interaction strengths, and (e) the extraction of phase boundaries in terms of known and hidden collective parameters, which become the relevant order parameters for phase transitions of interest.

Generating lattice representations of multivalent proteins
For a given linear or branched multivalent protein, we first choose a suitable mapping between the protein degrees of freedom and a lattice representation. The conformational space is a simple cubic lattice with periodic boundary conditions used to mimic a macroscopic system. Phase transitions represent the collective effects of large numbers of molecules, and simulations have to include at least 10 3 -10 4 protein molecules to observe facsimiles of these collective transitions in finite sized systems [82]. Further, we need to be able to test for the effects of finite size artefacts and this requires a titration of the effects of varying the numbers of molecules. Accordingly, the lattice has to be large enough to accommodate at least 10 3 molecules of each type for the most dilute concentrations. Often, we might need to increase the number of molecules to be of O(10 4 ). Accordingly, a one-to-one mapping between the protein degrees of freedom and a lattice representation would lead to a computationally intractable model. Instead, we adopt system-specific coarse-graining approaches, whereby the coarse-graining is guided by a priori rigorous or phenomenological knowledge of the identities of stickers versus spacers. For disordered proteins, the stickers within disordered regions often correspond to single amino acid residues or short linear motifs. For multivalent folded proteins, the stickers are either an entire protein domain or sectors on domain surfaces [28,29]. Residues corresponding to spacers may either be modeled explicitly, where one or more spacer residues are modeled by a single bead on the lattice site, or be modeled as phantom tethers, where the intrinsic lengths of tethers are calibrated in terms of the numbers of lattice sites [28,29]. In both cases, the tethers can stretch, bend, and rotate and these degrees of freedom contribute to density inhomogeneities that are the result of altered patterns of inter-sticker interactions.

LASSI and bond fluctuation models
The structure of LASSI is inspired by the bond fluctuation model (BFM) for lattice polymers [83]. This is a general lattice model for simulations designed to extract equilibrium conformational distributions and dynamical attributes of polymers in dilute solutions as well as dense melts. There are two versions of the BFM viz., the Carmesin-Kremer BFM or CK-BFM [84] and the Shaffer BFM or S-BFM [83]. Both models are based on the use of simple cubic lattices, which discretizes the conformational space for polymers.
In the CK-BFM [84], each repeating unit or monomer within a polymer is modeled as a 3-dimensional cube where the 8 corners of the cube occupy lattice sites and bond vectors connect pairs of monomers. Overlap of monomers is associated with an energetic penalty, and each bond vector can have up to 108 distinct directions. The choice of bond vector set encodes the geometry of the polymer and places constraints on the bond lengths and bond angles. All other interactions are governed by the inter-monomer potentials, and evolution of the system through conformational space is driven by changes to the overall potential energy. In contrast, the S-BFM places each monomer on a single lattice site. Covalently bonded monomers are connected by bonds that are constrained to be of three types, leading to chains that have bonds of length 1, ffi ffi ffi 2 p or ffi ffi ffi 3 p in units of lattice size. Monte Carlo moves with suitable acceptance criteria can be designed for both types of BFMs. The simulations are used to generate equilibrium conformational distributions of lattice polymers in either dilute or dense phases. The move sets control the overall polymer dynamics and the acceptance of different types of moves and the calculation of correlation functions allows one to compute dynamical quantities for lattice polymers [83]. If we were to use either of the established BFMs without modification, then each amino acid residue would be modeled as a monomer, and such an approach would be useful when the identities of stickers and spacers remain ambiguous. This approach underlies a different simulation engine known as PIMMS [43]. LASSI is a generalization of the S-BFM that also adapts features of the CK-BFM. Given a choice of the mapping for coarse-graining, each multivalent protein is described as a chain of non-overlapping monomers viz., beads that occupy sites on a 3-dimensional cubic lattice. Note that the choice of a single site per bead is similar to that of the S-BFM, although the bead, which is a sticker or spacer monomer, need not be the monomeric unit, i.e., an amino acid residue in the case of proteins. Each sticker monomer is linked to its adjacent sticker on the chain via either a phantom tether or a set of spacer beads that occupy individual lattice sites [28,29]. A spacer / tether length of unity implies that adjacent monomers are within ffi ffi ffi 3 p lattice units of one another (Fig 1D). The choice of the spacer length will be sequence-specific or more precisely, specific to the architecture of the protein of interest.
Inter-monomer (sticker-sticker, sticker-spacer, and spacer-spacer) interactions are modeled as contact-based pairwise interactions. A sticker monomer can bind to another sticker monomer that occupies an adjacent lattice site with an interaction energy that depends on the types of both monomers. Monomers are considered to be adjacent to one another if they are within a lattice distance of ffi ffi ffi 3 p . By this criterion, each lattice site occupied by a sticker monomer will have 26 adjacent lattice sites. This is reminiscent of the interaction geometry of a CK-BFM for each monomer. In the current implementation of LASSI, the interactions are mutually exclusive, implying that a sticker cannot interact simultaneously with more than one other sticker, even though there are 26 adjacent sites that the interaction partner can occupy. If the sticker in question is already engaged in another inter-monomer interaction with stickers or spacers, then the unoccupied sites of the sticker will be unavailable for interaction. The combination of the geometry of the interaction sites per monomer and the single occupancy constraint leads to anisotropic interactions between sticker interactions. This feature is unique to LASSI and it is not incorporated in other variants of BFMs; this allows us to deploy LASSI for modeling heteropolymeric systems. In the context of LASSI, we note that stickers are distinguished by their ability to participate in anisotropic or isotropic interactions. In contrast, explicitly modeled spacer sites only participate in isotropic interactions with other spacer or sticker sites. Furthermore, the interaction strengths involving spacers are typically weaker than those involving stickers. However, it is worth emphasizing that these distinctions only matter inasmuch as LASSI allows us to capture a numerical instantiation of the stickers-and-spacers model. For simplicity, one might simply think of LASSI as a model that has sites that are differentiated by whether or not they can involve themselves in anisotropic interactions, by their intrinsic site valence (a variable that we do not titrate in this work), and by the comparative magnitudes of site-site interaction strengths.

Setup of simulations
A system with n multivalent proteins is in reality an n+1 component system since the solvent is the implicit component. In LASSI, sites that are not occupied by protein units automatically represent solvent sites. Although the interaction potentials do not explicitly include terms between solvent and protein sites, the effective interaction strengths between pairs of protein units represent an averaging over protein-protein, protein-solvent, and solvent-solvent interactions. The solvent sites, i.e., the sites that are not occupied by protein units, represent contributions from the solvent to the overall translational and mixing entropies. Simulations are initiated by randomizing the positions of protein units, subject to the constraints of chain connectivity.
The parameters that are set at the start of each LASSI simulation include the total number of molecules n i of type i and the size of the lattice L, from which we can calculate the total number n of all protein components n ¼ X i n i and the concentration or number density of each protein c i = n i /L 3 . The setup also includes stipulations for the architectures of each protein such as specification of the number of monomers per chain, the overall topology of each protein (linear vs. branched), the lengths of spacers, and the types of spacers (implicit / phantom vs. explicit) [28-30, 79, 80]. The number of monomers per molecule will equal the sum of the number of stickers and spacers if spacer residues are modeled explicitly. Alternatively, if spacers are modeled as phantom tethers, then the number of explicitly modeled monomers will equal the number of stickers. Specification of the energetics of the system includes specification of the simulation temperature in normalized units, homotypic and heterotypic interaction strengths between pairs of stickers, the energetic cost for the overlap of stickers, and the interaction strengths between sticker and spacer sites if the spacers are modeled explicitly.

Design of monte carlo move sets
Our goal is to compute architecture-specific phase diagrams for systems comprising of one or more types of linear or branched multivalent proteins. This requires a simulation strategy that enables the sampling of the full spectrum of coexisting densities and networked states for multivalent proteins. Accordingly, the conformations of randomly initialized systems of proteins on a simple cubic lattice are sampled via a series of Markov Chain Monte Carlo (MCMC) moves that are designed to ensure efficient sampling of changes in protein density and networking while maintaining microscopic reversibility. We have developed and deployed a collection of moves and these are described below.

Monte carlo sampling with biases
In LASSI, we have independent contributions from two main energetic sources. Monomer units are not allowed to overlap, and this can be described by a position-dependent energy E pos where E pos = 0 or 1. On the other hand, inter-monomer pairwise interactions also contribute to the total energy, and E rot denotes the sum over all of the effective pairwise intermonomer interaction energies. The subscript "rot"(rotational) indicates the fact that for a pair of nearest neighbor stickers their interaction energies are actually governed by their mutual orientations. Accordingly, the total system energy in a specific configuration i is written as: The equilibrium probability associated with configuration i is given by the Boltzmann distribution as: In Eq (2), β is the inverse of the simulation temperature in units of the Boltzmann constant (effectively, k B = 1 energy unit / temperature unit). The frequency with which a transition from configuration i to j is proposed will be governed by the elements T ij of the targeting matrix T. The proposed transition is accepted / rejected based on the elements A ij of the acceptance ratio matrix A. A MCMC move that transitions the system from configuration i to j defines a flow in configuration space and this flow has to satisfy microscopic reversibility: If the targeting matrix is symmetric, then T ij = T ji and the acceptance ratios such as those prescribed by Metropolis et al. [85] will ensure the preservation of microscopic reversibility. However, if biases are incorporated into the targeting matrix, which is often essential to enhance the sampling of configurations that contribute to density inhomogeneities and the making / breaking of bonds in dense networks, then the elements of the acceptance ratio matrix have to be designed to ensure the preservation of microscopic reversibility. We deploy a general strategy of using biased moves to enhance the sampling of different mutual orientations among pairs of stickers. The incorporation of these orientational biases is accounted for by modifying the acceptance criterion of Metropolis et al. [85] whereby each element of A is written as: For a symmetric targeting matrix, we recover the standard acceptance ratio of Metropolis et al. [85] viz., Since the moves within LASSI generally involve orientational biases, the elements T ij are rewritten in terms of a Rosenbluth weighting factor W j [86,87] whereby: Substituting (6) into (4) leads to: The specific form for the weighting factors W i will depend on the type of move because the extent of asymmetry in the targeting matrix will depend on the nature of the bias incorporated into the biasing move that proposes a transition from i to j. The specific forms for weighting factors are discussed in the context of the move types that are introduced next.

Rotational moves
A monomer is in an associated or a dissociated state and in the associated state it has a specified binding partner. This defines the rotational state of a monomer. To change the rotational state, we randomly pick a monomer from the system, and exhaustively sample all 26 adjacent sites to construct a list of potential binding partners. The rotational state of the monomer is changed, at random, by drawing a random integer k from a uniform distribution between [0, b], where b is the number possible binding partners available to the monomer. If k = 0, the monomer is set to be in a dissociated state. Otherwise, the k th candidate bond (reversible physical crosslink) is formed and the state of the monomer is set to be in an associated state. If the monomer cannot be involved in a rotational interaction, as would be the case for an explicitly modeled spacer, the rotational move is rejected outright. The accessible volume for rotational interactions is within a cube of unit volume centered on the randomly chosen monomer (see S1 Fig for a 2-dimensional representation), and hence, each sticker monomer will have at most b max = 3 3 -1 possible sites as neighbors. Since this number is not large, we sample all 26 possible interaction sites.

Local moves
This move serves as the basic unit of local displacement of monomers-be they stickers or spacers. A randomly chosen monomer is moved from position r i to r j = r i + Δr. Acceptance of the move is predicated on the move not leading to an overlap with a site occupied by another monomer and the satisfaction of linker constraints. The choice for Δr is made by uniformly sampling each component from the interval [-2,2] such that jDrj � 2 ffi ffi ffi 3 p (shown in S2 Fig) moves if the selected monomer is in the interior of a molecule [88], and they become analogous to end rotation moves if end monomers are selected [89].
Local moves have a rotational bias in LASSI and the Rosenbluth factor is calculated as follows. Starting with Eq (7), we shall designate the chosen monomer by index k. In configuration i, assume that monomer k has a binding partner of index l. Typically in coarse-grained systems there are a finite number of unique monomer types, and thus it is more efficient to simply define interaction energies between different monomer types than between all monomer pairs. The energy associated with the bond between monomers k and l is written as ε t(k)t(l) , where t (x) indicates the type of monomer x. The local move causes a change in binding partner, whereby the monomer k now binds to monomer m. The local move leads to a bond swap that causes a change in rotational energy, which is written as: Use of Eq (8) in Eq (6) leads to: In Eq (9), each Rosenbluth weight has an additional index in the subscript to indicate that the change in configuration is achieved by a change in the binding partner for the monomer k.
To accelerate the creation of density inhomogeneities in supersaturated systems and facilitate the making and breaking of networks, we decompose W i;k as: The two terms in Eq (10) respectively represent the contributions to the Rosenbluth weights for monomers in associated (a) and dissociated states (d). First, we calculate the weight factors for the interacting monomers as a partition function over all nearest neighbor contributions, such that: In Eq (11), the summation runs over all potential binding partners l (nearest neighbors) for the monomer k. To illustrate how the Rosenbluth factor is calculated, we assume that the system has only one type of interaction with the pairwise energy designated as ε. If the number of nearest neighbors for monomer k in configuration i is designated as N k;i , then the Rosenbluth weight factor in Eq (11) becomes: Setting W ðdÞ i;k ¼ 1 to incorporate a bias towards associated states, and using the simplification that leads to Eq (12), we rewrite Eq (9) as a definition of the acceptance criterion for the move from configuration i to j via local move involving monomer k as: This choice for the acceptance criterion ensures that detailed balance is preserved while enhancing the sampling of configurations characterized by the breaking of old bonds and the forming of new ones.

Reptation-or slithering snake-moves
In dense configurations, it becomes difficult to realize large-scale translational or rotational motions of polymers. The slithering snake move is a Monte Carlo instantiation of reptation as first conceived by de Gennes [90]. In this move, a chain is chosen at random, and the monomer at one end of the target chain is moved to a new position. The remaining monomers within the target chain are then successively moved such that monomer m along the chain moves into the previous position of monomer m-1 (S3 Fig). This move relies on an inherent symmetry of chain molecules, because bond lengths between monomers are the same; if one swaps monomers across chains, the identity of the chain remains invariant. However, this move cannot be used if the molecule has heterogeneous bond lengths or if it is a branched polymer.
The reptation move is rotationally biased, and this is true for every monomer in a chain. The bias is independent for each monomer and accordingly, the Rosenbluth factor for a single reptation move can be calculated from the Rosenbluth factors for each monomer-specific local move. In configuration i we obtain: In Eq (14), the product runs over all monomers m within the chain of interest. The acceptance criterion for a reptation move takes the form: The inclusion of the bias for every interacting monomer, rather than just the end monomers, is to emulate how a real transiently bonded polymer would slither along its contours. Note that for strict detailed balance, the Rosenbluth factors for the two end monomers should be calculated and added to the acceptance criterion, but the current implementation of LASSI uses the first trial position that satisfies the position constraints.

Double pivot moves
These moves swap a part of a chain with the corresponding part of another chain of the same type. A monomer is picked at random; it is denoted as i m , where i is the monomer index within a chain, and m is the chain index. A search is then performed around the monomer within a prescribed distance for a monomer within the same type of chain. The requirement for the search is that the monomer of interest be one index ahead along its own chain, (i+1) n . Next a check is performed to ensure that the distance between i m and (i+1) n is within the bond constraint connecting i m to (i+1) m , and that the distance between (i+1) m and i n is within the bond constraint for i m and (i+1) m . Each monomer (i+1) n that satisfies each of these constraints is stored and one of these is randomly picked for the double pivot move (S4 Fig). The move is always accepted if there is a candidate because only connectivity changes unless bonds are modeled using elastic springs. The purpose of this move is to engender large configurational changes in dense polymer melts, which approximate the dense phases formed upon phase separation. In dense regions the rate of acceptance of local moves decrease precipitously. At high enough densities, polymers become entangled and local moves reduce to slithering-snake moves and polymers are restricted to motions along tubes around one another [91,92]. Therefore, rather than physically moving polymers to create a change in configurations, we incorporate move sets that break and make bonds while ensuring that monomers do not overlap, and that bond constraints are always satisfied. If two chains are close enough to each other that the bonds between two monomers can be swapped, then such the double pivot move results in a large configurational change for both chains, and for the system.

Chain and cluster translation moves
The chain translation move is designed to move single chains while forming new bonds at the proposed location. This move attempts to translate the center of mass of a chain i from r i to and L is the size of the simulation cell. Multiple trial displacements are proposed until a trial position that does not result in steric clashes for the entire molecule is generated. The move is then attempted. As with the slithering snake move, each monomer in the molecule that is translated will have an orientational bias. Accordingly, the Rosenbluth factors are calculated as in Eqs (14) and (15). The translational move results in large displacements for single chains and correctly biases the system for efficient sampling of configurations with alternate interaction patterns.
Translational moves can also be applied to clusters of molecules. A connected cluster refers to a collection of unique chains connected via rotational interactions. A proposed move only results in a translation, and the move is readily accepted if there is no steric clash. Since no new physical bonds are created at the proposed location, the cluster remains invariant and the move is accepted. Naively this move might seem unnecessary as this move simply moves clusters around. However, once a physical bond has formed between two molecules, it is highly unlikely for any of the non-cluster translation moves to move the centers-of-masses of clusters closer together.

Considerations for setting move set frequencies
The structure of each move set serves as a guide for selecting an optimal set of frequencies. This leads to a set of heuristics that are as follows: (i) in the cluster move we pick a random chain from the system, perform a networking analysis on that chain, and then propose a displacement of the cluster. As the cluster size grows it is more likely that a randomly picked chain will be part of the largest cluster which itself will result in a steric clash after the proposed move. Therefore, the frequency of the cluster move should be low, if not the lowest, in the entire set. (ii) In the translational move, we pick a random chain from the system for translation; as the size of the largest cluster increases it becomes less likely for a proposed translation move to be accepted. However, unlike the cluster move the translation move is rotationally biased and thus results in new interactions being formed. Hence, translational moves enable single-molecule to cluster-surface interactions. Therefore, this move should be proposed more often than the cluster move, although not as often as rotational or local moves. (iii) The rotation move is computationally inexpensive and it enables the switching of physical bonds and should thus be proposed fairly frequently. (iv) Similarly, local moves and slithering snake moves are also rotationally biased, and they help with the local rearrangements of physical bonds. Local moves are the primary route to enable local conformational changes, and to enable local physical bond rearrangements. Therefore, local moves should be proposed most frequently. The slithering snake move is particularly effective because it allows for large local physical bond rearrangements in dense configurations. Thus, this move should also be proposed frequently, less so than local moves but more so than translation moves. Note that in a system where some molecules are non-linear or have heterogeneous linker lengths, the frequency would need to be higher since the move is rejected if an incorrect molecule is picked at random. (v) The double pivot move allows for large-scale changes to conformations within dense configurations and accordingly, this move should be proposed more frequently than both cluster and translation moves. One can track the acceptance ratios of each move over a very rough initial sweep across the relevant system parameters. Moves that are always rejected do not enable any changes in configuration and only add computational costs. Therefore, the frequency for that particular move should be lowered. This is especially the case for the cluster move in high-density systems.

Identifying phase boundaries using measures for density inhomogeneities
In order to detect the onset of phase separation, we can calculate excess chemical potentials using the Widom particle insertion method [93] and equalize these chemical potentials across distinct phases. This process requires a priori knowledge of the densities of both phases. An efficient variant of this approach, based on fast Fourier transforms, was recently developed and deployed by Qin and Zhou [61]. They demonstrated their method for calculations of liquid-liquid coexistence curves for a patchy colloid model of γII-cyrstallin. Given that LASSI simulations are lattice-based, we instead rely on properties of pair distribution functions that help us diagnose the onset of phase separation and compute phase boundaries. Pair distribution functions are helpful because phase separation is the result of the system partitioning into phases of different densities. The pair distribution, which is a reduced-dimension partition function, serves as a rigorous thermodynamic and structural measure of the average local density and inhomogeneities of density. To first order, the density fluctuations are quantified by averaging over all orientations. Accordingly, the pair distribution function can be converted to a radial distribution function that allows us to probe local densities and local structural organization of molecules around one another. However, normalization of the pair distribution function requires some caution. The system contains polymer molecules and using a prior distribution that assumes an ideal gas of the chain monomers to normalize the pair distribution function is problematic because it does not accurately capture the effects of non-idealities due to chain connectivity. We leverage the efficient sampling of polymer fluids in LASSI and obtain suitable prior distributions by simulating the system of interest in the absence of sticker-sticker interactions.
The pair distribution function P (2) (r) quantifies the equilibrium distribution of distances between chain monomers, where r is the inter-monomer distance. If P ð2Þ 0 ðrÞ denotes the prior pair distribution function calculated from simulations where the inter-sticker interactions are ignored (see Fig 3A), then the normalized radial distribution function is written as: The functiong ðrÞ is a direct measure of the local density of the protein of interest. Since LASSI uses periodic boundary conditions, the maximal inter-monomer distance is ffi ffi Given this normalizedg ðrÞ, we note that if the system has short-range ordering as in a canonical liquid, the radial distribution function will oscillate around unity but eventually approach one as r ! 1. Conversely, if the system undergoes a density transition,g ðrÞ will have two distinct spatial regimes (Fig 3B): for small r,g ðrÞ will be characterized by a tall and broad peak such thatg ðrÞ > 1 untilg ðrÞ intersects theg ðrÞ ¼ 1 line; this region corresponds to the dense phase and we shall denote the value of r at this intersection to be r = r b . For r > r b ,g ðrÞ will be between 0 and 1, and for lattices that are large enough to avoid finite size artefacts,g ðrÞ will converge to a value lower than one and this corresponds to the density in the dilute phase region. Furthermore,g ðrÞ can be used to estimate the densities within the dense and dilute phases. To quantify the global density inhomogeneity we introduce a simple measure, � r, which is calculated as follows: In Eq (17), V(x), the volume element for the normalized radial distance x = r/L defined as: If � r � 0, the global density inhomogeneity in the system is small and this will be characteristic of a single homogeneous phase dominating the simulation volume. As � r increases beyond zero, the system accommodates density inhomogeneities. We construct coexistence curves using a cutoff value of � r = 0.025, which is universal to all systems, to delineate between a homogeneous system and one that has undergone phase separation.

Quantitative assessments of finite size effects
The pair distribution function is central to our calculation of density inhomogeneities and constructing coexistence curves for a system of multivalent proteins simulated using LASSI. At the start of this section we emphasized the importance of including 10 3 -10 4 distinct molecules within the simulation cell in order to avoid finite size artefacts. Prior to presenting detailed results that mimic specific systems, we present an analysis of finite size effects that we will confront if the requisite numbers of molecules are not included in the simulations. The data we present are for simulations of mimics of the protein FUS, specifically the A n + B n system introduced in the results section. The phenomenological mapping of this protein architecture onto a cubic lattice is discussed at the start of the results section. Here, we present an analysis that makes a crucial technical point about finite size effects.
First, we start with simulations for ideal polymers. The data shown in Fig 4 plots the pair distribution function P ð2Þ 0 ðrÞ extracted for simulations of ideal models of FUS-like proteins. Results are shown for simulations that use 20 chain molecules of A n + B n as an example of a small system. These results are compared to those from simulations with 100, 200, 1000, 1500, 2000, 3000, and 4000 A n + B n molecules, respectively. The pair distribution functions have a self-similar character and this is revealed by plotting P ð2Þ 0 ðr�Þ for all of the simulations, where r � is the reduced distance that accounts for the fact that for a similar concentration, the simulation cells are made larger (higher values of L, which is box size) as the numbers of molecules increase. This analysis shows that even for a truly ideal system, the smallest simulation comprising of only 20 molecules will generate noisy estimates of the pair distribution function. This clearly demonstrates the problems inherent to small systems where finite size effects are accentuated. Interestingly, for the ideal chain system, all simulations with 100 or more molecules yield similar pair distribution functions as assessed in Fig 4B. Next we assessed the impact of finite size effects with all of the terms in the potential being included in the simulation. There are three columns, one each for different values of the reduced temperature T � , in Fig 5. As discussed in the results section, these values of T � place the system of interest in the two-phase regime, with the quench depth into the two-phase regime increasing as T � increases. The first row (a to c) of The onset of phase separation should be manifest by the presence of a trough located between two peaks in the profiles for P (2) (r � ). This is evident for T � = 0.267 (Fig 5, panel (c)) for all systems providing the numbers of molecules are greater than or equal to 10 3 . This qualitative trend is preserved even for T � = 0.217, although sampling difficulties in large systems become obvious in the noisy estimates for P (2) (r � ). At the reduced temperature of T � = 0.167 we confront two problems: The small systems where the numbers of molecules are less than 10 3 cannot support the distinction between a proper dense phase that coexists with a dilute phase. This behavior is similar to that observed for lower quench depths i.e., higher simulation temperatures as shown in panels (b) and (c) of Fig 5. However, as the system size grows, an additional problem arises and this has to do with large clusters becoming frozen, and thus To overcome this broken ergodicity and obtain reliable converged pair distribution functions, we need additional biasing potentials and temperature sweep approaches used recently [43] to break up frozen clusters and enable their coalescence. In the results that we report here, we use the system size titration to identify the reduced temperatures below which broken ergodicity becomes evident. We do not include data from these simulations in our constructions of phase diagrams. Importantly, our analysis confirms the presence of finite size effects for small sys- Simulations of phase transitions of multivalent proteins

Estimating the percolation transition line that delineates percolated and non-percolated networks
Associative polymers form networks characterized by physical crosslinks among stickers. Accordingly, we use the concept of a cluster, viz., a collection of unique chains connected via rotational interactions, to define the extent of percolation. In polymer melt simulations, the extent of percolation, known as the gel fraction in the polymer literature, is defined as the fraction of polymers participating in a percolating network that spans the simulation box in at least one direction [94]. More generally, we can use the fraction of polymers that make up the single largest cluster to quantify the onset of percolation and the changes to the extent of networking beyond the percolation threshold [95]. A molecular network cannot percolate the whole simulation cell when dilute and dense phases coexist. Accordingly, we choose the second definition for the order parameter that describes the percolation transition, and we denote this as ϕ c [29]. Semenov and Rubinstein demonstrated that a percolation transition is purely a connectivity transition [45]. This implies that the identification of the percolation threshold is not achievable using a bona fide order parameter but instead relies on a suitable topological description. Here, we employ the midpoint of the ϕ c vs. concentration curve to assess the onset of percolation and the percolation line or curve is obtained as the locus of points in the phase diagram for which ϕ c = 0.5. In a system where finite size effects are minimized, the percolation transition is sharp having either a hyperbolic or sigmoidal shape as a function of concentration. Accordingly, the location of the percolation line will be relatively robust to the choice one makes for the percolation threshold.

Results
We demonstrate the use of LASSI by applying it to study two archetypal systems that are known to undergo phase separation [24, 31, 33]. The systems are instantiations of linear and branched multivalent protein systems, respectively. The simulation results obtained for linear multivalent proteins illustrate how phase diagrams are generated when protein concentration (at a fixed stoichiometry) and temperature are the independent variables. In the second example that includes a branched multivalent protein and a linear peptide, the temperature is fixed, and the concentrations of the individual components are varied. The simulation parameters for both systems are summarized in Table 1. For each system, we conducted 5 independent simulations, each of which consists of 5×10 8 MC steps after 5×10 6 equilibration steps. The data were taken over the last half of the trajectories at a frequency of 5×10 5 steps.  Simulations of phase transitions of multivalent proteins

A FUS-like system as an example of a linear multivalent protein
Wang et al. [24] recently uncovered the molecular grammar that contributes to the driving forces for phase separation of the protein FUS and a family of related proteins. They showed that, to first order, c sat /(s Y s R ) −1 , where c sat is the measured saturation concentration of the FUS family proteins and s Y and s R are the number of tyrosine (Tyr) and arginine (Arg) residues, respectively. In FUS and other proteins of similar architectures, the Tyr residues are located primarily within the N-terminal disordered prion-like domain (PLD), whereas the Arg residues are located primarily within the partially disordered C-terminal RNA binding domain (RBD).
Using mutagenesis experiments, Wang et al. established that Tyr and Arg residues are the stickers in the FUS family proteins. Accordingly, the zeroth-order stickers and spacers representation used to model FUS in LASSI comprises of two parts: An N-terminal mimic of the PLD encompassing Tyr residues as stickers and a C-terminal mimic of the RBD that encompasses Arg residues as stickers. Wang et al. also measured c sat for a 1:1 mixture of independent PLDs and RBDs interacting in trans. The c sat for this system is approximately twice that of the c sat for full-length FUS. Given the block copolymeric architecture of FUS, we denote the PLD and RBD as A n and B n , respectively for A and B-blocks of valence n. The model system of PLDs and RBDs interacting in trans is denoted as A n +B n (Fig 6A), whereas the system mimicking full-length FUS where the stickers can interact in cis and in trans is denoted as A n -B n (Fig 6B).
Within A n and B n blocks, spacers provide a uniform spacing of six lattice sites between stickers along the chain. We model spacers using a hybrid approach whereby a neutral spacer Fig 6. Architecture of the linear multivalent systems. (a, b) Cartoons to depict the A n +B n and A n -B n systems, respectively. Different colors of beads denote different species of stickers. Note that A n -B n can be simply considered as A n +B n where the two different sections of the proteins are joined together. (c) Linker lengths involved in the architecture (see also Table 1). Each sticker has a neighboring spacer bead that is 1 lattice site apart whereas the neighboring spacer beads are 4 lattice sites apart. This means that consecutive stickers are 6 lattice sites apart and also that the linkers connecting the two have a positive effective solvation volume. https://doi.org/10.1371/journal.pcbi.1007028.g006 Simulations of phase transitions of multivalent proteins monomer is attached to each sticker with spacing of a single lattice site (Fig 6C). This choice was made to provide a distinction between A n -B n and A n +B n . Accordingly, the relative concentration of neutral beads will be higher in A n -B n when compared to A n +B n . This allows us to study linker-mediated differences between the driving forces for phase separation for A n -B n vs. A n +B n Fig 7 shows phase diagrams for the A n +B n and A n -B n systems calculated using data from LASSI-based simulations. In panels (a) and (b), the ordinate quantifies the reduced temperature T � calculated as T � ¼ k B T ε where ε is the effective energy of pairwise interactions between stickers from the A n and B n blocks. Panel (a) shows results for the A n +B n system. The bulk concentration in the A n +B n system is quantified along the abscissa as c bulk ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi c A n c B n p where c A n and c B n are the bulk concentrations of A n and B n , respectively. Panel (b) shows the phase diagram for the A n -B n system where the abscissa represents the bulk concentration of this system.
Experiments show that the driving forces for phase separation are roughly twice as large for the full-length FUS compared to the system comprising of a 1:1 mixture of PLDs and RBDs [24]. This feature is recapitulated in LASSI simulations. For example, the width of the twophase regime is larger for the A n -B n system compared to the A n +B n system for all values of T � as shown in panel (e) of Fig 7. The critical temperature is higher for the A n -B n vs. A n +B n system (T � c � 0:56 vs. T � c � 0:36, respectively). The valence of stickers is the main determinant of the concentration at the critical point whereas the interactions mediated by spacers determine the density inhomogeneities and the critical temperature. The impact of longer chains and increased valence of stickers per chain is also evident from the percolation threshold, which is crossed at a bulk protein concentration that is two-fold lower for the A n -B n system when compared to the A n +B n system, across all the simulation temperatures. Differences between the two systems are also evident in the degree of cooperativity of phase separation and the percolation transition as shown in panels (d) and (e) of Fig 7. For each system, the intersection of the percolation threshold line with the two-phase regime shows that the dense phase predominantly forms a percolated droplet-panels (a) and (b) in Fig 7. Therefore, while phase separation without percolation is realizable, this is not the dominant scenario for associative polymers, where phase separation and percolation are typically conjoined to give rise to percolated droplets. The density of proteins in these percolated droplets is governed by the interaction strengths, modulated by T � and the effective solvation volumes of spacers [28,29]. Unlike homopolymers, which comprise entirely of stickers or spacers depending on the solvent quality, associative polymers encompass a mixture of stickers and spacers. Stickers provide the driving forces for networking via reversible crosslinks and spacers determine whether or not these driving forces lead to phase separation via condensation. Indeed, the importance of sticker-driven percolation is evidenced in the persistence of percolated networks for both systems at high values of T � .
The observation that dense phases form percolated droplets has several implications: (1) on timescales that are concordant with or smaller than the average lifetime of physical crosslinks between stickers, the condensates will have elastic properties; this will be replaced by viscous behavior on timescales that are longer than the average lifetime of physical crosslinks [96]; (2) accordingly, condensates will have an intrinsic tendency for viscoelasticity [97] and long-lived crosslinks will cause hardening behavior as has been observed in many systems [1,6,7,11,22,24,[39][40][41]; (3) the extent of crosslinking above the percolation threshold will change continuously with concentration [16,29], and this will govern the overall structure, internal dynamics, and porosity of condensates; (4) reactions within condensates are likely to be constrained by the network topology formed as a result of inter-sticker interactions [98]; these constraints can Simulations of phase transitions of multivalent proteins create a variety of interesting kinetic signatures for reactions [99], including temporal memories as has been demonstrated recently for a system that undergoes thermoresponsive phase behavior [66]. Clearly, any description of biochemical reactions within condensates has to account for the structural and dynamical attributes of percolated droplets that are best described as network fluids.

Move set frequencies and diagnostics of converged simulations
We used results from simulations of linear multivalent protein system to assess the design of LASSI. The frequencies of the different move sets for simulations of the linear multivalent protein system are summarized in Table 2. Considerations that go into the design of move sets include the achievement of converged equilibrium distributions, with maximal computational efficiency, for each bulk concentration. Details of these considerations were described in the methods section. Diagnostics from short simulations are often useful to optimize the move set frequencies especially if multiple short trials are performed using very different starting configurations. Fig 8 shows the concentration dependence of acceptance ratios for each of the move set types, diagnosed for simulations of the A n +B n and A n -B n systems. The acceptance ratios show similar qualitative trends for both systems, even though there are clear quantitative differences. The move with the highest acceptance ratio in the dense regime is the double pivot move, signifying that the systems are transitioning into a pure polymer melt. The second most accepted move is the local move; extrapolating from the higher concentrations it is expected that the acceptance of local moves should also become small and that the double pivot move will be the most dominant way to alter chain configurations, since even the move of an individual monomer will require that multiple monomers from multiple chains are moved simultaneously. Both systems have similar qualitative trends for the translation move where we see a transition from being accepted at low concentrations to not being accepted at higher concentrations. Since the proteins in the A n -B n system are twice long as the A n +B n system, the absolute acceptance ratio of the translation move is always lower in the A n -B n system. Analysis of acceptance ratios of different move sets within droplets will be helpful for estimating correlation lengths and amplitudes of conformational and concentration fluctuations within droplets. Cluster moves have high acceptance ratios in the dilute regime whereas the acceptance ratio nearly vanishes as the concentration increases. This is intuitive since the likelihood of steric clashes increases with a decrease in available volume and this is coupled to the simultaneous increase in the fraction of molecules in the largest cluster. We note here that the cluster moves have the most dramatic change in acceptance ratios from values near 1 to values near 0. However, the apparent inefficiency of cluster moves in dense configurations cannot be used as a rationale to quench such moves. In fact, as shown in panel (a) of Fig 9, phase separation, diagnosed in terms of � r, is suppressed if cluster moves are not part of the move set. This highlights the importance of cluster moves for generating bona fide phase separation as these facilitate coalescence that leads to condensation.

Mixtures of N130 and the rpL5 peptide as an example of a branched multivalent protein system
LASSI can also be deployed to study the phase behavior of branched multivalent proteins that undergo phase separation via obligate heterotypic interactions. Examples of branched multivalent proteins are molecules that form symmetric, stable oligomers such as nucleophosmin 1 (NPM1) and synthetic systems such as the corelets designed by Bracha et al. [100]. NPM1 is a key molecule within the granular component (GC) of the nucleolus [101]. Three coexisting layers define the nucleolus and the GC is the outermost layer. Within the GC, NPM1 appears to form facsimiles of percolated droplets in complex ribosomal proteins with Arg-rich motifs [17,30]. A minimalist system that mimics the phase behavior of the GC comprises of a truncated version of NPM1, referred to as N130, and an Arg-rich peptide, designated as rpL5 [31][32][33]. Both NPM1 and N130 form symmetric pentamers in the presence of Arg-rich peptides [102]. The pentamer formed by the association of folded domains serves as a scaffold for displaying disordered C-terminal tails that are defined by two distinct acidic tracts. The system also features an N-terminal disordered region with a well-defined acidic motif.
The FUS system is an example of a protein that undergoes phase separation via obligate homotypic interactions. This implies that the interactions necessary and sufficient for driving phase separation are encoded within the sequence of FUS and the strengths of these Simulations of phase transitions of multivalent proteins interactions can be modulated by changes to solution conditions. The N130 + rpL5 system is an example of a system that undergoes phase separation mainly via obligate heterotypic interactions that involve interactions between residues in the acidic tracts of N130 and the Arg motifs of rpL5. This could be viewed as an example of phase separation via complex coacervation, providing the heterotypic interactions are purely electrostatic in nature [31-33]. However, in general, there is likely to be combination of long-and short-range interactions that contribute to the spectrum of heterotypic interactions, and hence we refer to this class of molecules as drivers of phase separation via obligate heterotypic interactions.
In addition to demonstrating the applicability of the LASSI engine for simulations of branched systems, we use the analysis as an opportunity to highlight key conceptual features of multicomponent systems that undergo phase separation via obligate heterotypic interactions. There are three main features that are borne out in the analysis: (1) For fixed temperature, the order parameters are the concentrations of the proteins that drive phase separation via obligate heterotypic interactions. In such systems, the coexistence curves, i.e., the binodals, will have a closed loop form. These will be ellipses for two components and n-ellipsoids for systems that involve up to n obligate heterotypic interactions to drive phase separation. (2) The systems will support reentrant phase behavior as has been reported recently for protein-RNA mixtures that undergo phase separation via obligate heterotypic interactions [103]. (3) The apparent saturation concentration of a component molecule in a system that undergoes phase separation via obligate heterotypic interactions will show non-trivial dependencies on its bulk concentration. The dotted lines show the simulation results under the same system conditions but the frequency for cluster translation moves is set to zero. Not only do the systems phase separate and percolate at higher saturation concentrations, but also we can see that both percolation and separation are suppressed highly. Furthermore, errors are generally higher, due to the systems being highly dependent on the initial conditions of the system. These dependencies are governed directly by the slopes of the tie lines that pass through the point corresponding to the bulk concentration and intersect the binodal at coexisting concentrations that equalize the chemical potentials of all species in the dense and dilute phases. Here, we use the example of the N130 + rpL5 system to showcase the three central features of phase diagrams for systems that undergo phase separation via obligate heterotypic interactions.
In the LASSI representation, N130 pentamers with disordered tails are modeled using a five-armed structure. This approach follows the strategy of Feric et al. [30], which was based on the fact that pentamers do not dissociate under conditions where NPM1 / N130 undergo phase separation. Each arm comprises of two sticker sites to mimic the presence of the A1 and A2 acidic tracts within the disordered tails of NPM1 / N130. Therefore, each N130 pentamer displays a total of ten sticker sites. The spacers between each A1 tract and the N130 core as well as between each pair of A1 and A2 tracts on a disordered tail are phantom tethers, which means that their effective solvation volumes [29] are set to zero. Each rpL5 peptide has two sticker sites corresponding to the two Arg-rich motifs along the sequence. Schematic representations of the coarse-grained architecture used for N130 and rpL5 are shown in Fig 10, and the move set frequencies are summarized in Table 2.  Table 1). For the rpL5 molecule a linker length of 3 was chosen between the two stickers, and for the N130 molecule the first sticker (modeling the A1 tract) is 1 lattice site away from the hub spacer whereas the second sticker (modeling the A2 tract) is 3 lattice sites away from the first sticker. https://doi.org/10.1371/journal.pcbi.1007028.g010

Percolation lines have parabolic shapes
The percolation line, constructed as a function of the concentrations of two multivalent molecules, has an overall parabolic shape (panel (a) of Fig 11). This feature may be explained as follows: Let f 1 and f 2 denote the fractions of N130 and rpL5 molecules that are bound in a network; s 1 and s 2 will denote the valence of stickers on N130 and rpL5, respectively; for the current implementation of the N130 + rpL5 system, s 1 = 10 and s 2 = 2. Based on the Flory-  7). The phase-separated region has an elliptical shape and we have a closed loop, which demonstrates re-entrant phase behavior, whereas the percolation line has a conical shape extending into much higher densities. The solid black lines denote contours of constant total concentration where L1 is the lowest concentration and L3 is the highest concentration. Simulations of phase transitions of multivalent proteins Stockmayer theory [49,51], the percolation threshold is crossed when f 1 f 2 (s 1 −1)(s 2 −1) > 1. If we keep (s 1 −1)(s 2 −1) constant, the threshold concentration for percolation will be governed by the product f 1 f 2 . Accordingly, if there is a large excess of N130 (component 1) compared to rpL5 (component 2), then f 1 ! 0 and f 2 ! 1, and the system does not undergo a percolation transition. In this scenario, every rpL5 molecule is crosslinked to two sticker sites from one or two N130 molecules. However, since the relative stoichiometry favors N130 molecules, there is a vast excess of un-crosslinked N130 molecules and the network cannot grow. Percolation is also inhibited when the converse situation arises, i.e., when there is a large excess of component 2 with respect to component 1. Accordingly, the percolation line takes on a parabolic form in the plane defined by the concentrations of N130 and rpL5.

Binodals for systems defined by obligate heterotypic interactions will form closed loop ellipses
Given that the phase behavior of the N130 + rpL5 system is driven by heterotypic interactions involving the A1 / A2 tracts from the N130 tails and the Arg-motifs from rpL5, we constructed binodals by keeping the simulation temperature fixed and varied the concentrations of N130 and rpL5 molecules. Phase diagrams defined by N130 concentration along the abscissa and rpL5 concentration along the ordinate are shown in panel (a) of Fig 11. The general shape of the binodal is comparable with that of the experimentally determined phase diagram [33], even though direct comparison is not straightforward because the scarcity of experimental data points does not yield a full binodal.
The phase boundary, defined by the density transition, is an ellipse that forms a closed loop in the plane defined by the concentrations c 1 and c 2 of N130 and rpL5, respectively. In associative polymers, the phase behavior is governed by the affinity between stickers, the valence of stickers, and the effective solvation volumes of spacers [28,29]. For fixed c 1 that intersects the two-phase regime an increase in c 2 will lead to an entry into the two-phase regime caused by a density transition as c 2 approaches c 1 . However, as c 2 increases well beyond c 1 , the joint system exits the two-phase regime. This is because phase separation is driven by obligate heterotypic interactions and while there is a growing excess of rpL5 molecules there are not enough N130 molecules to drive the density transition via inter-sticker interactions. Similar reasoning applies to describe the reentrant behavior that will result by keeping c 2 fixed at a value that intersects the two-phase regime and increasing c 1 .
Taken together, the parabolic percolation lines and elliptic forms for two-phase regimes define conic sections that highlight reentrant phase behavior whereby fixing the concentration of component 1 and increasing the concentration of the second species can lead to phase separation and percolation at a low threshold concentration of component 2 and exit into the onephase, non-percolated regime beyond a second higher threshold concentration for component 2. This type of reentrant phase behavior, will be a general feature of multicomponent systems that undergo phase separation via obligate heterotypic interactions; indeed, reentrant phase behavior has been reported for a model protein + RNA system [103].

Apparent stoichiometric ratios can be different from effective stoichiometric ratios
Stoichiometry of molecules that drive phase separation is another key parameter that determines the functions of biomolecular condensates formed by multicomponent systems [104]. The apparent stoichiometric ratio is calculated as the ratio of the concentrations of stickers of types s 1 and s 2 for N130 and rpL5, respectively such that n app 12 ¼ c s1 c s2 . However, the effective stoichiometric ratio n eff 12 can be different from n app 12 if excluded volume effects modulate the effective concentration of stickers. We fit an ellipse to the two-phase boundary and determined the pseudo one-component system) comprising of a protein and solvent is realized when the bulk concentration of the protein denoted as c exceeds a saturation concentration denoted as c sat . The presence of a saturation concentration can be quantified by measuring the concentration of protein in the coexisting dilute phase as c increases. This value will not go above c sat . Strikingly, the presence of a saturation concentration has been confirmed in living cells for model disordered proteins by Brangwynne and coworkers using optogenetic tools in mammalian cells [100,105] and by Khan et al. [106] using yeast as a model system. If the protein whose phase behavior is being interrogated is part of a system where obligate heterotypic interactions drive phase separation, then whether or not the concept of a saturation concentration continues to be valid will depend on the nature of the binodals. We illustrate this by coopting the elliptic phase boundary from  Ligand-mediated shifts in saturation concentrations arise due to polyphasic linkage, a phenomenon first introduced by Wyman and Gill [107].

Slopes of tie lines within the elliptic binodals determine how [A] Sol varies with [A] in the two-phase regime
The main conclusion is that the concept of a saturation concentration, as defined for a pseudo one-component system, does not transfer over to multicomponent systems where phase transitions are driven by obligate heterotypic interactions. Instead, the slopes of tie lines or the geometries of tie planes in higher dimensional ellipsoids will have a direct bearing on inferences from measurements where the bulk concentration of a protein or RNA component is varied when condensates are observed and the concentration of the molecule of interest is quantified in the coexisting dilute phase. This insight emerges from our ability to deploy LASSI to compute full binodals for multicomponent systems.

Discussion
In this work, we have built on the connection between multivalent proteins and associative polymers [44,45,98] with their stickers-and-spacers architecture [15,17,24,28,29,40,43,47] to develop and deploy LASSI, a lattice-based open source computational engine that enables the simulation of system-specific phase diagrams of single and multi-component systems. The foundations of LASSI derive from the formalism of the bond fluctuation model [83,84,108]. We demonstrate how canonical ensemble Monte Carlo simulations with appropriately designed move sets and analysis of order parameters derived from the distribution functions allow us to calculate coexistence curves and percolation lines as a function of protein concentration and interaction strengths.
The choice of a lattice-based approach for coarse-graining and modeling phase behavior of multivalent proteins is guided by the advantages of lattice models [109] for polymeric systems. To titrate across the full range of volume fractions, one needs to balance considerations of finite size effects-which requires large numbers of molecules-with large simulation volumeswhich makes it difficult to observe density fluctuations that can grow into density inhomogeneities. On lattices the conformational space is discretized and the calculation of interaction Simulations of phase transitions of multivalent proteins potentials can be made to be very efficient through the use of look up tables. Importantly, we have generalized lattice-based simulations to incorporate anisotropic interactions.
LASSI allows us to query the impacts of overall and intrinsic valence of stickers, interaction strengths between stickers, the spatial ranges of these interactions, the effective solvation volumes and lengths of spacers, and protein concentrations. These titrations generate multidimensional phase diagrams. The approaches underlying LASSI have been applied to model a variety of multicomponent systems, including mimics of RNA molecules [24, 28-30, 43,79]. What is required is the development of approaches that enable systematic coarse-graining and adaptation of machine learning based methods to parameterize interaction potentials [78]. Engineering LASSI to be interoperable to cell-based modeling suites [110] will also allow for larger scale deployment of the overall framework. The calculation of pair and higher order distribution functions should afford multiscale descriptions of the structural organization of molecular components within condensates. The acceptance ratios associated with different move sets and the length scales spanned by distinct move sets open the door to analyzing the dynamics of phase separation, percolation, and the interplay between the two. Another major direction for future application of LASSI is to uncover the determinants of compositional specificity of condensates [1,12]. For a given randomly selected monomer, a new location is proposed by sampling ±2 lattice sites in each coordinate (brown box) and picking a lattice site that is empty. If an empty lattice site is found within a pre-determined number of trials, the numbers of interacting candidates are calculated at the old and proposed location (yellow boxes). Then the move is accepted or rejected using the modified Metropolis criterion that considers orientational bias (see text). (TIF) S3 Fig. Two-dimensional representation of reptation move. For a given randomly selected chain that has the same linker lengths between each monomer, an end is randomly picked. Then, a version of the local move is performed where the selected end is moved to a new random location that is an empty lattice site within 2 lattice sites in each coordinate (brown box). If an empty site is found within a predetermined number of trials, the number of orientational candidates is calculated for the whole chain in the old and the new configuration (yellow boxes). The modified metropolis criterion is then used to determine if the move is accepted or rejected. Note that since the whole chain is orientationally biased, monomers may have a different orientational state after the move is accepted, as shown in the figure. For a randomly selected monomer, a 2×2×2 cube around the monomer is searched for appropriate bridging candidates (brown box), where an appropriate bridging candidate is the next monomer from a different chain, is within a linker length of the selected monomer as shown by the dashed line connecting i m and (i+1) n . Furthermore, the distance between (i+1) m and i n must also be within a linker length as depicted by the upper dashed line. A list of all possible candidates is calculated and then a randomly chosen candidate is used to break and remake covalent bonds. This results in a large conformational change for both polymers. If the selected polymer is not linear, the move is rejected outright. (TIF) S5 Fig. Assessments of finite size effects analyzed in terms ofg ðr � Þ . The pair distributions from Figs 9(B) and 11 are used to compute the relevant radial distribution functions. This analysis is relevant because the radial distribution functions are used to extract the value of the order parameter that detects the onset of phase separation. For systems where the number of molecules A n + B n molecules is greater than 200, the radial distribution functionsg ðr � Þ start to deviate from one another only at the lowest temperatures where broken ergodicity becomes an issue. Therefore, for the A n + B n system studied in this calibration, it appears that the numbers of A n + B n molecules have to be greater than 200 in order to obtain reliable information about the phase behavior.