Motif-pattern dependence of biomolecular phase separation driven by specific interactions

Eukaryotic cells partition a wide variety of important materials and processes into biomolecular condensates—phase-separated droplets that lack a membrane. In addition to nonspecific electrostatic or hydrophobic interactions, phase separation also depends on specific binding motifs that link together constituent molecules. Nevertheless, few rules have been established for how these ubiquitous specific, saturating, motif-motif interactions drive phase separation. By integrating Monte Carlo simulations of lattice-polymers with mean-field theory, we show that the sequence of heterotypic binding motifs strongly affects a polymer’s ability to phase separate, influencing both phase boundaries and condensate properties (e.g. viscosity and polymer diffusion). We find that sequences with large blocks of single motifs typically form more inter-polymer bonds, which promotes phase separation. Notably, the sequence of binding motifs influences phase separation primarily by determining the conformational entropy of self-bonding by single polymers. This contrasts with systems where the molecular architecture primarily affects the energy of the dense phase, providing a new entropy-based mechanism for the biological control of phase separation.

where n(ξ, T ) is the combinatorial term for counting states with T A-B overlap bonds (given ξ total selfbonds) and the second sum is over all configurations where S = ξ. The parameter χ quantifies the strength of two-body nonspecific interactions, e.g. as appears in Flory-Huggins theory. We make the approximation that in the thermodynamic limit, Z is dominated by the largest term: where G(ξ) is the entropy associated with forming S = ξ self-bonds. First we calculate n(ξ, T ) = n steric × n trans . n steric is the number of allowed ways to place the polymers on the lattice and n trans is the number of ways to form T trans-bonds. To find n steric , we ignore chain connectivity and simply count the number of ways of choosing N ⟨l⟩ sites on a lattice with V sites, where ⟨l⟩ = L − s − t/2 (S1C) is the mean number of sites occupied by a polymer. We account for excluded volume using a semi-dilute approximation that the probability of placing monomer k successfully is the fraction of empty sites remaining: where V N counts the center-of-mass, or equivalently "polymer head," degrees of freedom. We find n trans by assuming that each protein sees the others as a mean-field cloud of motifs with which it can form A-B S1-1 overlap bonds depending on the overall motif density. Then where the first two terms count the number of ways to choose T A motifs and T B motifs, given that S of each are already in self-bonds. T ! is the number of ways to pair the chosen motifs, and the final term is the mean-field probability that two motifs are close enough to form a bond. (This is simply an extension of Semenov and Rubinstein's sticker model to two sticker types on a lattice [1].) Now we calculate F G (ξ) ≡ − log G(ξ), the entropy of having exactly S = ξ self-bonds. The difficulty arises from the restricted sum: we only want to count states with the correct total number of self-bonds. However, we can relax this restriction and require instead that ⟨S⟩ = ξ. Formally, this is equivalent to working in a "Grand Canonical Ensemble" for self-bonds, where a reservoir imposes a chemical potential w. In the thermodynamic limit, fluctuations vanish and all ensembles yield equivalent macrostates. Thus we can calculate βΩ = − log Z gc (where Ω is the grand potential and Z gc the grand canonical partition function), and use the Legendre transform F G (ξ) = Ω + wξ/β.
Calculating Z gc is relatively straightforward: (S1F) where ϕ is the monomer density N L/V . Combining this with Eqs. S1D and S1E, we obtain the full free-energy density: where (S1J) At every ϕ, we evaluate Eq. S1H with the average bond values (s * (ϕ), t * (ϕ)) which minimize f and the w which fixes ⟨s⟩ = s. (s * (ϕ) and t * (ϕ) are found by numerically solving ∂f ∂s = ∂f ∂t = 0 in Mathematica.) This yields f (ϕ) which we use to calculate the binodal and spinodal curves.
Regarding the nonspecific interaction parameter χ, density fluctuations make it difficult to map the simulation J to χ, so we simply use the mean-field relation χ = −V Jz/2, where z is the lattice coordination number. This yields theoretical T c values which differ numerically from the simulations but accurately reproduce the sequence hierarchy.

S1B Dense-phase correlations
From simulations, the ℓ = 1 sequence has a T c between that of ℓ = 3 and ℓ = 4, whereas the mean-field theory predicts that ℓ = 1 would have a T c very close to that for ℓ = 2. Why is the ℓ = 1 sequence better at phase separating than the mean-field theory predicts? In the theory, sequence only appears in g(s), the density of states for self-bonds. We thus assume that sequence does not directly affect inter-polymer interactions and that trans-bonds are uncorrelated. However, this assumption neglects the fact that a bond is between two polymers. We can quantify this correlation by looking at trans-bond "segments." Trans-bonds are considered to be in a segment of length λ if two polymers have λ trans-bonds, and all involved monomers are contiguous on both polymers (Fig A(a) Inset). Essentially, trans-bond segments form when two polymers are lying on top of each other. Fig A(a) shows the probability that each trans-bond is in a segment of length λ in an NVT simulation with ϕ = 0.3. For all sequences, the most probable segment length is 1. However, ℓ = 1 and ℓ = 12 both have relatively high probabilities of forming longer segments (these two curves overlap). As a result of these correlations, the dense phase is more favorable for these sequences than is predicted by the theory, and this leads to their higher T c values.
We can quantify a sequence's tendency to form correlated segment bonds by defining a correlation probability P corr . Consider two polymers which form a bond between monomers i and j. Now pair up neighboring monomers: the four unique possibilities are (i − 1, j − 1), (i − 1, j + 1), (i + 1, j − 1), and (i + 1, j + 1). P corr is the probability that these monomers will form a valid A-B bond instead of an invalid overlap. Fig A(b) shows examples for ℓ = 1 and ℓ = 2 sequences. Every possible initial bond (i, j) has its own P corr , and so we average this P corr over all possible bonds. This yields ⟨P corr ⟩, a sequence-specific metric for trans-bond correlations. Fig A(c) shows ⟨P corr ⟩ for the block sequences, and we observe that it is monotonic in block size except for ℓ = 1, which has a ⟨P corr ⟩ similar to ℓ = 12. This explains why these two sequences have similar segment probabilities in Fig A(a), and why ℓ = 1 is better at phase separating than expected from g(s) alone. In Condensation parameter Ψ below, we incorporate ⟨P corr ⟩ into a "condensation parameter" that successfully predicts the T c hierarchy observed in simulation. Fig A(d) shows the distribution of ⟨P corr ⟩ for 20,000 random sequences with a = b = 12. The distribution is strongly peaked at low values, comparable to the ℓ = 2 sequence. This suggests that the ℓ = 1 and ℓ = 12 block sequences are atypical in their tendency to form correlated trans-bonds, so the mean-field theory that neglects these correlations should perform well for generic sequences. S1C Condensation parameter Ψ Although our mean-field theory does a good job explaining sequence-driven patterns in T c , it would be convenient to have an order parameter that is simpler to compute but that retains some of the same predictive power. According to our results, such a metric should take into account the density of states g(s), the motif stoichiometry a, b, and the correlation metric ⟨P corr ⟩. Thus we propose as a metric the condation parameter Ψ: where the motif ratios are given by r A = a/L and r B = b/L. The role of g(s) is intuitive: the easier it is to form self-bonds, the less a polymer will tend to condense. The factor r b A r a B characterizes the probability of placing a A motifs and b B motifs in the dense phase without disallowed overlap. (The mean-field motif placement probability depends on the density ϕ, but this effect is not sequence-dependent.) Finally, we normalize g(s) by the tendency to form correlated trans-bonds in the dense phase. This tendency enhances the favorability of the dense phase, and we quantify it with ⟨P corr ⟩. The factor of 1/2 in s/2 is due to the fact that two trans-bonds/polymer are required to lower the energy by ϵ/polymer, and the factor of 4 is the number of pairs of bond-adjacent monomers (Fig A(b)). Although this metric is only heuristic, it successfully captures the T c patterns without multi-polymer simulations (Fig. 3(c)).
One limitation of the condensation parameter is that it still requires knowledge of g(s) for each sequence. Is it possible to characterize the tendency of a sequence to phase separate without any simulations? In Fig.  3(d) of the main text, we replace s g(s) with a theoretical calculation of g(1)/g(0) that uses established scaling relations for the number of self-avoiding walks and the number of self-avoiding loops [2]. This gives where ω walk (L − 1) is the number of self-avoiding walks when a polymer of length L forms a contiguous bond (shortening it by 1 monomer), and ω loop (N, L) counts the number of self-avoiding loops of length N . We model the entropy of the polymer outside the loop as a self-avoiding walk of length L − N . The sums are over all possible contiguous bonds and loops, which depend on the compatibility of motifs i and j. The exponents γ = 1.157 and ν = 0.588 are universal, and µ = 10.037 on the FCC lattice (this coefficient µ, which is standard notation, is not to be confused with the chemical potential µ in our simulations). The scaling amplitudes A walk and A loop are not universal, so we determine their relative magnitude by fitting to g(1) from the Monte Carlo g(s) for a single sequence. With this one fitting parameter, we can rapidly evaluate Ψ for new sequences with no additional simulations or calculations. Specifically, we perform a linear fit of Ψ to T c for the block sequences (Fig B) and obtain T c for any new sequence from its Ψ value. This procedure allows us to generate the T c distribution in Fig. 3(d) in seconds. A Python script to calculate Ψ and T c for arbitrary sequences is available at https://github.com/BenjaminWeiner/motif-sequence/ tree/master/condensation%20analysis. S1-5

S1D Measuring viscosity in MD simulations
Monte Carlo simulations allow us to efficiently sample equilibrium phase properties by using non-physical moves to transition between states (e.g. polymer insertions and deletions). However, the expanded move set obscures dynamic properties such as viscosity. We were able to estimate viscosity in Fig. 4(c) of the main text by modeling the droplets as viscoelastic polymer melts, which led to Eq. 5 [3]. This analysis yielded the intriguing result that trans-bonds, rather than density, are the main source of elastic memory in our system. We wanted to test this result directly, so we performed molecular-dynamics (MD) simulations using the software package LAMMPS (see Molecular-dynamics methods below). In each simulation we fix the density and LAMMPS automatically calculates the pressure tensor P ij , which is related to viscosity by the Green-Kubo formula [4]: (S1M) Fig C(a) shows the number of trans-bondst versus density for ℓ = 2 and ℓ = 12 sequences, and Fig  C(b) shows the viscosity. At each density, the ℓ = 12 droplet has more trans-bonds, and this corresponds to a higher viscosity as predicted by Eq. 5. It follows that ℓ = 12 droplets can be more viscous even when they are less dense than ℓ = 2 droplets. These results bolster our conclusions from the main text: sequence primarily controls the material properties of the droplet via trans-bonds, which implies that large-domain sequences lead to more viscous droplets.

S1D.1 Molecular-dynamics methods
Coarse-grained MD simulations were performed using the LAMMPS software package [5]. Each simulation contained polymers of a single sequence at fixed number, volume, and temperature (fixed NVT). Densities were chosen to be higher than the right side of the binodal, so as to simulate a homogeneous dense system. Polymers were modeled as linear chains of spherical beads of radius r 0 = 0.5nm, where each bead represents a sticker domain "A" or "B." The stickers were connected by stretchable bonds given by where R 0 = 5 nm and K = 0.56k B T /nm 2 . Specific bonds between motifs of different types were implemented through an attractive potential given by S1-6 where U 0 = 4k B T and R c = 2r 0 = 1 nm. Motifs of the same type interacted through a shifted Lennard-Jones potential given by where ϵ = 0.2k B T and σ = 2 5/6 r 0 = 2 −1/6 nm= 0.89 nm, with a cutoff at 2r 0 (the minimum of the potential). Between 550-575 polymers composed of 24 motifs were simulated in a 30×30×30 nm 3 box with periodic boundary conditions at T = 220K, where the system was evolved according to Langevin dynamics. Initially, the attractive potential was set to zero by setting U 0 = 0 and the system was allowed to equilibrate for 1.5 × 10 6 timesteps of ∆t = 0.01 ns. Then, U 0 was slowly brought up to the final value over the course of 1.5 × 10 6 time steps and the system was allowed to equilibrate for another 1.5 × 10 7 time steps. Afterwards, the position of each sticker and the pressure tensor were recorded over another 1.5 × 10 7 time steps. Fig D  shows a snapshot of the simulation, where "A" and "B" motifs are rendered in red and blue. The T c hierarchy is preserved across sequence lengths. Thus block size is a robust predictor of phase separation via its relationship with self-bond entropy. S1-9 Figure I: Using the "Sticky Rouse Model" for unentangled polymer dynamics in a melt with cross-links [3], the dense-phase diffusivity D = m 2 τ b t , where m is the monomer size and τ b = τ 0 exp(βϵ) is the bond lifetime, is plotted as a function of scaled temperature. For all sequences, lower temperatures correspond to higher densities and slower polymer diffusion. Importantly, the sequences with large block sizes and many trans-bonds (e.g. ℓ = 12 and ℓ = 6) have smaller D, in spite of their lower density. This coincides with the viscosity results in Fig. 4 of the main text, where the trans-bonds dominate the physical properties of the droplet. Color bar: droplet density. S1-10