Concerted regulation of npc2 binding to endosomal/lysosomal membranes by bis(monoacylglycero)phosphate and sphingomyelin

Niemann-Pick Protein C2 (npc2) is a small soluble protein critical for cholesterol transport within and from the lysosome and the late endosome. Intriguingly, npc2-mediated cholesterol transport has been shown to be modulated by lipids, yet the molecular mechanism of npc2-membrane interactions has remained elusive. Here, based on an extensive set of atomistic simulations and free energy calculations, we clarify the mechanism and energetics of npc2-membrane binding and characterize the roles of physiologically relevant key lipids associated with the binding process. Our results capture in atomistic detail two competitively favorable membrane binding orientations of npc2 with a low interconversion barrier. The first binding mode (Prone) places the cholesterol binding pocket in direct contact with the membrane and is characterized by membrane insertion of a loop (V59-M60-G61-I62-P63-V64-P65). This mode is associated with cholesterol uptake and release. On the other hand, the second mode (Supine) places the cholesterol binding pocket away from the membrane surface, but has overall higher membrane binding affinity. We determined that bis(monoacylglycero)phosphate (bmp) is specifically required for strong membrane binding in Prone mode, and that it cannot be substituted by other anionic lipids. Meanwhile, sphingomyelin counteracts bmp by hindering Prone mode without affecting Supine mode. Our results provide concrete evidence that lipids modulate npc2-mediated cholesterol transport either by favoring or disfavoring Prone mode and that they impose this by modulating the accessibility of bmp for interacting with npc2. Overall, we provide a mechanism by which npc2-mediated cholesterol transport is controlled by the membrane composition and how npc2-lipid interactions can regulate the transport rate.

in a polarizable continuum with a dielectric constant of 78.4 by setting solvent to water. Multiple conformations and orientations resp fitting was performed with the r.e.d. iii.5 software [5].
The bmp can exist as r-r, s-s, and r-s stereoisomers. We tested for the effect of stereoisomerization for charge calculations by performing the above protocol for 2,2'-bmp18:1 head groups conformations that are uniformly s-s, and also for r-r, r-s, and s-s mixtures. The estimated charges were found to be the same indicating that this stereoisomerization does not effect the charge estimation using the above protocol (Fig s1). Since s-s 2,2'-bmp is the most biologically active form and interacts with npc2 most favorably [7], we used this configuration in all subsequent simulations.

s1.2 Preparation of simulation systems and unbiased simulations
We prepared and equilibrated membranes and proteins separately before combining them to use in the biased and unbiased protein-membrane simulations. All bilayers listed in Table 1 were prepared in 0.1 M NaCl solution using charmm-gui [2]. The systems with 10 mol% cholesterol contain a total of 160 lipids, while those with 35 mol% cholesterol contain 200 lipids, so as to have a sufficiently large membrane area for protein-membrane binding simulations. bmp-containing systems were prepared by replacing the head groups of dopg-containing bilayers built using charmm-gui [2]. We minimized the bmp-containing systems in several stages, checking and correcting the stereochemistry of head groups at each stage. For all membranes, we first performed the staged minimization and equilibration protocols as prescribed by charmm-gui, and then replaced the charmm36 additive force field [1] with the Stockholm lipids (Slipids) force field [4] for lipids. We performed at least a 100 ns long N pT simulation of each membrane with the Slipids force field.
There are two available crystal structures for bovine npc2: apo (pdb id: 1nep [8]) and chol-bound (pdb id: 2hka [9], chains B and C). Intrinsic pK a of each residue was estimated for the crystal structure of the apo form and all three chains found in the crystal structure of using propka 3.1 [10,11]. Accordingly, E110, H31, and H56, which have intrinsic pK a > 5.0, were protonated. One of the cholesterol-bound chains (chain C of pdb id: 2hka [8]) and the apo form (pdb id: 1nep [9]) were placed in 0.1 M NaCl solution. After successive minimization and 100 ps long N V T and N pT simulations with the protein and ligand heavy atoms harmonically restrained (k = 100 kJ/mol/Å 2 ), we performed 100 ns long N pT simulations for npc2 apo and npc2 chol-bound .
The simulation systems presented in this work were prepared by stacking the final configurations from the equilibrated membrane and protein systems. After combining the systems, excess water was removed and the ionic concentration was re-adjusted to 0.1 M. Random initial orientations of the protein with respect to the membrane were obtained by rotating a sphere that contains the protein and a layer of solvent around a random axis by 180 • . These systems were then minimized and a 100 ps N pT simulation was performed with the protein heavy atoms harmonically restrained (k = 100 kJ/mol/Å 2 ). We then performed 400 ns long unbiased simulations for the systems indicated in

Simulation approach
All free energy calculations presented in this work use wt-mtd [12] with three collective variables (cvs). These three cvs capture different positions and orientations of the protein with respect to the membrane surface. The first cv, |z|, is the absolute value of the z-component of the distance between the protein center of mass (com) and the com of P atoms of the upper leaflet of the bilayer.
The other cvs, θ and φ, are defined as the angles between the membrane normal and the long and the short axes of the protein, respectively (Main text, Fig 2 A-B).
To limit the phase space, we biased each cv separately within overlapping windows (index w) flanked by half-harmonic flat-bottomed restraints on the |z| in the form of we performed separate wt-mtd [12] on |z|, θ, and φ cvs. Therefore, in each simulation (index i), a history-dependent bias on one of |z|, θ, and φ cvs, as well as a time-independent flat-bottomed half-harmonic potential on |z| (V b w ) were active. Metadynamics bias on |z| was applied only within the interval, |z| min w ≤ |z| ≤ |z| max w [13], whereas the metadynamics (mtd) biases on θ and φ were active throughout the cv space. The width of Gaussian hills was set to 0.05Å for |z| and 0.05 rad for θ and φ cvs. The frequency of hill addition was 1 ps −1 for all simulations. Bias factor (γ) was adjusted early in the simulations and finally kept at 40 in every simulation. The history dependent bias was stored on a grid with a spacing of 0.01 for computational efficiency [14]. The neighboring simulations i and i + 1 were coupled through bias exchange scheme [13,15] to improve sampling and phase space overlap. Bias exchanges were attempted between neighbors in alternating even-odd patterns every 1 ps as indicated by arrows in main text, Fig 2 C. We performed 51-54 (17-18 windows) 200 ns long simulations spanning a |z| range from the surface of the membrane to bulk solvent for each system. Each simulation was initialized using a random configuration from unbiased simulations that conform to the |z| limits of the window. In cases where no unbiased simulation was previously performed or a particular window could not be initialized from the unbiased simulation due to lack of sampling, we used short adiabatic bias molecular dynamics (abmd) [16] simulations with a ratchet-and-pawl like restraint along |z| (force constant of 150 kJ/mol/Å 2 ) to generate initial configurations starting from a nearby configuration. Initial 10 ns of each trajectory was discarded and configurations saved at every 100 ps were reweighted combining a time-independent free energy estimator for mtd [17] and a modified version of weighted histogram analysis method (wham) [18] as described in Section s1.3.2. All free energy surfaces were reconstructed with sample weights (p i,t ) used in adaptive two-stage weighted kernel density estimation [19], where the bandwidths for the pilot estimate were determined according to Silverman's rule of thumb [20]. The statistical errors were estimated with the Bayesian block bootstrap approach [21,22] as described in Section s1.3.4, using each independent simulation as a single block from 100 bootstrap samples.

Unbiasing the wt-mtd simulations
Here, we present a new protocol to unbias free energy simulations employing a combination of time-independent and mtd biases. Similar combinations of umbrella sampling (us) and mtd and protocols to obtain free energy surfaces was put forward elsewhere [23,24]. Our approach allows associating weights, p i,t , with each sampled configuration R i,t in simulation i at time t, simplifying the construction of pmfs as a function of arbitrary cvs and the estimation of equilibrium averages.
To this end, we use the time-independent free energy estimator for mtd recently introduced by Tiwary and Parrinello [17] to reweight sampled configurations, lifting the time-dependence due to mtd. We then use these weights in the non-parametric variant of the Weighted Histogram Analysis Method (wham) introduced by Bartels and Karplus [25] to combine all simulations. This approach can be easily generalized to any acting combination of mtd-type and time-independent biases for other applications.
We first estimate the mtd weights associated with each configuration R i,t based on ref. [17]: where V m i,t and c m i,t are the instantaneous mtd bias and the bias offset, respectively. Recently, Tiwary and Parrinello introduced a time-independent locally-converging free energy estimator for the mtd, allowing calculation of c m (t) [17]. Changing their notation [17], the free energy estimate for a mtd simulation on the collective variables s can be expressed as where the instantaneous free energy estimate F (s, t) = γ 1−γ V m (s, t) and γ is the mtd bias factor. Plugging Eq 3 into the expression for c m (t) [17] and simplifying, one can directly estimate the bias offset from the stored mtd kernels: . (4) Next, we reweight each configuration R i,t with the associated weight p m i,t before using the non-parametric wham equations introduced by Bartels and Karplus [25]. This can be achieved by modifying the equations in ref. [25] as follows and solving iteratively until convergence: where N j = Nj t=1 p m j,t , and V b j is the time-independent bias function in simulation j evaluated for configuration R i,t .
The pmf can then be estimated as a function of arbitrary cvs q using: where the average is taken over all sampled configurations. The pmfs were reconstructed for various cvs using weighted kernel density estimation and normalized with respect to the aqueous phase. Similarly, reported free energy values associated with each orientation in Table 1 are obtained by ∆G ori. = −β −1 ln Ri,t∈ori. p i,t /Ω ori.
Ri,t∈aq p i,t /Ω aq , where the sum in the numerator is taken over a subset of all sampled configurations forming a particular orientation (ori. := [Prone Supine]) and the sum in the denominator is taken over configurations in the aqueous (aq) phase. Prone and Supine modes were defined based on the z-φ space as marked on S1 Fig B. The Ω ori. and Ω aq are normalization factors equal to the volume of the collective variable space used in the definition of the binding orientation, ori., and the aqueous phase, respectively. Table 1 also shows the free energy contribution by membrane associated states defined as 5Å < z < 25Å collectively.

Normalized contact frequency analysis
In this application, we define the normalized contact frequency (ncf) as follows: N CF bmp/memb. r,ori.
= CF bmp r,ori. /CF memb. r,ori. , where CF bmp r,ori. and CF memb. r,ori. denote the contact frequency of residue r with bmp and any membrane lipid in a particular orientation, respectively. The ncf are calculated for Prone mode and Supine mode, separately, to get N CF bmp/memb. The contact frequency of residue r with any other group of atoms, g, can be calculated from the biased simulations as a weighted average where C g r,i,t is a number associated residue, r, and configuration R i,t and the sums are taken over all configurations assigned to a particular orientation. In this application, r is a protein residue, g is either bmp or any lipid. C r1,r2 i,t is a binary number equal to one if any atom of g is within 3Å of any atom of r in configuration R i,t , and is equal to zero, otherwise.

Error estimation using Bayesian block bootstrapping
The Bayesian block bootstrap protocol for error estimation can easily be implemented into Eq 5 by making the following assignment where w b i,t represents a set of random weights drawn according to the Bayesian bootstrap protocol for bootstrap sample b [26]. In the block variant, we first group the configurations R i,t into N blocks (each containing n block samples) that are presumably independent and identically distributed.
This can be done based on taking blocks of length equal to at least the autocorrelation time of the quantity of interest, or more conservatively taking each simulation as a single block. Then, N weights w b block are drawn randomly according to Bayesian bootstrap protocol and each configuration R i,t belonging to the block is assigned the weight w b i,t = w b block /n block . Finally, Eq 5 is solved with the modified p m i,t to estimate p b i,t associated with bootstrap sample b. With sufficiently many sets of p b , equilibrium averages and pmfs can be calculated for each set and their standard deviation gives the error estimate.