Atomic resolution mechanism of ligand binding to a solvent inaccessible cavity in T4 lysozyme

Ligand binding sites in proteins are often localized to deeply buried cavities, inaccessible to bulk solvent. Yet, in many cases binding of cognate ligands occurs rapidly. An intriguing system is presented by the L99A cavity mutant of T4 Lysozyme (T4L L99A) that rapidly binds benzene (~106 M-1s-1). Although the protein has long served as a model system for protein thermodynamics and crystal structures of both free and benzene-bound T4L L99A are available, the kinetic pathways by which benzene reaches its solvent-inaccessible binding cavity remain elusive. The current work, using extensive molecular dynamics simulation, achieves this by capturing the complete process of spontaneous recognition of benzene by T4L L99A at atomistic resolution. A series of multi-microsecond unbiased molecular dynamics simulation trajectories unequivocally reveal how benzene, starting in bulk solvent, diffuses to the protein and spontaneously reaches the solvent inaccessible cavity of T4L L99A. The simulated and high-resolution X-ray derived bound structures are in excellent agreement. A robust four-state Markov model, developed using cumulative 60 μs trajectories, identifies and quantifies multiple ligand binding pathways with low activation barriers. Interestingly, none of these identified binding pathways required large conformational changes for ligand access to the buried cavity. Rather, these involve transient but crucial opening of a channel to the cavity via subtle displacements in the positions of key helices (helix4/helix6, helix7/helix9) leading to rapid binding. Free energy simulations further elucidate that these channel-opening events would have been unfavorable in wild type T4L. Taken together and via integrating with results from experiments, these simulations provide unprecedented mechanistic insights into the complete ligand recognition process in a buried cavity. By illustrating the power of subtle helix movements in opening up multiple pathways for ligand access, this work offers an alternate view of ligand recognition in a solvent-inaccessible cavity, contrary to the common perception of a single dominant pathway for ligand binding.


Introduction
Measurements of affinities of small molecules to specific binding sites in proteins have become routine, especially in the context of drug discovery studies where such experiments are often among the first to be performed. These thermodynamic measurements are routinely supplemented by kinetic studies of drug-receptor interactions [1][2][3]. However, in contrast to these standard kinetic and thermodynamic measurements, experiments that provide atomic level insights into the kinetic pathways by which small cognate molecules find their binding sites and the roles that protein conformational dynamics play in this process are lacking. This is particularly the case for ligand binding sites that are deeply buried in the receptor protein core, precluding access, in the absence of structural rearrangements, even for bulk solvent molecules. Yet, many ligands recognize these deeply buried, solvent-inaccessible cavities very efficiently, and often at rates that approach those which are diffusion limited. Effective methodologies for the study of such binding processes in atomistic detail would have significant implications for the rational design of pharmaceuticals at a practical level and would provide important insights into molecular recognition and the role of protein dynamics in ligand binding, in general.
The current work provides an atomistic view of the complete kinetic processes by which a hydrophobic ligand binds rapidly to the occluded and solvent-inaccessible cavity of a wellknown system, namely the L99A mutant of lysozyme from the T4 bacteriophage (T4L L99A). Substitution of Ala for Leu at position 99 results in a 150 Å 3 cavity [5] that can accommodate a range of ligands, including benzene, (Fig 1) [4][5][6][7]. Over the past several decades this protein has been used as a model system for understanding how buried cavities affect protein stability and structure, in general, and how ligands might navigate the protein landscape to bind rapidly to their proper sites. To this end, high-resolution X-ray structures of both the apo form of the protein and the benzene-bound complex have been solved by Mathews and coworkers [4,5] and pioneering studies have been undertaken that relate cavity size to protein stability [6]. NMR measurements have shown that the binding of benzene to T4L L99A occurs with an on rate constant of approximately 10 6 M −1 s −1 [8]. While certainly slower than the diffusion-limited rate (~10 9 M −1 s −1 ), binding is more rapid than what is generally perceived for ligand-recognition in solvent-inaccessible buried protein cavities. Additional NMR studies, based on relaxation dispersion approaches [9], established that T4L L99A exchanges in solution between a major conformer similar to the crystal structure of the protein and a minor state that is populated to 3% with a millisecond lifetime at room temperature [10,11]. However, the structure of this rare state shows that the cavity is occluded by the aromatic side-chain of Phe114 that prevents the binding of benzene so that the dynamic process that has been characterized is not relevant to ligand binding [11,12]. Oxygen and xenon binding studies have also been performed that have helped to identify hydrophobic cavities in the protein [13][14][15]. Further, this system has served as a prototype for many binding free energy calculations [16,17].
Despite the availability of macroscopic kinetic rate constants for aromatic ligand binding, high resolution structures of the bound and free forms of T4L L99A, and NMR spin relaxation data that indicate conformational flexibility in the region surrounding the ligand binding site, a detailed description of how ligands bind to the protein remains elusive. Recent innovations in experimental and computational techniques, notwithstanding, the mechanism by which ligands bind to buried sites in a protein has remained difficult to address, because the process often involves the formation of transient metastable states that are challenging to characterize at high resolution and because the conformational fluctuations leading to binding can be rapid, complicating detailed characterization by most biophysical techniques. To this end, molecular dynamics simulations are emerging as a complementary tool to experiment for the study of molecular recognition processes because of recent innovations in GPU-based technologies, access to distributed computing, and the development of special-purpose computers [18][19][20][21], that have facilitated studies of many pertinent biochemical processes at atomic resolution. Benzene binds to a buried hydrophobic cavity in T4L L99A that is inaccessible from the surface. A) Ribbon representation of T4L L99A bound to benzene (PDB 3DMX) [4]. Bound benzene is shown in black CPK representation, helix 4 (residues 83 to 91) in navy blue, helix 5 (92-113) in dark green, helix 6 (114-123) in red, helix 7 (125-134) in orchid, helix 8 (136-142) in light blue and helix 9 (142-156) in gold. B) The benzene molecule is not visible in the surface representation of T4L L99A because the binding site is buried in the protein with no pathway to the surface in the crystal structure.
In this work, we elucidate the mechanism by which benzene binds to the solvent-inaccessible cavity of T4L L99A by capturing the entire binding process using unbiased molecular dynamics (MD) simulations. Despite the fact that the simulations contained no a priori knowledge of the T4L L99A binding site, the resulting bound conformations that were obtained in a series of microsecond-long simulations and independent trajectories matched that of the crystal structure. Underlying Markov state model analyses [22][23][24] of our simulation trajectories established multiple binding pathways, each with transient opening of a channel involving a distinct pair of helices in the C-terminal domain of T4L L99A. Using an enhanced sampling technique, infrequently biased metadynamics simulations [25,26], the ligand unbinding pathways from the bound state have also been established. The resulting estimates for ligand binding on-and off-rates as well as binding free energies are in reasonable agreement with the experimentally reported values, lending support to the conclusions of this study. Our work establishes that the fast binding of benzene to T4L L99A derives from its access to multiple pathways that are formed from low barrier concerted protein fluctuations that only occur in the mutated protein.
It also establishes the utility of molecular dynamics simulations, in general, in providing detailed descriptions of binding processes that are subsequently validated by comparison of the resulting calculated kinetic and thermodynamic parameters with those obtained via experiment.

Long unbiased simulations place benzene at the target-binding site of L99A T4L with atomic precision
We performed six independent all-atom unbiased MD simulations with lengths varying between 2 and 8 μs, resulting in a total simulation time of 29 μs. In each of the simulations a benzene molecule, starting from a random position in solution, was correctly placed in the target-binding site, corresponding to the hydrophobic and solvent-inaccessible cavity created by the L99A mutation in T4L. Ligands were initially positioned at least 4 nm away from the binding pocket in random orientations. As shown in three representative trajectories (Fig 2 and S1-S3 Movies), the ligand diffused extensively in the solvent, occasionally contacting different parts of the protein surface, before entering the binding pocket. Fig 2A quantifies the time-evolution of pocket-ligand distances en route to binding in three representative binding trajectories, where the distance between the binding cavity and benzene gradually decreases as the ligand finds its target. As depicted in Fig 2B, the MD derived structure converges to within 1-2 Å of the X-ray based model for the holo-form of the protein [4], as quantified by the rootmean-squared deviation (RMSD) of heavy atoms from residues defining the cavity and including the ligand (detailed in Materials and methods). The excellent agreement between the simulated and crystallographic binding regions, including the orientation of benzene (Fig 2C), provides confidence in the underlying force field used to model the T4L L99A benzene binding process. Unlike standard docking approaches which search for the best ligand orientation within a predefined binding site, our long and unguided atomistic MD simulations reproducibly identified and maintained the correct ligand-bound pose without user intervention or incorporation of any prior knowledge of the binding site. As shown in Fig A in S1 Supporting Information in each of the trajectories substantial dehydration accompanies benzene entry into the hydrophobic and solvent-inaccessible binding pocket.

Simulation trajectories reveal multiple binding pathways
One of the key findings of the current work is that there are multiple distinct pathways by which benzene binds the solvent-inaccessible cavity in T4L L99A. Contrary to the belief of a single dominant pathway in protein-ligand binding events, as exemplified by Shaw and coworkers in studies of GPCRs and kinases [18,19], our post simulation analyses of the six successful binding trajectories specifically identified three distinct pathways, each involving opening of channels between pairs of helices near the C-terminus of the protein. Fig 3A and S1-S3 Movies illustrate the three representative binding pathways via the time evolution of a benzene molecule as it reaches the binding pocket. The three pathways involve crucial conformational fluctuations of the protein and specifically, creation of crevices across certain inter-helical interfaces in the C-terminal domain. Notably, these helices were different in each of the three identified pathways as illustrated in Fig 3. For example, in trajectory 1, helices 4 and 6 temporarily move away from one another, creating a pathway to the binding site as seen in the snapshot at 7.388 μs where Ala 99 colored in orange is visible from the surface (Fig 3B). This is in contrast to the snapshot at 7.375 μs where the distance between the two helices is considerably shorter, prohibiting the entry of benzene. In trajectory 2 ( Fig 3C) benzene goes through a path created between helices 7 and 9. Benzene first associates with the surface as seen in the snapshot at 2.08875 μs. The subsequent transient displacement between helices 7 and 9 creates a tunnel through which benzene enters, as shown in the snapshot at 2.090 μs. After residing at an alternate site for~2 ns the ligand reaches the final binding site (Fig 3C). In trajectory 3, benzene enters through an opening created at the junction between helices 5, 6, 7 and 8 (compare snapshots at 3.874 and 3.876 μs, Fig 3D). After a period of approximately 10 ns the ligand localizes to the correct binding site, as illustrated in Fig 3D. In combination, these results show that key conformational fluctuations of T4L L99A lead to the formation of transient conformers where the distance between sets of helices becomes L99A and benzene shows that benzene diffuses through the solvent before finally binding. Protein heavy atoms within 5 Å of benzene in the holoprotein (PDB id: 3DMX) [4] are defined as the 'binding pocket' or 'cavity'. Radial distances were computed between the respective centers of mass of the pocket and the ligand. B) Time profile of the root-mean-squared deviation (RMSD) between the MD derived structure and the crystal structure of the benzene-bound form of the protein. Heavy atoms from cavity residues plus ligand of the MD derived structure are compared with corresponding positions in the crystal structure of the benzene-bound form of the protein (PDB id: 3DMX [4]) to calculate the RMSD values. Initial RMSDs are high because benzene is not in the binding site and not because the protein is distorted. C) Overlay of a snapshot of the benzene-T4L L99A complex from simulations with the crystal structure (PDB id: 3DMX). The crystal structure is shown in grey with the benzene in black CPK representation while the protein in the MD snapshot is shown in red with the benzene in red CPK representation. sufficiently large for benzene to reach the solvent-inaccessible cavity. As depicted in Fig B in S1 Supporting Information, the spatial density profile of benzene around the protein, obtained by combining the long trajectories, provides a cumulative picture of the different pathways that lead to binding. The localization of benzene near the cavity and near the C-terminal domain helices is evident from the time-averaged density profile. However, we also observe certain locations near the N terminus where the ligand resides for significant periods of time (Fig B in S1 Supporting Information). Whether these highly visited locations are ultimately important for ligand binding is not clear.

Markov state model identifies and quantifies transitions between key ligand-binding intermediates and pathways
We have used Markov State Model (MSM)-based analysis [24] of the MD trajectories to obtain mechanistic insights into the ligand binding process and to calculate binding on-and off-rates and thermodynamic parameters. Prior to building a meaningful Markov state model, an additional three hundred, 100 ns trajectories were obtained by initiating independent simulations from different intermediates in the long trajectories. Analysis of the cumulative 59 μs of simulated data (29 μs from long trajectories and another 30 μs from 300 short trajectories, see Materials and methods) using the MSM approach yielded estimates of the thermodynamic and kinetic parameters for the binding process which are in reasonable agreement with experiment ( Table 1). The calculated kinetic on and off rate constants (k on and k off, ) obtained by MSM- For clarity only the crystal structure of the protein is shown in A, with Ala 99 and Phe 114 in orange and blue CPK representations, respectively. All helices are colored as in Fig 1. B) In trajectory 1 helices 4 and 6 transiently move away from one another, temporarily creating a pathway resulting in Ala 99 (colored in orange) becoming visible from the surface in the snapshot at 7.388 μs. The separation between these helices returns to the starting value of~1 nm later in the trajectory (see Fig C in S1 Supporting Information). C) In trajectory 2 benzene (shown in black) enters through a pathway created between helices 7 and 9 (2.09 μs snapshot) after associating with the protein surface (2.08875 μs snapshot). D) In trajectory 3 benzene enters through an opening created at the helix 5, 6, 7 and 8 junction region. On the right side of panels B, C, D benzene entry is illustrated as in A, except that the protein is shown as a grey ribbon in each snapshot and the color of benzene changes from red to green to blue as time progresses and benzene reaches the binding site. Ala 99 is shown in orange in the structures.  Table 1 and Material and methods for equations), are, respectively, k on = 21 ± 9 x 10 6 M -1 s -1 (MFPT on = 5163 ns) and k off = 311 ± 130 s -1 (MFPT off = 3.2 x 10 6 ns). As compared in Table 1, the MSM-derived k on is larger than that measured experimentally, k exp on = 0.8-1 x 10 6 M -1 s -1 . We also note that the k on value calculated from the average binding times from six long unbiased multi-microsecond MD trajectories yields an on-rate constant of 25 x 10 6 M -1 s -1 , very close to the MSM-derived k on rate (21 ± 9 x 10 6 M -1 s -1 ). This implies that the underlying deviation of the simulated k on value from the experimental k exp on rate does not result from MSM analysis but is due to limitations in the force field. For example, the TIP3P [27] water model used in our MD simulations leads to water diffusion rates that are two to three times higher than those based on experiment. The ligand unbinding rate constant, i.e. the so-called off-rate constant k off , computed using both MSM (k off = 310 ± 130 s -1 ) and Metadynamics approaches (k off = 270 ± 100 s -1 ) (detailed later) is in reasonable agreement with experiment (k exp off = 950 s -1 ). In this context, we note that there are several reports highlighting systematic deviations of MSM-derived offrate constants from those determined experimentally [20,21]. However, it should also be emphasized that the MD estimate of k off is less precise than for k on because spontaneous unbinding events were not observed. Finally, we have also computed the standard binding free energy (ΔG 0 binding ) of benzene to T4L L99A using the MSM-derived stationary population of the unbound and bound conformations (see Table 2). The computed standard binding free energy of -6.9 ± 0.8 kcal/mol predicts slightly higher binding affinity than the experimentally measured ΔG 0 binding = -5.2 ± 0.1 kcal/mol. A simple four state MSM was constructed from our simulated data, as described in Materials and Methods. The four macrostates are illustrated in Fig 4, where MS0 and MS3 are identified with the unbound solvated benzene state and the final bound conformation, respectively, and MS1 and MS2 are two intermediates, with benzene localized near different entry points. Populations and lifetimes of each of the states, along with the committor values that measure the progress of the binding process, are summarized in Table 2. As shown in Table 2, the stationary population of the bound state, MS3, is highest and as described above, the binding free energy of -6.9 ± 0.8 kcal/mol, derived from the stationary state populations, is in reasonable agreement with the experimental measurement. The location of the benzene ligand is distinct Table 1. Thermodynamic and kinetic parameters estimated from MD simulations are in reasonable agreement with experimentally measured values. The thermodynamic and kinetic parameters were calculated from the four state MSM based on the mean first passage time (MFPT), k on = 1/(MFPT on C), k off = 1/MFPT off where C is the benzene concentration, 9.55 mM. The standard binding free energies were calculated based on the stationary populations (shown in Table 2) of bound and unbound macrostates as obtained from the MSM [20]. Unless specified, the experimental values refer to those from the NMR study by Dahlquist and coworkers [8] and are used for all comparisons.  ). Notably, benzene is bound to different positions of the protein in the MS1 state, potentially interconverting rapidly between the different sites. These include locations near a pair of entry points that are comprised of helices 7 and 9 or the junction between helices 5, 6, 7 and 8 that place the ligand close to the cavity. The positions of benzene in MS1 are, thus, as found in MD trajectories 2 and 3 ( Fig 3C and 3D), so that MS1 is a crucial intermediate. In contrast, benzene is further from the cavity in MS2, positioned instead on the surface of helices 4 and 6. MS2 is thus an intermediate of trajectory 1 that is elucidated from MD simulations ( Fig 3B). The computed committor probabilities from transition-path-theory based analysis suggest a higher (0.8) commitment of MS1 towards the bound state than that of MS2 (0.2) (

Insight into the low activation barrier for ligand binding to T4L L99A
The fact that benzene is able to rapidly bind to T4L L99A is consistent with a low activation barrier. The activation free energy ΔG Ã is not readily available from experimental data, as the temperature dependencies of measured rate constants provide only an estimate of the activation enthalpy ΔH Ã [28]. By assuming a diffusion controlled ligand-protein binding rate of~10 9 s -1 M -1 , and comparing this value with the experimentally measured k on~1 0 6 s -1 M -1 rate, Feher et. al. estimated the free energy barrier for binding to be relatively small, 4-5 kcal/mol (6.5 to 8.5 RT) [8]. Consequently, the barrier for the unbinding process, DG Ã unbinding , can be calculated to be~8-9 kcal/mol (13-15 RT), based on ΔG 0 binding = -4.2 kcal/mol ( Table 1). The MD simulations are also consistent with a small barrier as the observed binding rates are faster than the experimentally measured ones, even after accounting for the faster diffusion constant of the TIP3P-modeled water used here. The unbinding activation barrier can be estimated from the MD simulations using the relation k off ¼ ð1=t TPT Þe À DG Ã unbinding =RT [29], where the transition path time τ TPT is the time required for benzene to transition from the cavity to the unbound form. Note that τ TPT is obtained from simulation while k off can be measured experimentally or estimated by simulation. Values of τ TPT have been quantified directly from the six binding trajectories. In this approach, a benzene molecule was considered to be bound to T4L L99A if any of its carbon atoms was less than 0.38 nm from the Ala 99 Cβ position and, conversely, unbound when every carbon atom was greater than 0.6 nm from the Ala 99 Cβ. Based on this criterion, τ TPT values for the six trajectories vary from 0.2 to 24 ns with an average of 7 ns. Using this average value along with an experimental value for k off of 0.95x10 3 s -1 [8], DG Ã unbinding = 11.9 RT was estimated, in agreement with the barrier obtained from the analysis of Feher et al. The small value of DG Ã binding~5 RT, that is estimated from DG binding ¼ DG Ã binding À DG Ã unbinding , further emphasizes the small activation barrier to binding. In the above analysis barriers were estimated assuming a single binding pathway but the conclusion regarding small barriers will hold for the multiple paths as each of them occurred spontaneously during the simulations.
The MD trajectories described above provide insights into the origin of the small activation barrier for ligand binding. In all three binding routes identified in this study (Fig 3) the ligand reaches the cavity through the transient formation of pathways that are linked to the displacement of specific helices by 2-3 Å, without large scale changes in secondary structure or unfolding. We hypothesized that the L99A mutation not only creates the binding cavity but might also increase the flexibility of the cavity-containing C terminal domain in general, leading to an increased probability for the adoption of conformations that allow ligand binding. To test this hypothesis we calculated backbone order parameters squared, S 2 , from chemical shifts for both T4L L99A and T4L WT (referred to as T4L WT Ã in what follows, see Material and methods) [30]. S 2 values report on the amplitudes of ps-ns motions of amide bond vectors [31], range between 0 (isotropic motion)-1 (rigid), and are routinely used to identify flexible regions of proteins [32,33]. The similar S 2 vs residue profiles for both T4L L99A and WT Ã ( Fig  5A) indicate that fast timescale backbone dynamics are not affected by the L99A mutation, while the high S 2 values (0.8 to 0.9) show that both forms of T4L are rigid and that rapid backbone motions are not linked directly to ligand binding. In order to ascertain whether the L99A mutation increases the population of transiently formed high energy conformers that would allow benzene to move past helices and enter the cavity we have computed free energy profiles for both T4L L99A and WT Ã as a function of the distances between helices 4 and 6 ( Fig 5B) and between helices 7 and 9 ( Fig 5C). Interestingly, we observe that the free energy penalty for larger inter-helical distances in T4L WT Ã is greater than for T4L L99A, and thus the conformations that facilitate binding are less probable for T4L WT Ã . For example, the probability of a Helix 4/6 separation of 1.30 nm, that is required for benzene entry (Fig 2B and 2C) increases by *40 fold in the L99A mutant, as it is 2.21 kcal/mol more unfavorable to separate this helix pair by 1.30 nm for the WT Ã sequence (e ΔG/RT = 39 for ΔG = 2.21 kcal/mol). It is thus unlikely that a molecule of benzene could 'squeeze' between the helices in T4L WT Ã before reaching the occluded binding site due to the steric effect of residue L99. As the helices forming the C terminal domain of T4L are packed against one another, creating openings between helices that are required for binding (Fig 3) results in subtle positional changes to the other helices as well. A closer inspection of the trajectories showed that movement of helices 4 and 6 or 7 and 9 that is required to accommodate binding reduces the distance between side-chains of Ile78 and Ala99. A similar change in these inter-helical distances in the WT Ã protein would lead to a showing that a larger separation between helices is more likely in the T4L L99A mutant. D) Steric clashes between Leu99 and Ile78 destabilize conformers of the WT Ã protein when the inter-helical distance has increased to allow benzene access to the cavity. The snapshot from trajectory 1 at 7.388 μs (L99A) has a helix 4-helix 6 distance of 1.32 nm, with no steric clash between Ile78 (dark red) and Ala99 (orange). In contrast, in the WT Ã protein the larger Leu residue at the 99 position (yellow) leads to a steric clash between Ile78 and Leu99, destabilizing the conformation. The A99 to L99 mutation was performed by superimposing the backbone coordinates of residues 95 to 103 from the WT Ã crystal structure in grey on the snapshot from the simulation in brown.
https://doi.org/10.1371/journal.pcbi.1006180.g005 steric clash between Ile78 and the larger Leu99, as illustrated in Fig 5D in the case of Trajectory 1 where the helix 4-helix 6 distance transiently increases for only a few nanoseconds. We would like to emphasize that the population of the conformer where helices 4 and 6 are separated by 1.30 nm is extremely low, even for T4L L99A, (e −ΔG/RT *0.0046% with ΔG = 3.23 kcal/mol, Fig 5B) relative to the equilibrium conformer corresponding to a helix4-helix6 distance of~1 nm. Thus, this ligand-accessible state cannot be quantified through experiment, further emphasizing the importance of large scale MD simulations like those performed here. A similar scenario is also found for the separation of helices 7 and 9 (Trajectory 2, Fig 3) where increasing the inter-helical distance from 1.05 nm to 1.25 nm, that is required for ligand entry, is more unfavorable by 5.9 kcal/mol for WT Ã than for T4L L99A.

Unbinding trajectories validate and add to the ensemble of pathways generated from unbiased MD sampling
The computational expense of unbiased binding simulations prohibits the exhaustive survey of all of the possible kinetic pathways. Hence we explored a complementary approach of validating the identified pathways and of enriching the ensemble of binding trajectories using a recent implementation of metadynamics simulation [25] to enable the extraction of the kinetics of benzene unbinding from the cavity. Using the radial distance between the binding pocket and the ligand as the collective variable, this simulation technique infrequently deposits repulsive, history-dependent Gaussian bias along the pocket-ligand distance so as to efficiently accelerate the unbinding process. The corresponding acceleration factor, multiplied by the simulation time, provides an estimate of the time for unbinding. As depicted in Fig D in S1 Supporting Information, the p-value analysis [26], as suggested by Tiwary and Parrinello, provided a good Poisson fit with a high p-value of 0.73, when averaged over numerous independent trajectories. The computed off rate, 369 s −1 , is in reasonable agreement with the experimentally measured off-rate, Table 1, showing that the ligand-pocket distance is a relevant collective variable for exploring benzene-T4L L99A unbinding pathways. Significantly, as depicted in Fig D in S1 Supporting Information, the unbinding trajectories revealed two unbinding pathways of the ligand from the pocket. These involved benzene exiting through small openings between helices 7 and 9 and the juncture of helices 5, 6 7 and 8, pathways that are the reverse of the subset of those previously identified for binding.

Discussion
Although ligand binding to receptors is critically important for many biological processes an atomistic understanding of how this might occur in the context of occluded binding sites is lacking. This reflects the fact that while the main biophysical tools that are used to study molecular interactions are powerful for characterizing and providing atomic resolution structures of the end points of the binding process (free and bound states) they are often much less robust in generating a description of the binding mechanism. Improvements in both computer technology and MD simulation methodology have made it possible to obtain μs-ms MD trajectories of proteins in explicit solvent so that it is possible to study processes such as ligand binding, conformational exchange, and protein folding using MD simulations [12,18,19,34]. Here, using unbiased MD simulations, we have addressed the longstanding question of how hydrophobic molecules rapidly reach buried cavities in proteins by working with a model system in which benzene binds to a 150 Å 3 cavity mutant in T4L L99A. Central to every MDbased approach is that the force field used should be able to model the underlying free energy surface accurately. We chose to use the CHARMM force field [35] as previous MD studies had shown that the Phe114 buried to exposed conformational transition in T4L L99A and in the related protein, T4L L99A/G113A/R119P, is well modeled with it [12,36]. The structures of the benzene bound conformers obtained from the unbiased simulations are in excellent agreement with those determined by crystallography (backbone RMSD < 2 Å, Fig 2C), with the position of the benzene within 2 Å of the binding site residues established experimentally [4]. Further, kinetic and thermodynamic parameters obtained from the simulations are in reasonable agreement with the experimental values (Table 1), providing confidence in the results of the MD study.
Our unbiased μs timescale MD simulations establish that benzene, initially fully hydrated, reaches the internal cavity created by the L99A mutation via at least three different trajectories. Notably, the process does not involve a large-scale deformation to the T4L L99A structure, nor significant changes to secondary structure, but rather small perturbations to the positions of two or more helices in the C terminal domain of the protein that create pathways for binding (Fig 3). Similar tunnels from the protein surface to the cavity were also observed in a 30 μs MD simulation of T4L L99A using the AMBER force field [37]. The existence of multiple pathways coupled with a small net activation barrier explains how benzene can bind the occluded cavity rapidly.
Previously we had used combined MD simulations and NMR relaxation experiments to study how the side-chain of residue Phe114 fills the cavity created by the L99A mutation, that in some sense serves as a surrogate to hydrophobic ligands such as the benzene molecule considered here. It might be expected, therefore, that the binding pathways of Phe114 and benzene would share some similar features. Notably, in one of the trajectories (Trajectory 3) benzene enters the cavity in a manner similar to Phe114 [12] from its solvent exposed conformation.
The binding of ligands to occluded sites in other proteins, such as oxygen binding to the buried heme group in myoglobin, has been studied using a variety of experimental and computational techniques [38,39] In the case of myoglobin, the mechanism of binding also involves small barriers, multiple pathways and secondary binding sites in the protein [38,39]. Multiple pathways have also been observed by Dickson and Lotz in recent studies of an epoxide hydrolase inhibitor [40] and a trypsin-benzamidine system [41] that exploited a weighted ensemble based algorithm. The plasticity of T4L L99A allows the larger benzene molecule to reach the cavity in a manner similar to how oxygen connects with the heme group in myoglobin. The origin of the conformational plasticity in T4L L99A that facilitates this process is, in fact, the L99A mutation itself (Fig 5). Replacing the Leu99 residue with the smaller Ala not only creates the cavity but also allows helices to move with respect to one another to generate the pathways that are required for binding. In contrast, the presence of Leu99 destabilizes the binding competent conformations by clashing with Ile78 ( Fig 5). It remains of interest to investigate if the benzene binding rate can be modulated by mutating Ile78 to smaller hydrophobic residues. Similar to the binding of benzene, protein plasticity and cavities have also been implicated in oxygen binding to T4L L99A [14].
Preliminary observations into the mechanism of ligand unbinding from T4L L99A have recently been published and are in agreement with the results of this study. Kitahara et. al. [14] developed a simple and elegant 15 N NMR based method to detect O 2 binding sites in proteins using T4L L99A as a model system. In order to quantify the 15 N chemical shift changes that were observed upon oxygen binding MD simulations were performed to understand the rotational and translational diffusion properties of oxygen molecules in the T4L L99A cavities. Notably, an oxygen molecule escaped from the cavity via a pathway that is the reverse of Trajectory 1 (Fig 3). In MD simulations performed to understand how water interacts with the cavity at high pressures, water molecules were found to escape from the cavity due to transient openings formed at the junction of helices 5, 6 and 7 [42], as observed in the creation of a pathway to the cavity in Trajectory 3. In adiabatic-biased MD simulations of benzene unbinding from T4L L99A a pathway that is the reverse of Trajectory 3 was observed [36]. In a separate 27 μs MD trajectory of T4L L99A performed to understand the Phe114 exposed to buried conformational transition, tunnels leading to the cavity from solvent were transiently formed [37], as observed in the present work. A recent implementation of a biased simulation approach by Lindorf-Larson and coworkers also led to the extraction of the kinetics of the benzene recognition process [43]. McCammon and coworkers [44] have employed an accelerated MD-based biased simulation approach to explore the thermodynamics of benzene binding to T4L L99A cavity. Very recently, Arantes and coworkers [45] have reported weighted ensemble based biased simulations of unbinding of benzene from the T4L cavity and revealed multiple benzene exit pathways, in agreement with the current study. However, none of these simulationbased studies address the mechanism by which ligands spontaneously bind to buried protein cavities and consequently insights into multiple pathways and low free energy barriers were not obtained.
Recent studies suggest that diverse processes such as protein folding and protein interconversion between different compact conformers proceed via multiple pathways and/or small barriers [12,46,47]. More studies are required to determine if small activation barriers and multiple paths are common features of biomolecular recognition processes and to characterize at atomic resolution what these pathways are.

Materials and methods
Here, as in most biophysical studies involving T4 lysozyme, the cysteine free version of the protein [48] was used, T4L WT Ã , where Cys54 and Cys97 are replaced by Thr and Ala, respectively. In T4L L99A the L99A mutation is introduced into the T4L WT Ã background.

MD simulations and analysis
Unbiased binding simulations. The X-ray crystallographic structures of T4L L99A free and benzene bound forms (PDB code: 3DMX and 3DMV) were used as the starting points for all of the simulations [4]. All binding simulations were initiated by placing the apo form of L99A T4L at the center of a cubic box of dimension 7.2 nm with the empty space then filled with water and ion molecules. The system was solvated with 11613 triatomic water molecules and sufficient numbers of sodium and chloride ions were added to keep the sodium chloride concentration at 150 mM and render the system charge neutral. Two benzene molecules, corresponding to a concentration of 9.55 mM, that is less than the benzene solubility limit in water (20 mM) [49], were placed in random positions in the solvent. The two molecules were allowed to diffuse freely and no artificial bias was introduced throughout the simulation. The system included a total of 37541 atoms. The protein, benzene and ions were parameterized with the charmm36 force field [35] and the TIP3P water model [27] was used.
All MD simulations were performed with the Gromacs 5.0.6 simulation package [50], in most cases benefiting from usage of Graphics processing units [51]. During the simulation, the average temperature was maintained at 303K using the Nose-Hoover thermostat [52,53] with a relaxation time of 1.0 ps and an average isotropic pressure of 1 bar was maintained with the Parrinello-Rahman barostat [54]. The Verlet cutoff scheme [55] was employed throughout the simulation with the Lenard Jones interaction extending to 1.2 nm and long-range electrostatic interactions treated by Particle Mesh Ewald (PME) summation [56]. All bond lengths involving hydrogen atoms of the protein and the ligand benzene were constrained using the LINCS algorithm [57] and water hydrogen bonds were fixed using the SETTLE approach [58]. Simulations were performed using the leapfrog integrator with a time step of 2 fs and initiated by randomly assigning the velocities of all particles.
Initially six long independent unbiased trajectories were performed, differing in the assignment of initial velocity seeds. These ranged between 2.5 and 8.1 μs with a total simulation length of 29 μs. The simulations were only terminated after one of the two copies of the ligand settled into the target binding site. The binding process was ascertained via inspection and by checking (i) the radial distance between respective centers of mass of benzene and the cavity and (ii) the root-mean-squared deviation (RMSD) of the simulated conformation of the protein-ligand system from that of X-ray crystal structure of holo T4L L99A (PDB id: 3DMX) [4] that included benzene atoms and those of the protein cavity. The 'cavity' or the 'binding pocket' was defined by the protein heavy atoms within 5Å of benzene in the X-ray structure of holo T4L L99A (PDB id: 3DMX). For the calculation of RMSD, the cavity plus ligand of each frame of the trajectory was first translationally and rotationally aligned with that of the crystal structure of the benzene-bound form of protein (PDB id: 3DMX) and then the deviation was computed. The ligand was ascertained to be bound to the cavity when the RMSD remained below 0.5 nm and the cavity-ligand distance was below 0.2 nm for a simulation duration of at least 100 ns.
Apart from the six μs-long trajectories discussed above, we have also performed 300 short independent trajectories, each 100 ns in duration. These short trajectories were initiated from different intermediates observed from the long binding simulations and improved the statistics of the MSM that was derived from analysis of the MD trajectories (see below). Overall, an aggregate of about 59 μs of unbiased trajectory was recorded.
MSM analysis of the MD data. The generation of a MSM is a powerful way to automatically identify the kinetically relevant states and their interconversion rates from the simulated trajectories [59][60][61][62]. We employed PyEMMA [63] (http://pyemma.org) to construct and analyze the MSM obtained from the combination of all the recorded MD trajectories. The nearest-neighbor (cut-off 0.5 nm) heavy-atom contacts between benzene and protein residues were used as input coordinates. The time-lagged or time-structure-based independent component analysis (tICA) [64][65][66] with a correlation lag time of 20 ns was used for dimensionality reduction, which linearly transforms the high dimensional input data into a slow linear subspace (i.e. collective coordinates sorted by 'slowness'). The high dimensional input coordinates were subsequently projected onto 12 tICA components, covering more than 95% of the kinetic variance. We employed the k-mean clustering algorithm [67] to discretize the tICA data into 100 clusters. Then 100 microstates MSMs were constructed at variable lag times to identify the appropriate lag time. A lag time of 10 ns was used to construct the final MSM as the implied time scale leveled off at about 10 ns which ensures the Markovianity of the model (see Fig E in S1 Supporting Information). To better understand the dynamics of ligand binding a coarsegrained kinetic model with four metastable states was constructed using a hidden Markov model (HMM) [68].
The kinetic parameters reported in Table 1 were calculated using the mean first passage time (MFPT) obtained from the MSM analysis based on four derived macrostates. The MFPT was computed as the average time taken for the transition from the initial to the final macrostate [69]. The calculation included both the direct transition from the initial state to the final state and transitions through other intermediate states. The on-rate and off-rate constants are respectively calculated as k on = 1/(MFPT on C) and k off = 1/MFPT off where C is the benzene concentration, 9.55 mM. The binding free energies (ΔG) were calculated based on the stationary populations (shown in Table 2) of bound and unbound macrostates as obtained from the MSM [20], where π bound and π unbound are stationary populations of bound and unbound macrostates, k B is Boltzmann's constant, and T is the absolute temperature. We subsequently converted the result to a standard free energy ΔG 0 (one molar concentration) for comparison with experiment. Finally, the transition path theory as proposed by Vanden-Eijnden and coworkers [70] was employed to calculate the transition path fluxes. Free energies of helix opening. The umbrella sampling simulation technique [71] was used to map the free energy surface underlying the opening of a pair of helices to facilitate ligand binding to the cavity in T4L L99A. To this end, we performed two independent umbrella sampling simulations where the distance between centers of mass of Cα-atoms of helices 4 and 6 or of helices 7 and 9 was used as the collective variable (CV). The value of the CV ranged from 0.9 nm-1.6 nm for the distance between helices 4 and 6 and from 1 to 1.3 nm for helices 7 and 9. The starting configuration for each window was picked from the prior unbiased simulation trajectories, with spacing between consecutive windows of 0.05 nm. Each window was restrained to the desired value of the CV using a harmonic potential of force constant 8000 kJ/mol/nm 2 to ensure that the distribution of the CV for each window remains Gaussian with sufficient overlap in the distribution of two consecutive windows so that the entire space of the CV is sampled. Each of the windows was sampled by the umbrella sampling technique for 20 ns. Finally, the weighted histogram analysis method (WHAM) [72,73] was employed to reweight the umbrella-sampled windows and to obtain the unbiased potential of mean force as a function of the CV. In order to investigate the effect of the L99A mutation on the free energetics of helix opening, the entire series of umbrella samplings was repeated by insilico mutation of Ala99 in each of the starting configurations to Leu99.
Unbinding simulations using an enhanced sampling technique. The recent implementation of metadynamics by Tiwary and Parrinello [25] was used to compute unbinding rates of benzene from the hydrophobic cavity of L99A T4L and to extract unbinding pathways. In this novel method, by making the bias deposition slower than the time in bottlenecks, and by using a suitable CV capable of distinguishing between bound and unbound states, it is possible to keep the transition state relatively bias-free through the course of the metadynamics simulation.
This so-called 'infrequent metadynamics' approach allows one to accelerate the transition from ligand-bound to unbound states, without affecting the transition state. More importantly, this technique provides an estimate of the acceleration factor, which when multiplied with metadynamics simulation times, provides the true unbinding time. As elaborated in the original work of Tiwary and Parrinello [25], the acceleration factor is given by the time α = he βV(s,t) i t where s is the CV being biased, ß is the inverse temperature, V(s,t) is the bias experienced at time t and the subscript t indicates that averaging is performed under the timedependent potential. The above expression is valid even if there are multiple intermediate states and numerous alternative reaction pathways. In our current work, we employed the distance between the centers of mass of the cavity and benzene as the CV using a Gaussian width of 0.025 Å. One of the reasons for choosing the distance between the binding pocket and the ligand as the CV was that it can distinguish between bound and unbound states very well, while providing an estimate of binding free energies that agree with those obtained from experiment. During the 'infrequent metadynamics' simulation approach, the Gaussians were deposited every 10 ps with a starting height of 1.2 kJ/mol and gradually decreased on the basis of a well-tempered metadynamics biasing factor, Γ = 6. A total of 15 such 'infrequent metadynamics' simulations having different initial velocity distributions were spawned with the ligand fully bound to the cavity and simulations were stopped only when the ligand became fully unbound, reaching bulk-solvent i.e. the distance between the binding pocket and the ligand exceeded 3 nm. Acceleration factors, as described earlier and elaborated on in reference [25], were computed during the simulations and multiplied with the Metadynamics run time to get the true unbinding time.
The transition time statistics so obtained were then subject to a Poisson analysis [26] to ascertain their reliability (based on the Kolmogorov-Smirnov statistical analysis) and to obtain the average unbinding time, from which the unbinding rate is easily calculated. Our approach of performing metadynamics simulations using the cavity-ligand separation as a single CV was sufficient to pass the so-called p-value test which suggested that the frequency of bias was slow enough and the CV was appropriate to accelerate unbinding. All simulations were carried out using Gromacs 5.0.6 patched with the PLUMED plugin [74].
NMR experiments. Assignments for T4L WT Ã were obtained using standard triple resonance experiments [76,77], with assignments for T4L L99A published earlier [11]. All experiments were performed at 22˚C on a 700 MHz Bruker Avance III HD spectrometer equipped with a triple resonance cold probe.
NMR data processing and analysis. NMR data were processed using the NMRPipe software package [78] and the resulting spectra were visualized and analyzed using SPARKY [79]. To overcome complications due to chemical exchange, order parameters were estimated from chemical shifts according to the method of Berjanskii and Wishart [30], as implemented in TALOS+ [80].