Multiple Routes and Milestones in the Folding of HIV–1 Protease Monomer

Proteins fold on a time scale incompatible with a mechanism of random search in conformational space thus indicating that somehow they are guided to the native state through a funneled energetic landscape. At the same time the heterogeneous kinetics suggests the existence of several different folding routes. Here we propose a scenario for the folding mechanism of the monomer of HIV–1 protease in which multiple pathways and milestone events coexist. A variety of computational approaches supports this picture. These include very long all-atom molecular dynamics simulations in explicit solvent, an analysis of the network of clusters found in multiple high-temperature unfolding simulations and a complete characterization of free-energy surfaces carried out using a structure-based potential at atomistic resolution and a combination of metadynamics and parallel tempering. Our results confirm that the monomer in solution is stable toward unfolding and show that at least two unfolding pathways exist. In our scenario, the formation of a hydrophobic core is a milestone in the folding process which must occur along all the routes that lead this protein towards its native state. Furthermore, the ensemble of folding pathways proposed here substantiates a rational drug design strategy based on inhibiting the folding of HIV–1 protease.

where r i, j is the distance between the two atoms and r 0 is taken to be r 0 = 8.5 Å. 1 This definition of contact map is different from the one commonly used in literature, 1

which is discrete and
where r 0 is intended as a sharp cutoff. In order to be used as collective variables in a metadynamics simulation, contacts must be defined in terms of a function with continous derivatives. The definition in Eq. (1) has already been used successfully in studying the folding of small peptides 2,3 and

NVT simulation at room temperature
The NVT run was carried out for 512 ns at 300K using a Langevin thermostat with a damping coefficient of 5 ps −1 . The structure of the monomer appeared stable during the whole NVT simulation. The root mean square fluctuations of the C α atoms were within 0.5 and 1.5 Å along most of the chain ( Figure S1). Three regions displayed a grater mobility: the C and N termini and fragment 45-55, corresponding to β 2. A partial assembling of the N and C terminal into a beta structure was also observed ( Figure S2, top panel).
In the bottom panel of Figure S2 we monitor the time series of our six sets of native contacts.
In the time scale of the simulation a spontaneous breaking and reforming of the hydrogen-bonds pattern that stabilizes β 1-β 3 occurred. Atomistic details of the interaction between β 1 and β 3 are reported in Figure S3.

Unfolding analysis
The unfolding analysis was carried out on a set of 30 trajectories generated starting from the same equilibrated structure with different initial velocities. The temperature of 700K was enforced using a Langevin thermostat. All the thermal unfolding runs were simulated for 8 ns. The final configurations had a RMSD from the native structure that ranged from 11 to 22 Å, calculated on the C α atoms.
To identify common events and main unfolding routes, the ensemble of configurations produced in the unfolding simulations at 700 K was clusterized using the k-means algorithm. 10 This is a simple way to classify a set of N data in a certain number of cluster k fixed a priori. The algorithm aims at minimizing the error squared function: where x ( j) i − c j 2 is the distance between a data point x ( j) i and the cluster centre c j . Here we chose as distance between two configurations x 1 and x 2 the distance in the space of contact maps: where the index j runs over the six sets of contacts in which we classified the native map, C j i is the ith element of this set, as defined in Eq. (1), N j the total number of contacts belonging to this group.
Two clusters were connected by a link if a transition between them was observed during the unfolding simulations. To visualize the connectivity among the clusters found, we used Visone. 11 The method used to display and organize the network of clusters was the metric multidimensional scaling. The clustering algorithm used here is based on a choice a priori of the number of clusters S3 in which data are organized. We explicitly checked that the sequence of events and the different unfolding pathways found by our analysis were robust with respect to this choice ( Figure S4).

Structure-based potential simulations
Coarse grained simulations of the HIV-1-PR monomer were carried out using the all-atom structurebased potential introduced in Ref. 12 and GROMACS 4 13 together with PLUMED. 14 To build the topology, the web server of Onuchic research group (http://sbm.ucsd.edu/) was used.
A time step of 0.0005 in reduced unit and the stochastic thermostat of Bussi et al. 15 were used.
A thermostat that yields the correct energy fluctuations of the canonical ensemble is crucial in parallel tempering simulations. 16 For the PTMetaD simulation, 16 replicas were distributed with a geometric progression in a temperature range between 0.969 and 1.057 in unit of T f = 113.5K.
Exchanges between configurations were attempted every 200 steps. The total simulation time for each replica was 2 · 10 7 steps. As collective variable, we used the total number of native contacts Q without any discrimination among our six subsets. The ratio between the fictitious temperature of the collective variable T + ∆T 17 and that of the simulation T was kept constant across the different replicas: Gaussians of 1.0 kjoule/mol height and 5.0 width were deposited every 1000 steps. We monitored the convergence of the PTMetaD simulation by calculating at different times the free-energy difference between folded and unfolded states ( Figure S5): where β = (k B T ) −1 and V (Q,t) is the metadynamics bias potential acting at time t on the collective variable Q. The definition of folded (Q ≥ 0.5) and unfolded (Q < 0.5) is arbitrary.

S4
Convergence was accelerated by orders of magnitude with respect to standard PT. 18

Reweighting procedure
To calculate from the PTMetaD runs the multiple FES as a function of the fraction of native contacts of our six descriptors, we used the reweighting algorithm introduced in Ref. 19 This method is based on the evolution of the biased probability distribution P(R R R,t) during the metadynamics simulation: P(R R R,t + ∆t) = e −β (V (s s s(R R R),t)− V (s s s,t) )∆t P(R R R,t), whereV (s s s(R R R),t) is the time derivative of the bias potential and the average in the exponent is calculated in the biased ensemble. In the long time limit, the Boltzmann distribution P B (R R R) can be recovered using a standard umbrella sampling reweighting: