Folding kinetics of an entangled protein

doi:10.1371/journal.pcbi.1011107

Fig 1.

Cartoon native structures and corresponding Gaussian entanglement histograms of the proteins studied in this work.

(a) Cartoon native structure of the Type III antifreeze protein RD1 (Protein Data Bank (PDB) code 1ucs), which exhibits a mixed α/β structure. To illustrate the native topological complexity of RD1, we show in red one entangled loop (G′ = −0.97) between the contacting residues L17 and M41, in yellow, with the associated threading fragment in blue, the first 14 N-terminal residues (see the Gaussian entanglement subsection for the definition of entangled loops and threads). Whilst, the most entangled contact (G′ = −1.03) is shown in green, between residues N14 and K61. (b) Cartoon native structure of the SH3 domain (PDB code 1srl), which has a mainly-β structure, with five β strands. (c) Histogram of the Gaussian entanglement values for the native loops in the RD1 protein. The unnormalized weight function used to evaluate the entanglement indicator 〈G′〉 = −0.68 for the whole RD1 protein native structure is also shown. (d) Histogram of the Gaussian entanglement values for the native loops in the SH3 domain. The unnormalized weight function used to evaluate the entanglement indicator 〈G′〉 = 0.08 for the whole SH3 domain native structure is also shown. The equation defining the unnormalized weight function is reported in the legends of both panels c and d. It is a Hill activation function with threshold g₀ = 0.5 and cooperativity index m = 3. The weight function needs to be properly normalized to compute the weighted average 〈G′〉. While we use the same unnormalized weight function in all cases, the normalization is specific to each distinct protein configuration (see the Gaussian entanglement subsection for details).

More »

Expand

Fig 2.

Folding thermodynamics of the non-entangled SH3 domain.

Probability of native contact formation in the transition state ensemble at the folding temperature for the non-entangled SH3 domain. The inset shows the dimensionless free energy profile F(Q) as a function of the fraction of native contacts Q at the folding temperature. The shaded area in the inset highlights the interval of Q values used to define the transition state ensemble. (a) Energy function with 12/6 Lennard-Jones potential. (b) Energy function with 12/10 Lennard-Jones potential. The similarity of the two contact maps can be quantified by computing correlation coefficients between the respective contact formation probabilities. Pearson’s linear correlation coefficient: r = 0.981; Spearman’s rank correlation coefficient: r = 0.985; in both cases p–value < 10⁻⁷⁷.

More »

Expand

Fig 3.

Folding thermodynamics of the entangled RD1 protein at the folding transition temperature.

Contour plot of the dimensionless free energy surface in the (Q, 〈G′〉) plane for the entangled RD1 protein. The free energy is evaluated as the negative of the log-scale histogram of data collected from 8 long equilibrium trajectories at T = T_f. Histogram negative log-counts are smoothed using KDE (see the Ensemble definition and pathway classification subsection for details) and shifted in order for their minimum to be 0. Contour levels correspond to approximately 0.4 in log-histogram units. The bottom horizontal inset shows the dimensionless free energy profile F(Q) obtained using the WHAM method (see the Weighted histogram method subsection). The left vertical inset shows the dimensionless free energy profile F(〈G′〉) after the projection onto the entanglement indicator.

More »

Expand

Fig 4.

Time series from refolding trajectories for the RD1 protein.

Upper row: fraction of native contacts Q with labels referring to the visited states. Lower row: entanglement indicator 〈G′〉. (a) Example of “fast” folding trajectory. The two red arrows illustrate how in the folding transition the entanglement indicator reaches native values before the fraction of native contacts does the same, delimiting the short visit to the IE₋ intermediate prior to folding. In this trajectory, it is shown an example of a visit to IE₋ followed by a return to the unfolded state. (b) Example of “threading” trajectory. (c) Example of “backtracking” trajectory. The red arrow marks the final folding transition.

More »

Expand

Table 1.

Different refolding pathways for the entangled RD1 protein.

More »

Expand

Fig 5.

Folding pathways of the entangled RD1 protein below the folding transition temperature.

Log-scale histogram contour plot in the (Q, 〈G′〉) plane from 100 refolding trajectories for the RD1 protein at T = 0.9T_f. Histogram negative log-counts are smoothed using KDE (see the Ensemble definition and pathway classification subsection for details) and shifted in order for their minimum to be 0. Contour levels and the colour scale are the same as in Fig 3. Letters refer to the unfolded (U), folded (F), trap intermediate (IT), entangled intermediate with negative chirality (IE₋) states, and to the positive chirality configurations populated during refolding (IE₊). Representative snapshots for each state are shown with the same colour as in the upper left panel of Fig 1. In IT, the non-correct threading of the N-terminal portion (in blue) through the loop (in red) which eventually becomes entangled in the folded state F is apparent. The arrows show the different transitions observed: fast folding from U to F through IE₋, possible disentanglement from IE₋ to U, trapping from U to IT, folding by threading from IT to F, backtracking from IT to U.

More »

Expand

Fig 6.

Intermediate state ensemble contact maps for the entangled RD1 protein.

Probability of native contact formation for the entangled RD1 protein in the longer-lived IT ensemble (above diagonal) and the short-lived IE₋ ensemble (below diagonal). Both intermediate state ensembles are detected at T = 0.9T_f in Fig 5. The red arrows mark the location of the native contacts involving the first 6 N-terminal residues, which are likely to be formed in IE₋ whereas they are less likely to be formed in IT. Among the former ones, the “trap-avoiding” native contacts framed in red are those whose formation probabilities differ most in the two ensembles (see the Ensemble definition and pathway classification subsection for details). The shaded area framed in blue contains 28 out of the 30 natively entangled contacts, with G′ < −0.75. The native contacts framed in magenta are the “first-entangling” ones more likely to be formed in IE₋ (see the Ensemble definition and pathway classification subsection for details).

More »

Expand

Fig 7.

Examples of exponential fits to the contact formation probabilities for the entangled RD1 protein.

Two examples of native contact formation probabilities as a function of time, averaged over the 52 refolding trajectories that fold to the RD1 protein native structure at T = 0.9T_f through the fast channel, and of the corresponding exponential fits. Time is measured in MD steps. Both the average time series and the corresponding block averages (see the Exponential fit of contact formation curves subsection for details) are plotted (see legend). The block averages are fit to the exponential function reported in the legend and the resulting fits are shown in the plots. The fit parameters are A, the saturation value of the contact formation probability in the folded state, B, the gain of the former quantity in going from the unfolded to the final folded state, and k, the contact folding rate (See Exponential fit of contact formation curves section for details). Top row: native contact between V6 and E25; one of the “trap-avoiding” N-terminal thread contacts framed in red in Fig 6 and in the left panel of Fig 8. Bottom row: native contact between K23 and I37; one of the “first-entangling” contacts framed in magenta in Fig 6 and in the left panel of Fig 8.

More »

Expand

Fig 8.

Contact formation probabilities in the unfolded state and contact folding rates for both the entangled RD1 protein and the non-entangled SH3 domain.

The parameters p_U, the contact formation probability in the unfolded state U, and k, the characteristic rate at which this probability increases from p_U to its final value in the folded ensemble F, are shown for each native contact. p_U and its standard deviation are computed from 1500 unfolded configurations sampled with the same procedure used to select the initial configurations for refolding simulations (see Ensemble definition and pathway classification section for details). Contact folding rates k are obtained through exponential fits of the contact formation probabilities as a function of time, averaged over different refolding trajectories (see Fig 7 for specific examples of such fits). When not shown, standard deviations are smaller than the marker size for both observables (see Exponential fit of contact formation curves section for details). The colour scale refers for each contact to the |G′| of the corresponding loop. The darker the color the more entangled the loop. (a) RD1 protein. Only the 52 trajectories achieving refolding through the fast channel are considered in the average. The “trap-avoiding” and the “first-entangling” contacts identified in Fig 6 are framed in red and magenta, respectively. The position in the plot of the latter set is marked by the magenta arrow. (b) SH3 domain. All 100 refolding trajectories are included in the average.

More »

Expand