Hydrogen-Bond Driven Loop-Closure Kinetics in Unfolded Polypeptide Chains

Characterization of the length dependence of end-to-end loop-closure kinetics in unfolded polypeptide chains provides an understanding of early steps in protein folding. Here, loop-closure in poly-glycine-serine peptides is investigated by combining single-molecule fluorescence spectroscopy with molecular dynamics simulation. For chains containing more than 10 peptide bonds loop-closing rate constants on the 20–100 nanosecond time range exhibit a power-law length dependence. However, this scaling breaks down for shorter peptides, which exhibit slower kinetics arising from a perturbation induced by the dye reporter system used in the experimental setup. The loop-closure kinetics in the longer peptides is found to be determined by the formation of intra-peptide hydrogen bonds and transient β-sheet structure, that accelerate the search for contacts among residues distant in sequence relative to the case of a polypeptide chain in which hydrogen bonds cannot form. Hydrogen-bond-driven polypeptide-chain collapse in unfolded peptides under physiological conditions found here is not only consistent with hierarchical models of protein folding, that highlights the importance of secondary structure formation early in the folding process, but is also shown to speed up the search for productive folding events.


Introduction
The formation of contacts between pairs of residues in an unfolded polypeptide chain is one of the earliest steps in in vitro protein folding and is considered to determine the so-called protein folding speed limit [1]. Evidence exists for unfolded states being compact under native conditions [2][3][4][5][6][7][8][9][10][11], although it is unclear whether these states contain specific secondary structures [5,9] or whether compaction is a nonspecific hydrophobic-driven effect [10,12]. The former scenario is consistent with a hierarchical mechanism of folding [13][14][15], in which secondary structures that are local in sequence form first (e.g., the diffusion-collision model [16]). Non-specific compaction is non-hierarchical, requiring condensation for the formation of secondary structures (e.g., the hydrophobic-collapse [12,17,18] and nucleation-condensation [19] models). The hierarchical and non-hierarchical models may represent two extremes of a continuum of mechanisms [20,21], and the position of any given protein on the continuum may depend on, for example, the intrinsic propensity of its amino acid sequence to form secondary-structural elements.
The molecules studied here are glycine-serine (GS) based peptides. Due to their high chain flexibility, their solubility and the absence of a stable folded structure [23,25,27,30,52] these have been shown to be valuable model systems for studying end-to-end contact formation in ''unstructured'' polypeptide chains under native conditions, hence providing insight highly relevant to our fundamental understanding of the very first steps of protein folding. Recent FRET experiments and MD simulation have provided evidence for intrachain interactions in these systems in water [28,39]. However, whether and how the loop closure kinetics is affected by these interactions is still an open question.
Poly-GS peptides exhibit exponential kinetics for end-to-end loop-closure with time constants in the 10-100 nanosecond time range depending on the number of GS units [23,25,27,30]. The end-to-end loop closure rates in peptides with more than 10 peptide bonds exhibit power law scaling as a function of the number of peptide bonds. However, this scaling breaks down for shorter peptides, which exhibit slower rates than obtained by extrapolation of the longer-chain behaviour [25,27,30]. The origin of this ''rollover'' to slower kinetics is unclear. It has been suggested that the rollover is due to the shorter chains being intrinsically stiffer than the longer ones [27,53], although it has not been ruled out that the rollover might be an artefact due to perturbation by the extrinsic reporter system [33,35,36].
Here, a combined experimental and computational study is presented with a twofold aim: understanding the role of intrapeptide hydrogen bonds in the loop-closure kinetics and unveiling the origin of the observed rollover to slower kinetics for the shorter peptides. Loop-closure kinetics in GS peptides of various lengths, labelled with the oxazine derivative MR121 (the fluorescent dye) and tryptophan (the specific quencher) at the terminal ends (MR121-(GS) n W, with n ranging from 2 to 15), are investigated using fluorescence correlation spectroscopy (FCS) at the singlemolecule level with nanosecond time-resolution and MD simulation on the ms timescale. Excellent agreement is found between the simulated and experimental rate constants, allowing the loopclosure processes to be understood at atomic detail and the role played by intra-peptide hydrogen bonds to be determined.

Results/Discussion
End-to-end contact formation is characterized experimentally here by measuring selective fluorescence quenching of the MR121 dye by a tryptophan residue, the two groups being attached to the opposite termini of a series of poly-GS peptides [30]. The chemical structure of the MR121-(GS) 2 W peptide is depicted in Figure 1A. MR121/Trp contact formation and dissociation result in switching the fluorescence ''off'' and ''on'', respectively. The quenching process has been shown in previous work to be diffusion limited [30,40,54] and, thus, the underlying kinetics can be revealed by FCS [30].
To aid in interpretation of the experimental results a series of MD simulations of MR121-(GS) n W peptides with n~2, 3, 5, 7 and 9 was performed in explicit solvent at the same conditions as in the experiments, i.e., in aqueous solution at T~293 K. Four examples of the second-order autocorrelation functions, G(t), of the fluorescence signal I(t) (Eq. 1) obtained from the experiments and simulations are presented in Figure 1B. The time cut-off is given by the resolution of the FCS experiments (i.e., 6 ns). The profiles are in agreement, justifying the force field and simulation methodology applied.
Assuming a two-state model for the equilibrium between the fluorescent (open) and non-fluorescent (closed) conformational states the time constants (on timescales longer than 6 ns) for endto-end loop-closure (t z ) and opening (t { ) can be calculated from the relaxation time and the amplitude of the exponential function fitted to the data (Eqs. 2 and 3) and are reported in Table 1. Again, t z and t { show agreement between simulation and experiment and the t z can be compared to previous experiments, showing good agreement [27]. The long-time constants for opening (t { &30-80 ns) were found to originate from the dissociation of stacked geometries of the aromatic moieties of MR121 and Trp. Assuming that there is no faster process occurring (an assumption imposed by the time resolution of the experiment), the K values yield fractions of open conformations (K=(Kz1)) of &20-40%

Author Summary
In studies of protein folding evidence exists for early compaction in the unfolded state, although it is unclear whether these compact conformations contain specific secondary structures (through hydrophilic interactions) or whether compaction is a non-specific hydrophobic-driven effect. Here we combine single-molecule fluorescence spectroscopy and molecular dynamics simulation to demonstrate peptide hydrogen-bond-driven polypeptidechain collapse involving secondary structure formation as the key process in the early stage of folding. Partial structuring in unfolded polypeptide chains is shown to lead to faster contact formation kinetics than would be expected if the unfolded state were populated by featureless random-coils.
for peptides with n~2-9, both in experiment and simulation, reflecting the stability of the stacked structures.
Concerning the closing rates, experimental [24,27,30] and theoretical [31,53,55] work has shown that for random-coil chains a scaling law k z &N b exists for the end-to-end loop-closure rate constants as a function of the number of peptide bonds N, with b ranging from 21.5 to 22.1 [31,55]. The k z values for the MR121-labelled peptides are reported as a function of N in Figure 2 in a double-logarithmic plot. The rate constants for the MR121-(GS) 5 W and longer peptides show a power-law dependence resulting in a scaling law k z &N {1:4+0:2 , in agreement with the prediction for Gaussian chains. However, for shorter chains, in agreement with previous experimental results [25,27], the scaling law breaks down. It has been argued that the break-down reflects a pure peptide backbone property, i.e., stiffness of the shorter peptides [27]. However, the breakdown may, in principle, instead arise from perturbation due to the MR121 dye reporter system. Simulations of the same peptides but without the extrinsic MR121 dye attached to the chain-end allow the origin of the scaling breakdown to be understood. To this end, a series of simulations was performed for (GS) n W peptides with n~1, 2, 3, 5, 7, 9 and 12 but without the MR121 dye. Again, for these systems the autocorrelation functions G(t) were calculated and analysed. Corresponding time constants, and a description of the fitting procedure used, are reported in Table 2 and G(t) is reported in Figure 3A. The shorter peptides (n~1-3) do not show any relaxation process in the experimentally-detectable timescale, i.e., for times longer than 6 ns. Only faster relaxations, k 2,z~1 =t 2,z , below the experimental time resolution, are present (k 2,z are reported on the upper half of Figure 3B). Peptides with n §5 do possess slower closing rate processes, k 3,z~1 =t 3,z , which are within the experimental time resolution and can therefore be compared with experiment (k 3,z are reported on the lower half on Figure 3B). In addition to this experimentallydetectable relaxation, the longer peptides also possess a faster decay, k 2,z~1 =t 2,z (upper half of Figure 3B).
The k 3,z for the longer unlabelled peptides (n~5, 7, 9, 12) almost coincide with the experimental and simulation-derived k z of the labelled peptides (see lower panel on Figure 3B). Therefore, there is no detectable effect of the dye on the closing kinetics of peptides with more than 10 peptide bonds (in the Supplementary Information -Text S1 -evidence of the similarity between the open states in labelled and unlabelled peptides with more than 10 peptide bonds is provided, confirming that the agreement between the corresponding closing rates is not fortuitous). In contrast, however, the labelled peptides with n~2, 3 contain an experimentally-detectable slow component both in experiment and simulation, which is absent in the unlabelled n~2, 3 peptides. This is found to arise from open conformations stabilized by hydrogen bonds between the MR121 dye and the backbone (details are given in the Supplementary Information -Text S2).   Table 2. Loop-closure (t z ) and relaxation (t rel ) time constants for the unlabelled peptides The autocorrelation functions G(t) (Eq. 1) of the unlabelled peptides decay by &10-15% in the first 3 ps, in agreement with femtosecond-timescale spectroscopic data [29] and previous MD results [70]. Beyond 3 ps the curves are fitted using the following function: . Correlation coefficients were higher than 0.95 and the x 2 lower than 15. On the 3-500 ps timescale a complex (non-exponential) decay is observed (K 1 , t 1,rel ), again in agreement with experiment [29], corresponding to a distribution of relaxation times which is found to result from a range of processes, including rotation around single bonds and breaking and forming of intra-chain hydrogen bonds (space restrictions preclude a detailed description of these processes). On the nanosecond timescale one exponential relaxation (K 2 , t 2,rel ) for chains with n~2 and 3 and two exponential relaxations (K 2 , t 2,rel and K 3 , t 3,rel ) for chains with n~5, 7, 9 and 12 are observed. t 2,z and t 3,z are obtained using Eq. 3. Errors reported in parenthesis are one standard deviation as obtained by dividing each trajectory into two halves. doi:10.1371/journal.pcbi.1000645.t002 Therefore, the presence of the dye is responsible for the rollover observed for labelled peptides with less than 10 peptide bonds. The above results do not rule out that the shorter peptides might be intrinsically dynamically different from the longer ones. Indeed, a closer look at the closing processes on the faster, experimentally non-detectable, timescale reveals that the closing rate constants of the longer chains (n §5) show a power law scaling, but again the scaling breaks down for the shorter peptides with n~1, 2 and 3 (upper half of Figure 3B). The rollover to slower closing kinetics in the unlabelled peptides suggests that there is, indeed, an intrinsic effect for peptides with less than 10 peptide bonds.
A structural explanation for the experimentally-detectable closing kinetics and for the differences observed between the short (n~1, 2, 3) and long (n~5, 7, 9, 12) unlabelled peptides (i.e., the absence of k 3,z rate constants for the short peptides and the presence of a rollover in the k 2,z ) was found in an analysis of the peptide hydrogen bonds. While for the shorter chains (nv5) less  Table 2 (for clarity, the fit is shown only for the (GS) 5 W peptide -dashed black line). B) Corresponding loop-closure rate constants, k 2,z~1 =t 2,z and k 3,z~1 =t 3,z , are reported in the upper and lower half, respectively, as a function of the number of peptide bonds, N. For comparison are shown in black the k z of the labelled peptides evaluated from the experiment (circles) and simulation (open squares). Note that fitting the G(t) of the unlabelled peptides as in the experiment, i.e., in the 6-300 ns time range, yields k z values within the error of the k 3,z evaluated from the multiexponential fit described in Table 2. than 20% of the structures in the open state possesses intrabackbone hydrogen bonds, this value abruptly increases to almost 100% of the structures for peptides with n §5 (see Figure 3C). Moreover, 45%, 52%, 62% and 65% of the structures populating the (GS) 5 W, (GS) 7 W, (GS) 9 W and (GS) 12 W open states, respectively, contain short b-sheet segments, i.e., with two to six consecutive inter-strand hydrogen bonds formed. Hence, in peptides with more than 10 peptide bonds closure occurs from open structures possessing peptide hydrogen bonds, some of which are involved in secondary structure elements, and some not, whereas in peptides with less than 10 peptide bonds, closure occurs from open structures with no hydrogen bonds. The observation of a rollover to slower kinetics and the absence of intra-peptide hydrogen bonds for the shorter unlabelled peptides clearly show that there is an intrinsic stiffness in the short polypeptide chains.
Examination of the time dependence of the end-to-end distance and of the number of intra-peptide hydrogen bonds allows the contributions to the end-to-end closing process of the two kinds of peptide hydrogen bonds in open conformations (i.e., those involved in secondary structure formation or not) to be determined. Examples of the time series of the end-to-end distance and of the number of intra-peptide hydrogen bonds for the (GS) 5 W are given in Figure 4A and 4B, respectively -these plots are also representative of the longer peptides. Closing events with t 2,z &2 ns occur from structures with one or more peptide hydrogen bonds not involved in secondary structure, while the slower closing processes, with t 3,z &20 ns, occur from structures with multiple hydrogen bonds involved in short b-sheet segments (examples of these structures are shown in Figure 4A).
Average lifetimes of the hydrogen bonds were calculated using the hydrogen bond existence autocorrelation function, C(t), which is the probability that a given hydrogen bond which was intact at the initial time (t~0) is also found intact at later time, t [56]. The C(t) functions were grouped (and averaged) into two groups: one comprising hydrogen bonds involved in secondary structure formation and the other not (examples for the (GS) 5 W peptide are given in Figure 4C). Both C(t) are found to exhibit a fast relaxation on the picosecond timescale that is clearly nonexponential, and a slower, exponential, relaxation. The exponential process has relaxation times of &1-2 ns for the hydrogen bonds not involved in b-sheet formation, and 7.5 ns, 14.8 ns, 19.5 ns and 22.3 ns for peptides with n~5, 7, 9 and 12, respectively, for the hydrogen bonds involved in b-sheet structure.
The above results clearly show that the experimentallydetectable slow component of the end-to-end closing kinetics found exclusively in the longer peptides arises from the existence in the open state of transient b-sheet structures, the probability of occurrence and lifetimes of which increases with chain length. Before closing, the chains explore one or a few of these conformations, giving rise to closure dynamics on the 20-100 ns timescale. Interestingly, in the open structures not possessing bsheet segments, i.e., those from which the faster loop-closure events take place, a common backbone conformation is observed, namely the polyproline II, PPII (see Figure 5). These data confirm previous results [57] showing that the PPII is a dominant conformation in unstructured peptides.
Finally, the question arises to whether the presence of transient secondary structures in the unfolded peptides actually accelerates or slows down the loop-closure kinetics relative to the hypothetical system in which no hydrogen bonds, and thus no secondary structure, can be formed. To answer this question a 1.5 ms simulation of the (GS) 5 W peptide was performed under the same conditions as the previous simulations, but with all the charges of the backbone atoms set to zero (except for the two termini), thus rendering impossible the formation of intra-backbone hydrogen bonds. The end-to-end autocorrelation function G(t) (Eq. 1) was calculated also for this simulation and the resulting average closing times compared with those obtained for the reference simulation (see Figure 6). The average end-to-end loop-closure time in the nanosecond time scale increases by a factor of four (t 3,z~2 0:6 ns vs. t 3,z~8 5:2 ns) if no hydrogen bonds can be formed. This shows that the formation of hydrogen bonds accelerates the endto-end loop closure.
To address the possible origin of the acceleration of the closure kinetics by the formation of intra-peptide hydrogen bonds, further analyses were performed. The probability-density-based free energy profile along the end-to-end distance is calculated from simulation for the unlabelled (GS) 5 W peptide and for the corresponding uncharged-analog (see Figure 7). Both free energy landscapes show two minima, one corresponding to closed conformations (at around 0.4 nm) and the second to open, compact structures (at around 0.7-0.8 nm). The free energy barrier on going from the open-to the closed-state is much smaller for the (GS) 5 W than for the analog, being &1:5 and &4 kJ/mol, respectively. A lower barrier leads to faster closing kinetics, as indeed was found here. The effect of hydrogen bond formation on the relative stability of closed and open, compact conformations is, hence, to lower the free energy barrier to closure.
The present results point to a kinetic role played by intrapeptide interactions in driving the end-to-end contact formation in small-to medium -sized polypeptide chains. In previous studies it has been shown that a high percentage of proteins have their N-and C-terminal elements in contact, more than expected on a random probability basis [58][59][60]. A possible rationale for this bias was found in structural and functional aspects, rather than in the kinetics of folding. For example it was suggested that the terminal regions stabilize tertiary and quaternary structure to provide a framework for the active site [58] and that the N-C motif was evolutionarily selected for some functional advantage and is  . Effect of intra-backbone hydrogen bonds on the loop-closure kinetics. Autocorrelation functions, G(t), evaluated from simulation for the unlabelled (GS) 5 W peptide and for a (GS) 5 W-analog that was simulated in the same conditions as the (GS) 5 W peptide, but with all the charges of the backbone atoms set to zero (except for the two termini). The curves were fitted as described in the caption of Table 2. doi:10.1371/journal.pcbi.1000645.g006 therefore now built into the structural design of many proteins [60]. This can be contrasted with the present work that points to a kinetic role played by intra-peptide interactions in driving the endto-end contact formation in peptides.

Conclusions
Understanding the loop-closure dynamics of unfolded peptides provides valuable insight into early steps in in vitro protein folding. Here, loop closure of poly-GS peptides is characterized by combining fluorescence correlation spectroscopy with atomistic molecular dynamics simulation.
The experimentally-derived end-to-end loop-closure rate constants are found to decrease with increasing chain length in longer peptides (N §10), while they become almost independent of chain length for the shorter peptides, as has been previously observed in other experiments that make use of extrinsic probes [25,27]. Analysis of the simulations reveals that the observed rollover at short chain lengths is due to a perturbation by the extrinsic reporter system. The experimental rate constants of the short chains are found to be determined by transitions to the closed state from open-state conformations containing hydrogen bonds between the MR121 fluorophore and the backbone. However, for peptides with Nw10 negligible perturbation of the chain dynamics on the experimentally-detectable timescale by the reporter system is seen, as demonstrated by the very good agreement between loop-closure rate constants in peptides with and without the dye reporter system and by the similarity of the corresponding open states.
These results resolve the existing ambiguity regarding the experimentally-determined rollover at short chain lengths in favour of a perturbation effect by the extrinsic reporter systems. Nevertheless, evidence for an intrinsic stiffness of the shorter chains is also provided. The observation of a rollover to slower kinetics and the absence of intra-peptide hydrogen bonds for the shorter unlabelled peptides (i.e., the GS repeats without the extrinsic MR121 dye attached) clearly show that there is, indeed, an intrinsic stiffness in the short polypeptide chains.
The MD simulations allow the dynamical processes driving the end-to-end loop closure to be determined. The nanosecond closing time constants for peptides containing more than 10 peptide bonds correspond to transitions to the closed conformations from open state configurations possessing intra-backbone hydrogen bonds with a broad range of lifetimes. As the chain length increases, the probability of formation of b-sheet elements increases, leading to the experimentally-detectable length-dependent end-to-end loopclosure time constants on the 20-100 ns timescale, which are determined by the lifetimes of the secondary-structural elements. Early secondary-structure formation in unstructured chains, as found here, is in principle not restricted to b-sheet formation and could also, possibly, involve formation of a-helices, depending on the aminoacid sequence.
The scaling with length of the loop-closure rate constants for chains with more than 10 peptide bonds is found here, as well as in previous studies [24,25,27], to be consistent with predictions for Gaussian chains. However, again in line with previous work [34,39,61], the presence of partial structuring in unfolded states found here shows that random-coil statistics are not a unique signature of structureless polypeptide chains.
Partial structuring of unfolded states of peptides and proteins has potentially dramatic consequences for the thermodynamics and kinetics of folding [15]. The results presented here reveal structuring in unfolded polypeptide chains driven by backbone hydrogen bonding, also involving transient (of the order of few nanoseconds) b-sheet segments. What is most striking, however, is the finding that formation of these peptide hydrogen bonds accelerates end-to-end contact formation by lowering the free energy barrier to closure. In an unfolded polypeptide chain this corresponds to the acceleration of the search for ''productive'' folding contacts between distant residues. Structuring in poly-GS peptides found here is thus not only consistent with hierarchical models of protein folding, that highlight the importance of secondary structure formation early in the folding process [13][14][15][16], but is also shown to speed up the search for productive folding events.

MD simulation protocol
MD simulations of a set of MR121-(GS) n W peptides (n~2, 3, 5, 7 and 9) in water were performed with the GROMACS software package [62] and the GROMOS96 force field [63]. Partial atomic charges for the dye MR121 were taken from Ref. [54]. One peptide molecule was solvated with water and placed in a periodic rhombic dodecahedral box large enough to contain the peptide and at least 1 nm of solvent on all sides at a liquid density of 55.32 mol/l (1 g=cm 3 ) (the starting peptide conformation was taken at the end of a 10 ns-long MD simulation in explicit water in which the peptide was initially modelled in an extended conformation). Water was represented with the simple point charge (SPC) model [64]. Simulations were performed at the experimental temperature of 293 K in the NVT ensemble and isokinetic temperature coupling [65] was used to keep the temperature constant. The bond lengths were fixed [66] and a time step of 2 fs for numerical integration was used.
Periodic boundary conditions were applied to the simulation box and the long-range electrostatic interactions were treated with the Particle Mesh Ewald method [67] using a grid spacing of 0.12 nm combined with a fourth-order B-spline interpolation to compute the potential and forces in between grid points. The real space cut-off distance was set to 0.9 nm. The C-terminal end of the peptides was modeled as CO { 2 consistent with the experimental pH of &7 [30]. No counter ions were added since the simulation box was neutral (one positive charge exists on the MR121). Free energy profile along the end-to-end distance calculated from simulation for the unlabelled (GS) 5 W peptide (black) and for the uncharged (GS) 5 W-analog (red). The errors bars correspond to one standard deviation obtained from 2 independent trajectories. doi:10.1371/journal.pcbi.1000645.g007 A second series of simulations was performed for the unlabelled (GS) n W peptides (n~1, 2, 3, 5, 7, 9 and 12) under the same conditions as the labelled peptides. For these simulations the MR121-dye was replaced by an N-terminal NH z 3 group. Simulation lengths of the different systems are 1.2, 1.5, 2.5, 3.2 and 3.8 ms for MR121-(GS) n W peptides with n~2, 3, 5, 7 and 9, respectively, and 0.6, 0.8, 1.0, 1.9, 2.5, 3.3 and 4.0 ms for (GS) n W peptides with n~1, 2, 3, 5, 7, 9 and 12 respectively. The total number of atoms in the simulation box varies between 1366 and 8643, the number of water molecules between 443 and 2821 and the volumes between 13.3 and 84.7 nm 3 depending on the peptide length.
To test the dependence of the sampled backbone conformations on the force field used, two additional simulations of 500 ns of the (GS) 3 W and (GS) 5 W peptides were also performed with a different force field, namely the OPLS-AA [68] force field. Agreement between the two force fields is found in the hydrogen bonding properties, namely &35% of the structures populating the open state of the (GS) 5 W peptide contains short b-sheet segments, while these are absent in the shorter (GS) 3 W peptide.
Details on the experimental methods, setup and some of the results are reported elsewhere [30].

Data analysis
The relaxation process of the radiative emission of a fluorescent probe can be analysed via the second-order autocorrelation function of the fluorescence signal I(t) [30]: where the angle brackets denote average over all starting times. Assuming an all-or-none transition between the fluorescent and non-fluorescent states, the following model was used to fit both the experimental-and simulation-derived autocorrelation functions in the labelled peptides: where K~k z =k { is the equilibrium constant between the open and closed states and t rel~( k z zk { ) {1 is the mean relaxation time. From K and t rel , as obtained by Eq. 2 in both the experiments and simulation, average opening, t { , and closing, t z , times can be derived as follows: The criterion for quenching employed in analysing the present simulations is that a non-fluorescent state occurs when the minimum distance between an atom of the conjugated rings of the MR121 and an atom of the rings of the Trp is ƒ0:45 nm (I(t)~0) while the state is fluorescent otherwise (I(t)~1). For the non-labelled peptides the minimum distance between the Trp (the rings and the C-terminal CO { 2 group) and the N-terminal NH z

Supporting Information
Text S1 Characterization of the open state in labelled and unlabelled peptides.