BOBA FRET: Bootstrap-Based Analysis of Single-Molecule FRET Data

Time-binned single-molecule Förster resonance energy transfer (smFRET) experiments with surface-tethered nucleic acids or proteins permit to follow folding and catalysis of single molecules in real-time. Due to the intrinsically low signal-to-noise ratio (SNR) in smFRET time traces, research over the past years has focused on the development of new methods to extract discrete states (conformations) from noisy data. However, limited observation time typically leads to pronounced cross-sample variability, i.e., single molecules display differences in the relative population of states and the corresponding conversion rates. Quantification of cross-sample variability is necessary to perform statistical testing in order to assess whether changes observed in response to an experimental parameter (metal ion concentration, the presence of a ligand, etc.) are significant. However, such hypothesis testing has been disregarded to date, precluding robust biological interpretation. Here, we address this problem by a bootstrap-based approach to estimate the experimental variability. Simulated time traces are presented to assess the robustness of the algorithm in conjunction with approaches commonly used in thermodynamic and kinetic analysis of time-binned smFRET data. Furthermore, a pair of functionally important sequences derived from the self-cleaving group II intron Sc.ai5γ (d3'EBS1*/IBS1*) is used as a model system. Through statistical hypothesis testing, divalent metal ions are shown to have a statistically significant effect on both thermodynamic and kinetic aspects of their interaction. The Matlab source code used for analysis (bootstrap-based analysis of smFRET data, BOBA FRET), as well as a graphical user interface, is available via http://www.aci.uzh.ch/rna/.


Introduction
Förster Resonance Energy Transfer (FRET), distance-dependent energy transfer via a long-range dipole-dipole interaction, occurs between a donor fluorophore and an acceptor, which is typically (but not necessarily) also a fluorophore [1]. FRET results in a decrease in both donor emission intensity and lifetime, as well as the appearance of acceptor fluorescence [2]. Monitoring FRET between a single pair of dyes (smFRET) attached to a biomolecule can resolve both static and dynamic heterogeneity within a sample, i.e. differences between molecules and time-dependent conformational changes of individual molecules, both of which would otherwise be hidden through ensemble averaging [3,4]. smFRET experiments are performed either on freely diffusing or surface attached molecules, the latter approach allowing for observation over an extended period of time. Technically, experiments with diffusing samples are implemented using a confocal microscope suitable for single-photon detection (timecorrelated single photon counting, TCSPC). Experiments involving surface-tethered molecules can also be conducted with the aforementioned confocal microscope setup [5], although a widefield or total internal reflection geometry is typically used for excitation, followed by detection with a CCD camera, resulting in time-binned FRET trajectories [6,7]. Statistical analysis of such time-binned data is the objective of this article.
As smFRET data are generated from the emission of single fluorophores, the signal-to-noise ratio (SNR) is generally an issue, and considerable effort has been geared towards the development of tools to analyze noisy time traces. Ideally, such tools should permit to determine the number of conformational states in the system, their relative occurrence, and the rates at which they interconvert [8]. Cumulated FRET histograms have proven useful for simple two-or three-state systems, in which the approximation of individual FRET distributions with a normal distribution leads to minimal discrepancies [2]. When there is no or minimal overlap between the FRET distributions, the relative occurrence of the states is quantified by defining arbitrary cutoff values between FRET distributions (thresholding, Figure 1) [9]. In the case of moderate overlap, multiple Gaussian fits are typically performed to extract quantitative information (Figure 1) [10]. Under these circumstances, dwell times, i.e. the time spent in a certain FRET state until a conformational change occurs, can also be easily determined by thresholding, typically followed by fitting the dwell time histograms to exponential decay models to extract the rates of conformational rearrangement (Figure 1) [11][12][13][14]. However, when the SNR deteriorates (short exposure times or fluorescence quenching) and/or the centers of FRET distributions come close (similar interdye distances or modest conformational dynamics), these straightforward approaches can no longer be sensibly applied (Rayleigh criterion, Figure 1). Noise in smFRET time traces can be reduced through smoothing, i.e. by averaging out the inherent noise of the data collection process and hence emphasizing the discrete nature of the FRET levels [15]. While linear rolling point averaging (also: moving or sliding averaging) is known to obscure transitions with dwell times shorter than the averaging window, the more sophisticated non-linear forward backward filter initially proposed by Chung and Kennedy and adapted by Haran partly overcomes this problem [16,17]. Nevertheless, it also tends to average out very brief excursions to conformational intermediates in our hands. Taylor et al. recently presented an implementation of wavelet shrinkage to denoise smFRET time trajectories (Figure 1) [18,19]. Here, the observed time series are transformed into a frequency component, followed by suppression of the noise assumed to lie within the high-frequency region of the transformation and inversion of the transformation that yields (in theory) a denoised dataset [18,20]. It should be noted, however, that noise and signal often overlap in smFRET data, and thus such transformations may lead to spurious oscillations close to the transition (Gibb's phenomenon) [21]. A further application of wavelet transformation is termed change-point identification and has recently been implemented to denoise smFRET data [22]. An extensive overview of strategies for noise removal in so-called piecewise constant signals (constant signal levels connected by abrupt transitions) has been given elsewhere [21].
Hidden-Markov modeling (HMM, Figure 1) was first applied on TCSPC data by Yang and Xie [23,24], and later utilized for analyzing time-binned FRET trajectories by the groups of Ha (''HaMMy'', [8]), Gonzalez Jr. (''vbFRET'', [25]), Herschlag (''SMART'', [26]), and Dillingham (''CSSR'', [27]), as well as groups from other research fields (''QuB'', [28]). Briefly, a Markov process is a sequence of state-to-state transitions, becoming ''hidden'' because of the experimental noise [8]. Consequently, HMM attempts to reconstruct the underlying time trace based on transition probabilities of a molecule from a state A to a state B, and emission probabilities, i.e. the likelihood of observing a FRET value when the system is in a discrete state l assuming the noise can be modeled by a given statistical distribution [10,29]. Different approaches have been employed to determine the exact number of states: (i) deliberate overfitting followed by model selection using the Bayesian information criterion (BIC) or the Akaike information criterion (AIC) [8,25,27], or (ii) a maximum evidence approach for both model selection and determination of the model parameters [25]. Hidden Markov approaches enjoy great popularity nowadays such that an extensive body of literature has been published on this topic, including implementations for short time traces [30,31] and multivariate HMM dealing with more than one time trace at a time [5,32]. Nonetheless, it should be mentioned that the basic assumptions do not always hold true for single-molecule processes (single-exponential kinetics, vide infra), especially when memory effects or large variations in folding kinetics are observed that go beyond the scope of classical kinetics [18,33].
With the cumulated histograms and/or the dwell times at hand, both the thermodynamic equilibrium and the kinetics associated with the conformational changes can be characterized. To this end, the corresponding error is typically estimated via the goodness of the fit to the data (GOF) [34,35]. The GOF reports on how well the model describes the experimental data and is mainly determined by the SNR. Important contributions to the noise are made by the stochastic nature of photon emission (shot-noise), background noise, electron multiplier noise, read-out noise, dark noise, resolution-induced noise [3,[36][37][38][39][40][41], as well as photophysical effects like quantum yield fluctuations and spectral changes or technical aberrations such as focal drift or fluctuations in laser intensity [3,41,42]. In turn, this approach neglects crosssample variability (differences between single molecules) as it relies on building an ensemble from all smFRET time traces at once. Single-molecule data are however known to frequently display intermolecular heterogeneities that may originate from limitations with regard to the observation time (photobleaching) or technical issues. These frequently manifest as pronounced differences regarding the relative population of conformational states, and as differences in the absolute FRET values observed between individual smFRET time traces (heterogeneous broadening) [33,43]. Consequently, approximation of the error by the GOF is expected to underestimate the variance at the expense of the robustness of data interpretation. It must be emphasized that precise estimation of the variance of the sample is crucial in order to assess whether a difference between different treatment groups is real or has occurred solely by chance, for example a change in the relative population of the conformational states in response to the addition of a small molecule. Such statistical testing has, to the best of our knowledge, not been reported in the field of singlemolecule FRET.
Pioneered by Efron [44], the bootstrap scheme is a resampling method to assess the accuracy of sample estimates that has since been applied in numerous branches of biological research including phylogenetics [45], environmental science [46], forcebased single-molecule biophysics [47,48], or molecular dynamics simulations in conjunction with smFRET experiments on freely diffusing molecules [49]. In bootstrapping, the distribution of the whole population, including measures of variance, is estimated from a sample distribution of the size n (n replicates) [51]. During the resampling process, N values of the sample distribution are randomly selected with an equal probability of 1/N and multiple selections are allowed (resampling with replacement) [50]. Typically, N = n to avoid pseudoreplication and the resampling procedure is repeated M times to compute the variance, where 100#M#500 is usually considered sufficiently robust in phylogenetic research, though more conservative approaches may involve several thousand rounds of bootstrapping [46].
To meet the challenge of making smFRET data analysis more robust, we have designed a software package called BOBA FRET (BOotstrap-BAsed analysis of smFRET data) to estimate the crosssample variability associated with time-binned smFRET measurements using Efron's bootstrap ( Figure 1) [44]. The program is freely available and its implementation is straightforward. Herein, we illustrate its workflow to perform both thermodynamic and kinetic analysis of smFRET data: First, the algorithm is shown to be compatible with well-established approaches to analyze smFRET time traces and characterize its robustness using a set of simulated data. Second, BOBA FRET is applied to an experimental dataset, the cation-dependent interaction of the exon-binding sequence 1 (d3'EBS1*) and the intron-binding sequence 1 (IBS1*), which are derived from a crucial part of the 59splice site recognition complex in the group II intron Sc.ai5c found in Saccharomyces cerevisiae ( Figure 2). With the bootstrapped errors at hand, we perform statistical hypothesis testing to assess whether cation-induced effects on interaction kinetics and shifts conformational equilibrium are statistically significant [51][52][53].

Materials and Methods
Simulations smFRET time traces were simulated for an intramolecular twostate system. First, discretized time traces were created under the assumption that state-to-state transitions are governed by singleexponential kinetics, followed by addition of Gaussian noise. Standard parameters were based on previous simulations and defined as follows: FRET A = 0.3 (undocked state), FRET B = 0.7 (docked state); SNR = 3.5 (average total intensity = 24.5 photons bin 21 s 21 ); SNR distribution width = 0; observation time = 4000 s; k docking = 0.1 s -1 , k undocking = 0.04 s -1 (average number of transitions = 114 per time trace) [8]. For each set of parameters, 100 time traces were analyzed, followed by an estimation of the cross-sample variability (vide infra). All simulations were performed using a home-built script written in MATLAB.

Oligonucleotides
The RNA sequence pair was derived from the exon-binding site 1 (EBS1) and the intron-binding site 1 (IBS1) found in the primary cox1 transcript in cerevisiae. They are referred to as d3'EBS1* and IBS1* according to the nomenclature used in previous studies ( Figure 2) [33,51]. Labeled oligonucleotides were purchased PAGE-purified from IBA AG (Göttingen, Germany) and additionally HPLC purified [54]. All chemicals were purchased from Sigma-Aldrichs (Buchs, Switzerland).

Data Analysis
smFRET movies were analyzed with a home-built Matlab software (Matlab version 8.20.701, license 49040, MathWorks, Nattick, MA). Briefly, the local level of background noise was determined and subtracted from dye emission profiles by creating a sub-image (20620 pixel), followed by calculating the mean photon count rate of the 20 darkest pixels within this area, a method to locally determine background noise adapted from the commonly used aperture photometry approach [3,59]. Fluorescence time traces were further corrected for leakage of Cy3 emission into the Cy5 channel (,7%, determined experimentally). Emission time traces were manually selected for anticorrelation and stable acceptor emission to calculate time-dependent apparent FRET efficiencies FRET(t) as.

Characterization of the Thermodynamic Equilibrium
To characterize the thermodynamic equilibrium, n individual FRET time traces FRET(t) i were binned to 1D histograms h i (FRET) using a binning interval of 0.01 FRET units, yielding m individual FRET bins. Subsequently, a normalized cumulated FRET histogram was created for all smFRET data recorded under identical imaging conditions: , i~1,2,:::,n and j~1,2,:::,m: While individual time traces may be inconclusive in some cases depending on the observation time, the conformational interconversion kinetics, the SNR and the complexity of the system, distinct FRET distributions will develop in the cumulated FRET histogram if discrete conformational species are present and resolvable [3]. The relative occurrence of these states was then quantified by thresholding or multiple Gaussian fitting (Eqs. (13) and (S5)). In threshold-based analysis, the occurrence is quantified by the integral over the area of the cumulated FRET histogram that is assigned to one conformation. For this purpose, the integration limits are defined as 2', th 1 , …, th n , +', where th refers to a threshold. Without a loss of generality, we defined the threshold value to distinguish two FRET distributions A and B as (FRET A +FRET B )/2, which corresponds to the midpoint between their centers FRET A and FRET B .
Characterization of the thermodynamic equilibrium was also performed using dwell times. The underlying principle is that the time the molecules spend in different discrete states can be directly used to infer the position of the conformational equilibrium. For d3'EBS1*/IBS1*, the docked fraction was used to calculate the association constants K a as described in the Supplementary Information S1 (Eqs. (S1) and (S2)). The approaches used to determine dwell times and subsequent processing steps are outlined in the next section.

Characterization of Kinetics
Dwell times were determined from individual time traces FRET(t) i via thresholding at (FRET A +FRET B )/2 or using the freely available software vbFRET [25]. In short, vbFRET employs a maximum evidence (ME) approach for model selection (the number of FRET states L), followed by inferring the model parameters (FRET values and transitions) by a combination of variational Bayesian expectation maximization and hidden Markov modeling (HMM) [60]. As their duration was unknown, the first and the last dwell time of each time trace were consistently discarded. Additionally, a weighted k-means algorithm was applied to transition density plots (TDP) created from the vbFRET data to cluster the coordinates (FRET before transition ; FRET after transition ) into k subgroups and assigned each transition to one of the k centers (, FRET before transition . k ,,FRET after transition . k ). The principle of k means clustering is illustrated in Figure S1 and is a well-precedented approach to cluster data that has been applied to heterogeneous HMM data previously [61,62].
For single-exponential state-to-state transitions occurring in a stochastic manner with rate constants that do not vary over time, k subgroups in the TDP correspond to L FRET states with k = L 2 2 L. The corresponding dwell times are in this case exponentially distributed [26]. Consequently, dwell times were binned to histograms that then were used to calculate the normalized cumulative probability distributions 1{normalized cumP, which were in turn fitted to exponential decay functions to extract the corresponding rate constants [11][12][13]. Here, single-and stretched exponential decays were used to approximate simulated and experimental data [14,34,63]: where O denotes the number of exponential decays (singleexponential: O = 1), a p is the amplitude, and t p the average dwell time in the conformational state (decay constant). The decay time t 1/e refers to the time required for 1{normalized cumP to drop to 1/e of its initial value and the stretching exponent b 0vbƒ1 ð Þ is a means to quantify the width of the rates distribution [64]. Both t p and t 1/e were used to determine the rate constants associated with conformational changes as described in the Supplementary Information S1 (Eqs. (S3) and (S4)).

Bootstrapping in Thermodynamic and Kinetic Analysis of smFRET Data
Following the conventions in the field, the variability of the data vector is assumed to be due to limited observation time, experimental noise, instrumental aberrations (heterogeneous broadening, vide supra), and irresolvable molecular motion [8,25]. Bootstrapping allows to characterize the data space of an ensemble of smFRET time traces, and thus, to quantify cross-sample variability and allowing for its application in statistical hypothesis testing.
Bootstrap samples were built for a multi-sample problem given by a random sample of n smFRET time traces, each of which is composed of a discrete number of time bins B {FRET(t) 1 , FRET(t) 2 , …, FRET(t) n }, observed from a completely unspecified probability distribution F according to Efron [44]. The ensemble of time trajectories were used to create the corresponding single molecule FRET histograms {h 1 (FRET), h 2 (FRET), …, h n (FRET)}.
Resampling was then performed with replacement, where each single-molecule time trace FRET(t) i has a probability of.
where N was set to n to prevent pseudoreplication [45]. It should be emphasized, that in the case of an equal length of the time traces (constant observation time, no photobleaching etc.) the probability simplifies to 1/n, i.e. each time trace and its corresponding histogram has the same probability of being selected (molecular weighting). Normalized cumulated FRET histograms of the bootstrap-based ensemble were calculated as: using a Monte Carlo method to approximate the bootstrap distribution with a random sample of the size N, the creation of bootstrap samples was repeated M times, yielding an independent random ensemble of bootstrap time traces FRET(t) 1 boba ,FRET(t) 2 boba , . . . ,FRET(t) M boba , as well as the corresponding histograms The bootstrap mean X X boba and the corresponding standard deviation s boba were estimated according to [50]: Here, X denotes the random parameter whose variability is to be estimated, for example the relative occurrence of a certain FRET population A l given by a certain state l in the thermodynamic analysis.
The bootstrap distribution of X boba~X FRET(t) boba ,F F À Á , depends on both the random sample FRET(t) boba and the sample probability distributionF F . X boba is expected to approximate the real underlying distribution X FRET(t),F ð Þwell, including its mean and standard deviation. In this study, we chose M = 100, following the conventions from other fields [45], because a timeconsuming increase of M would yield only moderate improvements ( Figure S2) [44]. It is important to emphasize that the noiseinduced fluctuation around discrete values in smFRET time traces is entirely time-independent (stochastic). This is not always the case for time series, which would then require more sophisticated mathematical treatments ( Figure S3) [44,65].

Bootstrapping and Regression (Method 1)
To estimate the bootstrap mean X X boba and the standard deviation s boba of the parameter X, we defined a reasonably general non-linear regression model: where g denotes a model function of the unknown parameter vector a approximating the data vector y (outcome variable) depending on x (input variable), both of which display the length m. The corresponding residuals e j follow the unspecific probability distribution e j , F. We fitted y based on a non-linear least square regression to estimate a [66]: which yields the sampling distribution ofâ a. Subsequently, bootstrap samples were generated according to Eqs. (5)- (7) and are henceforth referred to as y boba using the terminology of Eq. (9): Regression based on a non-linear least square criterion was performed in an analogous manner as in Eq. (10): Applying this procedure on M independent bootstrap samples yielded a random sampleâ a 1 boba ,â a 2 boba , . . . ,â a M boba that was used to estimate X X boba and s boba . These values were later used for analysis of variance (ANOVA) [66].
The non-linear regression model was then applied to the normalized cumulated 1D FRET histograms h h(FRET) to quantify the variability associated with the determination of thermodynamic parameters. According to the conventions of the field, different conformational states were quantified by multiple Gaussian fitting: where L denotes the number of states that was in our case determined beforehand using a maximum evidence approach (vide supra), even though other model selection approaches are conceivable [8,25]. A l refers to the respective amplitudes, b l to the center values, and s l to the width of the distribution. The ensemble of model parameters constitute the parameter vector a(A l ,b l ,s l ). The resulting regression model Eq. (11) for each bootstrap sample is defined as and according to the non-linear least square fitting procedure described in Eq. (12) we obtained the representationâ a k boba (A l ,b l ,s l ) of the sampling distributionâ a boba .
Second, we applied the bootstrap-based regression on 1normalized cumP distributions to quantify the variability associated with the analysis of kinetics (vide supra, ''characterization of kinetics''). The appropriate model function based on Eq. (3) was obtained through the maximum evidence algorithm, which samples the model space as well as the parameter space to find the most evident model and yields the number of components O [25]: .
Thus, the regression model Eq. (11) for each bootstrap sample was defined as: and a a boba : min Thus, we obtained the representationâ a k boba (a p ,t p ) of the sampling distributionâ a boba . Considerations regarding method 1 are summarized in Figure 3.

Bootstrapping and Averaging (Method 2)
The bootstrapping formalism described above was also applied in the analysis of the thermodynamic equilibrium using dwell times obtained by threshold-or HMM-based analysis of smFRET time traces. Here, each time trace FRET(t) i is composed of a number of m dwell times t i,j,l in a discrete state l. As a consequence, each bootstrap sample FRET(t) boba yields an average dwell time in a certain state.
where i = 1, 2, …, N accounts for the time traces of N different molecules and j = 1, 2, …, m for the dwell times in the state l.
Again, applying this procedure on M independent bootstrap samples yielded a random sampleâ a 1 boba ,â a 2 boba , . . . ,â a M boba that was used to estimate X X boba and s boba of the thermodynamic parameters. Here, we determined the relative occurrence of each state, as well as the equilibrium constant K eq or, in the special case of an intermolecular association of the type A+B « AB, the binding constant K a (Eqs. (S1) and (S2)). Considerations regarding method 2 are summarized in Figure 3.

Bootstrapping and Integration (Method 3)
Finally, we applied bootstrapping on normalized cumulated FRET histograms h h(FRET) in conjunction with thresholding.
Here, each bootstrap sample h h(FRET) boba yielded a threshold value (FRET boba, A +FRET boba, B )/2 which was used to quantify the relative occurrence of each FRET state as explained before. In an analogous manner, applying this procedure on M independent bootstrap samples allowed us to estimate X X boba and s boba of the relative occurrence of the FRET states. These values were later used for analysis of variance (ANOVA) [66]. Considerations regarding method 3 are summarized in Figure 3.
Resampling and fitting was done with the software package BOBA FRET that is freely available via http://www.aci.uzh.ch/ rna/. Please refer to the Supplementary Information (Figures S5  and S6) for an outline of the BOBA FRET user interface and the built-in routines for the analysis of thermodynamic and kinetic data.

Robustness of the Software and Simulated Data
The robustness of the algorithm and its compatibility with common approaches used for thermodynamic and kinetic analysis was assessed using a simple intramolecular two-state system. Normally distributed noise was added to simulated time traces that were varied in length, separation of the FRET populations, ratio of the rate constants associated with conformational interconversion, and SNR ( Figure S4).
Thermodynamic characterization of simulated smFRET data. The relative population of FRET states was quantified using four commonly used approaches: Gaussian fitting of normalized cumulated FRET histograms (method 1), the ratio of dwell times obtained by either thresholding or HMM (both method 2) [25], and fractional integration after thresholding of normalized cumulated FRET histograms (method 3), respectively. Figure 4A demonstrates that the estimation of the docked fraction becomes more accurate at longer observation times. At the same time, the bootstrap-estimated error scales inversely to the length of time traces. This is expected, as longer time traces yield more data points. Dwell-time-based methods perform poorly at short observation times, because the data before the first transition and preceding the last one are discarded. Importantly, the bootstrapped variability faithfully covers the theoretical values. Figure 4B shows the influence of FRET spacing (DFRET) on X X boba and s boba . In general, threshold-based approaches lead to a systematic downward shift of the estimated mean and estimations of cross-sample variability that do not cover the predicted values at low DFRET values. Similarly, HMM does not reliably distinguish the docked from the undocked state at DFRET ,0.1. In turn, Gaussian fitting provides good estimations of the docked fraction, albeit s boba is considerably more pronounced than for other methods at low DFRET values. The same trend is observed with decreasing SNR ( Figure 4C). As DFRET and SNR diminish, the two FRET distributions get closer, becoming indistinguishable in extreme cases ( Figure S4), explaining the bad performance of thresholding and why this approach should not be employed under these circumstances (Figure 1). HMM sets somewhat lower standards to the separation of the FRET distributions, though, it erroneously suggests equal population of both FRET states once it breaks down. Finally, even though the results of the Gaussian fits are biased by large error bars when the Rayleigh criterion is not fulfilled, the means are in excellent agreement with the theoretical values. Figure 4D illustrates how the mean docked fraction and the cross-sample variability depend on the ratio of rate constants. Here, only the docking constant k docking is increased, while k undocking is kept constant at 0.005 s -1 , leading to a decreased average number of FRET transitions per time trace. As Gaussian fitting does not rely on faithful determination of dwell times, it provides an excellent estimation of the mean docked fraction and low cross-sample variability. In turn, threshold-based histogram analysis, HMM, and in particular threshold-based dwell time analysis underestimate the docked fraction when the thermodynamic equilibrium favors one conformation. Careful analysis of the FRET distributions from simulated smFRET time traces revealed that at SNR 3.5, the noise exceeds the threshold at times, explaining issues associated with thresholding. This is particularly problematic in the case of threshold-based dwell time analysis, as the ratio of false and true transitions then becomes highly unfavorable. In turn, when a conformational state is very scarcely populated, the mean dwell time becomes shorter than the time resolution and HMM fails to identify two FRET populations. Figure 4E depicts the variation of X X boba and s boba depending on the width of a SNR distribution, i.e. assuming intermolecular heterogeneity with regard to SNR within one dataset. For this purpose, SNR was assumed to be normally distributed around 3.5 and the width of the Gaussian distribution was varied between 0 (no heterogeneity) and 4 (strong heterogeneity). Analysis of FRET histograms and threshold-based dwell time analysis systematically under-estimate the mean bound fraction by 3-5%, which is due to the overlap between the two FRET states (vide supra). In turn, HMM-based dwell time analysis yields mean values and crosssample variabilities that closely approach/cover the theoretical value in the case of narrow SNR distributions. However, as more low SNR time traces are included in the analysis, HMM perform increasingly poorly (vide supra). Interestingly, regardless of the method chosen for analysis, the estimation of the cross-sample variability remains mostly unaffected by a change in the width of the SNR distribution.
Kinetic characterization of simulated smFRET data. When smFRET time traces display ''discrete hops'', i.e. consist of piecewise constant signal, rate constants can be extracted from dwell time histograms (Figure 1) [21,26]. Here, bootstrapping is applied to dwell times obtained by thresholding and HMM, followed by fitting the experimental data to a single-exponential decay model (both method 1, Eq. (3), O = 1) [25]. Figure 5A demonstrates that cross-sample variability strongly decreases when the observation time is increased from 50 s to 5000 s. Again, this is not surprising, as the average number of dwell times per time trace is expected to be proportional to the observation time, which leads to a more homogeneous behavior between individual time traces. However, thresholding systematically underestimates the mean decay constant associated with docking and undocking, an issue that noise is frequently mistaken as a FRET transition at SNR = 3.5 (vide supra). This problem persists in HMM-based analysis, though, the algorithm proves more robust than thresholding. The dependence of X X boba and s boba on the number of dwell times is also depicted in Figure 5B, which shows the influence of the ratio of rate constants on the outcome of the dwell time analysis and the value of the bootstrapped error. Here, two effects lead to an underestimation of the decay constants: (i) Due to the (slight) overlap between the two FRET distributions at SNR = 3.5 there are short false dwell times stemming from noise (vide supra). As the ratio of k docking and k undocking increases the average number of true dwell times per time trace decreases, while the average number of noise-induced transitions is constant. Consequently, dwell times determined from individual time traces become more homogeneous as well, since false transitions are more and more emphasized. (ii) When the thermodynamic equilibrium strongly favors the docked state, brief excursions to the undocked state become irresolvable (vide supra). Figure 5C and Figure 5D show the influence of overlapping FRET distributions on the outcome of the kinetic analysis. In general, threshold-based analysis is strongly biased as the two FRET distributions display increasing overlap. As DFRET and SNR diminish, the two FRET distributions display increasing overlap and the thresholding algorithm erroneously responds to noise, explaining its bad performance. Furthermore, cross-sample variability decreases, as each time trace (erroneously) yields a very high number of dwell times. The behavior of the HMM algorithm is not as easily explained: When the two FRET distributions display very strong overlap, HMM-based analysis yields approximately equal estimations for both for docking and undocking decay constants. Under such sub-Rayleigh conditions, the FRET distributions are essentially indistinguishable and the HMM algorithm (which assumes Gaussian noise) will approximate each time trace with two equally populated Gaussian distributions. As the two FRET distributions become more distinct, HMM tends to considerably overestimate the decay constants, which can be explained by (i) less artefactual transitions, and (ii) more real transitions. At the same time, however, not all transitions are identified, generally yielding an overestimation of the time a molecule dwells in the docked or undocked state. Further improvement of the data quality finally leads to a correct estimation at DFRET .0.3 and SNR .3.5 and at DFRET .0.4 and SNR .2.5. Importantly, throughout these simulations, the bootstrapped standard deviation is not significantly affected. In conclusion, HMM turns out to be more robust than thresholding in response to increasing overlap, an observation that is in excellent agreement with earlier reports [8]. Figure 5E shows how a variation of the SNR within the same dataset affects the estimation of X X boba and s boba . In general the influence of a change in SNR distribution width on both estimators is negligible. Threshold-based analysis consistently under-estimates the values of the decay constants, which stems from the fact that the default signal-to-noise ratio of 3.5 leads to a considerable number of erroneously identified dwell times as described above. In turn, the results of the HMM-based analysis are in good agreements with the theoretical prediction.
Taken together, these simulations illustrate the importance of selecting the correct method to analyze FRET time traces, as the bootstrapping algorithm cannot make up for ill-defined input values. However, when an appropriate approach is chosen, the bootstrapped cross-sample variability generally covers the theoretically predicted mean. Future work is anticipated to develop objective criteria to accept/reject a given model for thermodynamic and kinetic analysis of time-binned smFRET data presented herein.

Application of the Algorithm to Experimental Data
Time-binned smFRET data have been recorded and analyzed from numerous biological systems varying in size and complexity. Here, we studied an important element derived from the 5' splice site recognition complex of the yeast group II intron Sc.ai5c, the sequence pair d3'EBS1*/IBS1* [51,53]. As depicted in Figure 2, Cy3-d3'EBS1* strands were tethered to the surface of a quartz slide passivated with biotinylated BSA, while Cy5-IBS1* molecules were free in solution. Hence, docking/undocking dynamics could be followed via FRET over several minutes and in the presence of different divalent metal ions, as splice site formation has previously been proposed to depend on the action of divalent metal ions [67]. FRET-typical anticorrelated changes in Cy3 and Cy5 emission intensity were observed in all cases, followed by calculating the FRET over time (Eq. (1)), which varied between zero (undocked) and a high FRET value (docked) for all dynamic molecules observed ( Figure 6A). The fraction of statically undocked molecules, i.e. molecules that displays only donor emission during the time of observation, was 60% in the absence of M 2+ and 20% in the presence of Ni 2+ or Co 2+ . This fraction of molecules either displays a low association constant K A that cannot be correctly resolved during the observation time and/or they correspond to a photophysical artifact, for example a docked IBS1* molecule with a non-emissive acceptor [42]. In fact, 15-55% of the total population is usually ''donor only'' in smFRET studies using the FRET pair Cy3 and Cy5, which has been attributed to Cy5 prebleaching [68]. As a consequence, these molecules were excluded from further analysis.
Divalent metal ions have a significant effect on the thermodynamic equilibrium. Bootstrapping was performed in conjunction with Gaussian fitting (method 1) and thresholding (method 3) of normalized cumulated FRET histograms ( Figure 6B, C). The thermodynamic equilibrium was also characterized using dwell times obtained by HMM (method 2) [25]. Threshold-based analysis reveals weak inter-oligonucleotide interaction in the absence of divalent metal ions (docked fraction: 7.862.6%, errors correspond to 3s boba unless specified differently, Figure 6C and Table 1). Addition of 1 mM Ni 2+ shifts the equilibrium slightly (docked fraction: 16.363.3%), while an average of 25.566.0% of all d3'EBS1* molecules are docked to IBS1* at 1 mM Co 2+ . One-way analysis of variance (ANOVA) using bootstrapped values was performed to test the hypothesis that divalent metal ions affect or do not affect (null hypothesis) the thermodynamic equilibrium [66]. As illustrated in Figure 7A, an ANOVA makes the assumption that experimental values are normally distributed around the sample mean and its outcome (Pvalue) depends on the overlap integral between different distributions, which in turn depends on the separation of group means and the widths of the sample distributions. P-values constitute a strength of evidence against the null hypothesis and are typically compared to arbitrary values (0.05, 0.01 and 0.001) according to the conventions of the field [66]. The presence of divalent metal ions not only significantly promotes the interaction of the two oligonucleotides (P,0.001), the effect also differs significantly between Ni 2+ and Co 2+ (P,0.001), the latter being much more effective in increasing the docked fraction ( Figure 7A). Similar results were obtained by fitting the averaged 1D histograms to two Gaussian distributions (Table 1), though the bootstrap-estimated errors are generally higher ( Figure 6C). However, this did not strongly influence the significance of the effect (P,0.001, data not shown). Thermodynamic analysis using dwell times (method 2) leads to a systematic shift of the mean docked fraction towards higher values and an increase of s boba (Table 1). Nevertheless, the results of all methods are generally in good agreement (Table 1).
Taken together, the results of histogram and dwell time analysis are in good agreement and demonstrate the significant role of low concentrations of divalent metal ions in shifting the thermodynamic equilibrium of d3'EBS1* and IBS1*. However, a systematic upward shift of the estimation of the docked fraction is observed that is most pronounced in the absence of divalent metal ions. These findings demonstrate that the dwell time approach has to be employed with care, especially when the biomolecule is poorly dynamic (60% of statically undocked molecules in the absence of M 2+ , vide supra) and/or the number of dwell times is rather low, two problems that are often linked. Indeed, the average number of dwell times per time trace was less than 4, which contrast the average value of the simulations carried out using standard parameters (114, vide supra). As the first and the last dwell time were not considered (vide supra), (i) much information was lost leading to an increase in the bootstrapped error and (ii) the occurrence of the more populated undocked state is underestimated translating into higher values of the docked fraction. Bias of dwell-time based approaches in the case of low numbers of dwell times can also be seen in the simulations ( Figure 5A).
Divalent metal ions significantly alter d3'EBS1*/IBS1* interaction kinetics. d3'EBS1*/IBS1* dissociation has previously been shown to display considerable kinetic heterogeneity in the presence of divalent metal ions [33,69,70]. As a consequence, a stretched exponential decay (Eq. (4)) was fitted to dwell times in the high FRET state, while a single-exponential decay (Eq. (3), O = 1) was used to approximate the association kinetics. Dwell times were determined from individual time traces using thresholding and HMM, followed by clustering of transition density plots using a weighted k-means algorithm (Figures 8A and S1) [5,25]. Then, cumulative probability plots cumP were created from dwell times, followed by fitting 1{normalized cumP plots to exponential HMM-based data on interstrand association is well described by single-exponential fit in the absence of divalent metal ions (Eq. (4), b = 0.99 data not shown) and the process was found to occur very slowly (t docking = 76.7622.2 s). Both the presence of Ni 2+ and Co 2+ accelerates this reaction, albeit to different extents (t docking = 28.466.1 s and 31.667.3 s). These metal-ion-specific effects are highly significant as shown by one-way ANOVA (P,0.001, Figure 7B). Importantly, the presence of divalent metal ions also induces slight broadening of the distribution of observed association rates (b(Ni 2+ ) = 0.95, b(Co 2+ ) = 0.95, data not shown), though the experimental data could nonetheless be satisfactorily approximated with the single-exponential fit (adjusted R 2 .0.98 in all cases). d3'EBS1*/IBS1* dissociation is fast in the absence of divalent cations (t undocking, 1/e = 7.061.9 s). Co 2+ significantly slows down the dissociation rate (t undocking, 1/e = 10.062.7 s, P,0.001), while the presence of Ni 2+ does not induce any variation in the decay constant (t undocking, 1/e = 7.061.4 s, Figure 7C). In agreement with previous observations, the distributions of decay constants are severely broadened (b ,0.9 in all cases), underscoring the kinetic heterogeneity of the undocking process. The results of the threshold-based analysis are generally in excellent agreement with the values obtained from fitting HMM-derived dwell times. However, the decay constant associated with docking in the absence of divalent metal ions display a difference of 70%. All results are summarized in Table 2.
These findings suggest that the presence of divalent metal ions broadens the distribution of rate constants associated with d3'EBS1*/IBS1* interaction. Based on the NMR structure and metal ion titration studies of the d3'EBS1* hairpin in the absence and presence of IBS1*, this effect has been assigned to heterogeneous occupation of metal ion binding sites along the  RNA [70]. Such kinetic heterogeneity is beyond the scope of conventional kinetics and has frequently been observed in singlemolecule experiments [33,43]. In the context of this paper, kinetic heterogeneity contrasts the basic assumption made in first-order HMM, i.e. that state-to-state transitions are governed by singleexponential kinetics. The ability to assign one FRET level to multiple Markov transition rates is therefore important, an important feature that is implemented in some HMM software packages (vbFRET, CSSR) but not others (HaMMy) [8,25,27]. Fitting exponential decay models to bootstrapped dwell time histograms also permitted to show that changes in both association and dissociation kinetics are highly significant. Taken together, Figure 8. Kinetic analysis of smFRET data (method 1). Docking is defined as the state transition from the undocked to the docked state, the undocking process is defined as the inverse reaction. (A) Transition density plots of HMM data show two clusters corresponding to the docking and the undocking reaction, respectively. According to the maximum evidence approach employed in vbFRET [25], a two-state system is therefore most likely to produce the experimental data, which is in agreement with the experimental design. Raw data were grouped via the weighted k-means clustering algorithm. Color code: occurrence in counts. (B-D) Dwell time histograms created from the normalized cumulative occurrence of dwell times in the docked and the undocked state as determined by HMM. The green lines correspond to a single-exponential fit to the experimental data, while the red lines represent a stretched exponential decay. Errors are indicated as a swath and correspond to 3s boba associated with the decay constants. doi:10.1371/journal.pone.0084157.g008 Table 2. Kinetic analysis of d3'EBS1*/IBS1* association and dissociation using different methods to extract dwell times.  Ni 2+ shifts the thermodynamic equilibrium chiefly by promoting the association rate, while Co 2+ plays a two-fold role as an accelerator of docking and as an inhibitor of dissociation, probably by mediating specific contacts between the two RNA fragments. This difference is surprising, as both metal ions share very similar ionic radii (Ni 2+ : 0.83 Å , Co 2+ : 0.79 Å ) and have the same preferred coordination geometry (octahedral, 6 ligands) [71]. Fits of threshold-and HMM-based dwell time data were generally in good agreement, except for docking in the absence of divalent cations. Careful analysis of HMM data revealed that brief excursions to the docked state were not always identified as such, especially when very few and short binding event occurred in the time trace (data not shown). Instead, the zero FRET distribution was erroneously identified as two distinct states. This observation contradicts the simulations and is most likely due to the fact that noise in experimental time traces does not always follow a stochastic Gaussian model ( Figure S3). These findings suggests that HMM approaches are not always the best choice for analyzing smFRET data, in particular when one conformation largely dominates the structural equilibrium and the occurrence of other structures may be erroneously deemed statistically insignificant by the HMM algorithm and non-Gaussian noise is fitted instead. As binding events became more frequent and/or long-lasting, HMM and thresholding were found to be in very good agreement.

Summary
Single-molecule FRET has led to valuable work on mechanistic and structural aspects of numerous biological processes and has blossomed in recent years. However, the observation time of single fluorophore emission is rather limited, as dyes typically photobleach upon emission of 10 6-10 7 photons (unpublished data involving Cy3 and Cy5 emission in the presence of an enzymatic oxygen scavenging system and 1 mM Trolox) [72]. Furthermore, the detected signal, intrinsically weak in intensity, is further broadened by various sources of additive noise and technical issues. As a consequence, single molecules typically display considerable cross-sample variability and can then not be treated as biological replicates in thermodynamic and kinetic analyses, i.e. rate and association constants cannot be inferred from individual smFRET time traces. In such cases, smFRET relies on the principle of ergodicity, according to which the properties of ensembles involving billions of molecules be described by combining a number of single molecules that is lower by several orders of magnitude [73]. Analogously, bootstrapping computes the distribution of the whole population, including measures of variance, from a sample distribution of the size n [51].
Herschlag and co-workers have recently recognized the need for statistical rigor in smFRET experiments and implemented an HMM algorithm that assigned confidence intervals to rate constants inferred from individual time traces [26,74]. Thus, one can investigate whether kinetically distinct subspecies exist within the sample, a long-standing topic of debate in the field of singlemolecule spectroscopy [33,43]. However, this approach sets very high standards to the data, as the confidence interval scales inversely to the number of transitions in the FRET time trace, and simulated time traces in the original article were composed of up to 5'000 dwell times [26]. Given the technical constraints outlined above, these values may be difficult to reach experimentally. Here, we have combined bootstrapping with different approaches commonly used in thermodynamic and kinetic analysis of smFRET data in order to estimate the variability associated with the mean values. By performing statistical hypothesis testing using generalized analysis of variance (ANOVA), we could show that divalent metal ions have a statistically significant effect on both thermodynamics and kinetics of d3'EBS1*/IBS1* interaction, a pair of RNA sequences involved in group II intron splice site recognition. Importantly, the fact that time traces were on average composed of only 4-6 dwell times was not problematic, since the overall data was treated as an ensemble according to the principle of ergodicity. We therefore believe that this approach is widely applicable and it is expected to make biological interpretations in smFRET experiments more robust when it is combined with statistical testing. Finally, it should be mentioned that the method described herein is not limited to time-binned smFRET data. We anticipate its implementation to analyze time traces stemming from single photon detection. A further potential application is the characterization of conformation and orientation dependent fluorophore photophysics (blinking, spectral and spatial diffusion) [75][76][77].
BOBA FRET was developed under Matlab version 8.20.701, license 49040 (Mathworks, Nattick, MA) and is available at http://www.aci.uzh.ch/rna/. Some of the data presented herein are provided for download as well. Figure S1 k-means clustering to assign dwell times to consistent FRET values for further processing steps. (A) Transition density plot (TDP) built from a set of HMM-discretized FRET time traces. The data points are iteratively assigned to one of the two centers according to their distance. The center coordinates are then recalculated according to the distances and occurrences (weights) of the clustered data point. The weighted kmean centers are assumed to be definitive when the set of clustered transition does not change after an additional round of iteration.  7) and (8) in the main text). (C) Setting the starting values and boundaries of the exponential decay function to be used for fitting. Mono-, bi-, tri-, and tetraexponential decays functions are implemented, as well as stretched exponential decays (Eqs. (3) and (4)