Guidelines for the Fitting of Anomalous Diffusion Mean Square Displacement Graphs from Single Particle Tracking Experiments

Single particle tracking is an essential tool in the study of complex systems and biophysics and it is commonly analyzed by the time-averaged mean square displacement (MSD) of the diffusive trajectories. However, past work has shown that MSDs are susceptible to significant errors and biases, preventing the comparison and assessment of experimental studies. Here, we attempt to extract practical guidelines for the estimation of anomalous time averaged MSDs through the simulation of multiple scenarios with fractional Brownian motion as a representative of a large class of fractional ergodic processes. We extract the precision and accuracy of the fitted MSD for various anomalous exponents and measurement errors with respect to measurement length and maximum time lags. Based on the calculated precision maps, we present guidelines to improve accuracy in single particle studies. Importantly, we find that in some experimental conditions, the time averaged MSD should not be used as an estimator.

Since cellular environments are complex microscopic systems with a strong thermal component [17], the motion of single particles, even if directed, incorporates a random diffusive component, which must be characterized in order to build a physical picture of the system [18][19][20][21][22][23][24]. A common tool by which the diffusion of a single particle is classified is the time averaged mean square displacement (TAMSD) [14][15][16][17][25][26][27][28][29][30][31]: defined here for a trajectory x(t) of length L, taken at sampling time-intervals δ and the averaging window is τ = nδ. For normal diffusion (not necessarily Brownian or Gaussian [32]) the MSD is linear in time d 2 ðtÞ ¼ D 1 t, where D 1 is the generalized diffusion coefficient which includes all constant prefactors, depending on the diffusion mechanism. The TAMSD may be of any functional form, but in many cases it is a power law function over long times, d 2 ðtÞ ¼ D a t a [33,34]. The anomalous exponent α is related to fundamental characteristics of the stochastic process, such as temporal correlations and the distribution of particle steps and it is necessary for predicting the future particle motion, first passage times and more [35].
There are various classes of anomalous diffusion and they all result from the breaking of the assumptions behind normal Brownian diffusion, see [36] for a recent review. Continuous time random walks (CTRW) which have long tailed jump distributions or waiting times between jumps exhibit weak ergodicity breaking of a normal TAMSD. Variation in the surrounding space may lead, among other models, to heterogeneous diffusion processes (HDP) and obstructed diffusion, both with unique characteristics. If the stochastic process is not Markovian and there is a temporal correlation between steps, another class of anomalous diffusion is exhibited. Fractional Brownian motion (FBM) for example has self-similar Gaussian steps with a correlation that decays as a power law. A general description of processes with temporal step correlations can be obtained through the ARFIMA framework that generalizes fractional dynamics through a discrete generating process [37].
The TAMSD is normally fitted through the logarithm of eqn. 1 as a function of τ up to a maximal τ M , Fig. 1: Several studies have shown that the TAMSD is a problematic estimator [25,38,39]. The internal correlations between the averaged quantities merit the central limit theorem inapplicable and large variations are introduced with increasing τ. In addition, measurement errors lead to short time artifacts in the estimated TAMSD. For example, when a normally distributed measurement error (with variance σ 2 and zero mean) is introduced to ergodic anomalous diffusion measurements, the theoretical TAMSD is [40,41] For a discussion of the influence of various error mechanisms on the TAMSD of CTRW diffusion, see [42].
For normal diffusion various alternative and complementary techniques have been developed [39,43] that overcome these problems. In addition, anomalous diffusion can be efficiently estimated when an ensemble of trajectories is available [41]. However, when analyzing single trajectories of particles that exhibit anomalous diffusion, these techniques are inadequate and one is left with the direct estimation of the functional form of the TAMSD.
When analyzing experimental data, one has a limited trajectory length and for single particle trajectories, it is often shorter than 10 3 time points. This raises another fundamental problem in implementation of eqn. 2. Since the variance of the TAMSD increases with τ, taking large τ M reduces the accuracy of the estimation. However, since the data is limited, the MSD also fluctuates at small τ values. In addition, as seen above, measurement errors introduce an offset at small τ's. Thus one must find an optimal τ M that balances between the need to fit several τ's in eqn. 2 while avoiding the fluctuating nature of the TAMSD at large times. We stress that simply taking very small or large τ M values does not improve the estimation of α, as can be seen in Fig. 1.
To the best of our knowledge, there is no systematic study of the optimal τ M value for the estimation of the anomalous exponent in the presence of measurement errors. As a result, there are no standards or guidelines for fitting the TAMSD, which introduces difficulty in assessing the accuracy and precision of extracted values and comparison between studies. Furthermore, we show that the data analysis can be optimized by realizing the specific experimental conditions.
In what follows we study the performance of the TAMSD as an estimator for the anomalous exponent, depending on trajectory length, measurement error and the true anomalous exponent. This is done through the simulation of thousands of trajectories and fitting their individual TAMSDs. We study FBM diffusion, which we chose as an experimentally observed motion and a representative of the common class of ergodic anomalous diffusion [24]. We calculate the accuracy and precision of the TAMSD estimator as a function of the maximal fitted time lag, τ M , for different combinations of the diffusion parameters. The results allows us to identify an optimal M in each case and by taking all the extracted information together, we identify several guidelines, or best practices, for fitting of anomalous TAMSDs. We find that even a rough estimation of the measurement error and the expected regime of the anomalous exponent can greatly improve the accuracy of the extracted parameters.
Our approach can be applied to any process with a defined α that one can simulate in order to find the best estimation conditions, even if D α varies between trajectories such as in CTRW or HDP. Although we focus on the more difficult experimental case of short trajectories, our general guidelines apply also for longer trajectories.

Methods
Trajectories {x i (t)} were simulated using the MATLAB wfbm function [44] which is a common method for the simulation of fractional Brownian motion through a wavelet implementation, as proposed in [45]. In addition, we normalized the standard deviation of the increments for each trajectory to one, so that D α = 1. Notice that FBM has stationary Gaussian increments, so normalizing the standard deviation uniquely defines the stochastic process for a given α.
A series of independent normally distributed measurement errors {ε i (t)} with zero mean and standard deviation σ was added to each trajectory. Since all trajectories were normalized, the relative magnitude of the measurement error compared to {x i (t)} is set only by σ. Also, note that for any uncorrelated measurement noise distribution that has a defined second moment, the magnitude of σ is enough to characterize its effect on the TAMSD.
For each pair of α and σ we study trajectories of length L = 10 to 2000. For each trajectory we fit the TAMSD according to eqn. 2 for all possible τ M up to L/2. We then repeat the calculation of the TAMSD and its fitting for 1000 trajectories for each L and τ M . Thus for each (α, σ) pair we have a set of α i (L, τ M ) with i = 1, . . . , 1000 for every (L, τ M ) combination.
We are now faced with the problem of identifying what is a 'good' fitting regime. One approach is to characterize the distribution of P(α i ) for each (L, τ M ) pair in each (α, σ) mapping. Then, one can estimate the probability of the fitted value to fall in a certain range around the true anomalous exponent. However, this approach is problematic as P(α i ) is not necessarily normal. In fact, past studies have shown that the distribution of hδ 2 (τ)i is highly non Gaussian [46,47], leading to similar expectation for P(α i ). As a result, analytically estimating probabilities will demand the characterization of general distributions.
Thus we take a different, more applicable approach where for each (L, τ M ), we extract the fraction F (α, σ) (L, τ M ) of α i that are in the range [α − 0.1, α + 0.1]. We chose these limits since they provide reasonable accuracy in biophysical studies while maintaining reasonable F values for different (α, σ) maps. F is an intuitive parameter for the precision of the fitting, as higher values mean more precise fitting.
In some cases, one can extract multiple trajectories of the same stochastic process. This is for example the case in various simulation studies. Thus by averaging over fitted single particle α i 's, one may hope to converge with hα i i to α. We define the bias as B (α, σ) (L, τ M ) = hα i i (L, τ M ) − α. This bias is a measure of the accuracy of the MSD estimator. Fig. 2 shows a heat map of F with contour lines of the bias B for each measurement length L and τ M . As observed, it is easy to find the optimal τ M for fitting, i.e. optimal F and B conditions. For example, for a thousand time point trajectory in the weakly subdiffusive regime (α = 0.7) with σ = 0.5, we find a maximal F % 0.63 for τ M = 50. In addition, 0 < B < −0.1 gives reasonable results for averaged TAMSDs. However, if the trajectory is only 100 time points long, it is best to use τ M = 10, giving F % 0.36 and −0.1 < B < −0.2.

Results
We recommend the extraction of the optimal τ M for each experiment depending on the exact conditions. Table 1, however, gives a quick look-up table for optimal τ M for L = 100 and 1000 depending on α and σ and can be used to quickly analyze experimental data.
We now describe several trends in the maps of F and B. The two fundamental observations are that lower α or σ usually give better estimation results with the TAMSD. This is expected according to equation 3 in [41], which shows that the estimation error is less significant at smaller α and σ values. Beyond this first order behavior, however, F and B show a rich picture depending on α, σ and L. Small measurement error-when the experimental error is much lower than the average diffusion step, i.e. σ = 0.1, the small τ error of the TAMSD disappears, eqn. 3. In small τ's there is less overlap between squared displacements leading to lower variation of the TAMSD. Indeed, for almost all α maps with σ = 0.1, we found that the best τ M = 10, regardless of L (Fig. 2 left column). The one exception is for strongly subdiffusive motion, where τ M = 20 is needed for large L's.
In addition, a monotonous increase in optimal F is seen from a typical 0.4 when L % 50 to F ! 1 for L ! 10 3 . The typical bias, B, is also usually better than −0.05, except for highly superdiffusive motion where 0 > B > −0.05 only for L > 10 3 . Thus in the regime of weak experimental error, TAMSD fitting of the first few τ can give good estimation of anomalous exponents. It is important to notice that for α = 0.3, B is positive, unlike other α values.
Medium measurement error-In the case of σ = 0.5, i.e. when the typical step size is twice the measurement error, the optimal τ M changes with L, Fig. 2 middle column. With the exception of α = 0.3 we find that the best F is obtained when taking τ M at 10-20% of short trajectories and 4-7% of long trajectories (higher values are for higher expected α). The values of F are lower than in the low localization error regime by a typical 0.2. In addition, caution should be used when averaging short trajectories, L < 10 2 as bias can reach values worse than −0.2 for superdiffusive motion.
Interestingly, strong subdiffusive motion can be analyzed with τ M = 10 to give better results than when σ = 0.1. This is possibly due to the measurement error lowering the extracted  exponent and preventing high α values. Notice that if one uses the optimal τ M that was found for α > 0.3, excellent results are still received for the strong subdiffusion case. Large measurement error-When σ is the same size of the average particle step, accurate estimation of the anomalous exponent is hindered, Fig. 2 right column. For short trajectories, F values are approximately 0.2 and B −0.3. Thus if the measurement error is large, one should not estimate TAMSDs of short lengths L 300. Even for L % 10 3 we find values of F % 0.4 with biases that can approach −0.15.
In fact, for α = 0.7 it is better to sub sample an L = 2Á10 3 trajectory every 7 time points giving an effective trajectory of L = 285 and σ = 0.5. Analysing this shortened trajectory with τ M = 34 gives F = 0.49, compared to an optimal F = 0.42 obtainable from the original trajectory.
For strong subdiffusion, we find that best results are received when τ M is 20% of L. However, the bias is still significant with B % −0.1 for most conditions.

Discussion
After identifying the trends and pitfalls in F and B, we now discuss the best practices for anomalous exponent estimation with the TAMSD. It is clear that with more knowledge regarding the regime of the anomalous exponent and the measurement error, a better decision of τ M can be taken. We divide the recommendations into the following cases: a) perfect knowledge of σ and no necessary knowledge of α; b) approximate knowledge of both σ and α; and c) unknown σ. Finally we discuss the implications of having repeated realizations of the same process.
Case a: Perfect knowledge-If there is perfect knowledge of σ than a simple correction can be performed to bring the trajectory into the σ ! 0 regime. Simply, for the analyzed TAMSD one should fit b d 2 ðtÞ ¼ d 2 ðtÞ À s 2 to a power law. Even if there is no knowledge of the expected α, a limit of τ M = 10 when fitting b d 2 will give the best results. Notice that if α is known to be strongly subdiffusive, it may be beneficial to take a slightly larger τ M .
Case b: Approximate knowledge-When the magnitude of the measurement error is only approximately known, the correction performed in case (a) will leave some residual σ > 0. If b d 2 ðt ¼ 1Þ > s 2 > 0 we are in the regime of medium error. In such a case, knowledge regarding the expected α regime will help select the optimal τ M .
It is important to notice, that even if some measurement error is suspected but actually σ < < 0.5, the recommended τ M values will not lower the expected F. Rather, F will usually increase with reduction in σ even for sub optimal τ M . The benefit of knowing that σ < < 0.5 is that one can take even more efficient τ M values.
Case c: Unkown σ-If there is no estimation of the measurement error, extraction of the anomalous exponent can lead to significant errors. Specifically, for short trajectories (L 300), the possibility that σ ! 1 leads to an inability to estimate α, unless the process is strongly subdiffusive (i.e. α 0.3). Since it is possible that σ > 1, F may be even lower than in the cases studied in this work. Thus, if the magnitude of σ is unknown, it is advised not to perform estimation of trajectories unless L ! 10 3 , and only if it can be assumed that σ is not significantly larger than unity.
For this reason, we advise that in all TAMSD studies an estimation of the measurement inaccuracy be given. Without this estimation, or the clear statement of its lacking, it is impossible to assess the anomalous exponent results.
Multiple identical realizations-In some studies, it is possible to extract many trajectories of the same stochastic process, where the underlying α is identical or comes from a narrow distribution around an average value. In such cases, one may average over many instances of the process, and F becomes irrelevant.
However, as we have seen, in cases of high measurement inaccuracy, B is still significant for many L's. It is thus necessary to correct for the bias by adding an expected error factor to the extracted average exponent, hαi. Another option is to study the large τ behavior of the particle averaged TAMSD hd 2 ðtÞi, in the domain that is not affected by the measurement error or fit the particle averaged TAMSD directly to eqn. 3.
In biological and complex systems, this is usually not the case, and P(α) is widely distributed (a standard deviation of σ α = 0.2 is considered wide). If the distribution can be approximated by a normal distribution, than hd 2 ðtÞi can be analyzed with previous techniques [41]. Otherwise single particle analysis is needed and the typical bias, B(L, σ, α) should be added to all trajectories based on L and some estimation of the anomalous diffusion regime.
Special care should be taken when estimating weakly non ergodic processes as the diffusion coefficient varies between trajectories, thus changing the relative size of σ [36,48]. Since varying relative σ values leads to a varying bias in α, one will suffer a varying bias for each trajectory. Hence it may appear that α is distributed-in contradiction to the expectation for HDP and CTRW. Thus when characterizing weakly non ergodic processes with the TAMSD, one must strive to know the magnitude of the measurement error precisely.

Conclusions
We have studied the efficiency of the MSD technique in the estimation of the anomalous exponent depending on the various underlying parameters. The main picture that arises is that the TAMSD is not an efficient technique when looking at short trajectories, or superdiffusive processes with non ideal measurements. When analyzing measured trajectories it is important to estimate beforehand the measurement error and the expected regime of the anomalous exponent. Then one must choose the maximal time lag, τ M , based on the efficiency of the MSD estimator and not according to a visual fit to the MSD. Importantly, for some experimental scenarios the TAMSD is highly inaccurate and it should not be used.
According to our findings, when specifying extracted parameters of anomalous diffusion processes, it is important to describe the means by which the specific fitting regime was selected including the expected accuracy and precision. This will enable to compare different experiments and more objectively validate proposed theories.
Finally we encourage the development of new estimation techniques for anomalous diffusion single particle trajectories. Without the advancement of theses techniques, the study of accurate anomalous exponents in complex experimental systems will not be possible.