Quantitative Localization Microscopy: Effects of Photophysics and Labeling Stoichiometry

Quantification in localization microscopy with reversibly switchable fluorophores is severely hampered by the unknown number of switching cycles a fluorophore undergoes and the unknown stoichiometry of fluorophores on a marker such as an antibody. We overcome this problem by measuring the average number of localizations per fluorophore, or generally per fluorescently labeled site from the build-up of spatial image correlation during acquisition. To this end we employ a model for the interplay between the statistics of activation, bleaching, and labeling stoichiometry. We validated our method using single fluorophore labeled DNA oligomers and multiple-labeled neutravidin tetramers where we find a counting error of less than 17% without any calibration of transition rates. Furthermore, we demonstrated our quantification method on nanobody- and antibody-labeled biological specimens.


Three-state activation-bleaching model for single fluorophores
We consider switching between three states, the on-state, the off-state, and the bleached state. The switching between the on and off-states is modelled with the Poisson distribution; i.e. in the absence of photobleaching the number of transitions from the off-state to the on-state P sw m satisfies: where r = k sw t with t the time and k sw the switching rate, which is related to the lifetimes of the on and off-states by 1/k sw = τ on + τ off . The bleaching is governed by the geometrical distribution, namely the probability for bleaching at the m-th switching cycle is: where b = k sw /k bl is the probability for bleaching during one cycle, with k bl the effective bleaching rate. Intuitively, for small time scales the statistics will be close to the activation dominated Poisson-model, whereas for large times it will be close to the bleaching dominated geometric distribution. For intermediate times t the probability for m activation cycles is the sum of two terms. The first is the product of the probability P sw m of having m switching cycles and the probability (1 − b) m that the emitter has not bleached in the m switching cycles. The second term is the product of the probability P bl m of bleaching during the m-th switching cycle and the probability of having at least m switching cycles. In mathematical terms (for m ≥ 1): For m = 0 bleaching does not play a role, so the probability is then given by the Poisson term only: It may be verified that so that conservation of probability is satisfied. A rigorous derivation of these expressions for P m is presented in the next section. Interestingly, the probability distribution of the number of activation cycles m is equivalent to the distribution of the minimum of two random variables m Poisson and m geometric , where m Poisson is Poisson distributed with expectation value r and m geometric follows a geometric distribution with expectation value 1/b.
The moments of this probability distribution can be calculated from the moment generating function: The moments follow from the derivatives of this function at a = 0: giving a correlation parameter Q as: If we define the asymptotic value M ∞ = lim t→∞ M 1 (t) = 1/b then the results for the average number of activations and for the correlation parameter Q may be written as: For k bl t ≪ 1 we find: which is consistent with Poisson statistics provided that k bl ≪ k sw . For k bl t ≫ 1 we find constant values: consistent with a geometrical distribution with bleaching probability 1/M ∞ = k bl /k sw per activation cycle. These limiting cases fit with the a priori expectations.

Derivation of mixed Poisson-geometric probability distribution
This section presents a derivation of the mixed Poisson-geometric probability distribution in Eq. 3. The treatment is based on a generalization of the asymmetric Random Telegraph Signal (RTS) model [1], which describes switching between two states. Here a third state is introduced, representing the bleached state, which can in principle be reached from both the on-state and the off-state of the emitter. So, the starting point is the three-state model with state 0 ('offstate'), state 1 ('on state'), and state 2 (bleached state) with four transition rates k 01 (0 → 1), k 10 (1 → 0), k 02 (0 → 2), and k 12 (1 → 2). The total decay rate of state 0 is thus k 0 = k 01 + k 02 , and the decay rate of state 1 is thus k 1 = k 10 + k 12 . The lifetimes of the on and off states are thus τ on = 1/k 1 and τ off = 1/k 0 , generally τ off ≫ τ on . Suppose the system starts out in state 0 at time t = 0. The probability that the emitter remains in state 0 and never is activated is: for t ≥ 0. The probability that the system makes a single jump to state 1 in this time interval is: the probability that the system makes two jumps and returns to state 0 is: The probability that the emitter bleaches directly to state 2 is: and the probability it bleaches to state 2 after one transition to state 1 is: Clearly, these probabilities can be calculated by iteration. This is accomplished most easily by application of a Laplace transform: If so desired, a transition to the Fourier domain can be made by the substitution s → ε + iω and taking the limit ε → 0 after the inverse (Fourier) transform. We find that for n = 2m even and n = 2m + 1 odd different relations hold: Starting fromĝ 0 (s) = 1/ (k 0 + s) this leads to the solutions: with: The Laplace transform of the probability that the molecule is activated m times during the time interval t now follows as: and:q It may be checked that the sum satisfies: giving that: implying that conservation of probability applies.
Only the subset of molecules that is activated at least once is accessible to analysis. It follows that we need the renormalized probability distribution: for m ≥ 1 and P 0 (t) = 0. Here q 0 (t) can be found via an inverse Laplace transform: giving q 0 (∞) = k 02 /k 0 and a normalization factor 1/ (1 − q 0 (∞)) = k 0 /k 01 leading to a probability distribution (in the Laplace domain): for m ≥ 1.
An important simplification can be made for times t much larger than 1/k 1 = τ on . In that case we may use the approximation: with: At this point it is convenient to introduce the two physically relevant rates/time scales, namely the activation and bleaching rates, defined by: The activation time constant is simply τ sw = 1/k sw = τ on +τ off . We also find that k 01 k 10 /k 0 k 1 = 1 − k bl /k sw . Now the probability distribution can be written as: This expression may be rewritten in a form that is more amenable to inverse Laplace transform: The inverse Laplace transform now gives: in agreement with the results of the previous section.

Effect of labelling stoichiometry
Suppose there are K labelling sites with S i (i = 1, 2 . . . K) fluorescent emitters per site which have M ij activations (j = 1, 2 . . . S i ). The number of activations per site is then: Suppose the statistics of the number of emitters per site is independent of the site and has moments S and S 2 . Suppose furthermore that the statistics of the number of activations of each emitter is independent of emitter and site and gives rise to moments according to the three-state model: for all i and j and with M ∞ = k sw /k bl . The Q-parameter determined from the spatial correlation analysis is given by: with: and: Combining all results gives: with: a number characterizing the statistics of the number of emitter per site. The second term on the r.h.s. is new compared to the previous analysis of the statistics per emitter. Clearly, there are now three parameters that determine Q as a function of t, the effective bleaching rate k bl , the asymptotic value of the number of activations per emitter M ∞ and the labelling stoichiometry parameter µ. The expected total number of activations is: All that is lacking then to determine the number of labelling sites K is a connection between S and µ. We consider now three examples in which there is a connection between the mean and the variance of the statistical distribution of the number of emitter per site.
The first example refers to having a monomer/dimer on each site with probabilities P 1 = 1 − β and P 2 = β. It follows that then S = 1 + β and S 2 = 1 + 3β so that µ = 2β/ (1 + β). Measurement of µ from Q thus gives a value for β = µ/ (2 − µ) and hence for S = 2/ (2 − µ). So, the average degree of monomerization/dimerization can potentially be measured in this way, in addition to the total number of labelling sites.
The second example is for a Poisson distributed number of emitters per site. This is a model for primary antibody labelling where multiple fluorophores are attached to the antibody, under the condition that there is no significant fluorescence quenching. The averages are over the subset of sites with at least one emitter. This gives S = η/ (1 − exp(−η)) and S 2 − S = η 2 / (1 − exp(−η)) with η the Poisson rate, so that µ = η. Possibly, the Poisson-rate can thus be measured directly from the fit of the measured Q as a function of t to the model. In case η is large compared to unity then we simply have S = µ = η.
The third example is a model for secondary antibody labelling, where multiple secondaries can bind to a single primary, and where multiple emitters are attached to each secondary, i.e. now S = n i=1 T i with n the number of secondaries and the T i the number of emitters per secondary. We will analyse the case where the n are Poisson distributed with rate µ 1 and the T i with rate µ 2 . The probability distribution of S is given by: We find that the probability of observing zero fluorophores is: Restricting to the observed cases S > 0 implies we have to normalize the probability distribution by a factor 1/ (1 − P (0)) and sum only over values S > 0. This leads to: giving: Here, the angular brackets indicate averaging over the individual probability distributions for n and for the T i . For the Poisson-distribution at hand this gives: Generally prior knowledge on the distribution of secondaries per primary and the distribution of fluorophores per secondary is needed to proceed. It appears that the final counting result is not very sensitive to details of the secondary to primary labelling stoichiometry, i.e. errors in the value of µ 1 are largely compensated by opposite errors in the estimated M ∞ , giving a relatively robust estimate for the number of localizations per primary antibody. This can be understood semi-quantitatively as follows. In case there is little bleaching the fitting of M ∞ is dominated to a large extent by the switching regime k bl t ≪ 1. Then it holds that: So, given the measured correlation parameter Q as a function of time and bleach rate k bl , the product A = (µ + 1) M ∞ is fixed for all values of µ. It then follows that the estimate for the number of localizations per primary antibody is: so that the stoichiometry only affects the final counting estimate via the ratio: It turns out that the functional dependence of ζ on µ 2 hardly changes with µ 1 for the range of values 1 < µ 1 < 5, with relative variations on the order of 10%. In fact, the dependence of ζ on µ 2 in the range of values 1 < µ 2 < 5 is also rather weak. It should be noted that the current analysis neglects quenching, but in case that can be safely neglected, it does show that the counting analysis is robust against errors in the stoichiometry calibration.

Estimation of correlation parameter at high labelling density
In samples with high labelling densities or with tightly clustered labelled molecules, the Qestimation may be prone to overestimation because it mistakes correlations due to the sample's spatial structure for correlations from repeated localization of the same labelling site. Here we will analyse under which conditions this problem is expected to occur. The Q-estimation algorithm attempts to fit a model function H(q) to the FRC numerator, which describes the decay in spatial correlations due to localizations of the same labelling site. H(q) depends on the unknown spread of localizations of a single labelling site (i.e. effective localization error) due to localization error, errors in the correction for stage drift and the finite size of the labels, and is parameterized in Eq. 6 as: where σ m is the mean of the effective localization error and ∆σ it's standard deviation. The FRC's numerator ν(q) can be expressed as: where the term S(q) relating to the sample's spatial structure is equal to: with K the number of labelling sites andψ ( q) the Fourier transform of the normalized density of labelling. The algorithm will have difficulties estimating σ m and ∆σ when the decay of S(q)H(q) and QH(q) cannot be distinguished very well. This would occur if the decay of S(q) is still larger than Q at the spatial frequency q = 1/2πσ m where QH(q) starts to decay appreciably. Thus, we have the criterion: If we take as an example structure a line of length L and Gaussian cross-section with full width at half maximum w, then we have [2]: leading to the criterion for the linear density of labelled sites: If the width of the filaments is on the same order as the width of the localization error distribution and Q ≈ M , it follows that there should be fewer than one site per 2σ m /e ≈ 0.74σ m . In a more typical scenario Q ≈ 1.5M and thus the criterion becomes less than one per 0.5σ m . Similarly, for a line with a rectangular cross-section and width w we would have [2]: and thus we get the criterion: or if w equals the full width at half maximum of the localization error distribution that ρ lin should be less than one site per 1.23σ m (pessimistic case) or one per 0.82σ m (typical case). Thirdly, if we have a bell-shaped structure that can be described by a Gaussian with standard deviation a, then the number of sites K in the structure should satisfy: Finally, if we have a circular structure with a radius a and K sites, then and therefore we obtain the criterion For a ≈ σ m this gives K < 1.3 (pessimistic case) or K < 1.9 (typical case). For a ≈ 2σ m this becomes K < 3 (pessimistic case) or K < 4.5 (typical case).

Effect of false negative localizations
False negative localizations refer to events where a fluorophore is activated during an acquisition but this fluorescence does not lead to a successful localization by the reconstruction algorithm. This may happen for example if nearby fluorophores are simultaneously active or if the fluorophore is very dim or the event is very short. The consequence of false negative localizations is that the number of localizations per fluorophore does not correspond anymore to the number of activation events. Below we will analyse the consequences assuming that the probabilities for activation events to result in a successful localization P loc are independent and the same for all events.
Here, M denotes the number of activations per fluorophore. Similarly, we find that: from which it follows that Another important consequence of false negative localizations is that a fraction P 0 of all fluorophores is never localized. This fraction is given by: Finally, it can be shown that the probability distribution for the number of localizations per fluorophore is given by the same expression as in Eq. 3, if the following substitutions are made: If b = 1/M ∞ is not too large, then effectively only M ∞ appears to be reduced by a factor P loc . However, because M becomes smaller by the same amount, the accuracy of the estimate for M does not deteriorate much. For example, if P loc = 80% and M ∞ = 5 then the estimate for M would be off by 5%.