FOCAL3D: A 3-dimensional clustering package for single-molecule localization microscopy

Single-molecule localization microscopy (SMLM) is a powerful tool for studying intracellular structure and macromolecular organization at the nanoscale. The increasingly massive pointillistic data sets generated by SMLM require the development of new and highly efficient quantification tools. Here we present FOCAL3D, an accurate, flexible and exceedingly fast (scaling linearly with the number of localizations) density-based algorithm for quantifying spatial clustering in large 3D SMLM data sets. Unlike DBSCAN, which is perhaps the most commonly employed density-based clustering algorithm, an optimum set of parameters for FOCAL3D may be objectively determined. We initially validate the performance of FOCAL3D on simulated datasets at varying noise levels and for a range of cluster sizes. These simulated datasets are used to illustrate the parametric insensitivity of the algorithm, in contrast to DBSCAN, and clustering metrics such as the F1 and Silhouette score indicate that FOCAL3D is highly accurate, even in the presence of significant background noise and mixed populations of variable sized clusters, once optimized. We then apply FOCAL3D to 3D astigmatic dSTORM images of the nuclear pore complex (NPC) in human osteosaracoma cells, illustrating both the validity of the parameter optimization and the ability of the algorithm to accurately cluster complex, heterogeneous 3D clusters in a biological dataset. FOCAL3D is provided as an open source software package written in Python.


Lower bound of minL in FOCAL3D
To save computation time in optimizing minL, we can narrow the search range by estimating a lower bound. We find an optimal minL by randomly scattering all N total localizations within the image volume V , so that the density of localizations is N total /V . For a given grid size , the volume occupied by a voxel is V = 3 and the average number of localizations per voxel is N total V V . Since the image is initially enhanced by locally summing all neighbouring voxels, we should multiply by the voxel connectivity (3 3 1 = 26 in 3D or 3 2 1 = 8 in 2D). So, for the 3D case, we find a lower bound of: (1)

Lower bound of minP ts in DBSCAN
We can also set a lower bound for minP ts in DBSCAN. Similar to the case for minL, we can estimate the average number of randomly scattered localizations we expect in a sphere of radius ✏: (2) The total number of localizations we obtain from a single, blinking fluorophore may be written as where n 1 , n 2 , . . . n B+1 are the number of localizations observed each time the fluorophore blinks up to a total of B blinks. Here we define a blink as a reactivation of the emissive state of the fluorophore. Many fluorophores used in single-molecule localization microscopy are observed to have an ON time that follows an exponential distribution where ⌧ ON is the characteristic ON time for that fluorophore. If we work in units of the camera exposure time ⌧ exp , and identify n ⇡ t/⌧ exp as the number of localizations that will be detected during a single emission period due to temporal binning by the camera, we can rewrite Eq. 4 as: Since the sum of B + 1 independent, identical exponential variables is an Erlang or Gamma Distribution, from Eq. 5, we can now derive the probability distribution for the total number of localizations n expected from a fluorophore that blinks B times: where (x) is the Gamma function (not to be confused with the Gamma distribution). However, the number of blinks B is also a statistical variable.
Here we assume that B is properly modeled by a geometric distribution where is the characteristic number of blinks. This dependence has, likewise, been experimentally verified for a number of photoactivatable/photoswitchable fluorophores. With this in hand, we weigh the probability distribution f (n|B), Eq. 6, by the probability of observing B blinks, Eq. 7, to obtain the distribution of the total number of localizations per fluorophore p(n): The sum is just the Taylor series expansion of e x where x = ne 1/ ⌧ ON in this case. Thus, the distribution of the total number of localizations per fluorophore is which is an exponential distribution with a characteristic number of localizations per fluorophore ofn =⌧ ON 1 e 1/ . Since localizations are a discrete quantity, we can discretize the exponential distribution into a geometric distribution as p(n) ⇠ 1 e (1 e 1/ )/⌧ ON e n(1 e 1/ )/⌧ ON . (10)

Distribution of Localization Precision
We can model the distribution of localization precisions by taking into account the distribution of the total number of photons per blink: p(n p ) = 1 n p e np/np .
An approximate lower bound for the localization precision is given by n ⇡ P SF / p n p . From this we can find a model distribution of the localization precision: where (x) is the Dirac delta function. With the substitution u = P SF / p n 0 p , this gives: .
The mode of this distribution can be calculated from the maximum likelihood estimate as: The mean of this distribution is: These relations are useful in scaling the axial localization precision with respect to the lateral precision. For example, by taking double the PSF width in the axial direction as in the lateral dimensions, we e↵ectively shift the peak and mean of the distribution by a factor of 2.
To use this result in our simulations, we employ inverse transform sampling. The cumulative distribution function (CDF) of the localization precision is: and the inverse CDF is: so if Y ⇠ U (0, 1), then n ⇠ p( n ) .