Maximizing the Biochemical Resolving Power of Fluorescence Microscopy

Most recent advances in fluorescence microscopy have focused on achieving spatial resolutions below the diffraction limit. However, the inherent capability of fluorescence microscopy to non-invasively resolve different biochemical or physical environments in biological samples has not yet been formally described, because an adequate and general theoretical framework is lacking. Here, we develop a mathematical characterization of the biochemical resolution in fluorescence detection with Fisher information analysis. To improve the precision and the resolution of quantitative imaging methods, we demonstrate strategies for the optimization of fluorescence lifetime, fluorescence anisotropy and hyperspectral detection, as well as different multi-dimensional techniques. We describe optimized imaging protocols, provide optimization algorithms and describe precision and resolving power in biochemical imaging thanks to the analysis of the general properties of Fisher information in fluorescence detection. These strategies enable the optimal use of the information content available within the limited photon-budget typically available in fluorescence microscopy. This theoretical foundation leads to a generalized strategy for the optimization of multi-dimensional optical detection, and demonstrates how the parallel detection of all properties of fluorescence can maximize the biochemical resolving power of fluorescence microscopy, an approach we term Hyper Dimensional Imaging Microscopy (HDIM). Our work provides a theoretical framework for the description of the biochemical resolution in fluorescence microscopy, irrespective of spatial resolution, and for the development of a new class of microscopes that exploit multi-parametric detection systems.


Introduction
Fluorescence microscopy provides an invaluable tool to probe cell and tissue biochemistry. Fluorophores sensitive to the physicochemical properties of the environment or fluorescent sensors engineered to probe biochemical reactions encode biologically relevant information into changes of their photophysical properties. The read-out of these probes is typically performed with the quantitative detection of specific photophysical properties, e.g. excited state lifetime, fluorescence anisotropy or emission/excitation spectra. Photon-toxicity, photo-bleaching and the need for acquisition times compatible with biological processes limits the maximum number of photons that can be collected during an experiment. This limited photon budget hinders the capability of biophysical imaging techniques such as fluorescence lifetime, anisotropy and spectral imaging to unmix complex biochemical signatures and to resolve small changes in biochemical systems. Theoretical frameworks describing the role of photon-statistics in various techniques have been developed in order to define these limits and to provide tools that may serve for the optimization of detection schemes [1][2][3][4][5].
Over the past decade, most academic and industrial developments in microscopy have focused on spatial super-resolution techniques [6,7]; however, the capability of fluorescence microscopy to discriminate different biochemical and physic-chemical environments does not depend only on spatial resolution: detection schemes aiming to enhance biochemical/physico-chemical resolution of fluorescence microscopes are equally fundamental, in particular for full exploitation in cell biology. Intuitively, multiparametric detection [8,9] is an obvious strategy to achieve this goal. Indeed, various techniques developed in the past decades, e.g., spectrally resolved lifetime imaging [8], time-resolved anisotropy imaging [10] and spectrally-resolved anisotropy imaging [9], already provide detailed information concerning certain parameters. Detection technologies are mature to integrate all these modalities into a single instrument. However, these multiparametric imaging platforms are sometime met with skepticism because of the high cost and complexity of multi-parametric or multi-modal systems and the lack of a rigorous definition of ''biochemical resolving power'' that would permit to demonstrate advantages of these detection schemes on theoretical grounds. Here, we present a theoretical framework that generalizes Fisher information theory for a system with multiple detection channels with the aim to provide a mathematical background with which it is possible to define and to study the biochemical resolving power of an optical system as a function of photon statistic, independently from its spatial resolution. Furthermore, we describe strategies to optimize multi-dimensional detection systems, provide software for their analysis and review sources of photon-losses that may limit their efficiencies.
Fisher information theory has been conveniently used to study the upper boundary of signal-to-noise ratios (SNR) achievable in fluorescence detection [1,5]. These works have been primarily applied to the study of single-photon counting systems because single photon counting is mostly affected by Poissonian noise. All other techniques may introduce additional types of noise at the detriment of SNR. However, at high signal-to-noise ratios and with optimized detection schemes [3], analog techniques may converge to the upper boundaries described by Fisher information potentially providing other benefits, e.g. higher acquisition throughputs [3]. Therefore, all results obtained with the application of the following theoretical framework should be regarded as the physical limits in the precision of a detection system and any system that would approach this limit will be defined as ''efficient'' [3].
Furthermore, we aim to lay down the theoretical foundations (and justifications) for multi-dimensional techniques and, more specifically, for spectrally-and polarization-resolved time-correlated single-photon counting that would enable the parallel detection of all properties of light. For conciseness, this technique will be referred to as Hyper Dimensional Imaging Microscopy or HDIM [9]. Furthermore, as phasor transformation [11] has become widely used in the analysis of biophysical imaging data, we propose a generalization of this technique to multi-dimensional datasets. We show that parallel multi-parametric imaging modalities can maximize signal-to-noise ratios and boost the resolving power of biochemical/biophysical imaging techniques. Thus, our work lays a theoretical foundation for the development of HDIM platforms, and demonstrates some of the potential advantages of such systems in resolving cell biological processes.

Theory
In this work, Fisher information is used in order to define and describe the physical limit in biochemical resolving power of an optical system and to characterize how a multi-channel detection system can attain the highest possible biochemical resolution from a theoretical standpoint. In this section, we provide definitions and theorems that enable the description of biochemical precision and resolution in fluorescence microscopy; the demonstrations can be found in the Methods section; the validation of the theory and the description of practical tools for the analysis of optical systems can be found in the Results section.
A multi-channel optical instrument partitions detected photons into histograms of photon counts collected over ranges of arrival times, wavelengths and polarization states. It is possible to characterize the general properties of the Fisher information content provided by any given partition and to demonstrate that detection systems of higher channel number and dimensionality increase the information content of an experiment as stated in the ''photon partitioning theorem'' below.
Photon Partitioning Theorem (See also Methods: Proof of Photon Partitioning Theorem) Let p be a partition of N photons obtained with m independent noiseless channels resulting in the observable random variablẽ n n~G 1 ,G 2 ,:::,G m ð Þ , where G i is the photon-count value measured by the i-th channel. Let I x ð Þ be the Fisher information on the unknown parameter x. If p' is a finer partition of the N photons obtained with m'.m independent noiseless channels then I ' x ð Þ §I x ð Þ. In analogy to literature related to information theory [12], a noiseless channel is defined as a channel that does not introduce errors/noise. In the context of fluorescence, a noiseless channel reveals shot noise that is caused by the inherently Poissonian process of photon detection. This is not only a useful and common abstraction, but it is representative of single photon counting detectors. The random variable x may represent physical quantities such as the fluorescence lifetime of a fluorophore, FRET efficiency or fluorescence anisotropy, or a biochemical quantity such as pH, a fraction of bound molecules or a relative enzymatic activity.
Although for a finer partition of photons the Fisher information may increase, the trivial duplication of a detection channel that collects light on an identical spectral, polarization and time range, cannot provide any advantage, but may rather increase only photon-losses or noise (see practical non-trivial partitioning note and Text S1 on costs and photo-losses). The following corollary and note clarifies that maximal Fisher information can be attained I 'wI ð Þ solely if the partitioning of photons is carried out on partitions that maximize the gradient of the partitioning function relative to the parameter of interest.
Non-trivial Partitioning Corollary (See also Methods: Proof of Non-trivial Partitioning Corollary) For any given detection channel with photon counts G i there is a partitioning function 0va x,s s i ð Þv1 such that the two newly defined detection channels will detect aG i and 1{a ð ÞG i photons, respectively. a is a function of the unknown parameter x and of the (hyper-dimensional) spectral propertiess s i ð Þ of each detection channel. If the necessary and sufficient condition La=Lx=0 is satisfied, the new partition provides a net increase Here, the vectors s i defines the boundaries of each detection channel including the band of observed wavelengths, the orientation of an analyzer and the time windows of photon arrival times that is considered in the photon counting process (see also Proof of Photon Partitioning Theorem). An example of partitioning function for the case of fluorescence lifetime imaging is described in the Methods -Partitioning function: an example.
A fundamental and direct outcome of the photon partitioning theorem is that multi-channel multi-parametric detection systems are more photon-efficient (the capability to approach relative errors limited just by Poissonian noise) [2,3].
Photon-efficiency Corollary (See Methods: Proof of Photonefficiency Corollary) For a non-trivial partitioning of N photons onto m detection channels, the variance s x 2 of an unbiased estimator of the parameter x will asymptotically decrease with m increasing, i.e. the photon-efficiency of a measurement increases with non-trivial increments of detection channel number.
Photon-efficiency can be described by the figure-of-merit ''F '' (or F-value) introduced by Gerritsen et al. [2] for FLIM that can be easily generalized to F~s x x {1 ffiffiffiffi ffi N p , i.e. the ratio between the relative error s x x {1 À Á that affects the estimate of x compared to Poissonian noise 1= ffiffiffiffi ffi N p À Á . An F-value of 1 signifies that the estimate of x is affected only by Poissonian noise (it is efficient) and for an F-value of 2, the estimate of x exhibits twice the noise compared to a shot-limited measurement as if a fraction equal to 1-2 21/2 of the detected photons were lost. Therefore, a measure of photon-efficiency is p~F {1=2 [3] (for an analysis of actual optical losses see Text S1 -Costs and photon-losses: case studies and the Practical non-trivial partitioning note). The increased precision in the detection of photophysical/biochemical quantities results in enhanced biochemical resolving power as stated below by the separability and resolving power corollary.
Separability and Resolving Power Corollary (See Methods: Proof of Separability and Resolving Power Corollary) Let consider a photo-physical system that is sensitive to physicochemical (biochemical) properties of its environment. If E 1 and E 2 are two different physico-chemical environments characterized by the parameter x and distance described by m = |x 1 2x 2 | then, for a non-trivial partitioning of N photons onto m detection channels, the separability S~m ffiffiffiffiffiffiffiffiffiffiffiffiffiffi s 2 1 zs 2 2 q of the detection system will asymptotically increase with m increasing. Separability [1] is a less ambiguous (statistical) definition of resolving power; however, it is possible to define biochemical resolving power also applying the Rayleigh criterion (i.e., S,2) in analogy to the description of spatial resolution. The smallest resolvable difference between E 1 and E 2 at x = x 0 (Dx, i.e., the biochemical resolution) can be described thus by Dx~ffi ffiffiffiffiffi ffi 8 pN r x 0 and, therefore, the biochemical resolving power is . Therefore, at increasing number of channels, the photon-efficiency p of a fluorescence microscope increases together with its biochemical resolving power.
This work is concerned with the biochemical resolution of a microscope; however, when x is a photophysical property (e.g., the fluorescence lifetime of a fluorophore) a consequence of the photon partitioning theorem is that the photophysical resolving power of a fluorescence microscope increases for m increasing. Therefore, this definition is a generalized description of resolution useful to define how small photophysical or biochemical quantities can be resolved either within a pixel of an image or between two different areas of a sample. The definitions of separability, resolution and resolving power as affected by the instrument response function can be found in Methods -Proof of Separability and Resolving Power Corollary.

Practical Non-trivial Partitioning Note
The photon partitioning theorem and its corollaries were described in the case of an ideal system in which the addition of a detection channel can be done without any penalty. This is useful to define the physical limits in Fisher information, precision and biochemical resolution of an optical system and not just the limit imposed by current technologies. In reality, the addition of detection channels can result in photon-losses (i.e., partial loss of information) or more general constrains or costs (e.g., the actual cost of an optical system). If these generalized cost C can be quantified, a non-trivial partitioning will be practically achievable if and only if I '{C'wI {C. The analysis of photon-losses caused by optical relay systems (see Text S1-Costs and photonlosses: case studies) or additional sources of noise (see Methods -Numerical optimization of Fisher information) is therefore fundamental for the realization of the physical limits of the biochemical resolving power of a fluorescence microscope.

Numerical Optimization of Fisher Information
The non-trivial partitioning of photons with higher channel density increases the physico-chemical (or biochemical) resolution of a detection system. Practical implications of the photon partitioning theorem can be demonstrated by implementing numerical optimization of a detection system. Fast iterative algorithms can identify the best number and configuration of detection channels by the numerical maximization of G~a 1{a ð Þ ½ {1 Da=Dx ð Þ 2 , a factor that is directly proportional to the absolute gain in Fisher information (see Non-trivial Partitioning Corollary; D indicates numerical estimations of derivatives). A detection channel is split into two new detection channels only when GwC, where C represent the cost of adding a new detection channel. The physical meaning of C can be, for instance, a photoloss (see also Supporting Material -Costs and photon-losses: case studies) or additional noise (see Methods -numerical optimization of Fisher information); however, this cost term can be used as a mere numerical tolerance for the iterative algorithm. Similar algorithms can be implemented by evaluating the loss L ð Þ of Fisher information when two channels are merged into one. The optimal partitions p ð Þ obtained with these iterative strategies max p G p ð ÞD GwC À Á or min p L p ð ÞD LvC can be further optimized by the numerical minimization min p F p ð Þ ð Þ of the F-value F ð Þ [2,3,13], i.e. the ratio between the relative error s x x {1 À Á that affects the estimate x compared to Poissonian noise 1= ffiffiffiffi ffi N p À Á :

Optimization of Time Gating
Fluorescence lifetime imaging microscopy can be accomplished with a number of detection schemes, in the time-or frequencydomain and with single photon counting or analog detection [14][15][16][17]. For simplicity, in this work we consider only time-correlated single photon counting (TCSPC) [18,19] and time-gating [20] where photon arrival times relative to repetitive pulsed excitation are measured and histrogrammed in a number of time bins. The precision of these time-domain techniques as a function of instrument parameters (e.g., laser repetition period T, number and shape of time bins), fluorescence lifetime (t) and photon count per pixel (N) has been thoroughly described [1,2,[21][22][23]. Therefore, it is instructive to compare results inferred from the general properties of Fisher information relative to state-of-the-art optimization techniques. First, we describe how to optimize the position of time gates by analyzing the simple case of detection based on just two time gates; second, we generalize this optimization strategy to detection systems with multiple gates and TCSPC.
Gerritsen et al. have shown that time gates (channels) of uneven width can significantly improve the precision of fluorescence lifetime imaging microscopy compared to a system that provides the same number of time gates but with uniform width. The demonstration that time gates of uneven width increase the Fvalue in FLIM was obtained with Monte Carlo simulations over a number of test partitions because analytical solutions are often difficult or impossible to obtain [2]. These results have been confirmed also by the analytical description of the F-value that has been recently evaluated for two gates of uneven width [13]. Figure 1a shows the results of Monte Carlo simulations ran to identify the time (rt) when the second time gate should start provided that this time gate will close (at time T) before the next laser pulse. The optimal value of r that minimizes the F-value (,1.24) is ,1.58 (e.g., for a lifetime of 2 ns, the second time gate should start at 3.16 ns). These values are practically identical to those found analytically (F best = 1.25, r = 1.59; see Methods -Optimal gates for two time gates system for details and Fig. 1a, black circle) [13].
The photon partitioning theorem, however, permits to solve optimization problems just by the analysis of the general properties of Fisher information without requiring laborious Monte Carlo simulations or analytical solutions. Figure 1b and c show the F-value curves obtained by the simple numerical maximization of the gain function G and the linear regression between the estimated best value of rt and the simulated value t, respectively. These data permit to establish that the best F-value achievable for any value of fluorescence lifetime is 1.24 and that the best position of the start of the second time gate is at 1.59 times t. Notably, the numerical maximization of the Fisher information gain provides, in less than 2 ms per tested lifetime, the very same results obtained with lengthy Monte Carlo simulations (,3hours per simulated SNR) or with laborious evaluations of an analytical solution. Moreover, analytical solutions to find optimal gating with multiple time gates are unlikely to exist; however, a simple optimization algorithm based on the nontrivial Partitioning Corollary can find solutions very efficiently (,20 ms). Figure 1d-e shows optimization of gating strategies performed with either 4 or 8 gates, the number of gates often used in time-gating. The blue curves show the performance of a standard even partition of the period T in 4 ( Fig. 1d) and 8 (Fig. 1e) gates and the black curves demark the best performance that can be achieved by TCPSC (e.g., with a full bin resolution on the arrival times histogram equal to 256, method equivalent to detect photon counts onto 256 adjacent time gates of equal width). Red and green curves show the F-value obtained by time-gates that are iteratively optimized with max p G p ð ÞD GwC and min F p ð Þ (optimization software is provided in Supporting Material). Importantly, the gain of Fisher information is optimized over a number of fluorescence lifetime values at the same time (from 0.5 to 3.0 ns) permitting to obtain an optimal response over a very broad range of possible experimental conditions. Figure 1f shows the efficiency of each gating scheme relative to TCSPC as expressed by the average value of the ratio TCSPC evaluated over the 0.5-3.0 ns range of fluorescence lifetime values. A relative efficiency of 75% means that the precision of the system is as if 25% of photons are lost and, therefore, the system will have to increase the acquisition time of 0.75 21 times in order to recover the same SNR compared to TCSPC. Therefore, the numerical analysis of Fisher information theory permit to evaluate which is the best performing and simplest detection system that should be used for FLIM.

Optimization of Spectral Channels
Spectral detection is the detection of fluorescence over a number of spectral channels. Detection can be achieved by i) sequential imaging of fluorescence emission over a number of spectral bands selected with filters or other devices (spectral imaging), ii) simultaneous detection of fluorescence emitted over a few spectral bands with the use of a number of photodetectors (multi-colour or multi-spectral imaging) or iii) simultaneous is also plotted versus the number of gates (f) for gates of equal width (blue, ''even gates''), for gates optimized by max p G p ð ÞD GwC (red, ''optimization'') and then followed by min p F p ð Þ (green, ''full optimization''). doi:10.1371/journal.pone.0077392.g001 acquisition of spectral information using a spectrograph and an array of detectors (hyper-spectral imaging). Nowadays, spectral, multi-spectral and hyper-spectral imaging techniques are commercially available with high-end laser scanning microscopes and commonly used for spectral unmixing of many fluorophores. Neher and Neher characterized the role of photon-statistics in spectral imaging and discussed the Rao-Cramer limit that can be achieved with two spectral channels [24]. At the best of our knowledge, there are no other works that describe the role of photon-statistics in spectral imaging for applications to fluorescence detection. Figure 2a shows typical spectra representative of fluorophores like EGFP and mCherry; these spectra were used to test algorithms developed to optimize the performance of spectral imaging. Optimization of spectral gating schemes (the assignment of a specific spectral band to a detection channel) was performed with 2 or 4 spectral gates and the resulting F-values (see Eq. (21) in Methods) are shown in Fig. 2b and c, respectively. With well separated spectra, spectral unmixing can achieve very high precision (F ?1) for a broad range of relative abundances. The best partitions found with min p L p ð ÞD LvC (red curve) and followed by min p F p ð Þ (green curve) are practically matching a spectral system providing 256 spectral bins. This is also confirmed by the relative efficiency of partitions of increasing channel number shown in Fig. 2d. With fluorophores exhibiting much more overlapping spectra, e.g. EGFP and EYFP, spectral unmixing deteriorates its performance, but optimization of spectral channels permit to recover very high efficiency with few (2-4) spectral channels ( Fig. 2e-h). Figure 3a-c shows true color images, i.e. red-green-blue representations of how the sample would look like if seen at the naked eye [9,25], of HeLa cells expressing EGFP-Actin (a), EYFP-Tubulin (b) or both tagged cytoskeletal proteins (c). The total photon count for the latter typical measurement is also shown in Fig. 3d amounting to a few hundreds photons per pixel. The results of spectral unmixing for a number of spectral gating strategies with gates of equal width and the reference spectra of the samples are shown in Fig. 3e-k. Figure 3h-j show images obtained on the same data, but with spectral data binned together accordingly to gating strategies with gates of uneven width optimized by Fisher information analysis. The relative abundance of actin and tubulin are shown in the green and red channel, respectively, in a RGB overlay demonstrating good separation of these two fluorophores. The images demonstrate improvement in signal-to-noise ratios at increasing channel number (Fig. 3e-g) and higher signal-to-noise ratio for all optimized detection schemes ( Fig. 3h-j). F-values were evaluated experimentally by repeating the acquisition of the spectral data five times and analyzing it independently in order to obtain standard deviations and F-values on a pixel by pixel basis. The distribution of the F-values are shown in Fig. S1 and the best F-values are shown in Fig. 3l confirming better results for optimized gating strategies and trends towards improved precision for higher number of detection channels.

The Photon Partitioning Theorem and Multi-parametric Detection
The photon partitioning theorem and its corollaries permit us to predict that multi-parametric detection schemes that exploit the various properties of light can attain higher signal to noise ratios compared to detection schemes based on fewer detection channels. This prediction was tested with Monte Carlo simulations for hyper-spectral imaging (SPEC), for the combined detection of photon arrival times and wavelength that can be achieved by spectrally resolved FLIM [26] (SLIM) and for the simultaneous resolution of photon arrival times, wavelength and polarization state that can be achieved by Hyper Dimensional Imaging Microscopy (HDIM). Monte Carlo simulations (see Methods for details) were carried out with fluorophores exhibiting typical fluorescence lifetimes (3.0 ns and 2.0 ns), intrinsic anisotropy (0.4 and 0.2), rotational correlation times (12 ns and 1.0 ns) and spectral width (50 nm); simulations were performed with varying difference between the spectral peaks of the two fluorophores (from 0 nm to 20 nm) and central wavelength set to 550 nm. The resulting Hyper Dimensional Spectral Signatures (HDSS) of the two fluorophores, i.e. the spectrally and polarization resolved decays, are shown in Fig. S2 (see also inset in Fig. 4b). Synthetic images of 256 by 256 pixels were generated with photons histogrammed on 2,048 channels: 2 polarization states (parallel/ orthogonal to excitation light), 16 spectral bins of equal width covering the 440-630 nm range and 64 time gates of equal width spanning 12.5 ns. These acquisition parameters were chosen to match the properties of existing detection systems (see Methods for details). Figure 4a illustrates how one axis of the synthetic images was used to generate a linear gradient of the fractional abundances of the two fluorophores, whilst the second axis was used to generate 256 replicates that were used to estimate the precision of the unmixing algorithms. Synthetic HDIM images were then generated by using this abundance matrix and then adding Poissonian noise in each simulated detection channel (250, 1000 and 10000 photons per pixel) in order to test the robustness of the analysis at low, typical and very high photon count levels. Figure 4b show the resulting intensity image in the presence of Poisonian noise for the case where an average of 250 photons per pixel was simulated. In order to compare analysis of HDIM data with analyses of datasets representative of spectrally resolved lifetime imaging and spectral imaging, photon counts from the HDIM datasets were summed along the polarization and time dimensions to generate appropriate datasets of lower dimensionalities.
HDIM data analysis was carried out both over all 2,048 detection channels per pixel and with a data reduction algorithm based on a generalization of phasor-based data analysis. For this type of analysis, discrete cosine (DCT) and sine (DST) transformations were computed over only 2 spectral bands (440-540 nm and 540-630 nm) similarly to Digman et al. [11], but for both polarization states. DCT and DST were also computed along the spectral dimension similarly to Fereidouni et al. [27]. Furthermore, we generalized the phasor approach to multi-dimensional data by applying two-dimensional DCT/DST along the combined spectral/time dimension. This strategy resulted in Hyper Dimensional Phasors (HDPH) that reduced HDIM dimensionality from 2,048 to 16 (see Text S3-Hyper Dimensional Phasors (HDPH)).
Data analysis was then carried out by spectral un-mixing. The spectral un-mixing problem is described by Eq. (1) for the case of an HDIM image (iHDIM) with abundances matrix A and the matrix containing the endmembers signatures emHDSS: Similar equations can be written for SLIM, SPEC, and HDPH. By the inversion of Eq. (1) in the presence of Poissonian noise (250, 1000 and 10000 photons per pixel), we estimated the accuracy and precision of these various techniques (see Methods). Figure 4c and d demonstrates the growing accuracy (lower bias) and precision (lower noise) in the estimations of the abundances of the two fluorophores obtained with methods of increasing dimensionality (SPEC, SLIM and HDIM/HDPH) at decreasing spectral shift between two fluorophores. Bias was estimated by the average difference between the simulated abundances and their estimated values and noise was estimated by the spatially averaged standard deviations computed over the 256 simulation repeats and normalized by Poissonian noise ffiffiffiffi ffi N p À Á . The error bars represent standard deviations across repeats at different Poissonian noise levels (details on the definition of these figures of merits can be found in Methods -figures of merit for accuracy and precision).
The validity of the photon partition theorem is illustrated by the increasing precision at higher channel number. The addition of detection channels along gradients established by the random variables the experiment attempt to estimate increases the information content of the measurements; therefore, the acquisition with a higher number of detection channels results in better signal to noise ratios.

Biochemical Resolving Power and Fisher Information
We have shown that increasing the number of detection channels also across different photophysical parameters results in increased precision. Unmixing was used to demonstrate the possibility to separate the contribution of two different characteristic HDSS representative of two fluorophores or two biochemical/photophysical environments. In analogy to the spatial resolution in fluorescence microscopy, the resolution of biochemical techniques is also limited by photon-statistics. We developed a simple algorithm (see Supporting Material) that can determine the F -value of unmixing for all possible detection modalities. Figure 5a-e shows the results of this analysis applied to the same HDSS used to generate Fig. 4. Figure 5c shows also the best Fvalue for anisotropy imaging (4.37, green curve), FLIM (4.20, dashed blue curve), time resolved anisotropy (2.99, dashed cyan curve), spectral imaging (2.87, red curve), spectrally resolved anisotropy (2.48, dashed yellow curve), SLIM (2.46, fuchsia curve) and HDIM (2.18, black curve). How many photons are required to resolve two different biochemical/photophysical signatures? If Figure 2. Optimization of spectral channels. Spectra representative of fluorophores like EGFP and mCherry (a) or EGFP and EYFP (e) were used to carry a numerical optimization of the spectral gating scheme as a function of the relative abundance of EGFP. Results are shown for a a system with 2 (b and f) and 4 (c and g) adjacent spectral channels which relative positions were set equal (blue) or optimized numerically by min p L p ð ÞD LwC (red) and followed by min p F p ð Þ (green) with optimization carried over the relative abundance range indicated by the vertical lines. The black curves represent performance that could be achieved with hyperspectral detection over 256 spectral gates. The spectral gating scheme is represented by the boxes at the bottom of panels b, c, f and g. The efficiency of the optimization strategies relative to hyperspectral detection achieved with 256 is also plotted versus the number of gates (d and h). unmixing could achieve the ideal singal-to-noise ratio limited by Poissonian noise, 400 photons would be sufficient to estimate fluorophore abundances with 5% relative error ffiffiffiffi ffi N p À Á . For the case shown in Fig. 5c, the F -value of FLIM at a relative abundance ,50% is ,4.2. Therefore, the number of photons needed to resolve the mixture of the two fluorophores is ,4.2 2 ?400,7,100. In the same conditions, spectral imaging resolves the mixture with ,3,300 photons and HDIM, making use of all available information, with just 1,900 photons. Although we have demonstrated that increasing the number of channels always increases the biochemical resolving power in fluorescence microscopy, there are instances where simpler techniques can achieve optimal results already. For instance, if two biochemical environments exhibit a spectral shift of 20 nm, spectral imaging (Fig. 5e, F -value , 1.71) resolves these two environments just with ,1,200 photons. Therefore, the numerical analysis of Fisher information and the study of its general properties provide a useful tool to maximize the biochemical resolving power in fluorescence microscopy and to select the best (and simplest) detection system to achieve this limit efficiently.

Discussion
In this work, we demonstrate and exemplify the ''photon partitioning theorem'' as a tool necessary to define and describe the biochemical resolving power in fluorescence microscopy/spectroscopy. The photon partitioning theorem and its corollaries permit us to generalize the analysis of photon-statistics in fluorescence detection and to demonstrate the gain in Fisher information that can be attained theoretically. This provides theoretical tools of significant practical importance (e.g., the definition and characterization of biochemical resolution and methods to optimize detection systems) and also resolves a common misconception: as several hundreds or thousands of photons are necessary to fit even a simple fluorescence lifetime decay [1], the acquisition of data on separate channels has often been considered unnecessary. Moreover, the detection of many features of light is sometimes even considered impossible because of the long acquisition and analysis times that may be required. These statements hold true only if additional detection channels are acquired sequentially (as in a multi-modal microscope) or if each detection channel is analyzed independently from one another. However, whenever acquisition channels are operated in parallel [8,9] data can be acquired at the same speed (or faster) because photons are not lost through the use of filters or analyzers (see Text S1-Costs and photon-losses: case studies). Furthermore, a net gain in Fisher information results in higher signal-to-noise ratios whenever data are not analyzed channel-by-channel but in global approaches [28,29], as implicitly done in the applications shown in this work.
We have dedicated sections of this work to the description of practical constrains on achieving the theoretical maxima in Fisher information. In practice, the addition of a detection channel is carried out at a certain ''cost''. This cost can be represented by photon-losses in the coupling optics, design complexity, additional read-out noise, or even budget. However, it is important to distinguish between the theoretical limits imposed by physics (described here) and the practical limitations of state-of-the-art and total photon count per pixel (d) for the latter. Spectral unmixing was used for different gating schemes (e-j) for gates of even width with 3 (e), 9 (f) and 34 (g) gates and for optimized gates of 2 (h), 4 (i) and 8 (j) gates. The spectra gating scheme is displayed at the bottom of the panels with the reference spectra (k) overlaid for comparison. In green the signal recovered for GFP-Actin and in red the signal estimated for YFP-Tubulin. The comparison of the best achievable F-values measured experimentally demonstrate the positive impact of gating optimization and the trend to provide more precise results at increasing channel number (l). doi:10.1371/journal.pone.0077392.g003 technologies. Current HDIM detection systems do suffer losses, for instance, from the coupling and from the masks between adjacent anodes. However, single photon detection [30], new designs of dispersive optics [9] and novel solid state detectors [31,32] -once integrated -will be capable of achieving or approaching the theoretical limits described by the photon partitioning theorem.
In this work, we have been concerned with the description of theoretical aspects of multi-channel and multi-parametric detection and with the validation of the theory by Monte Carlo simulations. Furthermore, we have demonstrated the practical application of the theory for the particular case of spectral imaging, a technique nowadays available and used in many laboratories. In order to make the advantages of these theoretical frameworks accessible, software can be found in the supporting information. These algorithms can be used to find the minimal number of detection channels that optimize fluorescence detection for FLIM or spectral imaging. Furthermore, provided that HDSSs are known experimentally or can be modeled theoretically, the best detection scheme including spectral, lifetime and anisotropy resolution can be identified. In general, Hyper-dimensional imaging microscopy will achieve the best results; however, whenever the gradient La=Lx is significantly steeper along one photophysical property (e.g., fluorescence lifetime, emission spectrum or polarization), simpler detection techniques may achieve or approximate the theoretical best performances. We note that fluorescence detection can be achieved with sequential excitation at different wavelengths [9,33]; although this case was not treated explicitly in this work, sequential measurements with excitation at a different wavelength can be regarded as independent detection channels and, therefore, the theory and software presented can be applied also on this type of data.

Conclusions
Over the last decade, much emphasis has been placed on methodologies and technologies that increase the spatial resolution of fluorescence microscopy. With this work, we aim to highlight that affordable implementations of novel detection schemes can boost the resolving power of modern microscopes in terms of physico-chemical or biochemical detection as well. The increased resolving power and capability to discern biochemical environments with spatio-temporal resolution together promise fundamental tools for applications in biophotonics, biophysics, systems biology and biomedical research.

Sample Preparation and Imaging
HeLa cells were transfected with EGFP-actin and EYFP-tubulin with Effectene (Qiagen) according to the protocol provided by the suppliers. Cells were fixed after 24-48 hrs with 5% PFA and kept in PBS. Spectral imaging was performed with a Leica SP5 equipped with a hybrid PMT operated in single photon counting mode.

Simulations
Simulations were performed with Matlab (MathWorks, Cambridge, UK) on a workstation equipped with an Intel Xeon X5647 CPU and 24 GB of RAM. Images of 256 by 256 pixels were generated with photons partitioned (i.e., histogrammed) on 2,048 channels. Synthetic images included 64 time gates spanning 12.5 ns, 16 spectral windows covering 440-630 nm and 2 polarization states (parallel/orthogonal to excitation light). These values were selected to match the properties of HDIM datasets that can be acquired with commercially available instrumentation (e.g., two spectral FLIM systems by Becker and Hickl GmbH (Berlin, DL) coupled with a polarizer beam splitter, unpublished data). HDSSs were modeled with gamma functions and polarization-dependent exponential decays [17] as shown in Fig. S2. Synthetic HDIM images were then generated by using the abundance matrix shown in Fig. 4 and then adding Poissonian noise in each simulated detection channel (250, 1000 and 10000 photons per pixel). One simulation was carried out with both fluorophores exhibiting spectral full width at half maximum of The typical solution of the unmixing problem (see Eq. (1)) was achieved by unconstrained least squares and non-negative least squares, although at a higher dimensionality compared to hyperspectral imaging. Results from unconstrained least squares are shown because the latter did not show better performances on synthetic data. In Eq. (1), iHDIM is a 2,048665,563 (channel number 6 pixel number) matrix, emHDSS is a 2,04862 (channel number 6 number of fluorophores) matrix and A is a vector of 2 numbers representing the relative concentrations of the two fluorophores. The two endmembers of the unmixing problem are the HDSS of the two fluorophores concatenated in the endmember matrix emHDSS. These two endmembers were estimated from the synthetic data by the first and last row of the dataset (see Fig. 4a) where fractional abundances were known to be equal to 1 and 0.

Figures of Merit for Accuracy and Precision
A figure of merit for accuracy (Fig. 4c) was defined as the average of the differences between the values of the fractional contributions of a reference fluorphore (f f ) and its respective estimated value ( f ): FoM a~Sf f x,y ð Þ{f x,y ð ÞT x,y . Each x-th column of the simulated gradients (Fig. 4a) contains 256 repeats of the same simulation; it was thus used to estimate the variance s 2 x on the estimation of the fractional contribution. A figure of merit of precision (Fig. 4d) was therefore defined as the average standard deviation across the gradient normalized by Poissonian noise Where N is the total photon count per pixel and s 2 x is defined by the following equation:

Proof of Photon Partitioning Theorem
Because fluorescence emission is a Poissonian process, if G i is the expected count, the probability p i to count n i photons in the i-th channel is: G i represents the expected photon counts in the i-th channel; G i is the integral of g x,s s ð Þ, the photon-detection probability; g depends on various spectral coordinates ( s s) and the parameter x: Here, x denotes the generic quantity we are interested to estimate. Therefore, the likelihood and the log-likelihood of the detection process over m independent channels are: log Lñ n, Under basic assumptions of regularity and differentiability, the analytical representation of the Fisher information can be obtained computing the negative of the expectation of the second derivative of the log-likelihood that describes the experiment: Functions that describe the fluorescence emission of fluorophores at room temperature are regular and smooth, thus this identity can be considered always true. Therefore, it is simple to show that: LG From this follows that: Because the expectation of E n i ð Þ~G i x ð Þ, the Fisher information in fluorescence detection is described by Eq. (8): LG i x ð Þ Lx In order to show that the Fisher information increases with the addition of photon counting channels, let p' be a finer partition of N photons relative to p. If we demonstrated that the partition of one channel into two independent noiseless channels generally result into an increase in Fisher information, then this will be true for more complex finer partitions. Let partition only one channel of the partition p into two independent noiseless channels (i.e., m' = m+1). It is also convenient to define the function a x,s s ð Þ, the photon partitioning function, that will describe the fraction of G m photons that will be counted in the new channel G' m : Therefore, the Fisher information carried by the two new detection channels will be equal to: Therefore, inasmuch as the second addendum in Eq. (10) is always positive or null, we have proven that partitioning the m-th channel of p into two channels, the Fisher information theory increases or remain equal. This is applicable for any other i-th channel. Therefore, for any given finer partition, the Fisher information will be larger or equal to the Fisher information related to p: Proof of the Non-trivial Partitioning Corollary Theoretically, detection channels with different (hyper) spectral characteristics can be combined to generate a large number of possible partitions of photons collected in any given experiment. The trivial duplication of a channel will not increase the Fisher information. A finer partition p' of N m'wm ð Þfor which I ' x ð Þ is strictly greater than I x ð Þ will be called non-trivial as it will increase the information content of the experiment on the unknown x. Provided a partitioning function a x,s s ð Þ with values bound in (0,1) such that the two new detection channels will count aG i and 1{a ð ÞG i , respectively, it is possible to infer the following necessary and sufficient condition from Eqs. (10): We note that the mere duplication of a channel is trivial by definition and exhibit null derivative of the partitioning function therefore resulting in no gain of information.

Partitioning Function: An Example
Although the partitioning function has such a fundamental role in the optimization of detection schemes, we have demonstrated that optimization can be achieved without analytical solutions, but simply applying the general principles described by the photon partitioning theorem and its corollaries. However, it is instructive to analyze the partitioning functions for the simple case of fluorescence lifetime detection in the case that the random variable x is the fluorescence lifetime (t) of the fluorophore. Let's consider two time gates delimited by the relative times t 1 .t 2 .t 3 . In this case, the only spectral property considered for the partitioning is time and the vectors s i is simply a number equal to t 2 , the common boundary of the two gates. The photon counts in each time gates are simply the integrals of exponential functions and the partitioning function (i.e., the fraction of photons detected within t 1 and t 2 over the total photons detected within t 1 and t 3 ) is: The derivative of a t,t 2 ð Þ is clearly defined and continuous for tw0 and t 3 .t 1 , conditions always true. In fluorescence detection at room temperature, spectra, anisotropy and lifetime decays are always represented by smooth and continue functions and, therefore, partitioning functions are always defined.

Proof of Photon-efficiency Corollary
The Fisher information is the inverse of the Rao-Cramer limit on the variance of an unbiased estimator. It is therefore rather trivial to demonstrate that for any non-trivial partitioning the following inequality holds: This implies that with an increasing number of independent non-trivial noiseless channels the estimate of the unknown will be measured with decreasing uncertainty. We can therefore generalize the concept of photon-economy (or photon-efficiency) that has been described for fluorescence lifetime imaging microscopy [2]. The signal-to-noise ratio with which is possible to estimate x increases with the non-trivial addition of detection channels. This can be described by the F-value, F~s x x {1 ffiffiffiffi ffi N p , i.e. the ratio between the relative error s x x {1 À Á that affects the estimate of x compared to Poissonian noise 1= ffiffiffiffi ffi N p À Á . It is rather immediate to show that when s' 2 x vs 2 x , F 'vF . Furthermore, a definition of photon-efficiency is p~F {1=2 [3]. For instance, if F = 1, p = 1 and the SNR s x x {1 À Á in x is the same defined by Poissonian noise ffiffiffiffi ffi N p À Á ; if F = 2, p = 0.25 and the SNR in x twice as large than ffiffiffiffi ffi N p . In order to recover the same SNR, four times the number of photon should be detectec and, therefore it is as if 75% of photons were lost (the photon-efficiency is equal to 0.25). Therefore, when s' 2 x vs 2 x , p' .p.

Proof of Separability and Resolving Power Corollary
It is difficult to establish a general criterion for the resolution of biochemical techniques. In practical cases, however, biochemical systems are characterized by a parameter x. This parameter may be FRET efficiency, the fraction of interacting molecules, an enzymatic activity, etc. The distance between two biochemical systems can be therefore quantified by the Euclidian distance between the two values of x: m = |x 1 2x 2 |. We can thus define the separability of two biochemical signatures by: For non-trivial partitions of higher channel densities, both the uncertainties s' 1 and s' 2 will be smaller or equal than s 1 and s 2 , therefore, the separability of the detection system tend to increase, i.e., S '$S. S can be used as a robust metric for the biochemical resolving power in fluorescence detection. However, it is possible to apply the typical criteria used to define the spatial resolution in microscopy. In the approximation of normal distributions, the Rayleigh criterion correspond to a separability S = 2 and with N photons measured for both x 1 and x 2 (assumption inherent in the Rayleigh criterion). Therefore, the smallest difference Dx that can be resolved around the value x = x 0 can be evaluated by substituting in Eq. (15) x 1 , x 2, s 1 = s 2 = s 0 (assumption inherent in the Rayleigh criterion) with x 0 +Dx/2, x 0 -Dx/2 and F 0 x 0 = ffiffiffiffi ffi N p , respectively, and solving for Dx: Dx~2 ffiffi ffi This definition can be written in a more elegant form, using the photon-efficiency p: At the net of its spatial resolution, the biochemical/photophysical resolving power (x 0 /Dx) of a fluorescence microscope is thus: As we have demonstrated that for a non-trivial partitioning of channels increases the photon-efficiency of a detection system (p' .p), it implies that the non-trivial partitioning of channels improves the biochemical/photophysical resolution Dx'vDx ð Þ and resolving power (R'.R) of fluorescence microscopy. For the effect of the instrument response function of an instrument, see Text S2-Biochemical Resolving Power.

Optimal Gates for Two Time Gates System: Analytical Solution
Li et al. have evaluated the analytical description of the F-value for a time gated system with two time gates of uneven width [13]. For simplicity, we will consider the analytical solution for two adjacent (non-overlapping) time gates only and redefine the parameters x, S and R in Eq. (2) in Li et al. [13] substituting x~e {r , S = 0 (i.e., non-overlapping gates), R~T=rt (i.e., the second gate ends at the laser period T and starts at time rt). Eq. (2) in Li et al. [13] is already an approximated analytical solution (see appendix in [13]), but for further simplicity we can compute solutions for values of T large enough T&t ð Þto avoid significant number of photons to be collected in subsequent laser pulses: Therefore, by a simple derivation of Eq. (18), it is possible to prove that the best (i.e., its minimum) value for F RLD{uneven is 1.24 for r~1:59.

Optimal Gates for Two Time Gates System: Monte Carlo Simulations
Mono-exponential fluorescence decays with t = 2 ns and 250, 1000 and 10000 photon/pixel were simulated, in the presence of Poissonian noise, to find the optimal configuration of a two time gates based system with uneven gate width. The time range between 0-50 ns was used in order to include all photons emitted by the simulated fluorophores. Simulated photon counts were binned into two gates and the F-value was estimated using 15,000 replicates of the simulation in order to estimate the precision of the technique. The beginning of the second time gate was scanned from 0 to 10 ns in 0.05 ns steps. The best F-value for all three photon-count levels was 1.24 at 1.59t.

Numerical Optimization of Fisher Information
In order to optimize Fisher information, it is possible to implement numerical iterative strategies that maximize the gain of Fisher information when creating a denser partition of detection channels. The Rao-Cramer lower bound for the precision of a measurement for a given partition is s 2 0~I {1 0 . As the number of channels is increased, the new lower limit is where the approximation is provided by the zeroth and first elements of the Taylor series; the gain of information DI ð Þ can be evaluated numerically by The addition of a channel may introduce noise s 2 C À Á : s 2 C can be either considered a generic cost, a numerical tolerance or it can represent actual noise (e.g., a photonloss, read-out noise, dark counts) introduced when adding a detection channel. Therefore, iterative algorithms for the numerical optimization of partitions can impose the requirement to allow a channel being partitioned into two new channels. s 2 0 cannot be always evaluated, but its lower limit is necessarily Poissonian noise, i.e. G i . Therefore, it is convenient to apply a stronger constrain to the numerical optimization by using the condition GwC with C~s 2 C G {2 i . The interpretation of the cost is straightforward, but it will be different for different applications (see also Material S1-Costs and photon-losses: case studies). For instance, if a single photon counting system is used, with maximum photon count rate (MCR) and a dark count rate (DCR), C is equal to (DCR/MCR)N 21 , where N is the total count measurable in a given amount of time. For a typical system with DCR,1 kHz, MCR,1 MHz and considering 100 photons counted, C*10 {5 . The numerical factor G was defined for the case in which photons initially detected by one channel are then partitioned into two channels one counting a fraction of photons and the second counting the fraction 1{a. In this case, an iterative algorithm using the criterion max p G p ð ÞD GwC can be used. The same algorithms can be used to evaluate the loss of information L when merging two channels L~{G ð Þ , thus using the criterion min p L p ð ÞD LvC to search for the best detection system. The software provided in Supplementary Material relies on the numerical maximization Fisher information by the analysis of G for time gating (Fig. 1) and by the analysis of L for spectral gating in order to offer examples and useful tools to optimize FLIM and spectral imaging with these bottom-up (increasing channel number) and top-down (decreasing channel number) strategies. Fisher information was evaluated numerically using directly Eq. (8). Substituting G i with f i N, where f i is the fraction of total photon number (N) detected by the i-th channel, it is simple to show that the F-value can be evaluated numerically and minimized using the . Optimization strategies with the direct minimization of F would be computationally expensive. Therefore, a second iterative algorithm that minimizes F was used only when the optimal number of detection channels was found maximizing G or L by the use of the criterion min P F p ð Þ. Contrary to the direct application of the photon partitioning theorem and its corollaries, the optimization of F is not always straightforward.
For instance, in the case of spectral imaging, the relative fraction of one fluorophore depends on the abundances of both fluorophores. In this case, the Fisher information is a matrix and its numerical evaluation can be carried out as: I x 1 ,x 2 ð Þ~I 1 I 12 Inverting the Fisher information matrix permits to find the covariance matrix: Many figure of merits can be defined to evaluate the combined error on the estimation of the relative abundances; for the sake of example, the software we provide uses the average of relative errors: Material S1 Matlab code for Fisher information analysis. The zipped folders contain three subfolders with software designed to optimize Fisher information numerically, making use of the photon partitioning theorem and its corollaries: N fpt_flim_fig1 provides examples of optimization of time gates for FLIM. This software was used to generate some of the panels in Fig. 1. Run fpt_optimize_time_gates.m to find the best partition. N fpt_spec_fig2 provides examples of optimization of spectral gates for spectral imaging. This software was used to generate some of the panels in Fig. 2. Run fpt_optimize_spectral_gates.m to find the best partition. N fpt_check_dimensionality_fig5 provides example of optimization for a generic system. It accepts two HDSSs as an input and it will provide Fisher information analysis for systems capable of fluorescence lifetime, spectral and anisotropy detection and all combinations of these techniques. Run fpt_check_dimensionality.m to find the best detection system to separate two optical signatures. This code was used to generate Fig. 5. (ZIP)