Reproducibility in cytometry: Signals analysis and its connection to uncertainty quantification

Signals analysis for cytometry remains a challenging task that has a significant impact on uncertainty. Conventional cytometers assume that individual measurements are well characterized by simple properties such as the signal area, width, and height. However, these approaches have difficulty distinguishing inherent biological variability from instrument artifacts and operating conditions. As a result, it is challenging to quantify uncertainty in the properties of individual cells and perform tasks such as doublet deconvolution. We address these problems via signals analysis techniques that use scale transformations to: (I) separate variation in biomarker expression from effects due to flow conditions and particle size; (II) quantify reproducibility associated with a given laser interrogation region; (III) estimate uncertainty in measurement values on a per-event basis; and (IV) extract the singlets that make up a multiplet. The key idea behind this approach is to model how variable operating conditions deform the signal shape and then use constrained optimization to “undo” these deformations for measured signals; residuals to this process characterize reproducibility. Using a recently developed microfluidic cytometer, we demonstrate that these techniques can account for instrument and measurand induced variability with a residual uncertainty of less than 2.5% in the signal shape and less than 1% in integrated area.


I. INTRODUCTION
In the last 30 years, cytometry has evolved as a powerful technique for clinical diagnostics, drug development, and biotechnology [1,2].This success has recently motivated fundamental questions and studies designed to better understand the ultimate capabilities of cytometers, as well as their potential for quantitative, reproduceable, and even traceable measurements [3][4][5][6].However, in-depth uncertainty quantification (UQ) has yet to be fully realized, limiting efforts to refine the metrology aspects of cytometry.
In this context, signals analysis is a challenging and often overlooked task that has a significant impact on uncertainty.For example, conventional flow cytometers assume that individual events are well characterized by simple properties such as the signal area, height, and width [7][8][9][10].However, such approaches discard the vast majority of information in the measurement.Moreover, cytometers only interrogate a particle once per laser region, so that population variability is inherently convolved with other sources of uncertainty.As a result, it is difficult to determine whether an event corresponds to a valid measurand, characterize reproducibility on a per-event basis, and ultimately, quantify confidence in the measurement process.
The goal of this manuscript is to address such problems by incorporating UQ directly into the signals analysis.We propose a collection of techniques that use scale transformations to identify sources of variation in events arising from changes in particle speed, size, and bright- * paul.patrone@nist.govness.The main idea behind these analyses is to represent signals in terms of low-order mathematical interpolations and define the relevant transformations in terms of generic physical models.We then use constrained optimization to map all events onto one another, which also yields the physical parameters (e.g.size) associated with the particles.Using a recently developed microfluidic cytometer (see Ref. [11]), we illustrate how this analysis can: (i) quantify phenomena such as noise in flow conditions and sample variability; and (ii) estimate the per-event reproducibility associated with all remaining sources of uncertainty.We also demonstrate that this analysis provides a foundation for more advanced signal processing techniques by using it to extract the individual singlets comprising a multiplet signal, i.e. an event composed of multiple overlapping singlets.
The need to incorporate UQ directly into signals analysis arises from several issues unique to cytometers.For example, cells are only measured once per laser region; moreover, it is difficult to ensure identical flow and optical conditions for all measurands.Thus, it is impossible to directly characterize repeatability and reproducibility of any measurement.Signals analysis and mathematical modeling therefore take on new roles, since they allow us to answer the questions, "how does variation in measurement conditions deform the signals, and how can these deformations be undone?"In this way, the theory approximately reconstructs an imaginary scenario in which one could repeat measurements on identical particles, thereby overcoming experimental limitations.
Multiplets provide a similar motivation for our analysis techniques.Because engineering solutions alone cannot prevent multiplets [12][13][14][15], common practice discards their information.However, this introduces uncertainty into population variability estimates [12,14].Perhaps worse, common data analysis strategies that rely on integrated areas cannot distinguish dim doublets from bright singlets, for example.This leads to an uncomfortable situation in which even the identity and number of measurands have potentially unquantifiable uncertainties.Again, we demonstrate that signals analysis is an appropriate avenue to address such problems, since it allows us to extract individual singlets by understanding how they combine to make the multiplet.More generally, these examples illustrate that signals analysis is a useful tool for untangling sources of uncertainty that become intertwined by the measurement process.
This example of multiplet deconvolution highlights another theme of our work: there is a natural feedback between UQ and signals analysis that must be exploited to realize the full potential of cytometry.While it is obvious that uncertainty estimates arise from data analysis, we also illustrate the reverse: new data analyses are enabled by UQ.For example, we recast multiplet deconvolution as the task of finding the most probable singlets that reconstruct the original multiplet.Thus, estimating the uncertainty in the shape of singlets is a prerequisite step.Similar arguments apply to the related but distinct problem of multiplet detection.More generally, UQ can inform data analysis by ensuring that results are physically meaningful. 1ecause the present work develops new analysis tools, it is useful to perform validation measurements on particles with well characterized properties.For this reason, the examples below are restricted to fluorescent microspheres.However, extensions to cell-based measurements are straightforward, if not trivial, and we point to relevant issues throughout.We note also that the current manuscript does not consider issues associated with gating strategies and quantification of population variability per se.Such topics are left for future work.
The reader should also note that while our analysis strives to be as general as possible -for example, we need not specify a form of the laser profile -we make certain assumptions that may not apply to all cytometers.These relate to uniformity of the laser profile perpendicular to the flow direction and light-collection geometric factors.Importantly, these assumptions allow us to formulate the scale transformations as relatively simple, closed-form expressions, thereby facilitating downstream numerical optimization.An alternative would be to tailor our approach to a specific cytometer through more detailed modeling, but this may introduce significant computational overhead.Such issues are discussed in Sec.V.
The rest of the manuscript is organized as follows.Section II presents the main assumptions of our analysis in the context of a generic model of a cytometry measurement (II A); derives the associated scale transformations (II B); and formulates the optimization problem needed to undo deformations associated with variable measurement conditions (II C).Section III validates these methods using experimental data.Section IV highlights the usefulness of this analysis for UQ (IV A) and multiplet deconvolution (IV B).Section V considers this work in the greater context of cytometry and signals analysis thereof.An appendix presents technical aspects of the multiplet deconvolution.

II. MOTIVATION AND MAIN IDEAS
A key observation motivates our work: under generic conditions and for a given cytometer, the shapes of all signals are identical up to a set of linear transformations that depend only on the particle properties.As will become clear, these transformations can be interpreted as changes of units that bring the numerical values of different measurements into agreement.This provides a quantitative framework for comparing events, e.g. to determine if they are valid measurands, characterize their properties, and estimate variation about mean behavior.The goal of this section is to specify the conditions under which this observation holds and develop the tools needed to compare signals.
A key challenge in formulating this analysis is that the measurements are given in a form that does not permit direct comparison.Part of our task is therefore to convert the signals we can acquire into ones we can use.The multitude and complexity of steps involved motivates a separation of the analysis into three subsections.In Sec.II A, we propose a model that characterizes deformation in signals arising from differences in particle speed and size.Critically, the particles are assumed to have the same trajectory.In Sec.II B, we develop the machinery to convert signals acquired over the same time interval (which is experimentally feasible) to ones having the same trajectory length.Section II C combines these results to solve the backwards problem: determine the particle properties by undoing the signal deformations.

A. Physical Effects Causing Signal Variation
Consider the interplay between physical, chemical, and biological processes during a cytometry measurement.We take an event to be the fluorescence signal f (t) collected as a function of time t while a bead or cell passes a laser excitation region; see Fig. 1.For particles moving with a constant velocity and parallel to the fluid flow, a model of this process is where C(x, y, z − vt) is the concentration of fluorophores on the particle, Ψ(x, y, z) is the laser light intensity, FIG. 1. Schematic of the signal generation process associated with a cytometry event.As a bead or cell traverses the laser profile, the concentration of fluorophores and object shape convolve with the laser profile to create fluorescent light.This light is converted to a voltage by a photodetector.See Eq. ( 1) for a mathematical description of this process.Conceptually we can imagine the signal as beginning and ending when the particle reaches an initial and final position as it moves with the fluid.We refer to this as a constant-trajectory signal.In practice, it is easier to acquire a signal over a constant time interval, i.e. by approximately centering the peak in a time window of fixed duration, without regard to the distance traveled.
Φ(x, y, z) is the amount of emitted fluorescent light (per unit laser light and fluorophore number) coming from (x, y, z) and collected by the photodetector, v is the constant advection velocity of the particle, D L is the spatial domain over which the signal is generated, and z is parallel to the direction of flow, i.e. the axial direction.The goal of our analysis is to quantify how f (t) changes as a function of the bead radius R, velocity v, and concentration C, so that we can then solve the reverse problem of determining these parameters from f (t).
We first simplify Eq. (1).Considering, for example, the microfluidic cytometer in Ref. [11], we assume that Eq. ( 1) can be reduced to where ψ(z) depends only on z and accounts for both the laser and collection optics.In an abuse of terminology, we henceforth refer to ψ(z) as the laser profile.Equation ( 2) is valid for a wide range of operating conditions.For example, letting ρ = (x, y) denote the radial direction, Eq. (2) applies when: (i) differently sized, radially symmetric (in ρ) particles on a single streamline encounter a laser profile and geometric factors whose product is linear in ρ; and (ii) arbitrary shaped particles on any streamline encounter a laser profile and geometric factors that are independent of ρ.Either case may happen when the particles are small relative to the laser interrogation region.2See Ref. [11] for relevant details associated with the device studied in this manuscript, including in particular the focusing strategy used to ensure that particles do not cross streamlines.
We model C(x, y, z) as a step function of the form where c is a constant fluorophore concentration and the Heaviside function is defined such that Θ(r) = 1 if r > 0 and Θ(r) = 0 otherwise.Equation (3) corresponds to a bead or cell with a uniform concentration of fluorophores throughout its volume.Through appropriate modification of the Heaviside function this expression can describe, for example, surface concentrations.Moreover, Eq. ( 3) can be parameterized to model non-radial deformations associated with deformable cells, although such tasks are beyond the scope of this manuscript.
In light of these simplifications, Eq. (1) becomes where where we assume that the laser is fully contained inside the domain [−L 0 , L 0 ] for some length L 0 .For later convenience, we assume (without loss of generality) that L 0 satisfies the inequality where the domain [−a, a] is the smallest set for which ψ(z) > 0. Physically, inequality (5) implies an event in which we track a particle starting before it enters the laser and ending after it fully exits; the corresponding function f (t) is padded on the left and right by zeros. 3ecause L 0 is constant, the locations at which we start and stop tracking the particle are fixed.Taking t = 0 as the time when the particle is at the (arbitrary) "center" z = 0 of the laser, f (t) is defined on the interval [−τ, τ ], where τ = L 0 /v.Equation (4) implies that increasing or decreasing v "compresses" or "stretches" the signal in the time domain.Thus, we should be able to solve the reverse problem: estimate the relative velocities by undoing the deformations in such a way that the signals coincide.To achieve this, it is convenient to consider Fourier representations of the signals.We denote the corresponding and k = πn for any integer n.In practice, we assume that −M ≤ n ≤ M , where M is a mode-cutoff associated with the noise floor of the signal; see also Sec.III.Unless otherwise stated, all sums over k range from −M π to M π.Note that ψ(k) is unknown.
In light of these assumptions, Eq. ( 4) becomes where we have used the fact that To further simplify this, use inequality ( 5) to impose the periodicity relationship C(x, y, z − vt) = C(x, y, z − vt + 2nL 0 ), where n ∈ Z is an integer.This assumption does not change the signal on the domain However, it does allow us to invoke the convolution theorem [16], which yields Equations ( 6) and ( 9) illustrate how the measurand and measurement conditions affect the signal.Varying the concentration c and velocity v (which controls τ ) alters the height and width of the signal.Changes in particle size R have more complicated effects as characterized by Eq. ( 6), since a particle is convolved with a larger fraction of the laser profile as R increases.Taking the Fourier transform of Eq. ( 9) with respect to time (over the interval −τ to τ ) yields The inclusion of τ in the argument of f (k; τ ) is to emphasize the time-domain over which the transform is taken, which is important in the following sections.Letting f (t) and f 0 (t) denote test and reference signals, the laser profile ψ(k) is eliminated by taking the ratio where quantities with 0 subscripts are associated with f 0 (t) and the trajectory length L 0 is assumed to be the same for both particles.In the remainder of the manuscript, we assume that τ 0 = L 0 /v 0 , i.e. the reference particle defines the length and time scales against which we compare all events.Equation ( 11) is the key result of this section: it defines the shape of one signal in terms of another and as a function of the particle properties, provided the trajectories are identical.

B. Converting between fixed-time and fixed-trajectory representations
In practice, it is difficult to measure an event over a fixed trajectory length and variable time domain.It is more common for acquisition to occur in fixed time windows, which we assume to be [−τ 0 , τ 0 ].However, if τ 0 remains fixed, the corresponding trajectory lengths of each particles must vary, which violates the key assumption of the previous section, so that we cannot use Eq. ( 11). 5he resolution to this problem is to identify a mapping that converts a fixed-time Fourier representation to one on a fixed spatial domain.Observe first that in Eq. ( 10), the dimensionless frequency k is common to all signals; moreover, f (k; τ ) is linear in the particle-dependent time-scale τ = L 0 /v.Instead of f (k; τ ) we are given a measured signal fm (k; τ 0 ), where where f (k; τ ) and fm (k ; τ 0 ) are the constant-trajectory and constant-time representations of the event.The subscript m emphasizes that the latter is the measured signal.The time domains associated with the right and left sides of Eq. ( 12) are We temporarily assume that τ , τ 0 , and L 0 are known; in the next section we show how to find these quantities.Equation ( 12) does not itself specify the operator X.Its definition depends on how we wish to extend or truncate a signal when mapping it to a constant-trajectory domain.There are two equivalent perspectives one may take: active or passive.In the former, we deform (i.e.expand or contract) the time-series so that its constanttrajectory domain D v becomes D 0 .This collapses all time-series onto one another.In the latter perspective, we deform D 0 to become D v , keeping the time-series unchanged.While both approaches are equivalent, the corresponding derivations have subtle differences.
To better understand these issues, consider the case in which v > 1. Clearly D v is a subset of D 0 .In the passive approach, we are given fm (k; τ 0 ), i.e. the Fourier modes on D 0 , and wish to find the corresponding modes on the smaller domain D v .In this sense, X is a "domainrestriction" operator.Assume that inequality (5) holds, so that f m (t) → 0 as t → ±L 0 /v.Defining the restriction f r (t) = f m (t) for t ∈ [−τ, τ ], we take the Fourier transform of f r (t) on D v to find ) and the := symbol means that we define the left-hand side to be equal to the right-hand side.We adopt the convention that To arrive at this result via the active perspective, we scale the argument of f m (t) by a factor v 0 /v, so that Then, restricting f m (v 0 t/v) to the domain D 0 and taking the Fourier transform yields an estimate that we denote fr (k; τ 0 ) in a slight abuse of notation. 6To convert this to f (k; τ ) (i.e. the mode-weight on the domain D v ), we invoke Eq. ( 11) with R = R 0 and c = c 0 , which yields fr (k; τ ) = (v 0 /v) fr (k; τ 0 ).This derivation suggests the interpretation that X is also a "signal-expansion" operator, since v > v 0 .We leave it as an exercise to the reader to show the the corresponding matrix X is identical to that given by Eq. ( 13).When v < 1, the domain D v = [−τ, τ ] contains the interval [−τ 0 , τ 0 ], so that in the passive-interpretation, X is a "domain-extension" operator that extends f m (t) from D 0 onto D v .While there is no unique extension f e (t), we assume without loss of generality that f m (t) → 0 as t → ±τ 0 .7This together with inequality (5) suggests the definition Again, taking the Fourier transform on D v , one finds In the active interpretation, X is a "signal-contraction" operator; we leave derivation of the equivalence with Eq. ( 16) as an exercise for the reader.
Combining these results, we define f (t) on D v to be with f (k; τ ) being the corresponding Fourier transform.In both Eqs. ( 13) and ( 16), setting v → v 0 yields the identity transformation, as expected.
In practical settings, distinct signals f m (t) may not be identically centered on the domain D 0 (i.e.before applying X).However, Eq. ( 4) assumes that the center of any given particle is at z = 0 when t = 0, and moreover X deforms the signal about t = 0. Thus, it is necessary to consider transformations of the form f m (t) → f m (t+∆t).As before, assume that f m (t) is padded by zeros as t → ±τ 0 , so that cyclic permutations by sufficiently small ∆t only wrap zeros around the left and right sides of the peak.Assuming that f m (t) is periodic with a period of 2τ 0 , one finds that so that fm (k; τ 0 ) → fm (k; τ 0 ) exp (ik∆t/τ 0 ) under time translation.While the padding assumption is technically not necessary for the validity of Eq. ( 18), it ensures that subsequent transformations by X do not truncate nonzero parts of the signal.

C. Signal Matching
Assume that we have reference and test signals f 0 (t) and f m (t), both mapped to the interval −τ 0 ≤ t ≤ τ 0 and sufficiently padded by zeros on left and right.Both Fourier expansions use the same set of frequencies.Let k denote the vector of frequencies, and f 0 and f m denote the corresponding vectors of Fourier coefficients.Also define the matrix operators X, T, and S having elements where 19) and ( 20) derive from matrix analogues of Eqs. ( 13), (16), and (18).The quantity ĝ(k; R) is given by Eq. ( 6) and defines the relative transformation associated with changing particle radius.The vector analogue of f (k; τ ) is denoted f (k; τ ).
For noise-less signals and a known set of transformation parameters ∆t, R, R 0 , v, and v 0 , Eq. (11) implies that where The interpretation of Eq. ( 22) is straightforward.The product Λ f m (k; τ 0 ): (i) centers f m ; (ii) scales it to the domain D v ; (iii) matches the radius scaling to R 0 ; and (iv) normalizes the amplitude to the reference signal.The order of operations of the matrices mirrors the assumptions of the analysis above and cannot be changed. 8n practice, the transformation parameters must be determined from noisy signals.This motivates the objective where the square is interpreted as the sum over magnitude-squared of the elements of the complex vector argument.Ostensibly minimizing L as a function of the ∆t, R, R 0 , v, v 0 , c, and c 0 should yield their numerical values.However, many of these quantities only appear in Λ via the ratios v/v 0 , R 0 /L 0 , and c/c 0 .As a result, we can only determine their relative values.This reflects a deeper freedom that we have not yet exploited: the ability to arbitrarily pick the numerical scale of the coordinate system.For convenience, we pick c 0 = 1, L 0 = 1 (i.e. the reference particle traverses a distance of 2), and v 0 = 1, which implies that τ 0 = 1, with all units now being dimensionless.The latter two choices fix the time and length scales of the system, so that R, R 0 , and v are defined relative to the reference trajectory and velocity.We use this convention throughout the remainder of the manuscript.
With these choices, we define the true c, v, R, R 0 , and ∆t to be solutions to the optimization problem The objective L is highly non-linear and may have local minima.Thus, it is important to identify reasonable initial points as inputs to the optimization.While these issues are discussed at greater length in Sec.III, we note that simple properties of the signal such as its height, width at half max, etc. yield initial conditions that should be sufficiently close to optimal solutions to ensure convergence of the optimization.

III. VALIDATION WITH EXPERIMENTAL DATA
To validate the analysis presented in Sec.II, we consider a collection of 1302 events measured in the optofluidic device described in Ref. [11].The measurands are 15.3 µm diameter polystyrene microspheres with dispersed Dragon Green fluorophore. 9We use a high-speed camera system to manually verify that all events correspond to singlets.The fluorescence signals, denoted f m , are discretized on a 16-bit data acquisition card that records samples at 2 MHz.Thus, the f m are vectors whose entries correspond to the discrete time interval over which the data is collected.The digitizer outputs fluorescence values in units of volts; we divide these measurements by 1 V to non-dimensionlize.
As a preprocessing step, we use a moving average filter to smooth peaks followed by a polynomial fit peak finder to estimate the height h m , width w m , and the time interval of the location of the peak maxima for each event; see Ref. [11,17,18].The first two quantities are used later in the optimization.Using the third, we perform a cyclic permutation of the data (corresponding to temporal shifts) to approximately align the peak maximum with the midpoint of the time interval, which we rescale to be [−1 , 1].
Having estimated these quantities, we return to the original (non-smoothed) signal for subsequent analysis.Because the signals contain noise (likely from the photodetector and/or counting statistics of photons), we perform a discrete Fourier transform (DFT) to identify a noise-floor, which begins around the tenth mode, corresponding to M = 9 according to our indexing convention; see also Ref. [19].We take this value to be a cutoff for a low pass filter and also keep the complex conjugate weights, which correspond to the last 9 modes (the case k = 0 is its own conjugate mode). 10Given that the DFT yields a spectral representation that is a continuous interpolation of the data, we can directly use the corresponding mode weights and frequencies in calculations involving Eq. (24).
To test Eq.( 24), we pick a reference event f 0 at random from among those having a height and width roughly equal to the median value for this population.The specific choice of this event is unimportant, since the goal is to demonstrate that all signals can be collapsed to any chosen reference.Next, we choose N s additional samples f m and define the modified objective where R 0 is the unknown radius of the reference particle and (c m , v m , R m , ∆t m ) are the unknown transformation parameters associated with f m .In Eq. ( 25), m has been upgraded from a subscript to an index.Because the beads have on the order of 10 9 fluorophores per particle, we anticipate that variations in c m (e.g.due to shot noise) are negligible.Thus, we set c m = 1 for all values of m, so that it does not play a further role in the analysis.
In principle, minimizing L Ns with respect to the unknown parameters yields estimates of their values, as well as a "consensus" estimate of R 0 .However, several computational considerations limit our ability to use Eq. ( 25) directly.First, the objective function is nonlinear and contains on the order of 20N s terms and 3N s parameters.Second, the objective has rapid oscillations due to the forms of X(v) and g(k; R).Third, in the limit that R → 0 for R/R 0 constant, the ratio g(k; R 0 )/g(k; R) → (R 0 /R) 3 , so that L Ns develops a connected set of solutions.That is, only the ratio of R 0 to R can be estimated from L Ns .
A resolution to these problems is to consider a regularized version of the objective where 1 is a regularization parameter and R0 is an estimate of the particle radius given by outside sources of information.For the beads under consideration, manufacturer specifications indicate that the average radius is approximately 7.625 µm, which we use as the value for R0 in dimensional units.To convert to dimensionless units, we note that: (i) the digitizer samples at 2 MHz; (ii) each peak is 1200 samples long; and (iii) the average velocity as estimated from time-of-flight between two interrogation regions is 337 mm/s.Thus, the characteristic distance d c traversed by a particle during an event is The characteristic radius 7.625 µm normalized by d c and doubled (to account for mapping to −1 ≤ z ≤ 1) yields R0 = 0.0755.We also set = 10 −2 and fix N s = 5.
To estimate transformation parameters associated with the remaining curves, we minimize L for each of the remaining 1296 curves separately, using the consensus value of R 0 .The results of this exercise are shown in Fig. 2. The top subplot shows the original time-traces, while the bottom subplot shows the data collapse; the inset shows the relative errors.See also Fig. 3.We also compute the sample mean R and sample variance σ 2 R estimates of the particle radius to estimate the coefficient of variation (CV) given by CV R = σ R / R [20].We find CV R = 3.18 %, which is consistent with the manufacturer specified range of particle sizes (4.4%) and the number of measurements considered.

FIG. 2. Example of data collapse for 1302 time-series. Top:
The original curves to which we apply the scale transformations described in the main text.The bold-yellow time-trace is used as the reference.The time and fluorescence values are given in their original units, although subsequent analysis is done after non-dimensionalizing. Bottom: All 1302 curves after data collapse and conversion back to the original units.To achieve, collapse, we use Eq. ( 22) to turn each measured curve into a realization of f0(t).For the purposes of defining units in this and other figures, we adopt this active interpretation of the transformations.The inset shows the point-wise residuals between the transformed curves and reference.The difference is normalized by the maximum value of the reference.Note that the residuals have the characteristic frequency of the mode cutoff.

IV. APPLICATIONS
We now consider two applications of the analysis discussed in Sec.III: estimating per-event measurement reproducibility and doublet deconvolution.A key theme of these examples is that UQ enables one to extract otherwise inaccessible information from cytometry measurements.In particular, UQ allows us to resolve a uniqueness problem of identifying the most probable singlets that generate a multiplet.Our main task is to recast the previous results in a probabilistic framework and demonstrate how this leads to new kinds of signals analyses.FIG. 3. Absolute values of the mode-weights associated with the 1302 curves in Fig. 2 after transformation and collapse via Eq.( 22).The inset shows the signal-to-noise ratio, which was computed by: (i) using the first 10 modes to determine the transformation parameters; (ii) applying the transformation matrices to the first 100 modes of each signal; and (iii) computing the ratio of the absolute value of the mean to standard deviation of the resulting mode-weights.The dotted-red vertical line shows the mode cutoff, which occurs when the mode-weights approach a signal-to-noise ratio of unity.
A. Uncertainty Quantification Equation ( 22) implies that the optimal transformation parameters undo signal deformation due to effects such as particle speed, size, brightness, etc.Thus, each transformed signal Λ f m can be viewed as a realization of a measurement of identical particles, i.e. a version of f 0 .Treating these as independent and identically distributed random variables, we can estimate the per-event reproducibility in terms of corresponding statistical estimators.For example, if A j denotes the integrated area of the jth transformed signal, then reproducibility in area measurements is given in terms of a standard deviation σ A computed via where N e is the number of events.11 Figure 5 shows the results of this analysis for the 1302 curves considered in Fig. 2. The inset to the bottom plot of Fig. 2 indicates that the point-wise (in time and relative to the reference amplitude) reproducibility in shape of the filtered signals is on the order of 2 % or less.We make this last observation more precise by considering the Fourier modes of each realization of f 0 , which we denote by f 0,j .As a practical matter, it is easier to consider the real and imaginary parts of f0,j (k; τ 0 ) separately, since their fluctuations should be independent.Dropping explicit reference to τ 0 , we decompose f0,j (k) = fj,Re (k) + i fj,Im (k), where for k = 0; see Appendix A for more detailed motivation of this decomposition. 12There are N e such realizations of fj , where 1 ≤ j ≤ N e .The corresponding average (sample mean) weights and perturbations are f and ∆ fj .
We also use the fj to construct a covariance matrix Ξ, as well as the probability of a deviation ∆ f where the proportionality factor depends on the dimensionality and determinant of Ξ. Equation (31) characterizes uncertainty in the shape of a signal. 13Likewise, we construct a probability density where ∆χ T = (R − R, v − v) characterizes deviation from the average radius and velocity, and Υ is the corresponding covariance matrix.In the next section, we show how these probability densities play a fundamental role in the tasks of doublet deconvolution.

B. Doublet Deconvolution
As an illustration of how UQ can inform downstream analysis, consider the task of doublet deconvolution.In typical cytometry protocols, such data is identified indirectly via gating procedures and subsequently rejected.
We propose an alternate strategy based on constrained optimization.
First assume that a doublet d(t) is a linear combination singlet signals.In Fourier space this implies where the parameters c j , v j , R j , and ∆t j transform the reference signal into the corresponding singlets that comprise the doublet, and d and Θ are the representations of d and Λ in the same basis as ∆ f (cf. the Appendix).
We use the inverse Θ −1 , since we are transforming from the reference signal to the measured signal.Generalizing Eq. ( 33) to an arbitrary multiplet m(t), yields Assuming that a sample is known to be a multiplet comprised of M singlets, the probability that the singlets have a given set of deformations ∆f 1 , ∆f 2 , ..., ∆f M before transformation to m can be expressed as Likewise, the probability of a set of transformation parameters is given by We assume that the true values of ∆ fj and ∆χ j are those that maximize the joint probability P M Q M .However, the singlets must reconstruct the multiplet exactly.Together this suggests the objective which we minimize with respect to the ∆ fj and ∆χ j , subject to the constraint given by Eq. (34).This optimization problem is performed over M(2M + 1) + 4M parameters, where M is the number of singlets comprising the signal, and M is the mode cutoff.In practice, however, this can be reduced to a 4M dimensional problem, since the constraint is linear in the ∆ fj .We leave details of the optimization to the Appendix.We also set c j = 1, consistent with Sec.III.
Figure 6 illustrates the results of this analysis applied to a synthetic doublet constructed from two singlets taken from Fig. 2. The analysis recovers the original signals to within roughly 5% accuracy or better point-wise in time.The actual transformation parameters associated with each particle are recovered to within approximately 2%. Figure 7 shows the relative errors (with doublets inset) associated with different peak spacings between the singlets.All reconstructions have point-wise errors of less than 10%, even for closely spaced singlets.
Figure 8 shows the results of this analysis applied to two measured signals that were visually verified to be doublets.In the second event (bottom plot), one of the beads was likely out-of-focus of the camera, leading to a reduced visual signal.See Ref. [11] for more details.In both cases the analysis recovers singlets that are consistent with the synthetic examples shown in Fig. 6.While more work is needed to make this analysis fully robust for commercial settings, the examples provided herein indicate that the constrained optimization formulation is a useful tool for doublet deconvolution.FIG. 8. Deconvolution of two visually verified doublets.The inset shows false color images of the particles traversing the laser interrogation region.In the inset, the horizontal axis is parallel to the flow direction, and the vertical direction is parallel to the laser.The horizontal elongation is due to blurring.In both plots, two lobes are visible.The main figure shows the original signal (blue), as well as the reconstructed singlets (orange and yellow).The sum of singlets is superimposed in purple over the blue curve but is visually indistinguishable.In the bottom plot, one particle is likely rotated out of the focal plane, which would account for the differences in visual intensities relative to signal peaks.This assumes the geometric factors associated with illumination and light collection are constant for both particles; see Ref. [11] for the validity of this assumption.

A. The Role of Modeling in Cytometry
From a metrology perspective, the operation of a cytometer is at odds with the fundamental assumptions used to characterize reproducibility and uncertainty.The inherent separation of scales -thousands of distinct, micron-sized cells traveling meters per second in a complicated fluid-dynamic system -coupled with the native variability of biological systems means that it is challenging to repeat an independent measurement on the same particle in the same optical region.Moreover, as Fig. 2 demonstrates, even small variations in reference materials can lead to dramatic changes in raw measurement signals.Thus, a key challenge in characterizing the accuracy of a cytometer arises from an inability to design operating conditions that isolate individual sources of uncertainty.In other words, experimental analysis of uncertainty is difficult.
The modeling herein provides a distinct approach to this problem through synthesis, i.e. building up a prediction of the experimental result, taking into account the cumulative effects of multiple sources of variation.Under ideal circumstances, comparing this synthetic result with reality leads to a unique quantification of the physical phenomena (e.g.particle size, speed, etc.) that generate each signal.Any remaining variation that we cannot account for is then treated as a reasonable proxy for reproducibility.
This distinction between analysis and synthesis highlights both the potential roles and challenges of using mathematical modeling for device characterization.In particular, the model permits us to infer the relative magnitudes of coupled physical effects, thereby compensating for experimental limitations.But critically, the accuracy of these estimates relies on the validity of the underlying theoretical assumptions.For example, the assumption that the particles are spherical may not be appropriate for deformable cells, requiring revision of Eq. ( 7).Likewise, the properties of laser profiles considered herein may not apply to all cytometers.
These observations suggest a need for deeper coordination between UQ, signals analysis, and design of cytometers.While models can often be revised to account for increasingly complex phenomena (e.g.deformable cells, non-uniform lasers), computations invariably become too expensive to be useful.In such cases, theory can instead inform design changes that may be experimentally achievable and lead to improved accuracy through consistency with modeling assumptions.See also Ref. [11] for related ideas.

B. Further Applications and Open Directions
The analysis presented herein offers routes to solving several outstanding problems in signals analysis for cytometry.In particular, the ability to quantify reproducibility of a measurement suggests a task wherein one seeks to minimize this uncertainty as a function of operating conditions, e.g.flow velocity and degree of focusing, particle density, etc.Moreover, reproducibility estimates suggest the possibility for propagating uncertainty into populations studies so as to inform best gating and classification strategies.In this spirit, the work presented in Ref. [21] may be relevant.
Recent publications have also suggested the importance of cell deformability during the measurement process.Equation (4) makes a key simplifying assumption that requires modification in this case.To the extent that it is possible to parameterize deformation modes (e.g. in terms of spherical harmonics or empirically defined shapes), the transformations associated with Eq. ( 6) can be generalized to account for non-spherical particles.This ability to extract shape information from a relatively simple time-series could yield significant improvements in throughput relative to imaging cytometers while still characterizing more nuanced information about the cell status.
The problem of doublet identification is also largely unresolved.Traditional strategies address this task through subjective gating of populations.However, the distribution of Fourier spectra may provide more objective, probability-based methods.The left plot of Fig. 9 shows the envelope of spectra (red region) associated with the distribution of transformed singlets in Fig. 2. The right plot shows a collection of synthetic doublets; their corresponding spectra (normalized to the same area as the mean singlet) are shown relative to the singlet envelope on the left.Note that for almost any peak separation at least one mode associated with the doublet falls outside the admissible window for singlets.This suggests that signal shape may be an exquisitely sensitive tool for identifying events that are candidates for our doublet deconvolution algorithm.Such questions, however, are left for future work.

C. Metrics in the Context of Past Work
Signals analysis has been integral to cytometry since its inception [7][8][9][10]19].However, virtually all techniques characterize cells in terms of properties such as the signal area, height, width, etc., and the justifications for such analyses are not universally valid.For example, common practice dictates that forward-scatter (FSC) vs area or FSC-height measurements are appropriate for detecting doublets, as cells should nominally follow one another single file [12].But as Fig. 8 illustrates, cells may pass the laser interrogation region side-by-side or even at a diagonal to the flow direction.A recent study of inertial effects also suggests that typical flow focusing strategies are ineffective [11].Furthermore, biological processes of interest may interfere with measurements typically used to distinguish doublets [22].
Many of these problems arise from the fact that quantities such as signal area, width, and height are functionals of high-dimensional data (i.e. the full event time-trace) that produce a scalar.Thus, much of the underlying information that could be used to characterize measurands is lost before signals are actually compared.This suggests a need to revisit the order of operations, e.g. by directly comparing full signals before quantifying their differences in terms of simple descriptors.
To better understand this point, it is useful to consider the concept of a metric, which addresses the question: how "far apart" are the generic objects h and g?In FIG. 9. Left: Fourier spectra of synthetic doublets (dotted) compared with the uncertainty window for singlets (solid red).For a fixed mode number, the uncertainty window for singlets is the set of all mode weights between the minimum and maximum values computed in Fig. 2. Synthetic doublets are composed of two copies of a singlet (taken at random from Fig. 2) whose peaks are separated by 0 (green), 10 (blue), 20 (cyan), 30 (yellow), 40 (magenta), and 50 (black) microseconds.The doublet mode weights are divided by 2 to normalize the them to scale of the singlets.Note that after a separation of only 10 microseconds the doublet spectra fall outside the uncertainty window of the singlets.Right: The doublets whose spectra are shown on the left plot.Solid lines are the doublets, whereas dotted lines of the same color are the corresponding singlets.The colors have the same interpretation as on the left plot.Note that all doublets appear as a single peak and are visually difficult to distinguish from singlets.
other words, the metric, often denoted d(h, g), defines a notion of distance appropriate to the structure of h and g.While a complete treatment is beyond the scope of this work (see Ref. [23]), we note two fundamental properties: (i) d(h, g) ≥ 0 for any objects h and g, i.e. all distances are non-negative; and (ii) d(h, g) = 0 implies h = g.
While seemingly abstract, these observations have important ramifications for UQ of cytometry.Conventional analyses implicitly use the absolute value metric and related notions of Euclidean distance as the basis for comparing measurements, where P[f j (t)] is a functional that returns a scalar value (e.g.area, height) associated with the time-series f j (t).Since the dimensionality of P[f j (t)] is low, the statement d(h, g) = 0 applied to Eq. (38) implies that two distinct time-series may still be treated as equivalent.For example, a large, dim cell may yield the same integrated area as a small bright one, despite the objects being very different.Yet according to Eq. ( 38) both objects would be the same based on intensity alone.Moreover, such approaches make it impossible to separate effects (e.g.flow rate) that may change the signal shape while keeping scalar properties such as the height constant.Such shortcomings have limited UQ studies to those effects due to photodetectors and total uncertainties as characterized by population histograms [24][25][26][27].
The objectives considered herein address such problems by defining the metric in terms of the time-series explicitly; that is, Eq. ( 24) considers the situation d( f j , f k ) = || f j − f k || in which the notion of distance || • || is applied directly to the full time-series (expressed as its Fourier transform).Equation (24) in particular is a variation on the commonly used 2 or "sum-of-squares" metric [23].A benefit of this approach is that it leverages the full information content of each signal to characterize multiple sources of variation.However, the resulting analysis is more computationally expensive and does not have a simple interpretation.Nonetheless, the example of doublet deconvolution highlights the usefulness of such techniques, and we speculate that more advanced signals analyses in cytometry will require further development of appropriate metrics.

D. Limitations
The modeling framework described Sec.II A sets forth the minimum assumptions required for the validity of our analysis.A key goal of the theory is to avoid the need for detailed device characterization, which can be costly.However, our main assumptions on the laser profile and geometric factors may not be applicable to all cytometers.In such cases, using our analysis would decrease confidence in results by introducing additional modelform uncertainty.
In principle, such limitations can be overcome by more detailed modeling of the experimental system.The general structure of Eq. ( 24) can be maintained, although the transformation matrix Λ may not exist in closed form.Rather, it may be necessary to evaluate it on-the-fly in terms of a simulation or other computational model of the measurement process.In this case, a key challenge will be to formulate an optimization routine that can minimize Eq. ( 24) in a reasonable amount of time.Reduced-order modeling or computationally inexpensive approximations are possible routes for addressing such problems.In this appendix we describe the mathematical formulation and solution of optimization for multiplet deconvolution.For convenience, we restate the key equations underlying this problem.Specifically, the objective is given by ∆ fT j Ξ −1 ∆ fj + ∆χ T j Υ −1 ∆χ j (A1) whereas the constraint in the complex basis is m = j Λ −1 (c j , v j , R j , ∆t j )[ f + ∆ f j ], (A2) where f is the complex representation of f.Our goal is to convert Eq. (A2) to the same basis as f and then simplify the optimization problem implied by Eq. (A1).
Define Λ −1 j = Λ −1 (c j , v j , R j , ∆t j ).Next, note that a vector f has 2M + 1 elements.Because we compute them via a DFT, we take the convention that the first M + 1 modes correspond to k = 0, π, 2π, ..., M π, whereas the last M modes correspond to k = −M π, −(M − 1)π, ..., −π.Moreover, because the signals are real in the time-domain, we know that f (k) = f (−k) , where denotes the complex conjugate.This implies that all of the relevant information about f is contained in f, since only M complex Fourier modes are needed to describe the signal.To see this explicitly, express an arbitrary f as . . .
where a j is the jth element of f.To derive the transformed version of Λ in the basis of f, first decompose the matrix into blocks via where A is (M + 1) × (M + 1), B is (M + 1) × M , C is M × (M + 1), and D is M × M .Clearly A couples the first M + 1 modes of f into one another, B couples the remaining M modes into the first M + 1, and so forth.However, because the signal remains real in the timedomain after transformation by Λ, knowledge of A and B is sufficient to determine the transformation matrix in the basis of f.Let F denote that operator that reverse the order of columns in a matrix, T c (T r ) denote the operator that removes the first column (row) of a matrix, and 0 M denote a column vector with M zeros.Then it is straightforward to show that the operator Λ transforms to Minimizing Eq. (A1) subject to Eq. (A7) may entail optimizing over O(100) or more variables corresponding to: (i) the scale parameters c, v, R, and ∆t; and (ii) the real and imaginary parts of the mode-weights.The latter comprise the majority of variables, although they only appear up to second order.In contrast, the transformation variables, while few in number, appear in highly
FIG. 5. Histogram of integrated areas of the transformed signals in Fig. 2.

FIG. 6 .
FIG. 6. Example of doublet deconvolution.Signals associated with two known singlets were added together to make a synthetic doublet.Original signals are shown in dotted lines.The solid lines show the recovered signals after deconvolution (purple and yellow) as well as the filtered doublet (red).The inset shows the error in the recovered singlets relative to the original filtered singlets, normalized by the maximum values of the latter.