Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples
For gating estimates, frequency estimates from 10 flow cytometry operators were collected. For both DPGMM and HDPGMM, 10 MCMC runs with unique random number seeds were performed to evaluate the reproducibility of antigen-specific cell frequency estimates. Estimates of the antigen-specific frequencies from manual, DPGMM and HDPGMM approaches are shown as open blue circles, with the blue bar representing the mean of all 10 estimates at each spike frequency. The red crosses represent the “true” frequency of antigen-specific cells combining the known spiked-in frequencies and the average background from 10 manual evaluations. As shown in the figure, HDPGMM (right panel) estimates have equal or less variability at every spike dilution when compared with DPGMM (middle panel). A linear regression fit (red line) shows that the standard errors and correlation coefficient of all 3 approaches are comparable. The number in red text above each set of estimates is the absolute value of (median of estimates – “true value”), a measure of accuracy. This shows that HDPGMM is more accurate than manual gating at every spiked-in concentration. The number in blue text below each set of estimates is the coefficient of variation (CV), which is lower for HDPGMM than manual gating for all concentrations except autologous sample only. For both DPGMM and HDPGMM, model fitting was done with an MCMC sampler running 20,000 burn-in and 2,000 averaged iterations.