Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples
The panels show the estimated frequencies of antigen-specific cells (large red dots) expressed as a percentage of all events (yellow boxes). These percentages were estimated using manual gating by a representative user (left), DPGMM (middle) and HDPGMM (right). Text in red in the first column shows the spiked-in frequency of retrovirally transduced T cells for the data sample in that row. The red polygons in the left panel are gates used for identifying antigen-specific cells by manual gating; the exact shape, sequence and location of these gates is determined by the operator and may vary between different operators depending on their training, experience and expertise. With the DPGMM approach, cell subsets across the samples from top to bottom are not directly comparable as indicated by the event colors, posing a problem for quantification of the same cell subset in different samples. In contrast, with the HDPGMM approach, cell subsets are aligned and directly comparable across all samples. HDPGMM is more sensitive at detecting antigen-specific cells when the frequency is extremely low (first 3 rows). HDPGMM is also more consistent in labeling events across different samples, while DPGMM is prone to detect likely false positive antigen-specific cells that are CD3-low or negative (arrows in rows 1 and 4 of middle panel). HDPGMM improves on the accuracy and consistency because the model incorporates both sample-specific and group-specific information, in contrast to DPGMM which only has access to sample-specific information. For both DPGMM and HDPGMM, model fitting was done with an MCMC sampler running 20,000 burn-in and 2,000 averaged iterations.