Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples
(Left) Row 1 shows independent fitting of DPGMMs to each data set; row 2 shows the use of reference posterior distribution from data set 3 to classify events in other data set; row 3 shows a DPGMM fitted to pooled data from all data sets; and row 4 shows fitting of an HDPGMM to all 4 data sets. Results are described in the text. Within each row, if two events are assigned to the same cluster, they are given the same color - it can be seen that clusters are aligned in Rows 2–4, but not in Row 1. All models used a truncated DPGMM base with 16 components, a burn-in of 10,000 iterations, and sampling of 100 post burn-in iterations for the calculation of the posterior distribution. (Right) Contour plots of the log posterior distribution. The HDPGMM distributions (Row 4) are most similar to the independently fitted distributions (Row 1), with the advantage that the small cluster in data set 3 masked by its larger neighboring cluster on top has a distinct mode. In contrast, the reference and pooled distribution strategies have the exact same distribution for all data sets and lack the flexibility to model sample-specific features.