Figure 1.
CCAST flowchart and analysis on a simulated dataset.
A Cytometry data represented by 3D scatterplot of simulated FCM data showing the expression of 3 markers across all cells. B Clustering analysis produces five cell types color coded and denoted as Cell-types 1, 2, 3, 4 and 5. C Initial CCAST decision tree generated showing subpopulations at the leaf nodes. Nodes 9, 10 and 11 contain a single cell type and are considered as pure subpopulations. Nodes 5 6 and 8 contain a mixture of cell types. D Final CCAST decision tree obtained after filtering the data by removing contaminating cells in nodes with mixed cell-types. This tree can be used for cell sorting or data analysis. E 2D scatter plot of original (unfiltered) data showing the 5 clusters color coded and estimated cut-offs with corresponding color-coded thresholds for sorting the 5 cell state populations. Note that the subpopulations can be sorted using only Marker 1 and Marker 2 even though three markers were initially used to identify the cell types. F Bar plot the three markers in each subpopulation derived using the final CCAST tree on the filtered data. G 2D scatter plot of the filtered data derived from CCAST showing the analysis derived from hierarchical (right) versus npEM (left) clustering are similar.
Figure 2.
Visualization of 13 markers across heterogeneous population of T-cells.
These 13×13 scatter plots show pair-wise distribution of 13 markers (unlabeled) per cell from pooled single cell data of 4 T-cell subtypes. Primary data was made publicly available by Bendall et al. [9].
Figure 3.
CCAST applied to single cell analysis of T-cells.
A The CCAST gating strategy based on the unlabeled T-cell data in Figure 2, post filtering, showing that 4 cell types can be derived using only Marker 5 and Marker 2 with Marker 5 as the root node. Split points along with the minimum-maximum range for each split point are provided at each node. B Histogram plots for sample split point for each node is obtained via bootstrapping. The multi modal nature of the distributions makes it difficult to calculate a true confidence intervals on the split point estimates. C CCAST result without filtering represented as a 2D scatter plot of the 4 cell types, which each cell type color coded; note that gating the yellow-colored cells will likely result in contamination of green-colored cells. D CCAST result with filtering represented as a 2D scatter plot of the 4 pure cell types, with each cell type color coded. Note all contaminating cells mixed with various clusters have been removed. For manual gating purposes, comparing the two schemes C and D provides a visual evaluation of the expected contamination levels from sorting subpopulations. E CCAST gating strategy for all Tcell types with labels reveals that the key gating markers are CD4 and CD45RA markers. F 2D scatter plot for the four, labeled T-cell types based on CD4 and CD45RA.
Figure 4.
CCAST applied to single cell analysis of B-cells.
A Silhouette plot showing evidence of 5 B-cell types. B CCAST gating strategy for B-cell types based on CD45, CD34, CD38, and CD123 markers using 3 levels of gating. The estimated ranges for the split point variables are provided at each node. Note Celltype 3 is distributed across three gated populations. C Cross classification heatmap of manually gated and CCAST predicted B-cell types indicates strong evidence that the most abundant Mature CD38low B-cells comprise a mixture of other subtypes (Celltype 2 and 4). D Heatmaps show evidence of the two derived distinct mature B-cell states corresponding to Celltypes 2 and 4 based mainly on CD123 (label highlighted in red).
Figure 5.
Signaling behavior in B-cell subtypes for CCAST vs manual gating strategy.
A Heatmap of BCR, IFNa, FTL3, IL3, IL7, and SCF induced intracellular signaling responses in 5 B-cell CCAST-derived subtypes, compared with those of an unstimulated control. B Heatmap of BCR, IFNa, FTL3, IL3, IL7, and SCF induced intracellular signaling responses in the five B-cell subtypes obtained from the manual gates in Bendall et al. [9], compared with those of an unstimulated control. The higher difference implies a stronger signal in the CCAST-derived cell type compared to the manually gated cell type.
Figure 6.
CCAST gating strategy on SUM159 breast cancer cell line.
CCAST gating strategy for SUM159 breast cancer cell lines isolates 5 pure cell states (across 9 bins) based on CD24 and EPCAM. Visualization of these 5 subpopulations is clearly not apparent from the biaxial side scatter (SSC) vs. biomarker plots. Split point estimates (dotted red lines) go through density contour plot (orange) on the distributed data providing visual evidence for suitable cut-offs through bimodal contours. Note the split point lines for nodes 3 and 4 concentrate on the zero point mass; this indicates there are several cells with zero expression values for EPCAM or CD24 staining but with higher expression values with respect to CD44.
Figure 7.
CCAST analysis on SUM159 breast cancer results.
A Results for the estimation process for all the split point statistics in all the inner nodes in Figure 6. The root node corresponding to EPCAM shows one local maxima and one global maximum. Gating the data from this global maximum results in 9 distinct subpopulations. Nodes 3, 4, 8, 9, 13 and 14 have clear natural maxima indicating optimal splits for the data into these 9 homogenous subpopulations (see Figure 6) corresponding to the 9 bar plots in B. B Bar plots of the 9 homogenous subpopulations from Figure 6 across all 3 markers with standard deviation intervals for each marker. The values on the bars on the left side of each plot correspond to the minimum value for all 3 bar heights. Each side bar gives a sense of the relative difference between bar heights. The main title for each plot shows the corresponding leaf node bin on the tree in Figure 6. Predicted Celltypes 3 and 1 correspond to P3, P4, P7 and P5, P6, P8 respectively indicating more homogeneous sub populations than expected. The bar plots show evidence of at least 5 distinct sub populations i.e. P1, P2, P5, P7 and P9. C Gupta et al. [3] gating strategy isolated 3 cell states (Basal, stem, and luminal) using EPCAM as the major marker. They further use CD24 to sort out these 3 states. We also automatically identify EPCAM as the major marker but use a combination of multiple splits from CD24 and EPCAM to produce 9 homogeneous bins. D Comparison of predicted breast cancer subpopulations comparing the CCAST versus Gupta et al. [3] gating strategy shows potential evidence of contamination after sorting. This analysis indicated the CCAST subpopulation P9 is clearly a mixture of basal, stem, and luminal subpopulations from Gutpa et al. [3]. Unique CCAST subpopulations P1 and P2 were not even identified by Gupta et al. [3].