Fig 1.
Overview of the CytoPy framework. Single cell data and experiment/clinical metadata (1) are used to populate a project within the CytoPy database (2). The CytoPy database models analytical data in MonogDB documents (cylinder), and an interface of CytoPy classes retrieves and commits data to this database (dotted rounded rectangle). Utility modules perform regular tasks such as data transformations and sampling throughout the framework. The components of this interface can be used independently, but the recommended workflow is as follows: (3) autonomous gates identify a ‘clean’ population of interest from where to start analysis, (4) batch effect is visualised, quantified and corrected using the Harmony algorithm, (5) supervised and unsupervised algorithms classify cells into groups of similar phenotype, and finally (6) a feature space of cell population descriptive statistics is generated and feature extraction/selection methods deployed to identify a predictive signature that characterises an endpoint of interest.
Fig 2.
UMAP plots revealing batch effect in T cell staining of whole blood.
A reference sample (blue) is chosen as the ‘average’ sample in Euclidean space. A low dimension embedding of this sample is made using UMAP (other algorithms are available in CytoPy, e.g. PCA, PHATE, t-SNE) and samples for comparison are projected into this same space (red), demonstrating ‘drift’ in cell populations between patient samples. Each plot depicts results obtained with cells from an individual patient; numbers shown are unique patient sample identifiers.
Fig 3.
Number of events captured by autonomous gates for blood T cell subsets compared to the same subsets as defined by manual expert gates.
Each symbol depicts results obtained with cells from an individual patient.
Fig 4.
Batch correction using the Harmony algorithm.
(A) Single cell UMAP plots are coloured by cell origin, where each colour represents a unique patient. Shift in batch membership in the local neighbourhood of cells is shown by the change in the UMAP plot after Harmony is applied and by the shift in LISI distribution. (B) Cell population structure is conserved after correction as shown by the shape of latent variables UMAP1 and UMAP2, and the distribution of the cell surface markers CD4, CD8, the linear combination of p Pan-γδ and Vδ2 (to identify Vδ2+ γδ T cells), and the linear combination of CD161 and Vɑ7.2 (to identify MAIT cells).
Fig 5.
Percentage of blood T cell subsets as identified by XGBoost compared to the same subsets as identified by expert manual gates.
Each symbol depicts results obtained with cells from an individual patient.
Fig 6.
Meta-clustering results for FlowSOM (top) and Phenograph (bottom) when applied to blood T cells after batch effect correction with Harmony. Heatmaps show the normalised expression of cell surface markers for meta-clusters (clustered centroids of individually clustered patient samples). In the neighbouring UMAP plots, clusters from all patients are shown in the same embedded space and coloured by their meta-cluster membership. The size of each data point corresponds to the percentage of T cells this cluster represents in the patient it was derived from.
Fig 7.
Percentage of T cell subsets as identified by FlowSOM (top) and Phenograph clustering (bottom), compared to the same subsets as identified by expert manual gates. Each symbol depicts results obtained with cells from an individual patient.
Fig 8.
Leukocyte subsets as a fraction of CD45+ cells as identified by an XGBoost classifier (top), Phenograph clustering (centre) and FlowSOM clustering (bottom). Mann-Whitney U test were applied for comparisons between patients with acute peritonitis and stable controls, and p-values are reported after correction for multiple comparisons using Holm’s method (significance level was set as 0.05).
Fig 9.
Feature selection process to reduce variables for predicting acute peritonitis.
(A) Multicollinearity was addressed before generating linear models with redundant features removed prior to further analysis. (B) Principal component analysis shows that patients with acute peritonitis are discernible from stable controls. (C) L1 restricted modelling with a linear support vector machine reveals that neutrophils are the most predictive feature. (D) A simple cutoff applied to neutrophils is predictive of acute peritonitis in this cohort and is demonstrated by a shallow decision tree, where gini index is the chosen criterion for measuring the quality of split.