OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis

doi:10.1371/journal.pcbi.1003806

Figure 1.

An overview of the OpenCyto infrastructure.

When reproducing manual gating, raw FCS files and FlowJo workspace XML files are read into the R environment using parseWorkspace, creating a GatingSet object that represents the compensated, transformed and gated data stored in an ncdfFlowSet on disk. Cell populations annotated with gates can be visualized using plotGate, from the flowViz package Gating schemes can be visualized using plot. To perform automated gating, the user defines a csv representation of a gating tree, which is read by the OpenCyto package to generate a gatingTemplate object. This template can be applied to a GatingSet containing data, but no gates, provided the data uses the markers defined in the template. OpenCyto utilizes built-in automated gating methods, or external methods registered via a plug-in framework, to gate different cell subsets and populate the GatingSet with data-driven gate definitions for each sample. Manual and automated gating may be readily compared within a single framework. Cell populations and features can be extracted for further statistical analysis with other R and BioConductor software packages. Data (red boxes), software packages (blue boxes), framework functionality (gray boxes), and data flow/data structures (arrows/labeled arrows) are represented. flowCore, flowStats, and flowViz, are the core Bioconductor flow packages that benefit from the substantial infrastructure changes we have made to improve scalability and data visualization.

More »

Expand

Figure 2.

Comparison of a subset of manual gates and OpenCyto automated gates for a representative sample from the HVTN080 ICS data set.

The automated gates are data-driven. Each panel shows a corresponding manual and automated gate side-by-side. The left panel is the manual gate; the right panel is the OpenCyto data-driven gate. Parent population names differ between manual and automated gates for singlets and lymphocytes because the automated gating hierarchy differs from the manual gating by including boundary and boundary debris gates, respectively, before these populations. Starting at the top left and proceeding along the rows, the gates shown are singlets, live cells, lymphocytes, CD3⁺ T-cells, CD4⁺ and CD8⁺ T-cells, IFN-γ⁺ and IL2⁺ expressing CD4⁺ and CD8⁺ T-cells, and Granzyme B⁺ and CD57⁺ expressing CD8⁺ T-cells. The manual and automated gates are very comparable.

More »

Expand

Figure 3.

Comparison of OpenCyto automated gating and manual gating (performed with FlowJo and imported and reproduced in R using OpenCyto) for HVTN 080.

A) Box-plots of the paired differences (post-vaccination – baseline) in proportions of cytokine-producing cells from significant cell subsets identified by the linear model (see Supplementary Methods) for each stimulation condition, gating method, and vaccine regimen. Differences between baseline and post-vaccination are background-corrected (stimulated – non-stimulated). There were no significant differences between the observed distributions for manual or OpenCyto gating (paired Wilcoxon test). B) Scatter plots comparing manual gating vs. OpenCyto gating. The per-subject, background-corrected difference between vaccine and baseline is plotted for OpenCyto and manual gating, with concordance correlation coefficients shown for all stimulations.

More »

Expand

Figure 4.

Example of OpenCyto automated gates on the perforin channel for CD8⁺ T-cells for six randomly selected samples from the HVTN 080 ICS data set.

The perforin marker exhibits staining variability as evidenced by the varying width and position of the negative peak and was not gated by the manual template-gating approach. Despite this variability, OpenCyto data-driven automated gating is able to identify a reasonable threshold for perforin positive cells.

More »

Expand

Table 1.

Performance metric of OpenCyto on the flow cytometry and CyTOF data sets, on a single-processor machine with 8 GB of RAM.

More »

Expand

Figure 5.

The average frequency of expression across two CyTOF samples for cytokine-producing cell subsets from four T-cell maturational states.

Samples were stimulated with PMA-Ionomycin for 3 hours. Rows represent different maturational cell subsets (TN: naïve, TCM: central memory, TEF: effector, TEM: effector memory) and are clustered by Euclidean distance similarity. Columns represent different cytokine-producing cell subsets. The bottom legend defines the cell subset in a column. The legend is colored by degree of functionality of the cell subsets (light blue: degree 1, dark blue: degree 2, light green: degree 3, dark green: degree 4, salmon: degree 5, red: degree 6, orange: degree 7). The shading of individual blocks of the heatmap represents the average proportion of cells in the subset across the two samples, normalized to the total number of CD8 T-cells. Naïve cells have low polyfunctionality compared to effector, effector memory, and central memory cells.

More »

Expand

Figure 6.

The distribution of cells of each maturational state and their degree of functionality.

The majority of naïve CD8 T cells (TN) do not express any cytokines (degree of functionality 0) or are mono-functional, while effector memory cells (TEM) are the most polyfunctional of the subsets (peaking at degree 5). Short-lived effector (TEF) cells have lower polyfunctionality (peaking at degree 4), and central memory (TCM) populations tend to have a constant level of polyfunctionality from degree1 through degree 7. The area under the curve for each cell subset integrates to one. The y-axis is transformed by a hyperbolic-arcsine to facilitate visualization of differences between subsets at higher degrees of polyfunctionality.

More »

Expand