Fig 1.
Framework of data analysis and model construction.
The first classifier was trained and optimized on the Astro-A dataset (red box) and tested on the Astro-B dataset whereas the second classification was performed by using the combined Astro A+B datasets (blue box) for training and validation and the Neuron dataset for testing. The balanced random forest model was trained for the binary (synapse vs. kinapse-like behaviors) and the 3-class (poking, scanning, dancing) classification.
Table 1.
List of features extracted and engineered from Imaris.
Fig 2.
Data distribution among datasets.
The number of classification samples for each dataset is shown according to the visually identified behaviors: poking (blue), dancing (yellow), and scanning (red). The total number of classification samples were 73,746, 67,127, and 49,004 for the Astro-A, Astro-B and Neurons datasets respectively.
Fig 3.
Distribution of synapse- and kinapse-like behaviors in datasets.
The number of classification samples for each dataset is shown according to synapse (aqua) and kinapse-like (lavender) behaviors. The total number of classification samples is indicated under each pie chart.
Fig 4.
Selected features for balanced random forest for each model in decreasing importance.
Features that were selected in the binary model (synapse/kinapse) or 3-class model (poking, dancing, and scanning) for the classifier trained on the Astro A dataset and tested on the Astro B dataset and for the classifier trained on the combined Astro dataset and tested on the Neuron dataset are listed from top (yellow boxes) to bottom (darker blue boxes) by order of importance. The grey boxes indicate the features that were not selected in the model.
Fig 5.
Performance evaluation of the classifier trained on the Astro A dataset and tested on the Astro B dataset.
Performance evaluation of the model for classifying synapse vs. kinapse-like behaviors (A, B) and dancing, scanning, and poking visually identified behaviors (C, D) using the Astro A dataset for training and validation and the Astro B dataset for testing. A, C) Average cross-validation (CV) accuracy according to the number of features included in the analysis. The doted lines represent the minimal and maximal values. The test set accuracy on the Astro B dataset is indicated as a red square for the number of features selected for the final classifier. B, D) Confusion matrices illustrating the predicted values in the Astro B dataset using the classifier trained on the Astro A dataset for each behavior either synapse vs. kinapse (B) or dancing, scanning, and poking (D).
Fig 6.
Performance evaluation of the classifier trained on the Astro dataset and tested on the Neuron dataset.
Performance evaluation of the model for classifying synapse vs. kinapse-like behaviors (A, B) and dancing, scanning, and poking visually identified behaviors (C, D) using the Astro dataset (Astro A + Astro B) for training and the Neuron dataset for testing. A, C) Average cross-validation (CV) accuracy according to the number of features included in the analysis. The doted lines represent the minimal and maximal values. The test set accuracy on the Neuron dataset is indicated as a red square for the number of features selected for the final classifier. B, D) Confusion matrices illustrating the predicted values in the Neuron dataset using the classifier trained on the Astro dataset for each behavior either synapse vs. kinapse (B) or dancing, scanning, and poking (D).
Fig 7.
UMAP visualization of CD8+ T lymphocyte behaviors using machine learning-selected features.
Features from the combined CD8+ T lymphocytes in Astro A, Astro B, and Neuron datasets (total of 189,877 samples) selected by machine learning either for the binary (kinapse vs. synapse-like behaviors) classifier (A) or for the 3-class behavior (dancing, poking, and scanning) classifier (B) and applied in the models in Fig 6 were used to plot on the UMAP. Selected features are listed in Fig 4.