Fig 1.
t-SNE visualization of a scRNA-seq dataset of human dendritic cells.
This dataset [10] is composed of 4 blood cell types and 2 batches, colored by either batch label (left) or cell type label (right).
Fig 2.
Adversarial Information Factorization’s architecture (AIF).
The model comprises three blocks: the CVAE in blue, the GAN network in pink, and the auxiliary network in green. x is the original cell’s gene expression, y is the true batch label, is the latent vector,
is the reconstructed cell’s gene expression,
,
,
are the predicted batch labels based on x,
and
respectively.
Fig 3.
t-SNE visualizations of the original and batch-effect corrected data.
The t-SNE is computed for the original data (1st column) and the methods’ corrected data (columns 2 to 9) on the three datasets’ log-normalized (rows 1 to 6) and raw counts (rows 7 to 10). The cells are colored with respect to their batch labels (odd lines) and cell type labels (even lines).
Table 1.
Comparison of the methods based on the clustering metrics computed on the full datasets.
Fig 4.
Evolution of the up and down-regulated DEGs’ F1 score with the log-fold-change threshold.
The results are shown for the highly variable genes (’HVG’) and all genes (’All’). The reported log-fold-change values are in base 2.
Table 2.
Comparison of the methods’ differentially expressed genes’ AUC score for the simulated datasets (all versions of Dataset 3 and Dataset 4).
Fig 5.
t-SNE visualizations of the original and batch-effect corrected data for the AML dataset.
The t-SNE is computed for the original data (top row) or AIF’s batch-effect-corrected data (bottom row) using a prior PCA. The cells are colored by batch label (left), cell type label (middle), or major cell type label (right).
Table 3.
Comparison of the clustering results on the AML dataset.
Fig 6.
Clustering relative performance per patient.
The clustering relative performance corresponds to the ratio between the corrected and original data’s performance for each clustering metric priorly scaled. Those metrics are based on Louvain clustering on each patient’s t-SNE embeddings with a prior PCA, using either the cell types (blue) or the major cell types (yellow). Each metric is computed for the cell type purity (CT), the batch mixing (B), and combining both (F1).
Table 4.
Average per-patient clustering relative performance.