Deconvolving cell-type-specific gene expression profiles from bulk RNA-seq samples
Fig 2
BLUE achieves high correlations with ground truth in simulated and real-world datasets of PBMC.
a, The workflow for simulating bulk samples from scRNA-seq reference datasets. Training and validation pseudobulk samples were simulated from three publicly available PBMC scRAN-seq datasets. b, Scatter plot of predicted cell-type proportions vs. ground truth proportions for randomly generated 100 PBMC ctrl pseudobulk samples (blue dots) and 100 PBMC stim pseudobulk samples (orange dots). L1 error and CCC score are shown in each subfigure. c, Scatter plot of predicted cell-type proportions vs. ground truth proportions for real PBMC bulk samples with experimentally measured cell-type proportions serving as the ground truth. d, Heatmap (in scale) of predicted cell-type-specific gene expression profiles for three real bulk samples: 925L, 9JD4, G4YW. The left heatmap is cell-type-specific GEPs measured by bulk RNA-seq of FACS-sorted cell types. The right heatmap is the cell-type-specific GEPs predicted by BLUE. Each column is one GEP vector for one cell type in one sample. Rows are cell-type-specific DEGs ordered by their expression level within the cell type. The top 300 DEGs for each cell type are visualized. e, Heatmap (in
scale) of predicted cell-type-specific GEPs for the average of simulated bulk samples. Each column is the average of gene expressions of the same cell type from all 100 simulated samples under the same condition. Rows are genes differentially expressed among different cell types or between two conditions (ctrl vs. stim).