scaDA: A novel statistical method for differential analysis of single-cell chromatin accessibility sequencing data

doi:10.1371/journal.pcbi.1011854

Fig 1.

scaDA method overview.

scaDA contains three components: Data model, bayesian shrinkage of , and iterative optimization of and .

More »

Expand

Fig 2.

Human Brain 3K analysis.

A. Clustering analysis of “Human Brain 3K” shows 14 annotated cell types as shown in the UMAP. B. Expression of gene markers used for cell type annotation. C. Distribution of EM initial estimates for three parameters in ZINB model, which include mean (μ), prevalence (p), and dispersion (ϕ) across all peaks for 8 cell types with more than 100 cells.

More »

Expand

Fig 3.

Power analysis for scaDA and other ZINB-based likelihood ratio tests.

Baseline values of three parameters are estimated in the cell type “Granule neuron” from “Human Brain 3K”. For differential peaks, we fix the parameter values in Group 1 and vary the parameter values in Group 2 with different effect sizes of selected parameters to create power curves. To create parameter values in Group 2, we multiply and divide an effect size in terms of log2 fold change from the baseline values for differential peaks in equal proportion. The log2 fold change of selected parameters is changed from 0.5 to 3.0 with a step size of 0.5. Using the parameter setting in the four scenarios, we assume 20% differential peaks and simulate the read counts based on ZINB for 4000 peaks across 100 cells in each group. A. Scenario 1: only mean difference between two groups B. Scenario 2: only prevalence difference between two groups. C. Scenario 3: only dispersion difference between two groups. D. Scenario 4: difference of all three parameters between two groups.

More »

Expand

Fig 4.

Power analysis for scaDA and published methods.

Baseline values of three parameters are estimated in the cell type “Granule neuron” from “Human Brain 3K”. For differential peaks, we fix the parameter values in Group 1 and vary the parameter values in Group 2 with different effect sizes of selected parameters to create power curves. To create parameter values in Group 2, we multiply and divide an effect size in terms of log2 fold change from the baseline values for differential peaks in equal proportion. The log2 fold change of selected parameters is changed from 0.5 to 3.0 with a step size of 0.5. Using the parameter setting in the four scenarios, we assume 20% differential peaks and simulate the read counts based on ZINB for 4000 peaks across 100 cells in each group. A. Scenario 1: only mean difference between two groups B. Scenario 2: only prevalence difference between two groups. C. Scenario 3: only dispersion difference between two groups. D. Scenario 4: difference of all three parameters between two groups.

More »

Expand

Fig 5.

FDR control analysis.

scaDA is compared to both ZINB-based LRT tests and published methods. Using the same simulation strategy in Scenario 4 (log2FC = 2.5), we assume 20% differential peaks and simulate the read counts based on ZINB for 4000 peaks across 100 cells in each group. The observed FDR is plotted against the nominal FDR level. A. scaDA is compared to ZINB-based LRT tests. B. scaDA is compared to published methods.

More »

Expand

Fig 6.

Parameter estimation.

Using the same simulation strategy in Scenario 4 (log2FC = 2.5), we assume 20% differential peaks and simulate the read counts based on ZINB for 4000 peaks across 100 cells in each group. A. True dispersion is plotted against the estimated dispersion from ZINB(μ, ϕ, p) and scaDA on a log scale plot. B. MSE calculated between true and estimated mean, dispersion, and prevalence from ZINB(μ, ϕ, p) and scaDA.

More »

Expand

Fig 7.

Human Brain 3K.

A. True Discovery Rate (TDR) is reported across 8 cell types for all methods at different levels of top percentages (i.e., 20%, 40%, 60%, 80%, and 100%) B. Power (TDR at 100%) across 8 cell types for all methods.

More »

Expand

Fig 8.

Human PBMC 10K.

A. True Discovery Rate (TDR) across 14 cell types for all methods at different levels of top percentages (i.e., 20%, 40%, 60%, 80%, and 100%) B. Power (TDR at 100%) across 14 cell types for all methods.

More »

Expand

Fig 9.

Human AD.

A. Mean True Discovery Rate (TDR) across all 6 cell types and pairwise comparison for all methods at different levels of top percentages (i.e., 20%, 40%, 60%, 80%, and 100%) B. Mean Power (TDR at 100%) across all 6 cell types and pairwise comparison for all methods.

More »

Expand

Fig 10.

GO analysis for AD microglia.

Gene ontology (GO) analysis is performed on differential peaks identified by scaDA and published methods. Microglia from “Human AD” is selected for the GO analysis.

More »

Expand

Fig 11.

GWAS enrichment analysis.

Microglia from “Human AD” is selected for DA analysis and identify differential peaks for each method. GWAS enrichment analysis is performed by evaluating the enrichment of AD-associated GWAS summary statistics in microglia-specific differential peaks. Results are presented in terms of logOR, CI and pvalue.

More »

Expand