Skip to main content
Advertisement

< Back to Article

Fig 1.

Schematic and graphical representations of SCRaPL.

Here, we assume observed data consists of RNA expression and DNA methylation. 1A Schematic representation of the SCRaPL model. 1B SCRaPL’s graphical model, depicting the statistical dependencies between observed genomic data (Yij1 is RNA expression; Yij2 is DNA methylation), their associated latent variables (Xij1, Xij2) and feature-specific model parameters (μj, Σj). The additional parameter πj is specific to the noise model that is assigned to RNA expression data and captures zero inflation. Full details are given in the model description section in Methods.

More »

Fig 1 Expand

Table 1.

Summary of synthetic data experiments.

In all cases, latent means and standard deviations were set as μj1 = 4, μj2 = 1, σj1 = 3 and σj2 = 2. Unless otherwise stated, our simulations were based on: I = 60 cells, J = 300 features, 20% ZI rate on average for the expression data (πj = 0.20) and an average methylation coverage (nij) equal to 275 (sampled from a Uniform distribution with range [50, 500]) across cells and genes. When varying the number of cells, we use I ∈ {5, 10, 25, 50, 100, 200, 400, 800, 1600}. When varying expression ZI, we use πj ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 0.8}. When varying methylation coverage, we sample nij from Uniform distributions with ranges given by [5, 10], [10, 20], [20, 50], [50, 250] and [500, 1000]. Full details are provided in S3 Text.

More »

Table 1 Expand

Fig 2.

Plots summarizing differences in correlation estimation between SCRaPL, Spearman in Experiment 1 with synthetic data.

(2A) Estimated correlation difference from true correlation as a function of cells for SCRaPL, Spearman and Pearson. (2B) Estimated correlation as a function of true correlation for SCRaPL, Spearman and Pearson in synthetic datasets with 300 genes and 1600 cells. Each dot represents a gene and is color-coded based inference approach.

More »

Fig 2 Expand

Fig 3.

Summary of experiments on real data.

Figures summarizing most important points from synthetic and real data experiments. (3A, 3B) Bayesian volcano plots for mESC and mEBC data respectively. Scatter plot of posterior probability under the null hypothesis (in log scale) as a function of posterior median correlation. Each dot represents a feature and is marked with different color depending the method that labels it as a significant association. (3C, 3D) Venn diagrams summarizing detection rates for SCRaPL, Pearson and Spearman in mESC and mEBC data. By accounting for different sources of noise it detects a large set of features identified by frequentist alternatives. SCRaPL also uncovers a additional large set that would be impossible for frequentist methods to identify in a robust way.

More »

Fig 3 Expand

Fig 4.

SCRaPL’s behavior compared to Pearson/Spearman correlation in micro and macro scale.

In all figures apart from 4D the scatter plot depicts raw data for chosen features color-coded by CpG coverage, and normalized expression plotted in the log(1 + x) scale. The violin plots depict the posterior correlation densities estimated by SCRaPL for the raw data in their left hand side. (4A) Example where both SCRaPL and Pearson/Spearman identify the feature’s association as significant. (4B) Example were only Pearson/Spearman identifies the feature’s association significant. (4C) Example were only SCRaPL identifies the feature’s association significant. (4D) Scatter plots to demonstrate the negative/positive relationship between alternative correlation estimates and CpG coverage/% zeros in expression respectively. ( and ρprs in Fig 4D are posterior mean and Pearson correlation for feature j.).

More »

Fig 4 Expand

Fig 5.

Cell label transfer from expression to accessibility data for raw 5A and SCRaPL 5B preprocessed data.

Visualization of sc-RNA and scATAC data on the same plot for raw 5C and SCRaPL 5D preprocessed data.

More »

Fig 5 Expand

Fig 6.

DIC difference between model with and without inflation for mESC and mEBC data.

The more negative the difference, the stronger the evidence in favor of the model with zero inflation on the gene expression component and vice versa. As a visual reference, zero is marked with dashed red line.

More »

Fig 6 Expand