From noise to models to numbers: Evaluating negative binomial models and parameter estimations in single-cell RNA-seq
Fig 7
Technical-noise correction for heterogeneous capture efficiency reshapes the aeBIC model-selection landscape and reveals its impact on the accuracy of inferred bursting kinetics.
(a) Illustration of the differences between the standard and technical-noise-corrected models; (b) Phase diagrams produced by aeBIC model selection based on standard or corrected models. For both, the ground-truth model for observed data is the telegraph model with distributed according to the
distribution which has mean
and CV=
. The transcription rate ρ is fixed to 15; the maximum mean number of transcripts is
. The labels “Tele”, “NB” and “Pois” denote the regions selected using the aeBIC procedure with corrected models. The dashed lines demarcate the same regions but using the aeBIC procedure with standard models. The “Pois” area is divided into a white part (where both aeBIC procedures select the Poisson distribution) and a grey part (where the aeBIC with standard models selects the NB distribution while the aeBIC with corrected models selects the Poisson distribution). The heatmap shows the magnitude of the relative errors in the estimated burst frequency (
) and burst size (
) in the NB-optimal region (using the aeBIC with corrected models). The errors are computed using Eq (17) — note that this approach assumes full knowledge of the distribution of probability capture, an ideal case. In the plots,
denotes sample size,
is the sum of gene-state switching rates normalised by the degradation rate of mRNA, and
is the fraction time spent in the active state.