A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory

doi:10.1371/journal.pcbi.1006794

A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory

Fig 2

k-fold cross-validation: Inferred vs. actual molecules per cell of spike-ins.

(A) GR experiment data. Inferred (mean) values vs. actual values are plotted (symbols) for each spike-in molecule in each of 3 leave-out conditions: carbon-limited growth at rates of 0.12 (red), 0.20 (green) and 0.30 h^-1 (blue). (B) Ciona lineage specification data. Each symbol corresponds to the inferred value in each of 3 leave-out conditions: LacZ (red), Fgfr^DN (green), and M-Ras^CA (blue). Although in (A) and (B) each leave-out condition is plotted with a distinct symbol, a symbol can appear multiple times for some values along the x-axis, because these values are represented by several different spike-ins; i.e., among the 92 spike-in molecule there are 22 unique abundance values. (C) Measure of performance in three-fold cross-validation in (A). Mean Fold Error (MFE) is computed between inferred and actual molecules per cell. Symbols plot the average value, over 10,00 Monte Carlo trials, of the ratio MFE/MFE_syn versus the mean spike-in library size in the leave-out condition. Vertical bars span the mid 0.95 quantiles of MFE/MFE_syn values obtained in 10,000 MC trials for each leave-out condition. (D) Measure of performance in three-fold cross-validation study in (B) for the Ciona data.

doi: https://doi.org/10.1371/journal.pcbi.1006794.g002