Bayesian modelling of high-throughput sequencing assays with malacoda

doi:10.1371/journal.pcbi.1007504

Fig 1.

Diagram of MPRA.

MPRA simultaneously assess the transcription shift of thousands of variants. The diagram shows eight oligonucleotides for two variants (red and blue X’s) falling within different regions of genomic context (light red and light blue bars) with two barcodes for each allele of each variant. In practice the complexity and size of the oligonucleotide library is limited only by cost. A typical MPRA has tens to hundreds of thousands of oligonucleotides to assay thousands of variants. The oligonucleotides are cloned into a plasmid library in front of an inert ORF (brown). DNA sequencing of the plasmid library is used to count the input representation of each barcode, then RNA sequencing of the mRNA (green) is used to count the output RNA version of each barcode.

More »

Expand

Fig 2.

MPRA data and malacoda priors.

A) The table shows a subset of our primary MPRA data. The highlighted cell containing 759 barcode counts is influenced both by the sequencing depth of its sample (blue column) and the unknown input DNA concentration of its barcode (red row). B) A simplified Kruschke diagram of the generative model underlying malacoda. After evaluating the joint posterior on all model parameters, a 95% posterior interval on a variant’s transcription shift (shaded area) may be used for a binary decision between “functional” or “non-functional”. This example TS posterior shows a negative shift that excludes zero, meaning the variant in question would be called as “functional”. C) A conceptual diagram demonstrating three prior types available in the malacoda framework. The marginal prior (left) weights all variants in the assay equally, while the grouped and conditional priors utilize informative annotations as weights in the prior estimation process.

More »

Expand

Fig 3.

Simulation results.

A) The figure compares the TS values used to generate simulated data to TS estimates. Simulated MPRA assays use a varying fraction of variants that are truly non-functional (center). B) The average ROC curves used to assess the classification performance of each method across simulations with 3000 variants, 5% truly functional variants, and 10 barcodes per allele. The methods shown are malacoda (red), MPRAnalyze (orange), mpralm (green), QuASAR-MPRA (pink), MPRAscore (blue), and the t-test (purple) C) The average precision-recall curve for the same set of simulations D) Median performance metrics across multiple simulations under the same conditions as B.

More »

Expand

Fig 4.

Inter-method consensus.

A) A pairwise plot of TS estimate comparisons between methods in our primary MPRA dataset, showing that alternative methods generally agree with malacoda more than each other. Shaded values above the diagonal show the correlation values for the corresponding plot below the diagonal. Color below the diagonal indicates local density of points in over-plotted regions. B) A pairwise plot of TS estimates using both the marginal and DeepSea-based malacoda priors in the Ulirsch dataset, showing a similar outcome.

More »

Expand

Fig 5.

Luciferase validation results.

A bar plot showing the difference in normalized luciferase intensity for both alleles of rs11865131 (p = 0.032). Black error bars indicate +/- one standard deviation.

More »

Expand