Fig 1.
Compounding processes of metabarcoding that affect observed sequence patterns.
Observed sequences from metabarcoding are impacted by a suite of deterministic (navy) and stochastic (teal) processes. Here we focus on modeling the processes between extracted DNA and observed DNA sequences highlighted within the orange box.
Fig 2.
Non-detections driven by both DNA concentration and amplification efficiency.
The probability of non-detection (p(Y = 0)) is shown for a community of 50 equally abundant taxa with four different distributions of amplification efficiency across taxa. The amplification efficiency (Amp. Eff. (a)) distribution of each example is inset in the upper left of each panel. The amount of among-taxa variation in amplification efficiency varies from high variation (A; γ = 5) to moderate variation (B: γ = 10) to low variation (C: γ = 100) to effectively no variation (D: γ = 1,000,000). Both subsampling and amplification efficiencies influence the rate of non-detection. The probability of observing no DNA in a given technical replicate is highest at low DNA concentrations (<10 copies /μL). However, non-detections are possible for species with below average amplification efficiencies (in this case approximately ai = 0.7) and very likely (p(Y = 0) > 0.5) for amplification well below average (ai < 0.4).
Fig 3.
Observed reads and non-detections are a function of amplification efficiency and input DNA concentration in the mock community example.
For species observed within a replicate, we find that species with higher amplification efficiencies (αi >0.7) have a greater number of observed reads for an equivalent template DNA concentration (Panel A). We also find no difference in the total number of observed reads and increased DNA concentration, as expected for a compositional data set. Furthermore, we find a greater proportion of non-detections when both DNA concentration and amplification efficiencies are lower (Panel B).
Fig 4.
Observed reads and non-detections are a function of amplification efficiency and larval abundances in the CalCOFI example.
For species observed within a replicate, we find that species with higher amplification efficiencies (αi >-0.07) have consistently greater numbers of observed reads for an equivalent template DNA concentration (Panel A). We assume that the number of larvae in the jar is proportional to the number of DNA molecules present. We also find a greater proportion of non-detections when larvae are rare in the jars (Panel B).