Fig 1.
Principle of split sample approach with AmpliconDuo filter.
DNA extracted from a sample is split into branches A and B. In each branch, an independent PCR and sequencing run is performed. Sequences occurring in both branches pass the AmpliconDuo filter (upper green sequence ACC… with 4 reads in A and 7 reads in B), while sequences occurring in only one branch are discarded (lower red sequence CCG…). Read numbers of both branches are retained for statistical analyses.
Fig 2.
Primer construct and amplification products.
The primers are composed of sequences specific to the sequencing platform (green), i.e. the P5 adaptor and the Illumina primer 1 for the forward primer and the P7 adaptor and the Illumina primer 2 for the reverse primer. Downstream follows a sample identifier starting with a poly-N (red) region and the custom defined primer (blue). In the reverse primer construct, the sample identifier was replaced by a poly-N region.
Fig 3.
Discordance plot showing significant deviations of eukaryote read numbers between split samples.
For each of the samples S an individual panel shows the logarithmically scaled pairs of read numbers (riAS, riBS) of unique sequences i in PCR branches X ∈ {A, B}. Red and black points correspond to, respectively, sequences with and without significantly deviating riAS, riBS (false discovery rate q ≤ 0.05 or q > 0.05, respectively).
Table 1.
Discordance measures for eukaryotic samples.
Table 2.
Discordance measures for prokaryotic samples.
Fig 4.
Effect of AmpliconDuo filter on spectrum of read numbers for eukaryotic data.
Columns A and B are experimental branches of the split sample, rows correspond to sampling sites. Number of sequences before and after AmpliconDuo filtering are plotted as black and orange dots, respectively. Both axes have logarithmic scales.
Fig 5.
Distribution of probability part of artificial random mutations.
Each dot corresponds to one part value computed for one experimental branch A or B according to Eq (6). In the plot, part values are binned in intervals of 1/30 of their total range. Eukaryotes and metazoans (first two columns) have both been analyzed with the same single-read protocol, and the mean part of these two groups are not significantly different. For the prokaryotic samples that have been analyzed with a paired-end protocol, we have a higher part.
Fig 6.
Effect of AmpliconDuo filtering on apparent eukaryote community similarities.
Comparison of samples with respect to Jaccard distances dkl, Eq (5), between sequence abundance vectors. Left panel: Sequences clustered at 100% identity. Right panel: Sequences clustered at 100% identity and excluding sequences observed in only one branch of a split sample (AmpliconDuo filter).
Fig 7.
Taxonomic composition of eukaryotic communities before (top) and after (bottom) AmpliconDuo filtering.
In the bog soil sample, many archaean taxa were captured by the broad eukaryotic primers used in this study. Archaea were therefore not discarded from the bog soil sample for this community comparison.
Fig 8.
Effect of AmpliconDuo filtering on chimeras for prokaryotic sample Pro2.
Chimeras defined by being recognized by UCHIME in de novo mode with score ≥ 1. Top: Frequency of chimeras in branches A, B of split sample as function of their read numbers, before (dashed lines) and after (solid lines) application of AmpliconDuo filter. Bottom: Fraction of chimeras passing the AmpliconDuo filter (ffiltered/funfiltered) for read numbers 1 to 20 in both branches A, B, and corresponding prediction P(riA, riB ≥ 1) using the Poisson model in Eq (8) with λi = 1, 2, …, 20.