Skip to main content
Advertisement

< Back to Article

Fig 1.

Stochastic sampling leads to variation in observed overlap.

The members of two hypothetical populations are represented by blue and green circles, respectively. Each population has 16 members, and s = 5 are shared members of both populations. In two independent sampling experiments, shown in top and bottom rows, na = nb = 8 members are sampled at random from each population (dark circles) while the other 8 members are not sampled (transparent circles). Observation of the first experiment finds an overlap of nab = 4, while observation of the second finds nab = 0.

More »

Fig 1 Expand

Fig 2.

Inference and uncertainty using the posterior.

The posterior distribution over s is plotted for the realistic scenario of na = 47, nb = 32, and nab = 20 [line; Eq (6)]. The posterior mean provides our estimate of the true overlap [open circle; Eq (7)], and the interval accounting for at least 90% of the area under the posterior curve provides an equal-tailed 90% credible interval [shading; Eq (8)]. The estimate is shown for comparison [black cross; Eq (1)], and is typically less than or equal to .

More »

Fig 2 Expand

Fig 3.

Bayesian repertoire overlap consistently estimates true overlap.

Repertoires with true overlaps ranging from 0 to 60 were subsampled in simulations. As sampling rates increase from na = nb = 30 (left) to 40 (middle) and to 50 (right), the estimates of BRO (colored circles) approach the true values (dotted lines) symmetrically. Estimates from (crosses) approach the true values from below, systematically underestimating the true overlap. This bias is worse with lower sampling rates [7]. Similar results are found when nanb, and when the total repertoire sizes are different from each other (S1 Fig).

More »

Fig 3 Expand

Fig 4.

Credible intervals quantify uncertainty in overlap estimates.

By using Eq (8), 90% credible intervals are show above as error bars around the point estimates for varying true overlap s. As sampling rate increases from na = nb = 30 (left) to 40 (middle) and to 50 (right), credible intervals shrink, indicating a reduction in uncertainty. In expectation, 90% of intervals cover the true overlap (dotted line).

More »

Fig 4 Expand

Fig 5.

Reevaluation of published results.

In 2010, Albrecht et al. compared var repertoires from 5 populations using pairwise type sharing (see Refs. [18, 19, 27] for original data details). (left) Reproduction of analysis of [19], rescaled from [0, 1]→[0, 60]. (middle) Reanalysis using Bayesian repertoire overlap [Eq (7)]. For all boxplots, boxes span inner quartiles; center lines show medians; whiskers extend to 2.5 and 97.5 percentiles. (right) Histograms of Bayesian repertoire overlap distributions from Amele and Ariquemes clones (data identical to those in middle boxplots) colored by width of credible interval [Eq (8)], a measure of uncertainty. Differences in uncertainties are driven primarily by sampling rates: Amele samples average sequences per parasite while Ariquemes clones average .

More »

Fig 5 Expand

Fig 6.

Quantifying the decrease in uncertainty from increased sequencing.

Histograms show distributions of overlap estimates , computed using Eq (11), for various values of s which are indicated by color-matched dotted lines. While all estimates are distributed around the true values of s, increasing the number of colonies c from 48 (top) to 96 (middle) and to 144 (bottom) substantially decreases the error of estimates. For example the bottom plot shows that successfully sequencing c = 144 colonies from each parasite is guaranteed to produce estimates that are off by at most 5 (8.3%) in either direction of the true s.

More »

Fig 6 Expand