Putting BASIL in a BLT: A Bayesian filtering method for estimating the fitness effects of nascent adaptive mutations

doi:10.1371/journal.pcbi.1013946

Fig 1.

Performance of the neutral decline and FitMut2 methods on simulated BLT data depends on the strength of selection.

Panels A–H show the results for the neutral decline method, and panels I–L show the FitMut2 results. Left panels A, B, E, F, I, J show simulations in the weak selection regime; right panels C, D, G, H, K, L show simulations in the strong selection regime. A, C. Mean fitness trajectories, true and inferred by the neutral decline method with low-abundance reference lineages. B, D. Selection coefficients of adapted lineages, true versus inferred by the neutral decline method with low-abundance reference lineages. Each symbol corresponds to a lineage; orange circles, black upward triangles and blue downward triangles represent true positives, false positives and false negatives, respectively; true negatives are not displayed for clarity. E–H. Same as A–D but for high-abundance lineages. I–L. Same as A–D but for FitMut2.

More »

Expand

Fig 2.

Observed lineage read counts deviate from those predicted by equation (1) both in simulated and real data.

A. Rescaled average read count at the next time point (ordinate) plotted against the read count observed at the previous time point r_k–1 (abscissa) for the weak-selection simulation. Light and dark gray points represent early (t_k = 16 generations) and late (t_k = 160 generations) time intervals. Lines represent least-squares best fits within the linear regime (see Materials and Methods). B. Error in the inferred mean fitness (inferred minus true) as a function of the read count of lineages that are used as the neutral reference. Shades are the same as in panel A. C. Same as panel A but for the strong selection regime. Early time interval is at t_k = 16 generations and late time interval is at t_k = 64 generations. D. Same as panel B but for the strong selection regime. E–G. Same as panel A but for three real BLT datasets: Levy 2015 R1 (panel E), Li 2019 Evo1D R2 (panel F) and Venkataram 2023 Co-evolution R5 (panel G). Early time interval is at t_k = 16, 7, 20 generations and late time interval is at t_k = 112, 133, 86 generations in panels E, F, G, respectively.

More »

Expand

Fig 3.

BASIL schematic.

A. Key quantities and processes. Each barcoded lineage i is characterized at the current time t_k by its size n_ik and fitness (selection coefficient) s_ik, which are unknown. Lineage size changes over time depending on the lineage fitness and the mean fitness of the population. At each sampling time point, we measure relative lineage sizes by counting sequencing reads with the corresponding barcode. B. At the first step of Bayesian filtering, we use the model of evolution to project the past belief distribution for the hidden variables n and s and obtain their current prior distribution. At the second step, we update this distribution based on the observed read count r_k. C. BASIL workflow.

More »

Expand

Fig 4.

Properties of the measurement process.

A. Barcode frequency estimated from barcode sequencing data (abscissa) versus the lineage frequency set experimentally (ordinate). Each point is a unique barcode. Error bars show one standard deviation of the mean. The 1-to-1 line is shown in gray. B. Variance of the read count versus the mean read count. Each point represents data from multiple barcodes at the same expected lineage frequency. Shade represents coverage. Solid blue line is the parabola given by equation (2). Other curves are described in the Materials and Methods and their fitted parameter values are provided in S2 Table.

More »

Expand

Table 1.

Comparison of BLT analysis methods on simulated data.

More »

Expand

Fig 5.

Performance of BASIL on simulated data.

Left panels A, B, E, F show simulations in the weak selection regime; right panels C, D, G, H show simulations in the strong selection regime. A, C. Observed barcode frequency trajectories colored by fitness. B, D. Mean fitness trajectories. Grey lines show the ground truth, circles show values inferred by BASIL. E, G. F1 score as a function of the confidence factor β. White circle denotes the maximal value. F, H. Inferred versus true selection coefficients of individual lineages. Lineages are called adapted with β = 3.3 (see main text). Each symbol corresponds to a lineage; orange circles, black upward triangles and blue downward triangles represent true positives, false positives and false negatives, respectively; true negatives are not displayed for clarity.

More »

Expand

Fig 6.

The effects of down-sampling on BASIL and FitMut2 performance.

We reanalyzed the Levy 2015 dataset (Replicate 1) using BASIL and FitMut2 after reducing sequencing coverage or sampling frequency. See Materials and Methods for details. A. Mean fitness trajectories inferred by BASIL on full and reduced data. B. Inferred selection coefficients for lineages identified as adapted by BASIL in both full and reduced datasets. The 1-to-1 line is shown in light gray. C. Distribution of fitness effects of lineages identified as adapted by BASIL in full and reduced data. Inset shows the tail of the distribution. D–F. Same as A–C but for FitMut2. G–L. Same as A–F but for reduced sampling.

More »

Expand

Fig 7.

Histograms of the measured distributions of fitness effects (mDFEs) of adapted lineages inferred by BASIL in nine BLT experiments.

Each panel shows a BLT experiment, as indicated. Gray lines are different replicates. In the top right corner, we provide a short-hand description of the strain (haploids are denoted HYi, diploids are denoted by DY) and environment used in each experiment. “C” and “N” refer to the likely limiting nutrient in the medium, -xd indicates the number days in the growth and dilution cycle, and “+Cr” indicates the presence of the alga Chlamydomonas reinhardtii in the evolution environment (see Tab C in S1 Data for additional details). Thick yellow line shows the mDFE of HY0 in C-2d (Levy 2015, Replicate 1) as reference.

More »

Expand