Estimating virus effective population size and selection without neutral markers

doi:10.1371/journal.ppat.1006702

Fig 1.

Virus variants inoculated to pepper plants and sampling protocol.

(A) The five virus variants (in the gray box) were derived from the SON41p PVY clone and differed only at codon positions 101, 115 and 119 of the VPg cistron. These positions are shown in green if they correspond to the SON41p clone and in red if a non-synonymous substitution was introduced by site-directed mutagenesis. Single-letter amino acid abbreviations are presented below each position and PVY variant. Variant names and the corresponding binary code for the three point mutations of interest are given on the right of the sequences, with the binary code of the SON41p variant set to 000. The two additional possible variants, based on the three-digit binary code, are also shown at the bottom. (B) Sampling protocol for one pepper genotype. We inoculated 48 plants with the virus. Eight plants were sampled at each sampling time, from 6 to 34 days post-inoculation. The leaf circled in blue is the leaf inoculated with the virus. The leaves sampled are shown in red.

More »

Expand

Table 1.

Main notations for the observations and the model.

More »

Expand

Fig 2.

Five contrasting datasets obtained in the biological experiment.

Each line of bar plots represents the dynamics of virus variants in a single DH line over time: (A) 240, (B) 2430, (C) 2344, (D) 2321 and (E) 219. We inoculated 48 plants per DH line, and we sampled eight plants, which were subsequently removed from the experiment, at each of the six sampling dates (6, 10, 14, 20, 27 and 34 days post-inoculation). Within each bar plot, the frequencies of the five variants (see top of the figure for the color code) in each infected plant sample are represented by single bars (labeled from 1 to 48). The missing bars correspond to plant samples for which no viruses were detected. The last bar indicates the mean viral composition in the infected plants. Each individual bar plot corresponds to a single sampling date, indicated at the top of each column of barplots. The five DH lines displayed contrasting virus variant dynamics, consistent with contrasting patterns of selection and genetic drift. We could not sample plants of DH line 240 (A) 34 days post-inoculation, because severe necrosis symptoms invading the stem led to the death of all plants at this sampling date.

More »

Expand

Fig 3.

Contrasting datasets obtained in numerical experiment 1.

For each dataset (series A to D), the composition of eight populations was observed at six sampling dates, from 6 to 34 days post-inoculation, in independently sampled hosts. Within each plot, each bar represents the composition of the population in one plant at one date, and the last bar shows the mean frequencies over these populations. The color code at the top is used to distinguish the five variants. The harmonic mean of effective population size is indicated in the main title of each plot. The parameter values used for the simulations are: series (A) r = (0.971, 0.92, 1.09, 0.992, 1.027), , ; series (B) r = (1.05, 1.005, 1.077, 0.963, 0.904), , ; series (C) r = (1.045, 1.031, 1.12, 0.879, 0.924), , ; series (D) r = (1.105, 0.943, 0.999, 1.041, 0.912), , . Note that is used for the iterative computation of a sequence of effective population sizes varying each five generations during the systemic infection stage.

More »

Expand

Table 2.

Performance of the estimators of the harmonic mean of effective population sizes and variant fitness r obtained with the two numerical experiments.

More »

Expand

Fig 4.

Inferences for variant fitness r and for the harmonic mean of effective population size , for the 750 datasets simulated with five virus variants.

(A) Correlation between true r_i (x-axis) and estimated (y-axis) (all variants considered together). (B) Correlation between true (x-axis) and estimated (y-axis) (all sampling dates considered together, logarithmic scale). In both panels, the black line is the first bisector and the red dashed line is the best-fitting linear model. In panel A, the 9 points with correspond to datasets in which a highly counterselected variant was observed in only a few plants (5, on average, of the 48 plants) due to an initial low effective population size.

More »

Expand

Fig 5.

Goodness-of-fit of the Wright-Fisher model with the data of the biological experiments.

(A) Correlation between the observed mean frequencies of the five virus variants (averaged over all virus populations and sampling times () and their fitted values (n = 75). (B) Correlation between the logarithm of the observed mean (averaged over the variants) standard deviation of variant frequencies (between virus populations) at each sampling date t_d () and their fitted values (n = 87). In both panels, the black line is the first bisector and the red dashed line is the best-fitting linear model.

More »

Expand

Fig 6.

Fitness of virus variants and effective population size estimates for the 15 plant genotypes.

(A) Estimates of intrinsic rates of increase for each variant i for the DH lines 2321, 240, 2430, 2367, 2328, 2426, 2344 and 221. (B) Estimates of effective population size during the time course of the experiment for the DH lines listed in (A) and the model best supported by the data. (C) As for (A) for DH lines 2400, 2123, 2264, 2349, 219, 2256 and 2173. (D) As for (B) for the DH lines listed in (C).

More »

Expand