A novel framework for inferring parameters of transmission from viral sequence data

doi:10.1371/journal.pgen.1007718

Fig 1.

Challenges arising in the inference of transmission bottlenecks from viral sequence data.

Circles represent idealised viral particles characterised by four distinct alleles. A. Reductions in population diversity cannot necessarily be attributed unambiguously to either a population bottleneck, or the action of selection. In the illustrated case, either a tight bottleneck without selection or a large bottleneck with strong selection could explain the change in the population during transmission. B. Straightforward statistics describing a population may generate misleading inferences of population bottleneck size. In the illustrated case, the genetic structure of a population is changed by a population bottleneck during transmission, but the frequency of each allele within the population does not change; an inference of bottleneck size derived from single-locus statistics would incorrectly be very large. C. Noise arising from the process of collecting and sequencing data is likely to produce differences between the observed populations, even in the event that the composition of the viral population was entirely unchanged during transmission.

More »

Expand

Fig 2.

A. Basic model of transmission. A set of haplotypes exists at frequencies q^B from which a noisy observation x^B is made. During a transmission event, a total of N^T viruses are transferred under the influence of selection S^T, establishing an infection in the next host described by q^F. Growth of the viral population within the host then occurs to produce the population q^A, influenced by genetic drift (characterised by the effective population size N^G) and selection S^G. Sampling of the final population gives the second observation x^A. B. Regions of the genome which are separated by recombination or reassortment are used to distinguish the effects of selection and a population bottleneck. Prior to transmission, the first region contains seven different genotypes spanning four variant loci whilst the second region harbours four genotypes covering three loci. As recombination between these two regions leaves them unlinked, selection acting on genotypes in one region has no impact on the fate of genotypes in the other region. Thus, where genetic diversity is reduced in the first region, the preservation of diversity in the second region attributes this change to the action of selection on the first, rather than a shared, and narrow, population bottleneck. C. Models of neutrality and selection are compared, as illustrated in this simplified diagram. Black dots represent observations x^B and x^A while the red dot indicates the inferred expected position of q^A. The solid line joining these (b,c) indicates the inferred action of selection, with dotted lines showing components of this vector (c). The blue circle represents the optimised variance in the position of q^A; the length of its radius, shown as a dashed line, is inversely related to the bottleneck size. In the neutral case, the difference between observations is explained by the bottleneck alone. More complex models of selection fit q^A more closely to x^A and with reduced variance, giving higher inferred values of N^T.

More »

Expand

Fig 3.

Influence of sequencing noise upon the ability to infer a population bottleneck size from genome sequence data.

Median inferred bottlenecks are shown, calculated on the basis of 200 replicate simulations for each point. In the left-hand plot, a value of 1 indicates a correct bottleneck inference; in the right-hand plot, the absolute inferred bottleneck size is shown. Simulations were conducted under the assumption of selective neutrality, with no attempt to infer selection from the data.

More »

Expand

Fig 4.

Inferred bottleneck sizes (N^T) for true bottlenecks N^T = {5, 10, 25, 50, 100}.

Results were generated by applying a neutral inference model to neutral simulated data. Inferences are shown for 200 simulations at each bottleneck size.

More »

Expand

Fig 5.

Median inferred bottleneck size from data simulating transmission with a single locus under selection of magnitude σ ∈ {0, 0.5, 0.75, 1.0, 2.0}.

Inferences were made using either a neutral inference model, in which the effect of selection was assumed to be zero, or a model incorporating selection, which allowed the presence of selection to be inferred. Median inferences are shown from 200 simulations for each data point.

More »

Expand

Fig 6.

True and false positive rates of selection inference from 200 simulations of transmission events from single- and three-replicate systems in which a single variant was under selective pressure for increased transmissibility of σ ∈ {0, 0.5, 0.75.1.0}.

True positives were defined as inferences for which selection was inferred for the selected locus in a system; false positives were defined as inferences for which selection was inferred at any neutral locus or for multiple neutral loci in the system.

More »

Expand

Fig 7.

Probability distributions of inferred selection coefficients from 200 simulations of transmission events with selective pressures σ ∈ {0.5, 0.75, 1.0, 2.0}.

Distributions were constructed for bottleneck values where the inference of selection resulted in a true positive rate for identifying selected variants of above 5%. Smooth kernel distributions were computed using a Gaussian kernel function defined on (0, 10) and Silverman’s rule of thumb [59, p. 48] for the bandwidth size. Distributions were scaled such that their integral across the kernel range equalled the true positive rate.

More »

Expand

Fig 8.

Median inferred bottleneck size from data simulating neutral transmission with the viral population undergoing either a single- or 22-fold increase in population size during within-host replication.

Inferences were made using our approach (termed ML, for multi-locus method), which allows for specifying different growth factors, and the method of Leonard et al. [12], (termed SL, for single-locus method). Each datapoint represents the median bottleneck, calculated over 200 replicate simulations.

More »

Expand

Fig 9.

Inferred fitness landscape for within-host growth using data from the HA190D225D dataset.

Viral haplotypes for which the inferred frequency rose above 1% in at least one animal are shown. Lines show haplotypes separated by a single mutation.

More »

Expand

Fig 10.

Histograms of bottleneck inferences for HA190D225D and Mut transmission pairs from 200 analysis seeds.

A replicate inference method was employed for the Mut transmission pairs such that a common fitness landscape was imposed. The Mut transmission pairs may take different bottleneck values and have been plotted as an overlapping histogram. Bottleneck inferences larger than N^T = 35 have been omitted for clarity.

More »

Expand