The authors have declared that no competing interests exist.
Conceived and designed the experiments: OR. Performed the experiments: OR. Analyzed the data: OR CF KK. Contributed reagents/materials/analysis tools: GD AM. Wrote the paper: OR CF KK.
A key priority in infectious disease research is to understand the ecological and evolutionary drivers of viral diseases from data on disease incidence as well as viral genetic and antigenic variation. We propose using a simulationbased, Bayesian method known as Approximate Bayesian Computation (ABC) to fit and assess phylodynamic models that simulate pathogen evolution and ecology against summaries of these data. We illustrate the versatility of the method by analyzing two spatial models describing the phylodynamics of interpandemic human influenza virus subtype A(H3N2). The first model captures antigenic drift phenomenologically with continuously waning immunity, and the second epochal evolution model describes the replacement of major, relatively longlived antigenic clusters. Combining features of longterm surveillance data from the Netherlands with features of influenza A (H3N2) hemagglutinin gene sequences sampled in northern Europe, key phylodynamic parameters can be estimated with ABC. Goodnessoffit analyses reveal that the irregularity in interannual incidence and H3N2's ladderlike hemagglutinin phylogeny are quantitatively only reproduced under the epochal evolution model within a spatial context. However, the concomitant incidence dynamics result in a very large reproductive number and are not consistent with empirical estimates of H3N2's population level attack rate. These results demonstrate that the interactions between the evolutionary and ecological processes impose multiple quantitative constraints on the phylodynamic trajectories of influenza A(H3N2), so that sequence and surveillance data can be used synergistically. ABC, one of several data synthesis approaches, can easily interface a broad class of phylodynamic models with various types of data but requires careful calibration of the summaries and tolerance parameters.
The infectious disease dynamics of many viral pathogens like influenza, norovirus and coronavirus are inextricably tied to their evolution. This interaction between evolutionary and ecological processes complicates our ability to understand the infectious disease behavior of rapidly evolving pathogens. Most statistical methods for the analysis of these “phylodynamics” require that the likelihood of the data can be explicitly calculated. Currently, this is not possible for many phylodynamic models, so that questions on the interaction between viral variants cannot be welladdressed within this framework. Simulationbased statistical methods circumvent likelihood calculations. Considering interpandemic human influenza A virus subtype H3N2, we here illustrate the effectiveness of these methods to fit and assess complex phylodynamic models against both sequence and surveillance data. We find that combining molecular genetic and epidemiological data is key to estimate phylodynamic parameters reliably. Moreover, the information in the available data taken together is enough to expose quantitative model inconsistencies.
Many infectious pathogens, most notably RNA viruses, evolve on the same time scale as their ecological dynamics
Historically, epidemiological time series data have been pervasively used to analyze hypotheses of hostpathogen interactions at the population level
More recently, coalescentbased statistical methods have been used to elucidate the disease dynamics of RNA viruses from molecular genetic data alone
Because of these limitations, we adopt a different statistical approach known as Approximate Bayesian Computation (ABC) to infer the phylodynamics of RNA viruses. ABC allows mechanistic phylodynamic models to be simultaneously fitted against both sequence and surveillance data. This method circumvents explicit likelihood calculations by simulating instead from the stochastic model that defines the likelihood
To demonstrate the utility of our approach, we consider the phylodynamics of interpandemic H3N2. We obtained weekly reports of H3N2 incidence in the Netherlands from 1994–2009 by combining influenzalikeillness (ILI) surveillance data with detailed records of associated, laboratoryconfirmed cases of flu by type and subtype
(A) Weekly ILI time series from the Netherlands, and estimated time series of influenza A(H3N2) from weekly virological data. Type and subtype specific time series were estimated under an additive Negative Binomial regression model; see
shorthand  summary  data  distance  summary values and distances 
weighting scheme 

Netherlands  France  USA  under the SEIRS model  under the epochal evolution model  

average 
log ratio  0.56%  1.9% (−1.26)  1.4%(−0.97)  Indicator (3)  Indicator (3)  







standard deviation in 
log ratio  1.68  2.78 (−0.5)  2.24 (−0.28)  Indicator (3)  Indicator (3)  







average duration of reported seasonal epidemics at half their peak size  time series 1994–2009  log ratio  3.2  4.54 (−0.32)  5.81 (−0.57)  Indicator (3)  Indicator (3) 







Pearson autocorrelation of case report peaks at a lag of 2 & 4 years  largest difference  0.07 & 0  0.06 & −0.27 (−0.27)  −0.06 & 0.23 (0.23)  Exponential (4)  Indicator (3)  






largest seasonal populationlevel attack rate  Ref. 
difference 
20%  Indicator (3)  Exponential (4)  




Exp clock  Lognormal clock  

average substitution rate per genome per year  phylogeny 1968–2009  log ratio 


Indicator (3)  Indicator (3) 







average pairwise diversity between any two sequences sampled in the same season  log ratio  15.4  15.13 (0.01)  Indicator (3)  Indicator (3)  







average 
phylogeny 1991–2009 
log ratio  23.6  26.7 (−0.125)  Indicator (3)  Indicator (3) 







largest 
log ratio  5.34  5.32 
Exponential (4)  Indicator (3)  




Distances between summaries derived from the first listed and subsequent data sets are given in brackets.
Weighting schemes differ across models to accommodate weak or strong inconsistencies; see also
The number of dated HA sequences available before
To perform phylodynamic inference and goodnessoffit analyses for complex phylodynamic models, we adopt a simulationbased approach that has become known as Approximate Bayesian Computation (ABC)
ABC methods circumvent computations of the likelihood
Phylodynamic hypotheses are formulated into evolving, dynamical systems models. We used a twotier model formulation whose genetic component is tied to its ecological component through the flows through the prevalence class. Existing knowledge on model parameters is incorporated through the prior
It is typically difficult to establish the sufficiency of phylodynamic summaries analytically, and instead a small set of summaries is chosen such that model parameters of interest can be estimated
ABC methods require that each phylodynamic simulation must run on the order of tens of seconds. To meet this computational requirement while still allowing for flexible modeling
symbol  description  prior density  mean 

posterior density under the SEIRS model  posterior density under the epochal evolution model  

Basic reproductive number  uninformative  3.03 
18.7 

effective reproductive number    1.26 
1.42 

Average incubation period in days  0.9  

Average infectiousness period in days  1.8  

Average duration of immunity in years  uninformative  9.8 
206 

Reporting rate  uninformative  0.15 
0.56 

Residual selection  Exponential slab with mean 0.007 & Gaussian pseudoprior centered at 0.09 
0.1 
0.04 

Inclusion probability of 
uninformative  1 
1 

Mutation rate, 
uninformative  1.32 
3.38 

Size of sink population  fixed to Dutch demographic data, 


Size of source population  uninformative  1.28 
2.9 

Birth/death rate in the sink population  fixed to Dutch demographic data  

Birth/death rate in the source population, 
1/50; average lifespan of 60 years adjusted by net fertility rate in South East Asia  

Seasonal forcing in the sink population  0.42 
0.35 


Seasonal forcing in the source population  0.01 
0.013 


Number of travelers visiting the sink population  8.5 
9.9 


Fraction of 

0.06 
0.06 

Partial crossimmunity of motherdaughter variants  uninformative    0.76 

Scale parameter of the antigenic emergence rate  uninformative    386 

Shape parameter of the antigenic emergence rate  2; Ref. 
In the first tier (5a–5b), competition between two antigenic variants
In the second tier (5c–5d), the instantaneous loss in
To account for demographic stochasticity, Markov transition probabilites are derived from (5), assuming that the per capita rates are constant over a small time interval
For the application to H3N2 phylodynamics, simulations were started in
To interface the twotier model with observed case report data and phylogenies, we simulated reported incidence under a Poisson model with mean
A frequent problem in phylodynamic modeling is to determine if a specific model parameter should be included. For example, it can be unclear which types of ecological interactions between antigenic variants underlie pathogen phylodynamics, or if the residual selection parameter
To illustrate ABC methodology with the summaries in
We first tested ABC on simulated data generated under the SEIRS model and found that the subset
The behavior of the spatial SEIRS model, when fitted to the case report and phylogenetic summaries in
(A–C) MCMC trajectories of the estimated
(A–B) Populationlevel weakly incidence in the sink and source population, respectively. (C) Corresponding weekly time series of the percentage of susceptible individuals in the sink population. (D) Simulated H3N2 weekly surveillance time series in the sink population (blue) and reconstructed H3N2 time series in the Netherlands (black). (E) Simulated and observed case report seasonal attack rates, and (F) autocorrelation function of case report peaks. Typically, simulations under the fitted model show sustained oscillations that follow a clear biennial pattern. (G) Simulated HA phylogeny under a large, estimated residual selection parameter. (H) Simulated and observed lineage profile, and (I) simulated and observed time series of the time to the most recent common ancestor of extant phylogenetic lineages. Despite a relatively high selection parameter, the number of lineages and the time to the most recent common ancestor are overall too high when compared to data. Model parameters are
The extent to which phylodynamic parameters can be estimated depends mainly on the type of information that underlies the ABC summaries. As described more fully in
Moreover, while the sequence divergence and diversity are standard descriptors of viral phylogenies
The summary errors reveal that the SEIRS model fails to reproduce the irregular interannual variability in winter season epidemics, and the narrowness and limited diversity of the HA phylogeny despite large
While several models have been able to simulate phylodynamics that are consistent with some aspects of the observed data, most notably the ladderlike phylogeny of H3N2's haemagglutinin gene
We generated data under the epochal evolution model and fitted both models with the summaries in
We could fit and assess the epochal evolution model against summaries of H3N2 surveillance and sequence data with ABC (
(A–D) MCMC trajectories as in
(A–I) Subplots are as in
summary  mean 

posterior density under the SEIRS model  comments  posterior density under the epochal evolution model  comments  

−0.67 
encompassing values for all countries  −0.54 
encompassing values for all countries 

−0.27 
encompassing values for all countries  −0.29 
encompassing values for all countries 

−0.12 
explosiveness of Dutch data not matched well  −0.06 
encompassing values for all countries 

−0.84 
inconsistent  −0.15 
encompassing values for all countries 

0.03 
consistent  0.19 
inconsistent in conflict with 

0.06 
consistent  0 
consistent 

−0.43 
inconsistent by a factor 
−0.08 
consistent 

−1.2 
inconsistent by a factor 
−0.48 
inconsistent by a factor 

−0.73 
inconsistent by a factor 
−0.06 
consistent 

  −0.03 
consistent 
In the absence of strong seasonal forcing in the source population, infrequent cluster invasions often excite large invasion waves and refractory oscillations
Finally, we identify significant levels of unexplained selection pressures in the HA phylogeny under the epochal evolution model. While the mean posterior residual selection parameter
Fitting mechanistic models to infectious disease dynamics of RNA viruses that may escape immunity is notoriously difficult, and key epidemiological parameters such as
Phylodynamic parameter inference and goodnessoffit analyses rely critically on the possibility to combine epidemiological and molecular genetic data. In particular, H3N2 case report data were not sufficient to disentangle the reporting rate from epidemiological parameters, and measures of sequence divergence and diversity were not sufficient to separate the population size from evolutionary parameters. To the extent that other RNA viruses are characterized by different phylodynamic behavior, different sets of summaries must be identified in each case to replace likelihood calculations.
ABC relates evolutionary and epidemiological data mechanistically through an evolving dynamic system and thereby allows us to investigate empirical phylodynamic hypotheses more directly than is possible with other statistical data synthesis approaches
The reported parameter estimates and summary errors are derived by conditioning only on the phylodynamic summaries and weighting schemes described in
We used ABC to fit mechanistic phylodynamic models of interpandemic influenza A(H3N2) to summaries of surveillance data from the Netherlands and sequence data from Northern Europe. Influenza is a globally circulating virus, and the mechanistic models considered must account for the replenishment of genetic variants from outside Northern Europe in order to reproduce features of influenza's phylogeny. In contrast, semi or nonparametric models of population dynamics that are used in coalescent methods do not necessarily require this layer of spatial complexity
The two models we analyzed show clear limitations in their ability to replicate features of H3N2 sequence and surveillance data simultaneously, and the ABC error diagnostics give some indication how these models could be refined (
The SEIRS model could not generate the irregularity in observed incidence data. In comparison, our analysis of the epochal evolution model demonstrates that epochal evolutionary processes can easily excite irregular betweenseason dynamics that match observed data (see
Several alternative models have been proposed to reproduce H3N2's narrow HA phylogeny. Here, we identified an additional, testable constraint for these models on surveillance data, that arises through the phylodynamic interactions in
For the epochal evolution model with sourcesink migration dynamics, the average simulated waiting time is
More broadly, both types of data are now increasingly becoming available for RNA viruses
Supplementary online text describing the influenza A (H3N2) sequence and surveillance data used, ABC algorithms and summary statistics, ABC analyses on simulated data and sensitivity analyses.
(PDF)
We thank Steven Riley, Simon Cauchemez, Anton Camacho, David Rasmussen and Neil Ferguson as well as three reviewers for their time and thoughtful comments. Computations were performed at the Imperial College High Performance Computing Service (