plosPLoS Comput BiolploscompPLoS Computational Biology1553734X15537358Public Library of ScienceSan Francisco, USAPCOMPBIOLD120091110.1371/journal.pcbi.1002876Research ArticleBiologyEvolutionary biologyPopulation geneticsEffective population sizePopulation biologyPopulation dynamicsDisease dynamicsMedicineEpidemiologyEpidemiological methodsInfectious disease epidemiologyInfectious diseasesViral diseasesHepatitisHepatitis CPublic Health and EpidemiologyInfectious DiseasesEvolutionary BiologyIntegrating Phylodynamics and Epidemiology to Estimate Transmission Diversity in Viral EpidemicsVariation of Transmissibility in Viral PathogensMagiorkinisGkikas^{1}^{2}^{*}SypsaVana^{1}MagiorkinisEmmanouil^{1}ParaskevisDimitrios^{1}KatsoulidouAntigoni^{1}BelshawRobert^{2}FraserChristophe^{3}PybusOliver George^{2}HatzakisAngelos^{1}^{*}Department of Hygiene, Epidemiology and Medical Statistics, Medical School, University of Athens, Athens, GreeceDepartment of Zoology, University of Oxford, Oxford, United KingdomSchool of Public Health, Imperial College, London, United KingdomKosakovsky PondSergei L.EditorUniversity of California San Diego, United States of America* Email: gkikas.magiorkinis@zoo.ox.ac.uk (GM); ahatzak@med.uoa.gr (AH)
The authors have declared that no competing interests exist.
Conceived and designed the experiments: GM AH. Performed the experiments: GM EM DP. Analyzed the data: GM VS CF OGP AH. Contributed reagents/materials/analysis tools: GM VS DP AK RB OGP AH. Wrote the paper: GM VS EM DP AK RB CF OGP AH.
12013311201391e1002876662012151120122013Magiorkinis et alThis is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The epidemiology of chronic viral infections, such as those caused by Hepatitis C Virus (HCV) and Human Immunodeficiency Virus (HIV), is affected by the risk group structure of the infected population. Risk groups are defined by each of their members having acquired infection through a specific behavior. However, risk group definitions say little about the transmission potential of each infected individual. Variation in the number of secondary infections is extremely difficult to estimate for HCV and HIV but crucial in the design of efficient control interventions. Here we describe a novel method that combines epidemiological and population genetic approaches to estimate the variation in transmissibility of rapidlyevolving viral epidemics. We evaluate this method using a nationwide HCV epidemic and for the first time coestimate viral generation times and superspreading events from a combination of molecular and epidemiological data. We anticipate that this integrated approach will form the basis of powerful tools for describing the transmission dynamics of chronic viral diseases, and for evaluating control strategies directed against them.
Author Summary
To design strategies that efficiently mitigate an epidemic requires estimates of how many people each carrier is likely to infect, what is the variation of this number among infections, and what is the time needed for these transmissions to take place. The disciplines of epidemiology and population genetics independently provide partial answers to these questions by analysing surveillance data and molecular sequences, respectively. Here we propose a novel integration of the two fields that can reveal the underlying transmission dynamics of rapidlyevolving viruses such as HIV or HCV. We explore a welldescribed nationwide HCV epidemic and show that our method provides new insights into the nature and variation of HCV transmission among infected individuals. We suggest that this approach could form the basis of new tools that can help in the design of effective public health interventions targeting the spread of viral pathogens.
GM is supported by the Wellcome Trust and the European Commission, FP7. RB is supported by the Wellcome Trust. DP has been funded by EPEAEK II and PYTHAGORAS II. CF and OGP are supported by the Royal Society. This research has been cofunded by the European Social Fund and National Resources – EPEAEK II and PYTHAGORAS II and the Hellenic Scientific Society for the Study of AIDS and Sexually Transmitted Diseases. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Introduction
Mathematical epidemiology describes the spread of infectious diseases and aims to aid in the design of effective public health interventions [1]–[3]. Central to this endeavour is the basic reproductive number (R_{0}) of an infectious disease, the mean number of secondary infections per primary infection in a completely susceptible population [4] (for notations see Table 1). Under simple epidemiological scenarios, in which all infected individuals behave identically, R_{0} depends on the transmission probability per contact with a susceptible individual, the duration of infectiousness and the rate at which new contacts are made [2], [4], [5]. However, studies on sexually transmitted and vectorborne infections indicate that infected individuals behave far from identically and that variation in the number of secondary infections per infected individual can play a major role in epidemic dynamics. For example, some researchers have invoked the socalled 20–80 rule to describe the finding that approximately 20% of infected individuals are responsible for 80% of onward transmission [3], [6], [7]. The term ‘superspreaders’ has been coined to describe hosts that contribute disproportionately to onward infection.
10.1371/journal.pcbi.1002876.t001Abbreviations and terms used throughout the manuscript.
Symbol
Name
Statistical definiton
Units
R_{0}
Basic reproductive number or ratio
Mean number of secondary infections
Number of infections
R_{0,a}
Basic reproductive number or ratio of the transmitter group assuming a transmitter, nontransmitter secondary infections model
Mean number of secondary infections
Number of infections
Z
Number of secondary infections per infected individual
Random variable
Number of infections
Z_{a}
Number of secondary infections of the transmitter group assuming a transmitter, nontransmitter secondary infections model
Random variable
Number of infections
N
Number of prevalent cases

Number of infected people
N_{e}
Effective number of infections

Number of infected people
PTP
Phylodynamic transmission parameter

Number of infections per year
T
Generation time
Average length of time between primary and secondary infections
Years
γ
Recovery rate from the disease

Number of persons per year
μ
Death rate of the population

Number of persons per year
SSE
Superspreading Events
Minimum expected number of secondary infections from a superspreader
Number of secondary infections
k
Dispersion parameter of the negative binomial distribution


superspreader
Top 1% of infected individuals when we rank them by their attributed secondary infections


In previous work, variation in the number of secondary infections per infected individual, Z, has been represented by a negative binomial distribution that is described by two parameters, (i) mean R_{0} among infections and (ii) the dispersion parameter k[8], [9]. A small k (<0.1) indicates that a small proportion of infected individuals actively transmit the pathogen, whilst a large k (>4) means that all infected individuals contribute approximately equally to onwards transmission [8], [10]. LloydSmith et al. introduced a definition of superspreaders as the top 1% of hosts when ranked by the number of secondary infections they create [8]. Although superspreading events (SSE) (i.e. the minimum number of secondary infections generated by a superspreader) have been estimated for directlytransmitted acute infections [8], they have never been described for chronic viral infections. The indolent and subclinical nature of chronic infections makes it difficult to track primary and secondary infections of the multiple strains that concurrently transmit in a given population. The problem is further compounded for HIV and the hepatitis C virus (HCV) that circulate in sociallymarginalised groups such as injecting drug users (IDUs) and commercial sex workers.
In addition to R_{0} and the variation in onward transmission, another epidemiologicallyimportant parameter is the average time between the primary and secondary infections, typically termed the infection generation time (T; several other definitions are used in the literature). A short T indicates rapid transmission, whilst a longer T suggests slower spread but also longer carriage. The duration of carriage of pathogens, which is usually known, represents an upperlimit on T and thus it is reasonable to conclude that directly transmitted acute infections have T<1 month whilst chronic infections have T values on the order of months or years.
Here we show how transmission variability and infection generation time can be estimated by combining viral genomic data with surveillance data and mathematical epidemiology.
Results/DiscussionConceptual modelling framework
The concept of effective population size (Ne) has been used in population genetics for at least 50 years (for a brief review see Text S1) [11], [12]. Ne(t) is generally defined as the size of an idealised population (one without selection or population structure) that experiences the same level of genetic drift as the studied population at time t. Ne(t) is typically lower than N(t), the population's actual size at time t. The ratio N(t)/Ne(t) thus indicates how similarly the real population's reproduction matches the assumptions of the idealised model [13], [14]. Under a wide range of scenarios this ratio represents the variation in offspring numbers among individuals [15], [16].
If the population in question is a viral epidemic, then N(t) is the number of infections at time t (or number of prevalent cases) and Ne(t) represents the effective number of infections (i.e. the number of infections of an idealised epidemic that experiences the same level of genetic drift as the studied population). Crucially, if genetic variation among strains has little or no effect on their ability to infect hosts, as appears to be the case for HIV and HCV [11] then the ratio N(t)/Ne(t), is formally equal to var(Z), the variance in the number of secondary infections [17], [18]:N(t) can be directly observed or estimated from surveillance data using classical epidemiological methods [19]. Ne(t) can be estimated by analysing the pattern of genetic diversity in a sample of the viral population. Specifically, methods based on coalescent theory, such as the skyline plot [11], [20], estimate the product of the coalescent Ne(t) multiplied by T, the generation time. The value var(Z)/T is inferable from empirical data and we here call it the phylodynamic transmission parameter, PTP. With all these estimates in hand it is therefore possible to estimate var(Z) from equation 1 as follows:PTP reflects two important features of the intensity of transmission within a population, (i) the variance of secondary infections among infections, and (ii) time between infections. Equation 2 suggests that an epidemic with a specific PTP is equally well described either by slow and highly variable onward transmission or by fast and more homogeneous onward transmission. This means that by comparing prevalent cases and genetic diversity (as measured by the skyline plot) alone, we cannot directly infer var(Z) and T; more information is required to separate these parameters. In the next two sections we consider practical aspects of inferring these two variables.
Infection generation time
Volz and Frost [21], [22] incorporated mathematical epidemiology in coalescent models assuming that pathogens spread in the population according to compartmental models of epidemic spread. As theory predicts they showed that there is no constant transformation from NeT to N because as susceptible hosts decline in the population, T expands; a constant transformation from NeT to N is observed when the epidemic is on the exponential phase (i.e. T remains constant). Koelle and Rasmussen [23] showed similarly that a linear constant transformation of NeT to N is also observed when the epidemic is within a steady endemic state. Thus, if we compare NeT with N at the exponential phase or the endemic state we can assume that T remains constant.
Distributions of numbers of secondary infections for epidemics with active and inactive transmitters
To describe the variability in onward transmission we require a probability density function of the random variable Z, the number of secondary infections per infected individual. Previous work has modeled variation in this number with a negative binomial distribution described by two parameters, mean R_{0} and a dispersion parameter k[8], [9]. Chronic viral infections, such as those caused by HIV and HCV, are unlikely to be well described by a single distribution. For these epidemics a significant proportion of transmissions result in inactive infections that transmit the virus no further and thus a mixed distribution is a more realistic representation.
In our study we define a subpopulation of “inactive” infections whose expected number of secondary infections is equal to 0. The rest of the population is defined as “active”. Active infections comprise a proportion u of all infections and their expected number of secondary infections are assumed to be Poisson distributed with mean R_{0,a}. The distribution of the number of secondary infections Z in the whole population (active and inactive combined) is therefore a zeroinflated Poisson distribution, such that:Equations 3 and 4 can be used to estimate the number of secondary infections of active infections (R_{0,a}) provided that estimates of E(Z), u and var(Z) are available.
Proof of concept: Concurrent nationwide epidemics of HCV
Welldescribed cohorts of HCV infections (of subtypes 1a, 1b, 3a and 4a) have been described in Greek populations [24], [25]. Crucially, for these epidemics we have both surveillance information and concurrent samples of viral genome sequences from the same population. First, we used inferred HCV incidence and prevalence by subtype from previous studies [25]. Next, we used the skyline plot method to estimate the value Ne(t)T for each subtype from the viral genome sequences sampled concurrently from the same populations (see Table S1) [26]–[28].
For both methods we assume that the population corresponds to the set of individuals chronically infected with HCV. The majority of patients with HCV infection develop persistent or chronic infection (60–92%) whilst a minority clears HCVRNA (8–40%); viral clearance is much faster within the first 2 years of infection and slower thereafter (≪1% per year), while increased rates of viral clearance are associated with younger age, female gender, lack of HIV coinfection, chronic HBV infection and genetic variation in IL28B [29]–[42].
HCV phylodynamic analysis
In total, 24, 27, 24 and 22 samples from Greek patients were amplified and sequenced for subtypes 1a, 1b, 3a and 4a, respectively (Table S1). The majority of subtype 1a and 3a infections were associated with injecting drug use, while for subtype 1b and 4a infections the source of infection was usually unknown. These distributions are consistent with previous epidemiological findings [24].
Phylogenetic trees (Figure S1) were estimated using a part of the NS5B region (nt 8297–8597) for which more reference sequences from other locations are available. These revealed the epidemics of different subtypes in Greece are not monophyletic and thus they arose through multiple introductions.
Since the outbreaks were not monophyletic we can only provide upper limits of the date of introduction of each subtype (i.e. the date of the oldest possible introduction). Analysis using molecular clock coalescent methods (Figure 1, Figure S2) indicates that the 1a, 1b, 3a and 4a epidemics first entered the Greek population around 1965, 1958, 1975 and 1967, respectively (Table S2). It is important to note that the methods developed here depend on the exponential growth phase of each subtype, and not on the date of its most recent common ancestor, as the latter is more sensitive to sampling biases. The most striking difference in epidemic history among the subtypes is the rapid exponential growth of subtype 3a during 1978–1990, whereas the other subtypes appeared to expand more slowly during 1960–1990 (Figure 1).
10.1371/journal.pcbi.1002876.g001Plots through time of <italic>NeT</italic> (estimated from genetic data using the Bayesian skyline plot) versus <italic>N</italic> (estimated from surveillance data using back calculation).
The plot of N is drawn by means of locally weighted smoothing on the scatter plot (lowess) of the estimated N. We have truncated the plots after 1990 as we wish to characterise HCV transmission prior the virus' discovery in 1989. The vertical axes of the plots through time of NeT N for each HCV subtype (B) have been scaled between maximum and minimum values.
Epidemic and phylodynamic estimates are correlated
For each HCV subtype, the estimated plots of N_{e}(t)T and N(t) for each subtype correspond with each other in relative size (Figure 1a), indicating that larger N corresponds to larger N_{e}T. The plots of N_{e}(t)T and N(t) for each subtype are also remarkably similar in shape (Figure 1b), indicating that PTP = (N(t)/Ne(t)T) is relatively constant through time. Subsequently, to estimate the ratio N/N_{e}T for each subtype, we assessed the correlation of N_{e}T and N during the period of exponential growth using linear regression (suppressing the constant term, since theory proposes that N is directly proportional to N_{e}). The correlation of N(t) and Ne(t)T is thus given by N(t) = a Ne(t)T, such that a is an estimate of the phylodynamic transmission parameter PTP = (N/NeT). Since all these metrics are timeseries data we corrected the crosscorrelations between NeT and N for autocorrelation by means of the NeweyWest method [43]. Specifically, we assessed the autocorrelation structure for each parameter and each subtype and then used the maximum lag between the crosscorrelated data to correct statistical significance. Linear regressions of N(t) against Ne(t)T for each HCV subtype are strong and significant (p<0.01; R^{2} = 0.70–0.95). The regression gradients (a) provide estimates of PTP = (N/NeT), which vary from 15.6 to 43.4 for the different HCV subtypes (Table 2, S3).
10.1371/journal.pcbi.1002876.t002Estimates of transmission parameters for each HCV subtype.
All
Transmitters
99^{th} percentile SSE
PTP = (N/NeT)1 (95% C.I.)
E(Z) = R_{0} (95% C.I.)
T2
u3
E(Z_{a}) = Var(Z_{a}) = R_{0}_{,a}
Top 1% (overall)4
1a
25.8 (21.2–30.2)
3.4 (3.3–3.5)
1.4
0.26
13.1
20
1b
15.6 (14.6–16.4)
4.5 (4.2–4.8)
20.6
0.06
75
83
3a
43.4 (38.6–48.2)
11.5 (10.7–12.4)
3.7
0.47
24.5
35
4a
27.8 (23.2–31.4)
2.4 (2.3–2.5)
0.9
0.2
12
18
The phylodynamic transmission parameter PTP = N/(NeT) has been estimated as the coefficient of the linear regression of N versus NeT without constant term. For the confidence intervals the autocorrelation structure of each variable has been taken into account according to the NeweyWest correction.
Generation time estimated as Var(Z)/PTP (maximum estimate assuming that the minimum proportion of transmitters equals the proportion of IDUs in each subtype).
Proportion of transmitters, practically equal to the proportion of IDUs within each subtype.
Upper 1% of the distribution of secondary infections including transmitters and nontransmitters.
The subtypespecific estimates of mean R_{0} during the exponential growth phase of Ne or N were 2.4–11.5 (Table 2, Table S3) assuming that infectivity period is 40 years and life expectancy is 70 years. These estimates are similar to those reported previously for subtypes 1a and 1b (both global samples) and 4a (sampled from Egypt) [44]. The expansion of subtype 3a is characterised by faster epidemic growth over a shorter timeframe compared to the other subtypes (Figure 1) and this is reflected in the large R_{0} value for that subtype, which suggests an average of >10 secondary infections per primary infection.
Model of secondary infections in the Greek HCV epidemics
Historically, HCV epidemics have taken two distinct forms: older transfusion and iatrogenicrelated transmission, and more recent intravenous drug userelated (IDUrelated) outbreaks. The earlier transmission was characterised by slower spread; individuals infected by transfusion or nosocomial transmission are less likely to practice highrisk behaviors and thus often represent transmission chain deadends. The more recent IDUrelated epidemics are characterised by rapid spread. HCV is hyperendemic in IDUs worldwide with antiHCV prevalence of 15–90% [45]; IDUs may share syringes, needles and other contaminated equipment and are likely to cause long transmission chains [46], [47]. As explained above, the Zvalues of HCV epidemics are thus unlikely to be described well by a single distribution; instead we suggest a bimodal distribution model for the number of secondary infections (see Eq.3–5) that can represent both types of transmission behavior.
We can use Equation 4 to test whether our model is congruent with epidemiological data. Equation 4 predicts that PTP increases with the proportion of “transmitters” in the population of infected individuals (provided that the proportion of transmitters is <50%, which is the case for all the HCV epidemics in this study). Regression of PTP against the percentage of IDU infections for each HCV subtype is strongly significant (Figure 2) whereas the regressions for other risk groups are not (Table S4). This suggests that the estimates of PTP are compatible with the known epidemiology of HCV. However, we note that this regression contains only 4 points and therefore data from more subepidemics are required to strengthen this finding.
10.1371/journal.pcbi.1002876.g002Scatter plot of the proportion of IDUs against the phylodynamic transmission potential ( = <italic>N</italic>/<italic>NeT</italic>) for each subtype.Estimation of the generation time (<italic>T</italic>)
There is no previouslyavailable estimate for the generation time (T) of HCV since tracking of secondary infections is very difficult and date of infection is in most cases unknown. Some workers have suggested approximating T using the duration of infectiousness (1/(γ+μ)) [48], which for HCV is around 25 years (i.e 1/γ = 40 years and 1/μ = 70 years) (Table S3). If we assume that secondary infections follow a Poisson process within the duration of infectiousness (1/(γ+μ)) (i.e. if we perform a simulation of random secondary infections within 25 years of infectiousness), then the mean average time between primary and the subtending secondary infections is similarly high (∼12.5 years) regardless of the average number of secondary infections. Such values are epidemiologically and empirically unrealistic for many HCV epidemics: we know that IDUs usually get infected within 2 years after initiating injection [49].
By combining Equations 2, 34 taking into account that we can investigate how T is dependent on the proportion of the transmitters (u) and vice versa (Table 3, Figure 3):
10.1371/journal.pcbi.1002876.g003Contour plots showing how generation time (<italic>T</italic>), basic reproductive number (<italic>R</italic><sub>0</sub>) and the proportion of transmitters in the population (<italic>u</italic>) covary.
Gray bands highlight different values of u. The area between the white dashed lines represents R_{0} values estimated by sensitivity analysis of mortality and recovery rate (Table S3). The area between the yellow dashed lines represents the 95% confidence limits of R_{0} values estimated assuming 40 years of infectivity and 70 years of life expectancy. The black dots show the maximum T value for each subtype, which is defined by empirical values for u and the median values of R_{0} (see text).
10.1371/journal.pcbi.1002876.t003Sensitivity analysis of the transmission parameters (var(<italic>Z</italic>), <italic>u</italic>, <italic>R</italic><sub>0,a</sub>) accounting for different generation times (<italic>T</italic>) using the twogroup (transmitter, nontransmitter) model of secondary infections (<xref reftype="dispformula" rid="pcbi.1002876.e001">Eq.1</xref>).
R_{0}
T
var(Z)
u
R_{0,a}
1a
3.4
1
25.8
0.34
9.99
2
51.6
0.19
17.58
10
258
0.04
78.28
25
645
0.02
192.11
1b
4.5
1
15.6
0.65
6.97
2
31.2
0.43
10.43
10
156
0.12
38.17
25
390
0.05
90.17
3a
11.5
1
43.4
0.81
14.27
2
86.8
0.64
18.05
10
434
0.24
48.24
25
1085
0.11
104.85
4a
2.4
1
27.8
0.18
12.98
2
55.6
0.1
24.57
10
278
0.02
117.23
25
695
0.01
290.98
We assume that T is constant, which is reasonable for the exponential phase of the epidemic that we focus on [50]–[53]. Equation (5) shows that T is maximized at the smallest plausible value of u. The known epidemiology of HCV in IDUs suggests that the proportion of the transmitters (u) will not be smaller than the proportion of the IDUs (i.e. every IDU is likely to have transmitted), at least in our subtype 1a, 3a and 4a outbreaks, which are driven by intravenous drug use. Thus an epidemiologicallymeaningful maximum T value can be obtained by setting u equal to the proportion of IDUs in the population (Figure 3).
Using Greek surveillance data on the proportion of HCV infections of each subtype associated with IDU [24] we estimate that the maximum T (Figure 3, Table 3) for subtype 1a (IDU: 26%) is 1.4 years, for subtype 3a (IDU: 47%) is 3.7 years and for subtype 4a (IDU: 20%) is 0.9 years. For the iatrogenic (non IDUdriven) epidemic of 1b (IDU:<10%) we estimate the maximum T close to the approximate duration of infectiousness (∼20 years) [Note that we use IDU as transmitters even if the epidemic is nonIDU driven; this is due to their engagement in repeated paid blood donation up to the end of the 1970s.] [54].
These estimates of T for subtypes 1a, 3a and 4a are more compatible with the natural history of the disease than those based on the duration of infectiousness (∼12.5 years). The probability of secondary infection per contact is expected to be higher during the first year of infection, when viral load is 10 times greater than later in infection [55], [56]. Also, in the first year patients are less likely to have ceased or reduced the highrisk behavior (e.g. IDU) that led them to be infected. Taken together, this suggests that secondary infections are more likely during the first year of infection. For subtype 1b the estimated T is artificially inflated due to its transmission route (see below).
Analysing the transmission diversity of HCV epidemics
We used equations (3) and (4) to estimate the basic reproductive number of the transmitters (R_{0,a}) and the variability in onward transmission, given the values for u, PTP, R_{0} and T obtained above (Table 2). We estimate that for HCV subtypes 1a, 1b, 3a and 4a the R_{0,a} values ranged from 12 to 74 and the 99^{th} percentile SSE from 18 to 83 secondary infections (Table 2, Figure 4, Figure S4). Compared to directlytransmitted pathogens, HCV epidemics generally have large 99^{th} percentile SSE values, at least at the levels of SARS and Smallpox. For outbreaks of subtypes 1a, 1b, 3a and 4a investigated here, we estimate that 80% of the infections are caused by approximately 20%, 5%, 35% and 15% of the most infectious individuals, respectively (Figure 5).
10.1371/journal.pcbi.1002876.g004Estimated distributions of the number of secondary infections per primary infection for each HCV subtype.10.1371/journal.pcbi.1002876.g005Cumulative proportion of onward infection versus the infected population ranked by the number of secondary infections they create.
20% of onward infections is indicated with a grey horizontal line. The proportion of the population that generates 80% of onward infections is shown by a vertical dashed line. HCV subtype 1a is close to the 8020 rule (i.e. 80% of the infections are caused by the most infectious 18%).
The subtype 1b epidemic is the oldest and most prevalent in Greece, characterised by a small proportion of IDUs (6%) and was spread due to the use of contaminated blood and blood products. The very large number of secondary infections for each member of the transmitter population (R_{0,a} = 75), the high degree of superspreading (SSE 99^{th} percentile = 83) and the long generation time (T∼20 years) are compatible with the expected transmission dynamics of blood transfusions in the 1960s and 1970s. Historically, subtype 1b infections in Greece are attributed to the use of imported pooled plasma products, a practice that increased the probability of contaminating dozens of individuals from a single contaminated batch; the plasma products could be stored and distributed over many years leading to an artificially large “generation time”. Moreover, within Greece, infected IDUs during the 1960s and 1970s practiced repeated paid blood donations as a source of income. The reported dynamics of HCV1b are typical of older (pre1990s) HCV epidemics and do not apply to contemporary transmission (except in rare instances when transfusion safety breaks down. Similar trends in blood transfusion as a risk factor for HCV have been documented in many developed countries [46], [57]–[60].
On the other hand, the epidemics of subtypes 1a, 3a and 4a epidemics have higher proportions of IDUs (26%, 47% and 20% respectively) [24] and are typical of the modern HCV epidemics in the Western societies. For these epidemics the higher proportion of IDUs resulted in almost proportionally higher mean and variance in the number of secondary infections. The dynamics of these epidemics are still operating in the developed world and the estimated transmission parameters can be used to design mitigating strategies.
Limitations of the study
Phylogenetic analysis suggests the subepidemics of HCV in Greece are the result of multiple introductions (i.e. nonmonophyletic; Figure S1) suggesting that estimates of Ne(t)T near the root of the each subtype phylogeny may be biased upwards (because lineages fail to coalesce due to population structure). Two arguments suggest this is not a significant issue in our analysis. First, the trajectories of N(t) and Ne(t)T, which were estimated from separate data sources, closely correspond in four independent epidemics (in scale and shape) and N was obtained from epidemiological surveillance data of wholly Greek origin. Second, it is reasonable to assume that coalescent events within the exponential phase (the period during which we compared N(t) and Ne(t)T) did occur within Greece. That is, coalescences close to the root of each phylogeny (which may represent transmission outside Greece) were not used in our analysis. In the worst case scenario – that Ne(t)T has been overestimated – our estimate of PTP can be considered a lower bound and that variation in onward transmission might be even greater than reported here.
A second limitation of our study is that our estimate of PTP does not incorporate statistical uncertainty in the estimation of N(t) and Ne(t)T. In the future, we aim to develop a Bayesian approach to incorporate both sources of uncertainty and provide a proper posterior distribution for PTP.
Our approach provides information about superspreading from analytical relationships between the rate of coalescence (Ne), viral generation time (T), and prevalence (N) and thus is independent of phylogenetic topology. It is therefore complementary to alternative approaches that investigate how nonrandom contact structures affect the topology of a transmission tree [61]. At this point we should emphasize that further exploration and extension of the approach is required. For example a zeroinflated Poisson distribution of secondary infections does not fit most of the HIV1 epidemics. A powerlaw distribution resulting from sexualcontact analysis would provide a more realistic approximation, for which a detailed analysis of the effect of network structure on PTP needs to be performed. Finally, simulation studies could explore the robustness of the approach under a wider range of epidemiologic scenarios, whilst larger datasets could empirically replicate our findings to support wider applicability of this approach e.g. to inform Public Health policies.
Conclusion
We have shown that phylodynamic methods can be combined with epidemiological surveillance data to estimate the variability in ongoing transmission of a chronic viral epidemic, and to investigate its generation time. Both parameters are critical to the design of effective control measures but are very difficult to estimate from surveillance data alone. We tested the framework on a wellcharacterised set of HCV epidemic in Greece, showing that the results are epidemiologically coherent and suggesting that this approach could be a new tool for public health. We expect our approach to be most readily adapted to other chronic viral diseases such as HIV, but could also be applied to directly transmitted (e.g. Influenza) or vectorborne (e.g. Dengue) viral epidemics, for which superspreading events and generation times are largely unknown.
MethodsEthics statement
Study approval was granted by the IRB of Athens University Medical School.
Estimation of chronic HCV incidence and prevalence through time
The overall and genotypespecific incidence of chronic HCV infection has been estimated in previous studies using backcalculation [24], [25]. Briefly, the distribution of transmission risk groups among HCV infected individuals was obtained from 943 Greek patients enrolled in treatment studies [24], [25]. Enrolment took place between 1995 and 2000; patients were adults (18–70 years old) with a histological diagnosis of chronic hepatitis. Injecting drug use, transfusion, other and sporadic transmissions were reported by 24%, 32%, 6% and 38% of the patients, respectively. The distribution of the dates of infection within each transmission group was determined using data from 456 Greek patients enrolled in treatment studies with known dates of infection. We extended the backcalculation approach to estimate subtypespecific incidence of chronic HCV [25] in Greece as follows: a) we estimated the number of individuals infected with HCV in Greece, b) we obtained the distribution of HCV subtypes by year of onset for each transmission group within the infected population and c) we calculated subtypespecific incidence according to transmission group using the number of new infections in the past for each transmission group and the corresponding distribution of HCV subtypes by year of infection. The estimates for each transmission group were then combined to obtain an estimate of the overall genotypespecific incidence and prevalence during 1940–1990.
HCV sequence data
Correct sampling is crucial to the inference of epidemic history from genetic data [62]. All available 1a, 1b, 3a and 4a subtype samples from distinct HCVinfected patients, tested within a 12year period (1994–2006), were sorted according to their sampling dates, and at least one sample was randomly selected and sequenced for every 6month interval. For cases in which no sample was available in a specific 6month interval, the closest sample to that period was selected. Besides the sampling date, additional information was recorded for each sample: patient's age, sex, transmission group and treatment history (Table S1). Samples were excluded where the patient had a prior history of antiviral therapy and/or HIV coinfection, since these factors are believed to affect the intrahost evolution of the virus, thus (theoretically) introducing a bias into the estimation of substitution rate [63]. Sequencing of the HCV E2P7NS2 and NS5B regions was performed as previously described [26].
Estimation of basic reproductive number (<italic>R</italic><sub>0</sub>)
We estimated R_{0} assuming that the population is large enough to follow a deterministic SusceptibleInfectedRemoved model (SIR) [3]:where N(t) is the number of infected people at time t (prevalent cases), N(0) is the number of infected people at the baseline of the exponential growth phase, γ is the recovery rate of the disease and μ is the death rate in the general population. This equation is valid for the exponential phase of the epidemic growth. To estimate subtypespecific R_{0} we used the nl routine in STATA to fit the above equation to the estimated N(t) curve during the exponential growth phase, assuming an average life expectancy (1/μ) of 70 years and an average infectivity period (1/γ) of 40 years (i.e. excluding host mortality), which are plausible estimates for the study population (Table S3). Note that if the N(t) and Ne(t) are highly correlated (such that N(t)/N(0) is equal to Ne(t)/Ne(0)) then equation 6 shows that we can get equivalent estimates of R_{0} from the skyline plot..
Identification of the exponential growth phase
To identify the exponential growth phase of each Greek HCV epidemic, we first defined the end of the exponential phase as 1990, to reflect the introduction of antiHCV screening after the virus' discovery in 1989. The start of the exponential phase was detected using two methods. First, by visually inspecting the epidemic time series and selecting the first time point after 6 years of consecutive increases of N or NeT. Second, we employed a previouslypublished algorithm used in quantitative PCR experiments, where the identification of the exponential phase of a growth curve is crucial [64]. Both methods provided closely similar results (±3 years).
Supporting Information
Phylogenetic trees (midpoint rooted) of the Greek isolates (blue circles) along with a global sample (all published sequences available at April 1^{st}, 2010) on NS5B (nt 8297–8597).
(TIF)
Upper and lower limits of the 95% Higher Posterior Density (HPD) of the skyline plots (NeT) and of the 95% Confidence Intervals (C.I.) of the backcalculated number of prevalent cases (N).
(TIF)
Scatter plots of N against NeT for the exponential growth phase along with the fitted regression line that passes from the origin of the axis (i.e. suppressing the constant term). Note that regression has been performed correcting for autocorrelation according to the NeweyWest method. We note an apparent deviation from linearity due to stochastic noise independently present the autocorrelated series. This deviation disappears when only independent data points are included in the plot.
(TIF)
Cumulative distribution of the secondary infections for the Greek HCV epidemics (solid lines) and directly transmitted pathogens (dashed lines) based on estimates provided by LloydSmith et al. [30]. (SSE = Superspreading events)
(TIF)
A. Demographic features and experimental efficiency in the sample used for the phylodynamic analysis, B. Demographic features of the patients used for the epidemiological analysis.
(PDF)
Estimated parameters of the phylodynamic analysis.
(PDF)
Sensitivity analysis for the estimated medians of the Basic Reproductive Numbers (R_{0}).
(PDF)
Regression analysis of the percentage of the risk group per genotype with the spread metrics PPT and R_{0} per genotype in the study population: coefficients of determination (Pearson's R^{2}) are shown with associated level of significance (P value).
(PDF)
Supplementary information.
(DOC)
We would like to thank Aris Katzourakis for reviewing the manuscript and providing useful comments.
ReferencesGrasslyNC, FraserC (2008) Mathematical models of infectious disease transmission. Anderson RM, May RM (1992) Infectious Diseases of Humans: Dynamics and Control. Oxford: Oxford University Press.Keeling MJ, Rohani P (2008) Modeling Infectious Diseases in Humans and Animals. Princeton, New Jersey: Princeton University Press.KermackWO, McKendrickAG (1927) A contribution to the mathematical theory of epidemics. AndersonRM, MayRM (1979) Population biology of infectious diseases: Part I. MayRM, AndersonRM (1987) Transmission dynamics of HIV infection. WoolhouseME, DyeC, EtardJF, SmithT, CharlwoodJD, et al. (1997) Heterogeneities in the transmission of infectious agents: implications for the design of control programs. LloydSmithJO, SchreiberSJ, KoppPE, GetzWM (2005) Superspreading and the effect of individual variation on disease emergence. LloydSmithJO (2007) Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases. GarskeT, RhodesCJ (2008) The effect of superspreading on epidemic outbreak size distributions. GrenfellBT, PybusOG, GogJR, WoodJL, DalyJM, et al. (2004) Unifying the epidemiological and evolutionary dynamics of pathogens. WrightS (1938) Size of population and breeding structure in relation to evolution. HedrickP (2005) Large variance in reproductive success and the Ne/N ratio. O'DeaEB, WilkeCO (2011) Contact heterogeneity and phylodynamics: how contact networks shape parasite evolutionary trees. KimuraM, CrowJF (1963) The measurement of effective population number. FelsensteinJ (1971) Inbreeding and variance effective number in populations with overlapping generations. TavareS, BaldingDJ, GriffithsRC, DonnellyP (1997) Inferring coalescence times from DNA sequence data. KingmanJFC (1982) On the genealogy of large populations. DeufficS, BuffatL, PoynardT, ValleronAJ (1999) Modeling the hepatitis C virus epidemic in France. StrimmerK, PybusOG (2001) Exploring the demographic history of DNA sequences using the generalized skyline plot. FrostSD, VolzEM (2010) Viral phylodynamics and the search for an ‘effective number of infections’. VolzEM, Kosakovsky PondSL, WardMJ, Leigh BrownAJ, FrostSD (2009) Phylodynamics of infectious disease epidemics. KoelleK, RasmussenDA (2012) Rates of coalescence for common epidemiological models at equilibrium. KatsoulidouA, SypsaV, TassopoulosNC, BoletisJ, KarafoulidouA, et al. (2006) Molecular epidemiology of hepatitis C virus (HCV) in Greece: temporal trends in HCV genotypespecific incidence and molecular characterization of genotype 4 isolates. SypsaV, TouloumiG, TassopoulosNC, KetikoglouI, VafiadisI, et al. (2004) Reconstructing and predicting the hepatitis C virus epidemic in Greece: increasing trends of cirrhosis and hepatocellular carcinoma despite the decline in incidence of HCV infection. MagiorkinisG, MagiorkinisE, ParaskevisD, HoSY, ShapiroB, et al. (2009) The global spread of hepatitis C virus 1a and 1b: a phylodynamic and phylogeographic analysis. DrummondAJ, HoSY, PhillipsMJ, RambautA (2006) Relaxed phylogenetics and dating with confidence. DrummondAJ, RambautA (2007) BEAST: Bayesian evolutionary analysis by sampling trees. ZhangM, RosenbergPS, BrownDL, PreissL, KonkleBA, et al. (2006) Correlates of spontaneous clearance of hepatitis C virus among people with hemophilia. VogtM, LangT, FrosnerG, KlinglerC, SendlAF, et al. (1999) Prevalence and clinical outcome of hepatitis C infection in children who underwent cardiac surgery before the implementation of blooddonor screening. TillmannHL, ThompsonAJ, PatelK, WieseM, TenckhoffH, et al. (2010) A polymorphism near IL28B is associated with spontaneous clearance of acute hepatitis C virus and jaundice. ThomasDL, ThioCL, MartinMP, QiY, GeD, et al. (2009) Genetic variation in IL28B and spontaneous clearance of hepatitis C virus. ThomasDL, AstemborskiJ, RaiRM, AnaniaFA, SchaefferM, et al. (2000) The natural history of hepatitis C virus infection: host, viral, and environmental factors. SeeffLB, MillerRN, RabkinCS, BuskellBalesZ, StraleyEasonKD, et al. (2000) 45year followup of hepatitis C virus infection in healthy young adults. SantantonioT, MeddaE, FerrariC, FabrisP, CaritiG, et al. (2006) Risk factors and outcome among a large patient cohort with communityacquired acute hepatitis C in Italy. KennyWalshE (1999) Clinical outcomes after hepatitis C infection from contaminated antiD immune globulin. Irish Hepatology Research Group. FarciP, AlterHJ, WongD, MillerRH, ShihJW, et al. (1991) A longterm study of hepatitis C virus replication in nonA, nonB hepatitis. ConryCantilenaC, VanRadenM, GibbleJ, MelpolderJ, ShakilAO, et al. (1996) Routes of infection, viremia, and liver disease in blood donors found to have hepatitis C virus infection. BortolottiF, VerucchiG, CammaC, CabibboG, ZancanL, et al. (2008) Longterm course of chronic hepatitis C in children: from viral clearance to endstage liver disease. AlterMJ, MargolisHS, KrawczynskiK, JudsonFN, MaresA, et al. (1992) The natural history of communityacquired hepatitis C in the United States. The Sentinel Counties Chronic nonA, nonB Hepatitis Study Team. AlterMJ, KruszonMoranD, NainanOV, McQuillanGM, GaoF, et al. (1999) The prevalence of hepatitis C virus infection in the United States, 1988 through 1994. AlterHJ, SeeffLB (2000) Recovery, persistence, and sequelae in hepatitis C virus infection: a perspective on longterm outcome. NeweyWK, WestKD (1987) A Simple, Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. PybusOG, CharlestonMA, GuptaS, RambautA, HolmesEC, et al. (2001) The epidemic behavior of the hepatitis C virus. NelsonPK, MathersBM, CowieB, HaganH, Des JarlaisD, et al. (2011) Global epidemiology of hepatitis B and hepatitis C in people who inject drugs: results of systematic reviews. AlterMJ (2011) HCV routes of transmission: what goes around comes around. AlterMJ, HadlerSC, JudsonFN, MaresA, AlexanderWJ, et al. (1990) Risk factors for acute nonA, nonB hepatitis in the United States and association with hepatitis C virus infection. WallingaJ, LipsitchM (2007) How generation intervals shape the relationship between growth rates and reproductive numbers. HaganH, PougetER, Des JarlaisDC, LelutiuWeinbergerC (2008) Metaregression of hepatitis C virus infection in relation to time since onset of illicit drug injection: the influence of time and place. RaptopoulouGigiM, OrphanouE, LallaTH, LitaA, GarifallosA (2001) Prevalence of hepatitis C virus infection in a cohort of pregnant women in northern Greece and transmission of HCV from mother to child. SypsaV, HadjipaschaliE, HatzakisA (2001) Prevalence, risk factors and evaluation of a screening strategy for chronic hepatitis C and B virus infections in healthy company employees. GoritsasC, PlerouI, AgaliotisS, SpinthakiR, MimidisK, et al. (2000) HCV infection in the general population of a Greek island: prevalence and risk factors. CornbergM, RazaviHA, AlbertiA, BernasconiE, ButiM, et al. (2011) A systematic review of hepatitis C virus epidemiology in Europe, Canada and Israel. NelsonKE, VlahovD, MargolickJ, BernalM, TaylorE (1990) Blood and plasma donations among a cohort of intravenous drug users. PageK, HahnJA, EvansJ, ShiboskiS, LumP, et al. (2009) Acute hepatitis C virus infection in young adult injection drug users: a prospective study of incident infection, resolution, and reinfection. CoxAL, NetskiDM, MosbrugerT, ShermanSG, StrathdeeS, et al. (2005) Prospective evaluation of communityacquired acutephase hepatitis C virus infection. ArmstrongGL, AlterMJ, McQuillanGM, MargolisHS (2000) The past incidence of hepatitis C virus infection: implications for the future burden of chronic liver disease in the United States. WilliamsIT, BellBP, KuhnertW, AlterMJ (2011) Incidence and transmission patterns of acute hepatitis C in the United States, 1982–2006. AlterHJ, KleinHG (2008) The hazards of blood transfusion in historical perspective. ChungH, UedaT, KudoM (2010) Changing trends in hepatitis C infection over the past 50 years in Japan. LeventhalGE, KouyosR, StadlerT, WylV, YerlyS, et al. (2012) Inferring epidemic contact structure from phylogenetic trees. StackJC, WelchJD, FerrariMJ, ShapiroBU, GrenfellBT (2010) Protocols for sampling viral sequences to study epidemic dynamics. DantaM, SemmoN, FabrisP, BrownD, PybusOG, et al. (2008) Impact of HIV on HostVirus Interactions during Early Hepatitis C Virus Infection. TichopadA, DilgerM, SchwarzG, PfafflMW (2003) Standardized determination of realtime PCR efficiency from a single reaction setup.