Reconstructing long-term dengue virus immunity in French Polynesia

Background Understanding the underlying risk of infection by dengue virus from surveillance systems is complicated due to the complex nature of the disease. In particular, the probability of becoming severely sick is driven by serotype-specific infection histories as well as age; however, this has rarely been quantified. Island communities that have periodic outbreaks dominated by single serotypes provide an opportunity to disentangle the competing role of serotype, age and changes in surveillance systems in characterising disease risk. Methodology We develop mathematical models to analyse 35 years of dengue surveillance (1979–2014) and seroprevalence studies from French Polynesia. We estimate the annual force of infection, serotype-specific reporting probabilities and changes in surveillance capabilities using the annual age and serotype-specific distribution of dengue. Principal findings Eight dengue epidemics occurred between 1979 and 2014, with reporting probabilities for DENV-1 primary infections increasing from 3% to 5%. The reporting probability for DENV-1 secondary infections was 3.6 times that for primary infections. We also observed heterogeneity in reporting probabilities by serotype, with DENV-3 having the highest probability of being detected. Reporting probabilities declined with age after 14 y.o. Between 1979 and 2014, the proportion never infected declined from 70% to 23% while the proportion infected at least twice increased from 4.5% to 45%. By 2014, almost half of the population had acquired heterotypic immunity. The probability of an epidemic increased sharply with the estimated fraction of susceptibles among children. Conclusion/Significance By analysing 35 years of dengue data in French Polynesia, we characterised key factors affecting the dissemination profile and reporting of dengue cases in an epidemiological context simplified by mono-serotypic circulation. Our analysis provides key estimates that can inform the study of dengue in more complex settings where the co-circulation of multiple serotypes can greatly complicate inference.


A. Derivations of Eqs. (1,2)
In this Supporting Information, we derive Eqs. (1,2) of the main text. We first focus on ( , | ). We consider the difference between ( + , + | ) and ( , | ) with a small increment . Since this difference is the number of newly infected fraction of the population of age at year during a small increment , it is written as It is straightforward to check if Eq. (1) in the main text is the solution of (s2).
Similarly, the difference between ! ( + , + | ) and ! ( , | ) is written as where the first term in the right-hand side is the add to ! ( , | ) due to primary infections, while the second term is the removal from ! ( , | ) due to secondary infections. By dividing both sides by , and taking the small limit, we obtain Eq.
(2) in the main text is the solution of this equation.

B. Visualisation of the inferred parameters
First of all, 95%-CIs used in this article are generated under the Bayesina framework using the negative binomial likelihood probability with the fitting parameter that is inferred as 0.68 (95%-CI 0.61-0.75).
In Fig 3B, the FOI ! ( ) is plotted ( = 1,2,3,4 corresponds to DENV-1, DENV-2, DENV-3, DENV-4). In Fig 3C, the average immunity profile of the population is plotted using the fitted parameters. The fraction of the never infected population averaged over age at the time is estimated as ∫ '( $ ( , ) ( , ), where ( , ) is the normalised age distribution of the population size. The averaged fraction of the population who have been infected once by a serotype before the time is estimated as ∫ '( $ ! ( , ) ( , ). Finally, the averaged fraction of the population who have been infected more than once before the time is estimated as ∫ '( $ ( , ) ( , ).
In Fig 5A, FOI is plotted as a function of the fraction of the susceptible. Using the fitted model, this fraction (for circulating serotype ) is estimated as ∫  Fig 5A) and 80 for the general population (the right panel of Fig 5A). In Fig 5B, the probability of the occurrence of the epidemic (black solid lines) is plotted as a function of the fraction of the susceptible. This probability is estimated using logistic regression as detailed as follows: We first define the epidemic period as the period during which the reported number exceeds 300 (see Fig A). We then plot the value 1 as a function of the fraction of the susceptible for the epidemic periods (red filled circles in Fig 5B) and 0 for the non-epidemic periods (green filled circles in Fig 5B). Assuming a linear relationship between the fraction of the susceptible and the log-odds of the occurrence of epidemics, we determine this linear coefficient by fitting the corresponding Bernoulli sampling model to the 0-or-1 signal. For the ROC curve in Fig 5C, we first introduce a threshold value for the fraction of susceptible, above which we expect that the epidemic occurs and below which we do not. We then calculate the FPR (the rate at which an epidemic does not occur even if we expect it to occur) and the TPR (the rate at which an epidemic occurs as expected) for various values of the threshold. In Fig 5C, we plot filled circles at the position ( , )=(FPR, TPR), where the colours of the filled circles correspond to the threshold value.   (1-4, 5-9, 10-14, 15-80). In this figure, plotted quantities are the same as those in Fig 5 but with different age stratifications (1-4, 5-9, 10-14, 15-80).

Fig E Estimated reporting probabilities for the model with less changing reporting probability over time (ver1).
We consider the model with ( ) that is constant during 1979-1990, 1990-2004, 2005-2014, and infer the reporting probabilities.