Skip to main content
Advertisement
  • Loading metrics

Estimating dengue transmission intensity from serological data: A comparative analysis using mixture and catalytic models

  • Victoria Cox ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    v.cox@imperial.ac.uk

    Affiliation MRC Centre for Global Infectious Disease Analysis; and the Abdul Latif Jameel Institute for Disease and Emergency Analytics, School of Public Health, Imperial College London, London, United Kingdom

  • Megan O’Driscoll,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Genetics, University of Cambridge, Cambridge, United Kingdom

  • Natsuko Imai,

    Roles Conceptualization, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliation MRC Centre for Global Infectious Disease Analysis; and the Abdul Latif Jameel Institute for Disease and Emergency Analytics, School of Public Health, Imperial College London, London, United Kingdom

  • Ari Prayitno,

    Roles Data curation, Investigation, Resources, Writing – review & editing

    Affiliation Department of Child Health, Faculty of Medicine Universitas Indonesia, Jakarta, Indonesia

  • Sri Rezeki Hadinegoro,

    Roles Data curation, Investigation, Resources, Writing – review & editing

    Affiliation Department of Child Health, Faculty of Medicine Universitas Indonesia, Jakarta, Indonesia

  • Anne-Frieda Taurel,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Sanofi Pasteur, Singapore

  • Laurent Coudeville,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Sanofi Pasteur, Lyon, France

  • Ilaria Dorigatti

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    Affiliation MRC Centre for Global Infectious Disease Analysis; and the Abdul Latif Jameel Institute for Disease and Emergency Analytics, School of Public Health, Imperial College London, London, United Kingdom

Abstract

Background

Dengue virus (DENV) infection is a global health concern of increasing magnitude. To target intervention strategies, accurate estimates of the force of infection (FOI) are necessary. Catalytic models have been widely used to estimate DENV FOI and rely on a binary classification of serostatus as seropositive or seronegative, according to pre-defined antibody thresholds. Previous work has demonstrated the use of thresholds can cause serostatus misclassification and biased estimates. In contrast, mixture models do not rely on thresholds and use the full distribution of antibody titres. To date, there has been limited application of mixture models to estimate DENV FOI.

Methods

We compare the application of mixture models and time-constant and time-varying catalytic models to simulated data and to serological data collected in Vietnam from 2004 to 2009 (N ≥ 2178) and Indonesia in 2014 (N = 3194).

Results

The simulation study showed larger mean FOI estimate bias from the time-constant and time-varying catalytic models (-0.007 (95% Confidence Interval (CI): -0.069, 0.029) and -0.006 (95% CI -0.095, 0.043)) than from the mixture model (0.001 (95% CI -0.036, 0.065)). Coverage of the true FOI was > 95% for estimates from both the time-varying catalytic and mixture model, however the latter had reduced uncertainty. When applied to real data from Vietnam, the mixture model frequently produced higher FOI and seroprevalence estimates than the catalytic models.

Conclusions

Our results suggest mixture models represent valid, potentially less biased, alternatives to catalytic models, which could be particularly useful when estimating FOI from data with largely overlapping antibody titre distributions.

Author summary

Characterising the transmission intensity of dengue virus is essential to inform the implementation of interventions, such as vector control and vaccination, and to better understand the environmental drivers of transmission locally and globally. It is therefore important to understand how methodological differences and model choice may influence the accuracy of estimates of transmission intensity. Using a simulation study, we assessed the performance of catalytic and mixture models to reconstruct the force of infection (FOI) from simulated antibody titre data. Furthermore, we estimated the FOI of dengue virus from antibody titre data collected in Vietnam and Indonesia. The models produced consistent estimates of FOI when they were applied to data with clear separation between the distributions of seronegative and seropositive antibody titres. We observed greater bias in FOI estimates obtained from catalytic models than from mixture models when they were applied to data with high overlap in the bimodal distribution of antibody titres. Our results indicate that mixture models could be preferential to estimate dengue virus FOI when the antibody titre distributions of the seronegative and seropositive components largely overlap.

Introduction

Dengue fever is caused by infection with one or more of four antigenically distinct serotypes of dengue virus (DENV1-4), a Flavivirus carried by Aedes mosquitoes [1,2]. DENV infects approximately 105 million people each year [3], primarily in tropical and sub-tropical regions. The geographical range of DENV is increasing [1,4,5] and it is expected that the spread of dengue will be influenced by rising global temperatures and increasing urbanisation [1,6]. Intervention measures to date rely essentially on vector control due to the absence of antiviral treatment, challenges in the use of the first licensed dengue vaccine for widespread dengue prevention and control [7], as well as in the use of rapid diagnostic tests for screening [8]. The current and expected future burden of dengue on health-systems is therefore high, demonstrating a continuing need for increased understanding of DENV transmission.

Estimating epidemiological parameters such as the force of infection (FOI, the per capita rate at which a susceptible person is infected) and population seroprevalence (the proportion of people in a population exposed to a virus, as determined by the detection of antibodies in the blood) allow us to gain insights into the subsets of populations most at risk of infection and disease [9], to assess the predicted impact of an intervention strategy [10], and to inform public health policy [11,12].

Both the FOI and seroprevalence can be estimated using mathematical models calibrated to age-stratified serological data measuring IgG antibody levels (also called titres) from blood samples. IgG titres are obtained using Enzyme-Linked Immunosorbent Assays (ELISAs) and are often classified into qualitative, binary test results (seropositive or seronegative) based on the manufacturer’s threshold.

Catalytic models, first proposed in 1934, estimate disease FOI from age-stratified serological or case notification data [13]. In these models, large rates of increase in seroprevalence between individuals who are age a versus age a+1 are explained by high age-specific FOI (assuming the FOI is constant in time) or high time-specific FOI experienced by individuals of all ages during the period a to a+1 years ago [14]. Catalytic models have been used extensively for measles [15], rubella [16], Hepatitis A [17], Chagas disease [18], and DENV [12,14,1921]. Whilst commonly used, previous work suggests that catalytic models risk generating biased estimates due to data-loss and/or misclassification [2224]. For example, samples with titres greater than the seronegative threshold but lower than the seropositive threshold are classified as ‘equivocal’ and discarded from the analysis. Furthermore, titre levels of seropositive individuals in a given population may be affected by factors including host response, the degree of exposure to the pathogen and infection timing, which could lead to misclassification.

Mixture models are flexible statistical models that can be applied to continuous data from different clusters or populations, called components. Mixture models can therefore be applied to the absolute antibody titre values in serology datasets, rather than to the counts of titres in each of two classes (seropositive/seronegative) as is necessary for catalytic models [22]. The components’ distributions and their defining parameters (e.g., the mean titre of each component distribution) are inferred from a fitted mixture model which is used to estimate the FOI and population seroprevalence [22,25]. To date, mixture models have been applied to serological data to estimate the seroprevalence of infectious diseases such as parvovirus B19 and rubella in England [26,27], human papillomavirus in the Netherlands [23], measles in Italy [28], and a selection of arboviruses inlcuding DENV in Zambia [29]. In addition, mixture models have been used to develop frameworks capable of distinguishing between primary and post-primary DENV infections [30,31], and recent and historical influenza A infections [32]. Recently, DENV FOI was estimated using catalytic and mixture models applied to serological data collected in three locations in Vietnam (N > = 266) and in Chennai, India (N = 799) [31]. In this study, the estimates from mixture models were deemed more robust than those from binary catalytic models [31].

Here, we implement a simulation study to assess the ability of mixture and catalytic models to reconstruct the FOI value used for simulating the data. Furthermore, we add to the growing body of evidence exploring the use of mixture models by presenting a comparitative analysis of the DENV trasmission intensity estimates obtained from mixture and catalytic models applied to age-stratified serological datasets from Vietnam (N ≥ 2178, for years 2004–2009) and Indonesia (N = 3194, for 2014).

Methods

Ethics statement

Ethical approval for the secondary analysis of the age-stratified seroprevalence datasets was granted by the Imperial College Research Ethics Committee (Approval Reference 21IC7066).

Data

Age-stratified seroprevalence data.

DENV IgG data were collected in Long Xuyen, Vietnam, during a prospective epidemiological study that was conducted to assess the suitability of the area for future CYD-TDV vaccine efficacy trials, as described previously [33]. Samples were collected from children under 11 years old in 2004 and then from children under 15 years old during September to February in 2004–2005, 2005–2006, 2006–2007, 2007–2008 and 2008–2009 (Datasets A-1 to A-6, Table 1). The titres were measured using in-house ELISA assays (Arbovirus Laboratory of Pasteur Institute, Ho Chi Minh City).

thumbnail
Table 1. Description of the datasets used in the analyses.

Summary statistics including notation, region, the assay used, the year of testing, the age range of the children participating to the study and the sample sizes.

https://doi.org/10.1371/journal.pntd.0010592.t001

DENV IgG data from 30 urban subdistricts in Jakarta, Indonesia were collected from 3,194 children under 18 years old as part of a cross-sectional seroprevalence survey in 2014 [34] (Dataset B). Given the small spatial scale of the range of data collection, we did not account for spatial differences when modelling. IgG titres were measured using the commercial Panbio Dengue IgG Indirect ELISA kit.

Simulated datasets.

We simulated 540 antibody titre datasets (Dataset C), with the same age-distribution and sample size as the Indonesian seroprevalence survey data (Dataset B). For each simulation the distributions used for sampling seronegative and seropositive log(titres + 1) were selected from a normal, gamma or Weibull distribution. This gave 9 possible distribution pairs for seronegative and seropositive log(titre + 1) values, and we generated an equal number of simulations (N = 60) for each combination. Normal, gamma and Weibull distributions were chosen based on preliminary work on our antibody titre datasets showing that these mistures were most frequently selected among a wider set of distributions. Parameter values were randomly drawn from uniform distributions with limits as shown in S1 Table. The serostatus of each individual was drawn from a Bernoulli distribution with probability 1−e−λa, where a is the age of the individual and λ is the FOI (which is assumed to be constant with age and time), and therefore λa represents the cumulative FOI experienced by individuals over their lifetime. Log(titre + 1) values for each individual were subsequently randomly drawn from the respective component distributions. The analysis was conducted in the statistical programming language R [35].

Catalytic model

Data preparation.

Catalytic models rely on data that are binarily classified as seropositive or seronegative. For Datasets A-1 to A-6, a background/control titre (t) was measured for each assay. An individual titre was classified as seronegative if ≤t and seropositive otherwise. For Dataset B, samples with titres ≤ 9 PanBio units were classified as seronegative and ≥11 as seropositive. Titres >9 and <11 were discarded (28 out of 3,194 samples). For simulated Dataset C, titres were classified as seronegative if they were ≤X and seropositive if they were ≥Y. X and Y are thresholds that were optimised using the ‘true’ simulated serostatuses: the optim function in R, using the Nelder Mead algorithm, was used to calulate the X and Y values per simulated dataset resulting in the fewest titre misclassifications. The optimisation process occassionally failed to estimate realistic classification thresholds (X < 25% quantile of seronegative titres or Y > 75% quantile of seropositive titres) and these simulations were excluded from analysis (N = 31). For the remaining 509 simulations, titres >X and <Y were classified as equivocal and discarded (mean = 2 out of 3,194 samples, median = 1, interquartile range (IQR) = 2, range = 0 to 30). The mean percentage misclassification error rate of titres across the simulations was 6.6% (median = 2.8%, IQR = 9.8%, range = 0% to 41.9%).

Parameter estimation.

We used a catalytic model as previously described [14,36]. The yearly FOI, i.e., the per capita rate of infection experienced by individuals in a given year Xi, where X is the year the serosurvey was conducted, is assumed to be either constant in time (constant across the years) or time-varying (piecewise constant across the years). When we assumed a time-varying FOI, the number of FOI estimates is equal to the number of single year age groups available in the datasets (maximum age group A –minimum age group M). The proportion of seropositive individuals in age group a during year X (πa,X), was estimated as in Eq 1. Here, the yearly FOI (λXi) is summed over the lifetime of the individuals in age group a to give the cumulative FOI experienced by the individuals in this age group. If the minimum age group M does not equal 1, then we estimated an average FOI for the M years without age-specific data (XM to X) denoted λX−M. (1) When we assume a time-constant FOI over the whole study period, Eq 1 can be expressed as shown in Eq 2: (2) A binomial log-likelihood was assumed for the FOI (Eq 3), where Na is the total number of individuals and Pa is the number of seropositive individuals in age group a during year X [19]. The optim function in R was used to find the maximum likelihood estimate of the FOI using Eq 3. (3) When we assumed a time-varying FOI, the λXi values were averaged to produce a mean FOI experienced over the years in the study period (AM) (Eq 4) which was compared to the FOI estimated by the time-constant catalytic model and the mixture model. (4) We estimated 95% Confidence Intervals (CI) using a bootstrap method, where the titre data were sampled with replacement and the age-stratified proportion of seropositive indivuals was calculated, 500 times. The 95% CI was given by the 2.5% to 97.5% quantiles of the estimates from the catalytic models applied to the bootstrap samples.

Mixture model

Applying the mixture models to the titre distributions.

Mixture models were applied to the bimodal distribution of individual antibody titres as described in Bollaerts et al., 2012 and Hens et al., 2012 [22, 25]. All individual antibody titre measurements were used in each dataset, which differs from the data used for the catalytic model where equivocal titres were discarded and titre measurements were classified as either seropositive or seronegative. The mixture model defines the distribution (z) of the log(titres + 1) as a mixture of two distinct distributions: one for susceptible individuals (seronegative, zs) and one for individuals who have been previously infected (seropositive, zI). The two-component mixture model is represented by: (5) where fs and fI represent the probability density function of the seronegative and seropositive components, respectively, and where μ and σ represent the mean and standard deviation of each component, and πa,X represents the age-specific seroprevalence during year X, when the serosurvey was conducted.

The mixdist R package was used to fit the mixture models to the titre data by maximum likelihood using an Expectation Maximisation (EM) algorithm [37]. The package was adapted to allow fitting of different distributions for the seronegative and seropositive titre components: normal, gamma and Weibull distributions, giving 9 possible combinations. The best fitting mixture was chosen using the Akaike Information Criterion (AIC). For Dataset C, the estimated means (μs and μI) and standard deviations (σS and σI) for the seronegative and seropositive components of the best mixture were compared against the true parameter values used for simulating the data. We explored multiple parameterisations, including fixing the standard deviation of the two mixture components. For the Vietnamese datasets, we optimised the model having constrained the standard deviation of the seropositive component to multiple different values (for Dataset A-4 we set σI equal to all values from 0.02 to 0.08 in steps of 0.01, for the other five Datasets we set σI equal to all values from 0.05, to 0.15 in steps of 0.01). For the Indonesian dataset (Dataset B) the standard deviations of both components were constrained (σS was set equal to 0.10 to 0.15 in steps of 0.001, and σI was set equal to 0.15 to 0.3 in steps of 0.05).

Parameter estimation.

The relationship between the age-dependent mean log(titre + 1) (μa,X), the age-specific seroprevalence (πa,X) and the means of the mixture components (μs and μI) is described in Eq 6. We estimated μa,X by least-squares regression using a monotonically increasing P spline [22,25,38] using the mpspline.fit function from the serostat R package [39]. Equally spaced cubic polynomial segments (degree = 3) made up the spline. The optimal smoothing parameter (α) and number of segments (knots) were determined using the Bayesian Information Criterion, having explored combinations of α values (set equal to 0.001, 0.01, 0.1, 0.5, 1, 5, 10, 50, 100, 500) and knots (set equal to values in the sequence: 5 to the maximum number of x-axis age categories, step size = 1). The seroprevalence was calculated using Eq 7.

(6)(7)

The time-varying FOI was derived from the age-specific seroprevalence as described in Eq 8 [22], where the rate of change in the seroprevalence between two sequential age groups (a−1 and a) is divided by the proportion of seronegative individuals in age group a, to give the FOI experienced in the year Xa (λXa). Eq 8 can in turn be expressed as a function of the underlying antibody titre distribution as shown in Eq 9, where μa,X represents the derivative of the age-specific mean log(titre + 1). The μa,X terms were calculated by taking the gradient of the fitted μa,X spline at each age group a. The time-varying FOI can be averaged across the years in the study period to give the total FOI λ (Eq 4).

(8)(9)

The 95% CI around the FOI and seroprevalence estimates were calculated using a boostrap method, where the titre data were sampled with replacement 5000 times. The 95% CI were given by the 2.5% to 97.5% quantiles of the estimates from the bootstrap samples.

Bias in the mixture and catalytic model estimates of FOI and seroprevalence for Dataset C was calculated as the estimated value minus the true simulated value of the parameter. Uncertainty was calculated as the width of the 95% CIs around the parameter estimates. Coverage was calculated as the percentage of simulations where the estimated 95% CIs contained the true parameter value. Code for the simulation study analysis is available at: https://github.com/Tori-Cox/Mixture-catalytic-models.

Results

Simulated data

The mixture model identified the correct distributions used to simulate both seropositive and seronegative titres in 76.1% (411/540) of simulations, one of the two distributions in 22.2% (120/540) of simulations and did not correctly identify either distribution in 1.7% (9/540) of the simulations. Whether the distributions were gamma, normal or Weibull did not influence the ability of the mixture model to correctly identify them (S2 Table). The estimated 95% CIs contained the true parameter values used to simulate the data in 88.1% (476/540), 86.9% (469/540), 86.5% (467/540) and 89.4% (483/540) of simulations for μs, μI, σS and σI, respectively (S1A Fig). Simulations where the seronegative titre distribution was Weibull distributed were over-represented in the simulations which produced outlying estimates of μs, μI, σS and σI (S1B Fig).

The mixture model coverage for the FOI was 95% of the total simulations (513/540) and 95% (485/509) of the simulations included in the catalytic model analysis, and for the seroprevalence was 88% (475/540) and 89% (451/509) respectively. The time-varying catalytic model coverage for the FOI was 96.7% (492/509) and for the seroprevalence was 78.8% (401/509). The time-constant catalytic model coverage for the FOI and seroprevalence was 38.9% (198/509) and 55.0% (280/509). It should be noted that the time-varying catalytic model produced wider CIs compared to when assuming a time-constant FOI or when using a mixture model (Fig 1). Average bias in the FOI estimates (0.001 (95% CI -0.036, 0.065), -0.007 (95% CI -0.069, 0.029) and -0.007 (95% CI -0.095, 0.043) for the mixture, time-constant and time-varying catalytic models, respectively) and the seropreavelance estimates (-0.003 (95% CI -0.144, 0.108), -0.007 (95% CI -0.244, 0.087) and -0.005 (95% CI -0.241, 0.100)) was smaller for the mixture model estimates (Fig 1). The increased negative bias in the catalytic model estimates compared to the mixture model estimates demonstrates that the catalytic models are more prone to underestimation of FOI and seroprevalence (Figs 1 and 2). High antibody titre misclassification error rates were positively associated with increased bias in the parameter estimates from the catalytic models (S3 Fig). As expected, model performance was improved when we fitted the catalytic models to the simulated ‘true serostatus’ (i.e., without classifying the titres using optimised thresholds): the coverage of the FOI was 99% (536/540) (95% CI: 98%, 100%) and 42% (228/540) (95% CI: 38%, 47%) for the time-varying and time-constant FOI catalytic models respectively, and the coverage of the seroprevalence was 100% for both models. The average bias in the FOI estimates was 0.007 (95% CI -0.020, 0.056) and -0.005 (95% CI -0.016, -0.477), and in the seroprevalence estimates was 0.000 (95% CI -0.001, 0.001) and 0.002 (95% CI -0.001, 0.001) for the time-varying and time-constant FOI catalytic models respectively.

thumbnail
Fig 1. Bias, coverage, and degree of uncertainty for seroprevalence and force of infection (FOI) estimates using catalytic and mixture models fitted to simulated datasets (Dataset C).

Bias is calculated as the estimated parameter value minus the true parameter value. Uncertainty is the width of the 95% Confidence intervals (CIs) around the central estimates, calculated using the bootstrap method. The coverage is the percentage of simulations where the estimated CIs contained the true values. The dashed line at 95% shows the threshold for the ideal coverage. For the bias and the uncertainty, the mean and 95% CI across the 509 simulations are given. For the coverage, the 95% exact binomial CI are given.

https://doi.org/10.1371/journal.pntd.0010592.g001

thumbnail
Fig 2. True versus estimated seroprevalence and force of infection (FOI) values from the mixture and catalytic models fitted to the simulated datasets (Dataset C).

The catalytic model was run under the assumption that the FOI was time-constant or time-varying. The 95% Confidence Intervals for the estimated values were calculated using a bootstrap method and are shown here as error bars; the point denotes the central estimate. The Pearson’s correlation coefficients (R) are shown. The dashed line represents the line y = x and shows where points would be located in a scenario with zero bias in the estimated values.

https://doi.org/10.1371/journal.pntd.0010592.g002

Long Xuyen, Vietnam data

When we applied the mixture model (Fig 3) to the data from Long Xuyen, Vietnam, the total population-level seroprevalence estimates ranged from 0.163 (95% CI 0.138–0.188) in 2004 to 0.376 (95% CI 0.249–0.403) in 2005–2006. The seroprevalence estimates from the time-constant and time-varying catalytic models were consistent with each other, with the latter ranging from 0.189 (95% CI 0.163–0.217) in 2006–2007 to 0.299 (95% CI 0.262–0.337) in 2008–2009. The seroprevalence estimates from all three models were consistent (as determined by the 95% CIs) for 4 out of 6 datasets (Fig 4, S3 Table). The general trend in the age-specific seroprevalence estimates, specifically for Datasets A-2:A-5, differed significantly between the mixture model and the catalytic models, with the mixture model estimating higher seroprevalence at the older ages (Fig 5).

thumbnail
Fig 3. Mixture model fitted to the Vietnamese (A1:A6) and Indonesian (B) datasets.

The distribution of log(titre+1) is shown in dark grey, the fitted mixture model is shown in blue, and the red dashed lines represent the mean antibody titre of each component of the fitted mixture model (μs and μI for the seronegative and seropositive components respectively). Note that the y-axis limits differ for each panel.

https://doi.org/10.1371/journal.pntd.0010592.g003

thumbnail
Fig 4. Force of infection (FOI) and total population level seroprevalence (SP) estimates from the mixture model and the catalytic models fitted to the observed data.

The catalytic model was run under the time-constant and time-varying FOI assumption. The 95% Confidence Intervals (CI) which were calculated by bootstrapping for all models are given as error bars. Note that the y-axis limits differ for each panel.

https://doi.org/10.1371/journal.pntd.0010592.g004

thumbnail
Fig 5. Age-specific seroprevalence estimates for the IgG data from Vietnam (Dataset A1:A6) and from Indonesia (Dataset B).

Mixture model estimates are in orange, catalytic model estimates are in green and blue when applied under the assumption that the FOI is time-constant or time-varying respectively. Shading represents the 95% Confidence Intervals (CI). The grey points show the observed seroprevalence per age group calculated from the binarily classified IgG data (seropositive individuals / tested individuals), with error bars indicating the 95% exact binomial CIs. The seroprevalence data and model estimates are overlayed for the purpose of comparison. However, it is important to note that the mixture model was not fitted to the data (grey points), as the former does not depend on the titre classification. The size of the grey data points represents the number of individuals tested in each age group.

https://doi.org/10.1371/journal.pntd.0010592.g005

The average FOI estimated by the mixture model ranged from 0.026 (95% CI 0.019–0.033) for the period 1993 to 2004, to 0.099 (95% CI 0.077–0.124) for 1990 to 2005. The average FOI estimated by the time-varying catalytic model ranged from 0.024 (95% CI 0.007–0.058) for the period 1991 to 2007, to 0.050 (95% CI 0.001–0.118) for 1990 to 2005. The FOI estimates from the mixture model versus the time-varying catalytic model were consistent for 6 out of 6 datasets, and versus the time-constant catalytic model they were consistent for 3 out of 6 datasets (Fig 4, S3 Table). There is a higher degree of uncertainty around the estimates from the catalytic model when assuming a time-varying FOI compared to the time-constant FOI assumption (Fig 4). We observe greater differences in the estimates from each model when comparing the year-specific FOI as opposed to the averaged total FOI (S4 Fig).

Indonesian data

The mixture and catalytic models fitted to the Indonesian data produced consistent FOI, total population seroprevalence and age-specific seroprevalence estimates. The FOI for the period 1996 to 2014 was estimated at 0.154 (95% CI: 0.106–0.213), 0.143 (95% CI 0.136–0.150) and 0.164 (95% CI 0.022–0.814), and the seroprevalence in 2014 was estimated at 0.718 (95% CI 0.694–0.741), 0.700 (95% CI 0.686–0.714) and 0.700 (95% CI 0.655–0.743) by the mixture model and the time-constant and time-varying catalytic models, respectively (Fig 4, S3 Table).

Discussion

In this analysis, we explored the accuracy and bias of FOI and seroprevalence estimates obtained from mixture and catalytic models applied to serological data. The catalytic models were applied assuming a time-constant or time-varying FOI. We performed a simulation study to compare the performance of each model with known parameter values used to generate the simulated data, and we observed significantly greater accuracy in FOI and seroprevalence estimates from the mixture and time-varying catalytic models than time-constant catalytic models. We observed reduced bias and uncertainty in estimates from the mixture compared to the time-varying catalytic model.

In our simulation study, larger bias in the catalytic model estimates of FOI and seroprevalence (Figs 1 and 2), was associated with increased serostatus misclassification (S3 Fig). Serostatus misclassification occurred more often in simulations where the difference between the mean log(titre + 1) for the susceptible/seronegative component and the mean log(titre + 1) for the infected/seropositive component was lower (S2 Fig), indicating greater overlap between the distributions of the two components. Our results are consistent with previous work which showed greater bias in seroprevalence estimates using methods which employ cut-off thresholds to classify simulated antibody data as opposed to mixture models, when there was high overlap in the underlying components [24].

Differences in the degree of overlap between components in real serological datasets are likely impacted by many factors, including differences in the ELISA tests used to measure antibody titres, the age groups sampled and the underlying age structure of the population as well as the transmission setting and spatiotemporal heterogeneities in the risk of infection at the local scale. In datasets where there is clear separation in the bimodal distribution of antibody titres, catalytic and mixture models are expected to produce more similar estimates of FOI and seroprevalence as fewer samples are misclassified during the binary classification of the data needed to calibrate catalytic models [22,24]. This is consistent with the results from our simulation study and with the reduced variability we observe in our FOI and seroprevalence estimates from each model when they were applied to serological data from Indonesia compared to Vietnam, where the former had higher separation of titre distributions (Figs 3 and 4, S3 Table).

The estimates for Indonesia from each model were consistent with each other and with previously published FOI estimates from catalytic models fitted to case-notification data from 2008–2017 in Jakarta, Indonesia (0.130, 95% CI: 0.129–0.131) [12], and seroprevalence estimates from time-constant catalytic models applied to the same serology dataset (Dataset B) [34,40]. Our results show that the mixture and catalytic models do not significantly differ in their FOI and seroprevalence estimates in this setting. In contrast, the mixture model applied to the six datasets from Vietnam produced more variable estimates (FOI range = 0.026–0.099, seroprevalence range = 0.16.3–0.376) than the catalytic models (FOI range = 0.023–0.037 and 0.024–0.050, seroprevalence range = 0.190–0.300 and 0.189–0.299 under the assumption of a time-constant or time-varying FOI respectively). The variance was even greater in the age-specific seroprevalence and yearly FOI estimates (Figs 5 and S4). As expected, the time-varying catalytic model and the mixture model (which implicitly models FOI as time-varying), were better able to capture the age-specific seroprevalence than the time-constant catalytic model. The estimates from the mixture model tended to exceed those obtained from the catalytic models (Fig 4, S3 Table). Given the greater negative bias observed for the catalytic models in our simulation study, we expect the higher mixture model estimates to be more accurate for the Vietnamese setting. Lam et al., similarly observed higher FOI estimates when applying mixture models compared to catalytic models to serological data from Vietnam, for example 0.12 (95% CI: 0.11–0.14) compared to 0.07 (95% CI: 0.06–0.09), in Ho Chi Minh City [31].

A major advantage of the mixture model is the comparative ease with which it can be applied to serological data to estimate transmission intensity without the need to use thresholds to process the data. Furthermore, to generate robust estimates, there are fewer data requirements for mixture models than for catalytic models: in the former, the data are pooled, and age is used only to calculate the age-specific mean log(titre + 1) using a spline, meaning that there are no constraints on the number of participants per age category. However, it is important to consider the bias that will be introduced if the mixture distributions fit the titre data poorly [24]. In this study we accounted for this by using an information criterion to select the best fitting models from a range of options. In the future we will explore implementing the models in a Bayesian framework [22] which would allow us to perform posterior predictive checks to more robustly assess model fit. It would also be interesting to explore the FOI estimates obtained when applying a mixture model with more than two mixture distributions, which may better account for the complex immunity profiles observed in areas where multiple DENV serotypes circulate. For example, Biggs et al. and Lam et al. fit three-component mixture distributions to DENV antibody titre data in the Philippines and Vietnam respectively, to develop frameworks capable of distinguishing between post and primary DENV infection [30,31] by specifying mixture components for seronegative, seropositive with a primary infection and seropositive with post-primary infections.

In summary, our results suggest that mixture models represent a good alternative to catalytic models to quantify DENV time-varying FOI and seroprevalence from age-stratified serological data, with potentially less bias and less uncertainty. They may be particularly useful when estimating FOI from data where there is high overlap between the component distributions, where the risk of serostatus misclassification and bias introduction when using cut-off threshold methods is greater (S2 and S3 Figs).

We have provided code to run the simulation study to encourage further exploration and comparison of the different methodologies. Critically, further investigation of the use of mixture models depends on the availability of raw antibody titre data. For these reasons, we would encourage current and future seroprevalence studies on DENV, as well as other infectious diseases, to publish anonymised individual-level antibody titre data where it is possible to do so.

Supporting information

S1 Table. Parameter values used for generating 540 simulated datasets.

https://doi.org/10.1371/journal.pntd.0010592.s001

(DOCX)

S2 Table. Number of simulations where the mixture model correctly specified the distributions of the seronegative and/or seropositive component of the simulated antibody titre datasets (Dataset C).

Here, n represents the number of simulated datasets out of 540.

https://doi.org/10.1371/journal.pntd.0010592.s002

(DOCX)

S3 Table. Force of infection (FOI) and total population level seroprevalence (SP) estimates from the mixture model and the catalytic models fitted to the observed data.

The observed data is serology data collected in Vietnam (Datasets A-1:A-6) and Indonesia (Dataset B). 95% Confidence Intervals (CI) were calculated by the bootstrap method.

https://doi.org/10.1371/journal.pntd.0010592.s003

(DOCX)

S1 Fig.

(A) True versus estimated parameter values from the mixture model fitted to the simulated datasets (Dataset C). The estimated parameters are the mean log(titre + 1) value of the seronegative/susceptible (S) and seropositive/infected (I) components (μs and μI respectively) and the corresponding standard deviations (σs and σI). Red indicates the estimates where the true parameter value was not captured by the estimates (i.e., the 95% Confidence Interval of the estimate did not contain the true value). Note that the axes limits differ for each panel. (B) The percentage of parameter outliers after fitting the mixture model to Dataset C, per seronegative and seropositive titre family distributions. The percentage of the total number of outliers of μs, μI, σS and σI (red in panel A) per distribution combination on the x-axis, where the two letters represent the seronegative (first letter) and the seropositive (second letter) distribution pair (N = normal, G = gamma and W = Weibull).

https://doi.org/10.1371/journal.pntd.0010592.s004

(TIF)

S2 Fig. Association between the true component mean titre values in Dataset C versus the serostatus misclassification error.

The x-axis shows the difference between the true mean log(titre + 1) value of the seronegative (μs) and the seropositive component (μI) for each realisation over 509 simulated datasets. The titres are classified as seropositive or seronegative using realisation-specific optimised titre thresholds. The loess regression line and corresponding 95% Confidence Intervals are shown.

https://doi.org/10.1371/journal.pntd.0010592.s005

(TIF)

S3 Fig. Serostatus misclassification versus catalytic model estimate bias.

The bias in the estimates from the time-constant and time-varying catalytic models for the realisations over 509 simulated datasets are plotted against the serostatus misclassification error rate. The serostatuses of the titres in each of the 509 simulated datasets are classified as seropositive or seronegative using realisation-specific optimised titre thresholds. The serostatus misclassification error rate is calculated as the percentage of titres in each dataset that are misclassified. Absolute bias is calculated as the absolute value of the estimated value–true value for the force of infection (FOI) and seroprevalence. The linear regression lines and corresponding 95% Confidence Intervals are shown, as well as the Pearson’s correlation coefficients (R). Three outliers with FOI estimates > 0.4 were removed from the time-varying catalytic model panel and corresponding regression line estimation.

https://doi.org/10.1371/journal.pntd.0010592.s006

(TIF)

S4 Fig. Force of infection (FOI) estimates across time.

Yearly FOI estimates from the catalytic and mixture models. 95% Confidence Intervals were calculated by bootstrapping.

https://doi.org/10.1371/journal.pntd.0010592.s007

(TIF)

Acknowledgments

We would like to acknowledge Sanofi Pasteur for provision of the data used in this research. We acknowledge the participants, their families and the principal investigators involved in the studies.

References

  1. 1. Brady OJ, Hay SI. The Global Expansion of Dengue: How Aedes aegypti Mosquitoes Enabled the First Pandemic Arbovirus. Annu Rev Entomol. 2020;65(1):1–18. pmid:31594415
  2. 2. Simmons CP, Farrar JJ, van Vinh Chau N, Wills B. Dengue. N Engl J Med. 2012 Apr 12;366(15):1423–32. pmid:22494122
  3. 3. Cattarino L, Rodriguez-Barraquer I, Imai N, Cummings DAT, Ferguson NM. Mapping global variation in dengue transmission intensity. Sci Transl Med. 2020;12(528):1–11. pmid:31996463
  4. 4. Gibbons R V. Dengue conundrums. Int J Antimicrob Agents. 2010;36(SUPPL. 1):S36–9. pmid:20696556
  5. 5. Fritzell C, Rousset D, Adde A, Kazanji M, Van Kerkhove MD, Flamand C. Current challenges and implications for dengue, chikungunya and Zika seroprevalence studies worldwide: A scoping review. PLoS Negl Trop Dis. 2018;12(7):1–29. pmid:30011271
  6. 6. Gubler DJ. Dengue, Urbanization and globalization: The unholy trinity of the 21 st century. Trop Med Health. 2011;39(4 SUPPL.):3–11. pmid:22500131
  7. 7. Vannice KS, Durbin A, Hombach J. Status of vaccine research and development of vaccines for dengue. Vaccine. 2016;34(26):2934–8. pmid:26973072
  8. 8. Luo R, Fongwen N, Kelly-Cirino C, Harris E, Wilder-Smith A, Peeling RW. Rapid diagnostic tests for determining dengue serostatus: a systematic review and key informant interviews. Clin Microbiol Infect. 2019;25(6):659–66. pmid:30664935
  9. 9. Kucharski AJ, Kama M, Watson CH, Aubry M, Funk S, Henderson AD, et al. Using paired serology and surveillance data to quantify dengue transmission and control during a large outbreak in Fiji. Elife. 2018;7:1–26. pmid:30103854
  10. 10. O’Reilly KM, Hendrickx E, Kharisma DD, Wilastonegoro NN, Carrington LB, Elyazar IRF, et al. Estimating the burden of dengue and the impact of release of wMel Wolbachia-infected mosquitoes in Indonesia: A modelling study. BMC Med. 2019;17(1):1–14.
  11. 11. Lauer SA, Sakrejda K, Ray EL, Keegan LT, Bi Q, Suangtho P, et al. Prospective forecasts of annual dengue hemorrhagic fever incidence in Thailand, 2010–2014. Proc Natl Acad Sci USA. 2018;115(10):E2175–82. pmid:29463757
  12. 12. O’Driscoll M, Imai N, Ferguson N, Hadinegoro SR, Satari HI, Tam C, et al. Spatiotemporal Variability in Dengue Transmission Intensity in Jakarta, Indonesia. 2018;1–62.
  13. 13. Hens N, Aerts M, Faes C, Shkedy Z, Lejeune O, Van Damme P, et al. Seventy-five years of estimating the force of infection from current status data. Epidemiol Infect. 2010;138(6):802–12. pmid:19765352
  14. 14. Ferguson NM, Donnelly CA, Anderson RM. Transmission dynamics and epidemiology of dengue: Insights from age-stratified sero-prevalence surveys. Philos Trans R Soc B Biol Sci. 1999;354(1384):757–68. pmid:10365401
  15. 15. Grenfell BYBT, Anderson RM. The estimation of age-related rates of infection from case notifications and serological data. 1985;(1985):419–30.
  16. 16. Nokes BYDJ, Anderson RM. Rubella epidemiology in South East England. 1986;(1986):291–304.
  17. 17. Schenze D, Dietz K, Frösner G. Antibody against Hepatitis A in seven European Countries: II. Statistical analysis of cross-sectional surveys. Am J Epidemiol. 1979;10(1):70–76.
  18. 18. Delgado S, Neyra RC, Machaca VRQ, Juárez JA, Chu LC, Verastegui MR, et al. A history of Chagas disease transmission, control, and re-emergence in peri-rural La Joya, Peru. PLoS Negl Trop Dis. 2011;5(2).
  19. 19. Imai N, Dorigatti I, Cauchemez S, Ferguson NM. Estimating Dengue Transmission Intensity from Sero-Prevalence Surveys in Multiple Countries. PLoS Negl Trop Dis. 2015;9(4):1–19. pmid:25881272
  20. 20. Salje H, Paul KK, Paul R, Rodriguez-Barraquer I, Rahman Z, Alam MS, et al. Nationally-representative serostudy of dengue in Bangladesh allows generalizable disease burden estimates. Elife. 2019;8:1–17. pmid:30958263
  21. 21. Rodriguez-Barraquer I, Salje H, Cummings DA. Opportunities for improved surveillance and control of dengue from age-specific case data. Elife. 2019;8. pmid:31120419
  22. 22. Bollaerts K, Aerts M, Shkedy Z, Faes C, Beutels P, Hens N. Estimating the population prevalence and force of infection directly from antibody titres. Stat Modelling. 2012;12(5):441–62.
  23. 23. Vink MA, Kassteele J Van De, Wallinga J, Teunis PFM, Bogaards JA. Estimating Seroprevalence of Human Papillomavirus Type 16 Using a Mixture Model with Smoothed Age-dependent Mixing Proportions. 2015;26(1).
  24. 24. Kafatos G, Andrews NJ, Conway KJMC, Maple PAC. Is it appropriate to use fixed assay cut-offs for estimating seroprevalence? 2016;(2016):887–95. pmid:26311119
  25. 25. Hens N, Shkedy Z, Aerts M, Faes C, Damme P Van, Beutels P. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. 2012;314.
  26. 26. Gay NJ. Analysis of serological surveys using mixture models: Application to a survey of parvovirus B19. Stat Med. 1996;15(14):1567–73. pmid:8855482
  27. 27. Hardelid P, Williams D, Dezateux C, Tookey PA, Peckham CS, Cubitt WD, et al. Analysis of rubella antibody distribution from newborn dried blood spots using finite mixture models. Epidemiol Infect. 2008;136(12):1698–706. pmid:18294427
  28. 28. Rota MC, Massari M, Gabutti G, Guido M, De Donno A, Atti MLC degli. Measles serological survey in the Italian population: Interpretation of results using mixture model. Vaccine. 2008;26(34):4403–9. pmid:18585420
  29. 29. Chisenga CC, Bosomprah S, Musukuma K, Mubanga C, Chilyabanyama ON, Velu RM, et al. Sero-prevalence of arthropod-borne viral infections among Lukanga swamp residents in Zambia. PLoS One. 2020;15(7):1–13. pmid:32609784
  30. 30. Biggs JR, Sy AK, Brady OJ, Kucharski AJ, Funk S, Reyes MAJ, et al. A serological framework to investigate acute primary and post-primary dengue cases reporting across the Philippines. BMC Med. 2020;18(1):1–14.
  31. 31. Lam HM, Phuong HT, Vy NHT, Thanh NT Le, Dung PN, Muon TTN, et al. Serological inference of past primary and secondary dengue infection: Implications for vaccination. J R Soc Interface. 2019;16(156). pmid:31362614
  32. 32. Nhat NTD, Todd S, De Bruin E, Thao TTN, Vy NHT, Quan TM, et al. Structure of general-population antibody titer distributions to influenza A virus. Sci Rep. 2017;7(1):1–9.
  33. 33. Tien NTK, Luxemburger C, Toan NT, Pollissard-Gadroy L, Huong VTQ, Van Be P, et al. A prospective cohort study of dengue infection in schoolchildren in Long Xuyen, Viet Nam. Trans R Soc Trop Med Hyg [Internet]. 2010;104(9):592–600. pmid:20630553
  34. 34. Prayitno A, Taurel AF, Nealon J, Satari HI, Karyanti MR, Sekartini R, et al. Dengue seroprevalence and force of primary infection in a representative population of urban dwelling Indonesian children. PLoS Negl Trop Dis. 2017;11(6):1–16.
  35. 35. R Core Team. R: A language and environment for statistical computing. 2019.
  36. 36. Rodriguez-Barraquer I, Cordeiro MT, Braga C, de Souza W V., Marques ET, Cummings DAT. From re-emergence to hyperendemicity: The natural history of the dengue epidemic in Brazil. PLoS Negl Trop Dis. 2011;5(1):1–7. pmid:21245922
  37. 37. Macdonald P, Du J. mixdist: Finite Mixture Distribution. 2018.
  38. 38. Eilers PHC, Marx BD, Ellers PHC. Linked references are available on JSTOR for this article: Flexible Smoothing with B-splines and Penalties. Stat Sci. 1996;11(2):89–102.
  39. 39. Kovac T. serostat: Modeling Infectious Disease Parameters Based on Serological and Social Contact. 2018.
  40. 40. Tam CC, O’Driscoll M, Taurel AF, Nealon J, Hadinegoro SR. Geographic variation in dengue seroprevalence and force of infection in the urban paediatric population of Indonesia. PLoS Negl Trop Dis. 2018;12(11):1–12. pmid:30388105