Serological Measures of Malaria Transmission in Haiti: Comparison of Longitudinal and Cross-Sectional Methods

Background Efforts to monitor malaria transmission increasingly use cross-sectional surveys to estimate transmission intensity from seroprevalence data using malarial antibodies. To date, seroconversion rates estimated from cross-sectional surveys have not been compared to rates estimated in prospective cohorts. Our objective was to compare seroconversion rates estimated in a prospective cohort with those from a cross-sectional survey in a low-transmission population. Methods and Findings The analysis included two studies from Haiti: a prospective cohort of 142 children ages ≤11 years followed for up to 9 years, and a concurrent cross-sectional survey of 383 individuals ages 0–90 years old. From all individuals, we analyzed 1,154 blood spot specimens for the malaria antibody MSP-119 using a multiplex bead antigen assay. We classified individuals as positive for malaria using a cutoff derived from the mean plus 3 standard deviations in antibody responses from a negative control set of unexposed individuals. We estimated prospective seroconversion rates from the longitudinal cohort based on 13 incident seroconversions among 646 person-years at risk. We also estimated seroconversion rates from the cross-sectional survey using a reversible catalytic model fit with maximum likelihood. We found the two approaches provided consistent results: the seroconversion rate for ages ≤11 years was 0.020 (0.010, 0.032) estimated prospectively versus 0.023 (0.001, 0.052) in the cross-sectional survey. Conclusions The estimation of seroconversion rates using cross-sectional data is a widespread and generalizable problem for many infectious diseases that can be measured using antibody titers. The consistency between these two estimates lends credibility to model-based estimates of malaria seroconversion rates using cross-sectional surveys. This study also demonstrates the utility of including malaria antibody measures in multiplex assays alongside targets for vaccine coverage and other neglected tropical diseases, which together could comprise an integrated, large-scale serological surveillance platform.

and the probability of recovery in period t is: In both probabilities, λ is the incidence rate and ρ is the recovery rate. With serological antibody data, the incidence rate is estimated by seroconversion, and recovery by seroreversion. The parameters can be estimated empirically using maximum likelihood. For a binomial random variable Y i,t that indicates infection status for individual i at time t, the part of likelihood function that depends the probabilities of infection and recovery under the model is: where P I (t) and P R (t) are the transition probabilities in equations 1 and 2 and Y i,t (s t , s t+1 ) are indicator variables for the status of individual i (0=negative, 1=positive) at the beginning and end of the period. The log likelihood is: We estimated the seroconversion and seroreversion rates by maximizing the likelihood in equation 4 given the observed data. The R function used for the likelihood is at the end of this appendix. To estimate the variance of prevalence estimates and model-based seroconversion and seroreversion rates in the longitudinal cohort, we used a clustered bootstrapped approach where we resampled individuals with replacement 10,000 times, and in each iteration estimated the parameters using maximum likelihood. This preserved the within-child correlation in outcome measurements. The SD of the bootstrap distribution was used to estimate the SE of the parameters, and the 2.5 and 97.5 percentiles of the bootstrap distribution were used as the 95% confidence intervals.

Estimating seroconversion rates using cross-sectional data
With cross-sectional data, direct information about seroconversion and seroreversion is unknown since we do not observe the same individual at two points in time and cannot identify incident cases. Instead, we only have current status information for an individual at one point in time when they are a particular age. Investigators have used a model similar to the one described in the last section to model the age-specific probability of infection [2,3]. The probability of infection at age a is modeled as: Note that this is similar to equation 1, but rather than modeling incident seroconversions and using person-time in a period of observation (t), here the equation summarizes the prevalence at age (a). Since there is only one probability in the model, the log likelihood function is identical in form to equation 4 but without the last two terms.
where Y i is an indicator variable equal to 1 if individual i is positive and P (a) is the probability of infection described in equation 5. Estimates ofλ andρ in a population can be obtained by maximizing the likelihood function given the observed status Y i . We fit the age-specific prevalence model to the cross-sectional dataset using all individuals aged 0 -90 years old, and separately for individuals ages 0-11 years old for a direct comparison with the longitudinal data.

Estimating 2 seroconversion rates using cross-sectional data
One extension to the basic model for cross sectional data is to allow the seroconversion rate to differ for different age groups, and investigators have chosen the breakpoint for allowing the rate to change at a point that maximizes the overall likelihood (identified with a profile likelihood plot) [4][5][6][7]. We used a similar approach, but used cross-validation to select the break point in age to avoid the potential for choosing a break point that overfitted the data [8,9]. If the entire dataset were used to choose a breakpoint parameter that maximizes the likelihood, it is possible for the model to overfit the data. Instead, choosing model parameters such as the breakpoint age for where the seroconversion rate should change using V -fold cross validation should be far less sensitive to overfitting. Hastie et al. [8] provide details of V -fold cross-validation; in brief, the approach randomly splits the data into V roughly equal sized parts. For the vth part, the model is fit on the other V −1 parts of the data, and loss (prediction error) is calculated on the vth part that is left out as the test set. The process is repeated for each of the V splits, and the loss function is averaged over the V splits. In this particular application, we used V =5 data splits in the cross validation and the negative log likelihood as the loss function. Finally, we fit a single seroreversion rate across all ages in the survey (not separately for each age group). The standard maximum likelihood approach and cross validation approach produced highly comparable loss functions in this dataset ( Figure S1). Both functions were maximized with a split at age 8 years. Figure S2 plots the model predictions allowing for two seroconversion rates along with observed seroprevalence.  Table 2 (points) and predicted prevalence from the reverse catalytic model (line). The model fitted to the data allowed for different seroconversion rates below and above age 8 (chosen through cross-validation). For ages ≤ 8 years, the seroconversion rate was 0.020 (0.006, 0.033) and the seroconversion rate for ages >8 years was 0.043 (0.034, 0.051). The model estimated a single seroreversion rate over the entire population: 0.024 (0.005, 0.043).

R functions used for maximum likelihood estimation
R function for maximum likelihood estimation of a reversible catalytic model fit to cross-sectional data, using an example from Williams [10].