A Bivariate Mixture Model for Natural Antibody Levels to Human Papillomavirus Types 16 and 18: Baseline Estimates for Monitoring the Herd Effects of Immunization

Post-vaccine monitoring programs for human papillomavirus (HPV) have been introduced in many countries, but HPV serology is still an underutilized tool, partly owing to the weak antibody response to HPV infection. Changes in antibody levels among non-vaccinated individuals could be employed to monitor herd effects of immunization against HPV vaccine types 16 and 18, but inference requires an appropriate statistical model. The authors developed a four-component bivariate mixture model for jointly estimating vaccine-type seroprevalence from correlated antibody responses against HPV16 and -18 infections. This model takes account of the correlation between HPV16 and -18 antibody concentrations within subjects, caused e.g. by heterogeneity in exposure level and immune response. The model was fitted to HPV16 and -18 antibody concentrations as measured by a multiplex immunoassay in a large serological survey (3,875 females) carried out in the Netherlands in 2006/2007, before the introduction of mass immunization. Parameters were estimated by Bayesian analysis. We used the deviance information criterion for model selection; performance of the preferred model was assessed through simulation. Our analysis uncovered elevated antibody concentrations in doubly as compared to singly seropositive individuals, and a strong clustering of HPV16 and -18 seropositivity, particularly around the age of sexual debut. The bivariate model resulted in a more reliable classification of singly and doubly seropositive individuals than achieved by a combination of two univariate models, and suggested a higher pre-vaccine HPV16 seroprevalence than previously estimated. The bivariate mixture model provides valuable baseline estimates of vaccine-type seroprevalence and may prove useful in seroepidemiologic assessment of the herd effects of HPV vaccination.


Introduction
Human papillomavirus (HPV) is among the most prevalent sexually transmitted infections. Persistent infection with high-risk HPV is the necessary cause for the development of cervical cancer, and may also lead to anogenital and oropharyngeal carcinomas [1]. HPV types 16 and 18 are the main focus of current vaccination programs, as these high-risk types are responsible for the majority of cancer cases [2][3][4][5].Vaccination against HPV16 and -18 has been introduced in many countries, including the Netherlands, but eligibility is typically restricted to preadolescent girls and uptake is relatively low; approximately 60% of (pre)adolescent girls in the Netherlands have been vaccinated [6]. Vaccination of preadolescents is not expected to have a noticeable impact on cancer incidence within the coming decades. HPV16 and -18 infections can be acquired soon after sexual debut, but the development of cancer after infection may take several decades [7]. To anticipate the population impact of HPV vaccination at an earlier instance, post-vaccine monitoring programs targeting HPV-related surrogate endpoints have been introduced in many countries [8,9]. Many of these focus on time-trend analyses in the incidence or prevalence of type-specific HPV infections, anogenital warts, and cervical lesions. Serological surveys might also be useful for observing changes in infection dynamics, but serology is still an underutilized tool in HPV monitoring programs.
Serological surveys are relatively inexpensive and only a small amount of serum is necessary to test for antibodies against a variety of pathogens. These surveys can be used for monitoring the antibody levels in vaccinated individuals, and to inform on post-vaccine changes in infection risk in the non-vaccinated population, the so-called herd effect of mass immunization [10,11]. These aspects are especially relevant for monitoring HPV vaccination, because both the duration of protection against high-risk HPV types and the herd effects of vaccinating against HPV16 and -18 are still unknown. Herd effects may constitute an important aspect of the overall impact of HPV vaccination programs, as demonstrated by the rapid fall in anogenital warts diagnoses in vaccinated as well as non-vaccinated cohorts in countries with as satisfactory uptake of quadrivalent HPV vaccine [12], which includes low-risk HPV types 6 and 11 associated with warts. The extent of indirect protection against high-risk HPV types will likely be smaller than against low-risk types [13], but substantial herd effects are nonetheless predicted for non-vaccinated women as well as men [14,15].
In principle, monitoring for herd effects against HPV16 and -18 could be integrated with HPV DNA screening for precancerous cervical lesions. The Netherlands will be the first country to adapt their organized screening program on the basis of primary HPV DNA testing, using the cobas 1 HPV test which detects HPV16 and -18 individually and a pool of 12 other high-risk HPV types [16]. However, organized screening only starts at 30 years of age in the Netherlands, and monitoring in younger cohorts is hampered by the lack of pre-vaccine data that could serve as a benchmark. Moreover, herd effects in men would go unnoticed by reliance on HPV DNA testing in cervical screening. As an alternative, herd effects can be monitored by means of serological surveys, two of which have already been carried out prior to the introduction of HPV vaccination in the Netherlands, with a third one scheduled for 2016/2017. These surveys have been informative in studying infection dynamics of other vaccine-preventable diseases, such as mumps and rubella [17,18]. An additional advantage regarding HPV is the straightforward attribution of vaccination status, as vaccine-derived antibody levels are approximately 10-100 times higher as compared to naturally derived HPV-specific antibodies [19]. Conversely, changes in antibody levels among non-vaccinated individuals could be employed to infer herd effects of HPV vaccination.
Use of serology for post-vaccine monitoring is complicated, because the antibody response to HPV infection is generally weak. Currently used serological assays are able to accurately quantify vaccine-induced antibody levels, but detection of natural antibodies is difficult due to a poor signal to noise ratio [20]. Consequently, classifying individuals as naturally seropositive for multiple HPV types may suffer from a high misclassification rate. Changes in the population antibody levels may still be detectable, but require an appropriate statistical model to inform about herd effects of HPV vaccination.
Previously, we have shown that a probabilistic assignment based on a mixture model can be used for the estimation of seroprevalence as a function of age [21]. With multi-strain pathogens, such as HPV, this approach neglects possible correlation between type-specific serologic test results within subjects, caused e.g. by inter-individual differences in sexual behavior and immune response. Improvement could therefore be obtained through joint estimation of seroprevalence against multiple HPV types, as type-specific antibody concentrations often display correlation, especially when measured by multiplex immunoassays [22]. Here we propose a bivariate mixture model, in which the age-dependent HPV16 and -18 seroprevalence is jointly estimated from correlated antibody responses against HPV16 and -18. The bivariate model provides information on the joint occurrence of HPV types and is shown to enable a more reliable classification of singly and doubly seropositive individuals than a combination of separate univariate analyses.

Ethics statement
A signed informed consent was obtained from all participants of the serological survey carried out in the Netherlands in 2006/2007. For those below 18 years of age, signed informed consent was also obtained from the parents, guardians or care takers. The study proposal was approved by the medical ethics testing committee of the foundation of therapeutic evaluation of medicines (METC-STEG) in Almere, The Netherlands (ISRCTN 20164309).

Serological data
We analyzed the serum concentrations of HPV16 and -18 immunoglobulin G (IgG) antibodies in women who participated in a large cross-sectional survey, representative of the Dutch general population. The samples were collected in 2006/2007, before the introduction of HPV16 and -18 vaccination in the Dutch national immunization program in 2009. A total of 3,875 randomly sampled women between 0 and 79 years of age provided a serum sample. HPV type-specific IgG antibodies against L1 virus-like particles (VLP) were tested with a VLP-based multiplex immunoassay [23]. The assay measures the antibody concentrations to 7 high-risk HPV types simultaneously, and it has a lower limit of detection at 0.08 luminex units per milliliter (LU/mL) for HPV16 and at 0.03 LU/mL for HPV18.

Bivariate mixture model
The log-transformed HPV16 and -18 antibody concentrations x i ¼ x 16 i ; x 18 i À Á of observations i = 1, . . ., 3875 are described by a Gaussian mixture model: with four bivariate normal component densities with unknown means and covariance matrices. The four component densities represent individuals that are: 1. Seronegative for both HPV types, with mean and covariance matrix: Seropositive for HPV16 and seronegative for HPV18, with mean and covariance matrix: The subscripts of the means, standard deviations, and correlations (parameters μ, σ and ρ respectively) indicate to which mixture component the parameter belongs (e.g., the HPV16 seropositive and HPV18 seronegative component is represented by + −), and the superscript represents the HPV type.

Parameter estimation
Parameters were estimated by Markov chain Monte Carlo (MCMC) simulation with JAGS, a program for Bayesian analysis using Gibbs sampling. Each observation contributed to the likelihood as follows: We imputed antibody concentrations below the detection limit of the VLP-based multiplex immunoassay (S1 Text). To avoid label switching in posterior simulations, the positive mixture means were re-parameterized as: with all Δ' s greater than or equal to zero. By implication, we ignore the possibility that infection with HPV16 or -18 could lead to a signal below the noise of the assay in seronegative individuals. We took normal prior distributions for the seronegative mixture means and half-normal priors for the Δ's. Uniform prior distributions were taken for the standard deviations and the correlation parameters, and a Dirichlet prior was assumed for the mixing proportions. We ran four parallel MCMC chains. Convergence of the MCMC chains was inspected visually. Computations were performed with the statistical software R (S2 Text).

Model scenarios and model selection
Univariate estimates of HPV16 and -18 seroprevalence (S1 Fig) are presented both discretized by age group and as a smooth function of age. We evaluate a suite of mixture models to describe the joint distribution of HPV16 and -18 antibody concentrations (Table 1). Scenario 1 assumes that HPV16 and -18 antibody concentrations are independent and follow two-component mixture distributions. The model of Scenario 1 is basically the product of two univariate mixture models, one for HPV16 and one for HPV18. Likewise, the model of Scenario 1 has twice the number of parameters of a univariate model, i.e. two μ's and two Δ's, four σ's and eight ϕ's (two per age group above 10 years).
The assumption of independent occurrence in seropositivity for both types is relaxed in Scenario 2 (adding fourϕ's to the model), whereas Scenario 3 allows for correlated antibody concentrations per mixture component (adding four ρ's to the model). Scenario 4 allows for associations in HPV16 and -18 seropositivity as well as correlated antibody concentrations, but retains the constraint of a marginal description by two mixture components, i.e. D 16 þÀ ¼ D 16 þþ and D 18 Àþ ¼ D 18 þþ . We relax this constraint in Scenario 5: the seropositive means and covariance matrices may be different for the doubly positive and singly positive mixture components, e.g. due to boosting of antibody concentrations upon multiple infections. Thus, the model of Scenario 5 has four additional parameters (two Δ's and two σ's) relative to Scenario 4, yielding a total of 28 parameters. To avoid the overfitting of assay noise in the seronegative component, which contains most of the data, we retained the assumption of constant seronegative means (i.e., m 16  Finite mixture models present some problems to the use of model selection criteria, in particular for selecting the number of mixture components [25]. We used the deviance information criterion (DIC) for model selection. The DIC balances the expected deviance-a measure of model fit-and the effective number of parameters-a measure of model complexity [26]. The expected deviance was estimated by taking the average value of the deviance (defined as -2 times the log-likelihood) across posterior simulations. The effective number of parameters was computed by taking the difference between the expected deviance and the deviance at the posterior means of the parameters for the model [26]. Owing to our parameterization, we found that the posterior means adequately summarized the central tendency of the posterior densities. As an alternative, we also calculated the effective number of parameters as one half the variance across posterior simulations. This approach is invariant to re-parameterization, but assumes negligible prior information [27].
Although there is not a formal threshold to assign a relevant difference between two models, a difference of more than 7 to 10 points is generally taken to favor the model with the smaller DIC [26,27]. To assess whether the preferred bivariate model (i.e., the model with the lowest DIC) yields a better mixture classifier than a combination of two univariate models, we simulated 50 bivariate data sets of 3800 individuals (roughly corresponding to the size of the serological survey) using the observed age distribution and parameter estimates of the selected bivariate model. Each simulated data set was fitted by both the preferred bivariate model and by two univariate models for each HPV type separately. Per data set, we calculated the probability for each individual of belonging to one of the four mixture components by Monte Carlo integration [28]. We assigned each individual to the component with the largest support, and compared how many individuals were correctly classified by the bivariate model and by the combination of two univariate models.

Parameter estimates
We observe a general increase of antibody concentrations with age, and strong correlations between the HPV16 and -18 responses across the range of antibody concentrations (Fig 1). As a result (Table 2), model scenarios that allow for correlation between the HPV16 and -18 antibody responses (Scenarios 3-5) invariably outperform the models without correlation (Scenarios 1-2). Of the models that allow for correlated antibody responses, those that also take age-dependent associations between HPV16 and -18 seropositivity into account (Scenarios 4 and 5) provide a substantially better fit than the model that ignores such associations (Scenario  3). Finally, the difference in DIC between the two remaining scenarios is large enough to favor the model with separate mixture distributions for the singly and doubly seropositive components (Scenario 5). Alternative choices for computation of the DIC had no influence on the ranking of models, as the effective number of parameters reflected expectation in most model Scenarios (Tables 1 and 2). Parameter estimates are presented in Table 3. The means of the seronegative component are substantially smaller than the means of the doubly seropositive component (-0.74 versus 2.15 for HPV16, and -0.68 versus 2.49 for HPV18). For each type, the singly seropositive means lie in between the seronegative and doubly seropositive means, as might be expected. Posterior distributions of the preferred bivariate model (Scenario 5) are summarized by median and 95% credible interval (S1 Table), and the model fit is illustrated by contour lines of the mixture density to the heat plot of the data (Fig 2). HPV16 and -18 antibody concentrations are strongly correlated in the seronegative and doubly seropositive components, with almost identical correlation on a logarithmic scale (Pearson correlation coefficients ρ −− = 0.75 and ρ ++ = 0.73, respectively). Correlation coefficients in the singly seropositive components are also comparable, but substantially smaller (ρ +− = 0.39 and ρ −+ = 0.29). Fig 3 shows the estimated age-dependent seroprevalences. Seronegativity for both types decreases from 100% in young children to approximately 60% in persons aged 40 years and older. Singly seropositive prevalence increases quickly from 10 years onwards to more than 20% for HPV16 and to 3-4% for HPV18. Seropositivity for both types also increases quickly with age, to 8-10% in 10-60 years old persons. There is a strong association in seroprevalence for HPV types 16 and 18 as demonstrated by the odds ratio (OR) for double seropositivity; we consider these as age-dependent 2-by-2 contingency tables obtained from the age-dependent mixing proportions. The association of HPV16 and HPV18 is especially pronounced for the age group 10-20 years (OR = 64), and decreases in older age groups: OR = 6.4 for 20-40 yearolds, OR = 5.4 for 40-60 year-olds, and OR = 4.2 for the oldest age group ( Table 4). Note that the association in HPV16 and -18 seroprevalence is significant in all age groups.

Seroprevalence by age
Both the component densities and the marginal seroprevalences from Scenario 1 are similar to the estimates from the type-specific univariate mixture models (S1 Fig). The estimated marginal seroprevalence of HPV16 is substantially higher in the preferred bivariate model than in the univariate HPV16 model (Fig 4). The difference in seroprevalence is largest for the age group 40-60 years, with an absolute difference of 13.5% in point estimates. Despite this

Classification accuracy
The bivariate model provided better classification of individuals than a combination of univariate models for HPV16 and HPV18 in 49 out of 50 simulated data sets. The two approaches perform equally well in classifying the doubly seropositive persons: respectively 85.7% and 85.3% of these are correctly classified in a bivariate or a combined univariate analysis. However, the bivariate model provides a better classification in all other situations, in particular when classification of the singly seropositive persons is concerned: 64.3% versus 52.1% of singly HPV16 seropositive individuals and 58.9% versus 50.4% of singly HPV18 seropositive individuals are correctly classified in a bivariate versus a combined univariate analysis (Table 5).

Discussion
We have presented a bivariate mixture model for joint estimation of seroprevalence against multiple genotypes in seroepidemiologic studies. The method is particularly useful when typespecific antibody measurements are correlated, a typical observation when using multiplex   immunoassay technology. We applied our model to estimate HPV16 and -18 seroprevalence in women prior to introducing HPV vaccination of preadolescent girls in the Netherlands. Our results provide valuable baseline estimates of vaccine-type seroprevalence that can be used for future vaccine impact assessment.
Our method provides a full description of the HPV16 and -18 antibody concentrations, and allows for post-hoc assessment of age-dependent associations in HVP16 and -18 seropositivity. Our analysis has uncovered that (i) antibody concentrations in seronegative and doubly seropositive individuals are strongly correlated; (ii) doubly seropositive individuals have higher antibody concentrations than singly seropositive individuals; and (iii) there is a positive association between HPV16 and -18 seropositivity in all age groups, with the strongest association around the age of sexual debut.  Age-dependent co-occurrence in cross-sectional serological data has been dealt with earlier using methods that rely on marginal models for multivariate binary data, i.e. denoting seropositivity for each infection separately. These methods model the odds ratio of co-occurrence either by means of a parametric link function as in the bivariate Dale model [29,30] or via correlated frailty models [31,32], and are based on a rigid classification of sera using a predefined cut-off value. However, in the seroepidemiology of some infections, such as HPV, considerable uncertainty exists about the true serological status of individual samples, and population prevalence needs to be inferred from an analysis that takes account of this uncertainty. In addition, the choice of a link function or frailty distribution is not straightforward if the odds ratio is strongly peaked as a function of age, which we show to be the case for HPV16 and -18 seropositivity. Our bivariate mixture model circumvents this problem and gives an unconstrained estimate of co-occurrence of antibodies against multiple infections.
Our analysis was stratified into age groups of 10 to 20 years to account for changes in seroprevalence and co-occurrence by age. We showed for the univariate analyses that the stratified seroprevalence resembled the smoothed seroprevalence figures. A straightforward multivariate extension of the smoothing spline approach would involve modeling the mixing proportions by a flexible but age-dependent function, for example by means of vector generalized additive models [33]. This would be especially worthwhile for applications to smaller data sets or with an uneven distribution of serum samples across age groups. Such an extension would also yield a better resolution with respect to the age-specific pattern of co-occurrence between HPV16 and HPV18 seropositivity.
Positive associations between HPV type-specific seropositivity are to be expected, as persons who have been infected with one type are at increased risk for acquisition of a second type [34]. This has already been confirmed by previous analyses based on serological data [35][36][37][38], but to our knowledge we are the first to uncover and estimate age dependency in the association of HPV16 and -18 seropositivity. Our analyses show that this association is particularly strong around the age of sexual debut. This suggests that the population of adolescents and young adults is strongly heterogeneous with respect to sexual activity, which is in line with survey data on sexual behavior. The substantially smaller odds ratios among 20-40 year-olds can possibly be explained by the fact that almost 90% of women have had sexual intercourse by the age of 21 [24]. Heterogeneity with respect to lifetime number of sex partners may explain the clustering of HPV16 and -18 seropositivity at older age, as the association remains significant in all age groups. Nevertheless, the decreasing odds ratios with age suggests that older age groups are more homogeneous with regard to HPV16 and -18 infection history, presumably because both types are relatively common in sexually active age groups and are easily transmitted within sexual partnerships [39].
We have estimated the correlation between HPV16 and -18 antibody concentrations for each of the mixture components separately. The correlation among seronegative individuals may represent assay noise, given that seronegativity is related to absence of prior exposure to HPV16 and -18, certainly at young age. As we expect that doubly seropositive persons have experienced both HPV16 and -18 infections, the correlation in the doubly seropositive component could also reflect the immunologic profile of a host's response to infection; a strong antibody response to HPV16 is likely related to a strong antibody response to HPV18. Additionally, we found that the antibody concentrations of the doubly seropositive individuals were larger than those measured for the same HPV type in singly seropositive individuals. Possibly, the assay targets not only antibodies to the main L1 epitope but also recognizes epitopes that are shared between HPV16 and HPV18 [40], which would result in a boosted immune response to one type after infection with the other type.
The estimated HPV16 seroprevalence is substantially larger in bivariate than in univariate analysis, even though both analyses achieved comparable marginal fits to the data. Using a simulation study, we concluded that the bivariate mixture model is better able to correctly classify seropositive individuals, as it enables more flexibility in describing antibody concentrations among HPV16 seropositives by stratification for HPV18 serostatus. Indeed, the increased HPV16 seroprevalence in the bivariate model was predominantly due to reclassification of samples with HPV16-specific antibody concentrations that had similar support for seronegativity or seropositivity in the univariate analysis. This is relevant for post-vaccine serological surveys, because a reduced circulation of vaccine-type HPV throughout the population is not only expected to lower the proportion seropositives among the non-vaccinated individuals, but also the relative rate of double seropositivity against HPV16 and -18. Vaccination will probably change the association between HPV16 and -18 seropositivity, in turn leading to varying degrees of seroprevalence underestimation by univariate analyses in pre-and post-vaccine serological surveys. A bivariate mixture model would likely provide more sensitive and accurate estimates of changing seroprevalence by age in partly vaccinated cohorts. Our results could therefore be particularly useful for the assessment of population effects from HPV vaccination in forthcoming serological surveys.