The Impact of HPV Female Immunization in Italy: Model Based Predictions

The Human Papillomavirus (HPV) is a sexually transmitted virus that causes cervical cancer. Since 2008 a vaccination program targeting 12-year-old girls has been initiated in Italy, backing up the cervical screening program already active since 1996. We propose a mathematical model of HPV transmission dynamics with the aim of evaluating the impact of these prevention strategies. The model considers heterosexual transmission of HPV types 16 and 18, structured by sex, age and sexual activity level, where transition to sexual activity is explicitly modeled from recent survey data. The epidemiological structure is a hybrid SIS/SIR, where a fraction of individuals recovering from infection develops permanent immunity against reinfection. Infections may progress to cervical lesions and cancer and heal spontaneously or upon treatment. Women undergoing hysterectomy (either after treatment of HPV lesions or by other causes) also transmit HPV infection. The model fits well both the age-specific prevalence of HPV infections and the incidence of cervical cancers in Italy, and accurately reproduces the decreasing trend in cancer incidence due to the introduction of the screening program. The model predicts that if the screening coverage is maintained at current levels, even in the absence of vaccination, such trend will continue in the next few decades, eventually plateauing at 25% below the current level. The additional initiation of routine vaccination targeting 12-year-old girls will further reduce cervical cancer incidence by two thirds at equilibrium, under realistic assumptions of 70% coverage and a duration of protective immunity of 50 years. If catch-up immunization of 25-year-old women at first cervical screening is also introduced, about 3,000 cervical cancer cases overall can be averted, corresponding to 9.6% of all cases expected in the scenario without catch-up. We conclude that HPV vaccination in addition to cervical screening will significantly reduce the burden of cervical cancer in Italy.

Parameters appearing in the above equations are briefly described in Table S1. Functions bM[a] and bF [a], indicating the gender-specific number of newborns per year, are defined in such a way that newborns are actually introduced only in the age class of individuals aged 0: [ ] = { = 0 0 ℎ with bM = bF = b/2 and b the total number of newborns per year. The age and sexual level-specific forces of infection for each gender are defined as in (S1): where N represents the total (gender-specific) population of that age and SAL, and m are elements of the (genderspecific) sexual contact matrix, specifying how many sexual partnership men (women) of age aM (aF) and SAL lM (lF) have with women (men) of age aF (aM) and SAL lF (lM). For further details on the estimation of mixing matrices, see Section S2 of this text. Gender and age-specific mortality rate (S2) (a) Gender and age-specific rate of sexual debut Estimated from sexual behavior data f(l) Gender and SAL-specific proportion Estimated from sexual behavior data c(a,a', l, l') Gender, age and SAL-specific contact rates Estimated from sexual behavior data p(a) Gender and age-specific vaccine coverage Fraction of hysterectomies for CIN-1 treatments (S11) hC2 Fraction of hysterectomies for CIN-2 treatments (S11) hC3 Fraction of hysterectomies for CIN-3 treatments (S11) hCIS Fraction of hysterectomies for CIS treatments (S11) hCC Fraction of hysterectomies for CC treatments (S11)

CC(a)
Age-specific cancer-induced mortality rate (S12)  Gender-specific transmission probability per-partnership Free parameter a Age-specific coefficient of assortativity Free parameter l SAL-specific coefficient of assortativity Free parameter The treatment submodel Treatment rates Tk, specified in the model for each lesion type, were calculated by the sum of two contributions: spontaneous care-seeking after the insurgence of symptoms, represented by Tk sympt , and active case-finding by screening, Tk screen . For spontaneous care-seeking, treatment rates were assumed to be 0 for k = CIN-1, CIN-2, CIN-3, since these lesions were assumed to be asymptomatic; for k=CIS, CC, the rates were estimated as the reciprocal of the sum of the average time from lesion initiation to CIS or CC symptom development (S13) and of the average time from careseeking to treatment (S14). The current screening protocol in Italy refers to the Bethesda guidelines and can be briefly summarized (with some minor simplification) as follows: screened women undergo a preliminary cytological Papanicolau (Pap) test; in case of Pap positivity, a colposcopic exam with optional biopsy is performed to ascertain the presence and degree of the lesion and, where appropriate, decide for treatment. Therefore, we calculated the screening-related lesion-specific treatment rates as the product of several terms:  the time-varying effective screening coverage (Table S2), distributed over a period equal to the screening interval (3 years);  the proportion of cases discovered per screening round, defined as the product of: o the lesion-specific sensitivity of cytologic screening (S15-S17); o the compliance to follow-up upon discovery of cytological abnormalities (S11); o the lesion-specific sensitivities at colposcopy or biopsy (S18; S19)  the lesion-specific fraction of treated vs. untreated lesions (S11);  the reciprocal of the waiting time from care-seeking to treatment (S14).

S2. Modeling and parametrization of sexual activity
Modeling and parametrization of the age at sexual debut The high speed at which young people acquire HPV infection in Italy and elsewhere suggests that it might be important to explicitly incorporate the process of transition into the sexually active phase (known as sexual debut, SD) as a separate sub-process of the transmission model. In this feature our model differs from most available models of HPV which usually assume a fixed age at SD in correspondence of which everyone becomes sexually active.
In order to parametrize the process of transition to the sexually active phase, a variety of alternative parametric hazard models were fitted to data on age at sexual debut. The models considered included classical hazard models such as Weibull, Gomperz and Gamma models, and other duration models used in socio-demography, such as Hernestype (S20). Models were fitted by maximum likelihood for censored data to account for the presence in the available data of both right censoring (individuals who could not specify an age at SD because they were not yet sexually active at the survey time) and interval censoring (due to individuals who were unable to recall the exact date, i.e. the day, of their SD, but could bound it with some prescribed degree of accuracy i.e. either the month, or the season, or eventually the year when sexual debut occurred). A first selection of the candidate models for the hazard of becoming sexually active was carried out on an empirical basis by computing the maximum likelihood estimate, still including censored data, of a piece-wise constant hazard model, and retaining only those models whose hazard at least qualitatively fit the empirical piece-wise constant hazard. This step allowed to exclude Weibull and Gomperz risks as non-relevant due to their ever-increasing shape. Selection among remaining models was based on likelihood-ratio tests. The model eventually selected was Hernes-type briefly described hereafter.

The Hernes' model
Let T be the random variable describing the age at which transition to the sexually active phase occurs. Let moreover a denote age (or time), and consider an age-continuous (or time-continuous) hazard process whose survival function, which represents the probability that the event of interest occurs after any given age duration a, is denoted as: The previous model therefore represents an SI-type model for infection diffusion in a homogeneously mixing, fixed and demographically stationary population, characterised by an exponentially declining age-dependent transmission rate ( ) = − . The model can be solved analytically under the initial condition (0) = 0 < 1 (necessary to initiate transmission as in all "epidemiologically-based" models) yielding the following closed form for the hazard rate and survival functions: Let us denote by   H K,   the vector of unknown parameters of Hernes' model. Let n, m, p respectively denote the overall sample size, the number of observations for which the age at SD was recorded "exactly" (where exactly means that the date of the day of SD was available), and the number of interval-censored observations, for which the age of SD was recorded as the interval   The likelihood function for sexual-debut data has the following form: The likelihood function was maximised numerically by the function nonlin of software R. The results of the fit are reported in Figure S1 and S2.

Contact patterns
Considering that our model includes transition to the sexually active population as a separate process, contact patterns were estimated from the subset of sexually active individuals (identified as those who experienced the sexual debut) only, rather than including also non-sexually active individuals as in most papers on the subject. Parametrization of sexual contact patterns of sexually active individuals was undertaken by following the standard methodology for models for sexually transmitted infections, based on the "preferred mixing" approach for populations with heterogeneous sexual activity (S1, S3). In the standard 1-dimensional preferred mixing approach individuals of one sex (e.g. males) in a given risk group i (e.g. by age or level or sexual activity) mix with individuals of the other sex (females) in risk group j by reserving a fraction  (the "preference" parameter) of their sexual partnerships to individuals in the same risk group while remaining partnerships are allocated at random among females from all risk groups, i.e. according to the proportionate mixing rule. When risk groups are characterised by several dimensions, e.g. in our model age and sexual activity level, multi-dimensional preferred mixing needs to be considered, where the preferred mixing matrices from each single dimension considered are compounded independently, each one with its specific preference parameter . Let the preference parameters for the two dimensions considered be denoted as a (age) and l (SAL) respectively. To parametrize sexual contacts, we carried out the following steps. First, female age-and SAL-specific marginal mean numbers of sexual partners per year were computed from the numbers of different partners had during the last year reported in survey data ( Figure S3, left panel). Second, male marginal mean contacts rates are computed residually ( Figure S3, right panel) using the female dominance criterion, based on the knowledge of female mean contact rates, the frequency of males and females in each risk group and the so called "sexual-mixing restraints" (S1). Individuals who had reported 0 or 1 sexual partners in the previous year were assigned to the low SAL group (84.7% of all females and 63.9% of all males); those who had reported 2 to 5 sexual partners in the previous year were assigned to the intermediate SAL group (14.7% of all females and 26.4 of all males); and those reporting 6 or more sexual partners in the previous year were assigned to the high SAL group (0.6% of all females and 9.7% of all males). Finally, sex-specific preferred mixing matrices were parametrised using the "preferred mixing" approach (S1, S3). Mixing matrices are the Markov matrices relating age-SAL-specific marginal mean numbers of sexual partnerships with the corresponding joint mean number, as follows: [ , , , ] = [ , ] [ , , , ] In substantive terms mixing matrices assign the proportions of partnerships that individuals having age-SAL (a,l) have with partners having age-SAL (a',l'). Using the "preferred mixing" approach the mixing matrices are defined as follows:

S3. Parametrization
The region of exploration of the parameter space was defined by identifying plausible minimum and maximum values of each of the model's free parameters, through an extensive literature search (about 180 different studies were reviewed, of which 40 contained relevant data). These ranges are reported in Table S3 (only studies suggesting minimal or maximal values are cited).

Poisson likelihood of the age-specific HPV prevalence
The estimation of natural history parameters was conducted by maximization of the Poisson likelihood of the age specific HPV prevalence during the first step of the parametrization procedure (see main text of this manuscript). The Poisson likelihood was calculated as: Pi are model predictions for the HPV prevalence in age group i, and Di is the corresponding observation for that age class.

Distribution of best-fitting parameters
Following (S33), we report in Figure S4 the distributions of best-fitting values of free model parameters, normalized with respect to the range of variability assumed from the literature search and summarized by boxplots representing the 95% CI, and the first, second and third quartiles. While parameters related to the natural history of infection (the six leftmost boxplots in Figure S4) have a relatively narrow range, parameters accounting for progression and regression to different degrees of lesions and cancer have a much wider variability, probably due to the strong correlation between pairs of parameters and the lack of data to fit intermediate targets such as the prevalence of HPV16/18 lesions by age.

S4. Comparison between predictions from different best-fitting parameters
Given the relatively large variability in the estimates of progression/regression parameters, we show in Figure S5 the simulated predictions of cervical cancer incident cases produced by the 50 best-fitting parameter sets. In this analysis, we considered the two realistic vaccination schedules reported in the main text, i.e. the baseline vaccination of girls at 12 years and the baseline vaccination with additional catch-up at 25 years. In both scenarios predictions are qualitatively robust, with quantitative variations becoming less and less important with time. Predictions from the optimal parameter set (dashed line in Figure S5), presented in the main text, are overall conservative (i.e. suggest a higher number of cancer cases) with respect to the average of predictions obtained through different parameter configurations (solid line in Figure S5) in both vaccination scenarios. Figure S4. Best-fitting parameter values, normalized to original range of variation. Figure S5. Variability of cervical cancer cases predicted over time by the 50 best-fitting parameter sets. Top: baseline scenario; bottom: catch-up scenario.