HPV Clearance and the Neglected Role of Stochasticity

Clearance of anogenital and oropharyngeal HPV infections is attributed primarily to a successful adaptive immune response. To date, little attention has been paid to the potential role of stochastic cell dynamics in the time it takes to clear an HPV infection. In this study, we combine mechanistic mathematical models at the cellular level with epidemiological data at the population level to disentangle the respective roles of immune capacity and cell dynamics in the clearing mechanism. Our results suggest that chance—in form of the stochastic dynamics of basal stem cells—plays a critical role in the elimination of HPV-infected cell clones. In particular, we find that in immunocompetent adolescents with cervical HPV infections, the immune response may contribute less than 20% to virus clearance—the rest is taken care of by the stochastic proliferation dynamics in the basal layer. In HIV-negative individuals, the contribution of the immune response may be negligible.


Nonparametric persistence estimators
First, we performed a nonparametric analysis of HPV persistence in the REACH data set. Following Moscicki and colleagues [1], we defined time to loss of infection as the time elapsed between the first positive HPV test and the first of two consecutive negative HPV tests. Thereby, a negative test was defined as loss of all HPV subtypes that had been identified during the first positive test. The corresponding Kaplan-Meier survival estimators (ignoring interval-censoring and left-truncation) are shown in Figure 1A. Comparison of these persistence curves shows that time to clearance is significantly higher in HIV-positive individuals as established by both log-rank and Wilcoxon tests (p < 0.0005). This is in agreement with the conclusions in [1].
However, by defining time to loss of infection as described above we lose a considerable amount of information due to interval-censoring, right-censoring and left-truncation. In fact, initiation and clearance times are only known up to intervals of length ≥ 6 months (interval-censoring), and many individuals leave the cohort study before clearing the infection (right-censoring). Furthermore, 220, or 70%, among the 313 participants have a prevalent infection (i.e. the first valid HPV test is positive), which means their time of infection initiation is unknown (left-truncation). When restricting the analysis to the 93 subjects (52 HIV-positive, 41 HIV-negative) with incident infections (i.e. the first valid HPV test is negative), we found that the two cohorts were no longer different (p > 0.24 by generalized log-rank test for interval-censored data). Even with the less stringent definition of clearance time as time of the first negative test result, the persistence curves of incident  infections were still not significantly different (p > 0.061 for log-rank and Wilcoxon tests).
In summary, when restricting ourselves to the subset of incident infections, there is no significant difference between the HIV-positive and HIV-negative cohorts. However, in this analysis we neglected a lot of available information by discarding the 220 prevalent infections (70% of all cases). To incorporate prevalent infections, we need to account for the unknown time between infection start and date of first HPV test. This requires a parametric approach, as developed in the main text.

Model details 2.1 Replacement dynamics
If a D * cell is eliminated, the proliferation event of the triggered stem cell is asymmetric or symmetric, as for spontaneous division. Keeping track of all possible outcomes, we have the following changes in cell balances due to D * cell elimination: While S and S * cell proliferation caused by D * cell elimination is identical to spontaneous division, S * cell elimination is assumed to trigger a symmetric division resulting in two identical S or S * copies, respectively. In fact, this assumption is necessary to ensure conservation of the total number of stem cells in the homeostatic regime: if an S * cell division were to trigger all three possible division patterns, the total number of stem cells in the tissue would not be conserved over time. Keeping track of all possible outcomes after S * cell elimination, we find the following rates

Spatial Model
The stochastic process corresponding to the spatial model dynamics in the S * -D * plane is summarized in Figure 2. In Figure 3, three different realizations of the process are shown for the representative parameter set α = 0.12 (chosen) and µ = 1.3 · 10 −3 , and n 0 = 37 (both maximum likelihood estimates from the branching process model). See caption for remaining parameters.

MLE for Branching Process
In this section, we provide the details for the parameter inference through MLE. To derive the likelihood function for the extinction time of the subcritical branching process we need to derive the exact distributions. We denote by τ n 0 the time to extinction of the infection process with initial condition n s * (t = 0) = n 0 . We introduce now the survival function Since the n 0 families are independent, we have where the one-particle survival function F 1 is Introducing the notation A ≡ (λr + µ)/µ for µ > 0, we can write the survival function as Differentiating yields the probability density functions We can now combine (6) and (7) with the data from the REACH study to derive the likelihood function. But first, we need to understand the data format. The data is given . . , n, where • p j = 1 if the infection was present at the first HPV test during the study; p j = 0 if the infection was not yet present at the first HPV test.
• d j = 1 if the infection is cleared during the study; d j = 0 if the subject is lost to follow-up before clearing the infection • t i j is time of the first positive HPV test.
• t f j : if d j = 1, then it is the time of the first of two negative results (if d j = 1); if d j = 0, then it is the censoring time.
• We denote byt i/f j the time of the visit preceding the visit at t i/f j . To make the notation lighter, we omit the subscript j in the following calculations. We introduce the notation ∆t i ≡ t i −t i and ∆t f ≡ t f −t f . See also Figure 4 for a visualization of the time sequence.
We introduce the random variable τ which is the time from infection start to clearing of the virus.
Denote by f and F the pdf and cdf of the time to clearance τ , and define the survivor function F ≡ 1 − F . Furthermore, define g and G to be the pdf and cdf of the censoring times (counted since start of the study), and define the corresponding survivor funcion G ≡ 1 − G. Finally, denote by t 0 the study begin -in fact, the censoring events are counted from study begin onward.

Figure 4: Events
We discuss now the different contributions to the likelihood function.
Prevalent infections (p = 1). If the infection was present at the first HPV test (prevalent infection), then the infection started σ days before study begin, i.e. only τ − σ days can be observed until clearance (recall that τ is the time to clearance counted from onset of the infection, not the start of the study). Therefore, the contributions to the likelihood function are and It remains to get a handle on the distribution of τ − σ. In fact, once we know the length of the infection period, i.e. we condition on {τ = t}, then its hidden length is uniformly distributed over t because we assume constant arrival rate of new infections, With this, we derive (note that ∈ ds is short for ∈ [s, s + ds)) and, similarly, To make sure our calculations are correct, we check, using a change of variables, that the pdf integrates to 1, Incident infections (p = 0). In this case, the first HPV test of the study is negative, i.e. the infection starts after the first visit (an incident infection). More precisely, it started σ days before the first positive test at time t i , but unlike σ which is supported on [0, ∞), the timeσ cannot exceed ∆t i (because if it did exceed ∆t i , then the visit att i would have been positive). The contributions to the likelihood function are now and It remains to compute the distribution ofσ: First, we note that if t > ∆t i , then Combining the two, we find now (note that a ∧ b is short for min {a, b}) and, similarly, Using these formulae, we can now compute and Here too, the sanity check pans out, and the pdf integrates to 1. The calculation is messy, and we omit it here.
Finally, we can assemble the complete log-likelihood function. To this end, we reintroduce the subscripts in the time variables, and combine (8), (9), (13) and (11) to obtain In the above expression, θ = (µ, λ, r, n 0 ) are the model parameters, P θ is the probability assuming parameter set θ, and we have added a subscript toσ j to emphasize its dependence on ∆t i j , see (22).
Finally, to infer the most likely parameter set, we maximized the likelihood function (14) using the statistical software R [2]. We did this by combining a coarse grid search across prior parameter ranges (see Section 5) with the built-in local optimization routine optim().

Identifiability Issues
Even though there are no apparent identifiability issues in the probability density function (7), when performing the MLE for the branching process model, we found that equally good fits were obtained for a fixed ratio α/n, where α = λr and n is the size of the initial infection. In other words, the distribution depends primarily on α/n, and is insensitive to the absolute values of α and n. To provide a theoretical explanation for this finding, we start by performing a series expansion in µ of the density f in (7) f (t, α, n, µ) = αn αt 1 + αt Next, we zoom out by rescaling time ast = tα/n, and obtain the rescaled densityf f (t, α, n, µ) = n 2 t n 1 +tn Importantly, the lowest order term converges to and numerical experimentation shows that convergence is rapid, with good results achieved for n = O(10). Next, we replace α → β and n → n β α to conserve the ratio α/n and calculatê Again, the lowest order term converges to and we conclude that to first order in µ (and for t and n large enough) Since the longitudinal study used for parameter inference contains mostly large time points (order of months), and the inferred µ is small, this provides insight as to why the inference is insensitive to the absolute values of α and n, and primarily depends on their ratio.

A Priori Parameter Estimates for cervical squamous epithelium
For the parameter estimation by MLE as well as the sensitivity analysis in Section 7 we use a priori estimates for the following model parameters: • ρ: the fraction of proliferative cells in the basal layer, • Γ: the exit rate of D and D * cells from the basal layer, • λ: the proliferation rate of stem cells, • r: the probability of symmetric division.
These parameters are not completely independent because there are two constraints imposed by the homeostatic equilibrium of the epithelium. First, the basal layer is in a stationary state: arrival of new D cells produced by S cells is balanced by the exit of D cells into the parabasal layers, i.e.
Furthermore, we know that there is periodic renewal of the entire epithelium, with a renewal time of ∆T , and hence the number of upwards migrating cells over ∆T has to satisfy where H is the height (in numbers of cells) of the stratified squamous epithelium. Combining (21) and (22) we find To derive the various a priori estimates, we use (21) and (23) together with literature estimates as follows.
• The thickness H of the squamous epithelium is estimated using data from [3]. In this study, the authors use histological sections of excised specimen to find an average thickness of 360 ± 100µm. They then split the height into 4 equal parts, and assess the cell heights in µm (from bottom to top quarter of thickness): 15 ± 5, 17 ± 5.1, 14 ± 4.7 and 14 ± 4.8. Using these averages, we find that the epithelium stacks approximately H ∼ 24 cells high from basement membrane to lumen of the cervix.
• In [4] the authors use BrdU staining to find that the fraction of cells in S-phase is 10.7% in the lower third (∼ 8 cells) of the epithelium. Since all proliferative cells in the healthy tissue are located in the basal layer, this means that about 75% of basal cells are mitotically active, i.e. ρ = 75%. Of note, estimates for the mouse tail epidermis [5] and mouse esophageal epithelium [6] are ρ = 22% and ρ = 65%, respectively. Due to this wide spread in murine data, and the lack of reliable experimental estimates in humans, we assume the following a priori bounds: ρ ∈ [0.2, 0.8].
• Literature estimates for the time of complete renewal of the squamous epithelium of the cervix vary between 3 and 6 weeks [7,8], ∆T ∈ [21, 42] days.
• Using the above estimates, we find now an a priori estimate for λ ∈ [0.29, 9.5]/day. The upper bound is clearly too big and we assume that the minimal time for a complete cell cycle is 1 day, leading to the following prior range, λ ∈ [0.29, 1] day 1 .
• Estimates for r, the probability of a stem cell dividing into two stem cells (or into two differentiated cells), are not available for human tissues. In the mouse tail epidermis [5] and the mouse esophagus this probability was estimated to be r = 0.1 and r = 0.09, respectively. We assume a priori bounds of r ∈ [0.05, 0.25].
• Finally, combining the a priori estimates for λ and r, we find the prior range for α used in this study, α ∈ [0.015, 0.25] day −1 .
6 Sensitivity of Clerance Time to Choice of α In Figure 3 of the main text, we showed the clearance time distributions for an intermediate value of α = 0.14. In Figure 5,

Sensitivity Analysis for Spatial Model
In Figure 5 of the main text we assessed the sensitivity of the spatial clustering model to the immune capacity µ. Here, we do a more extensive sensitivity analysis by performing 5000 simulations, and for each run the parameters are drawn at random from the following a priori intervals: r ∈ [0.05, 0.15], α = rλ ∈ [0.015, 0.25], ρ ∈ [0.2, 0.8], µ ∈ [0.001, 0.0018], the latter being an uncertainty interval aroundμ − , the estimated immune capacity of the HIV-negative cohort. Furthermore, we only allowed for parameter sets that are compatible with the a priori intervals for H and ∆T , see Section 5. The initial infection size, n 0 was obtained by calculating n 0 as a function of α as determined in the branching process model, see Figure 3A in the main text. In Figure 6, the clearance time distributions are compared for the non-spatial branching process model (A), the spatially clustered model with fixed parameters (B) and the spatially clustered model with random parameters (C). In particular, by comparing (B) and (C) we conclude that drawing the parameters at random within their prior ranges does not affect the clearance time distribution substantially.