New estimates of the Zika virus epidemic attack rate in Northeastern Brazil from 2015 to 2016: A modelling analysis based on Guillain-Barré Syndrome (GBS) surveillance data.

BACKGROUND
Between January 2015 and August 2016, two epidemic waves of Zika virus (ZIKV) disease swept the Northeastern (NE) region of Brazil. As a result, two waves of Guillain-Barré Syndrome (GBS) were observed concurrently. The mandatory reporting of ZIKV disease began region-wide in February 2016, and it is believed that ZIKV cases were significantly under-reported before that. The changing reporting rate has made it difficult to estimate the ZIKV infection attack rate, and studies in the literature vary widely from 17% to > 50%. The same applies to other key epidemiological parameters. In contrast, the diagnosis and reporting of GBS cases were reasonably reliable given the severity and easy recognition of the disease symptoms. In this paper, we aim to estimate the real number of ZIKV cases (i.e., the infection attack rate) and their dynamics in time, by scaling up from GBS surveillance data in NE Brazil.


METHODOLOGY
A mathematical compartmental model is constructed that makes it possible to infer the true epidemic dynamics of ZIKV cases based on surveillance data of excess GBS cases. The model includes the possibility that asymptomatic ZIKV cases are infectious. The model is fitted to the GBS surveillance data and the key epidemiological parameters are inferred by using a plug-and-play likelihood-based estimation. We make use of regional weather data to determine possible climate-driven impacts on the reproductive number [Formula: see text], and to infer the true ZIKV epidemic dynamics.


FINDINGS AND CONCLUSIONS
The GBS surveillance data can be used to study ZIKV epidemics and may be appropriate when ZIKV reporting rates are not well understood. The overall infection attack rate (IAR) of ZIKV is estimated to be 24.1% (95% confidence interval: 17.1%-29.3%) of the population. By examining various asymptomatic scenarios, the IAR is likely to be lower than 33% over the two ZIKV waves. The risk rate from symptomatic ZIKV infection to develop GBS was estimated as ρ = 0.0061% (95% CI: 0.0050%-0.0086%) which is significantly less than current estimates. We found a positive association between local temperature and the basic reproduction number, [Formula: see text]. Our analysis revealed that asymptomatic infections affect the estimation of ZIKV epidemics and need to also be carefully considered in related modelling studies. According to the estimated effective reproduction number and population wide susceptibility, we comment that a ZIKV outbreak would be unlikely in NE Brazil in the near future.


Introduction
step. The unique feature of our work is that we draw on this property and fit our model to GBS data 75 collected during and following the period of a ZIKV outbreak. We use this to infer the true numbers, 76 and dynamics in time, of ZIKV cases. 77 For the modelling work that follows, it is useful to consider some of the above events in more detail 78 on a country-specific basis, as they give further important background information that justifies our 79 approach in using GBS as a proxy for zika-cases, on data sources and on choices of parameter values. 80 French Polynesia From October 2013 to April 2014, a severe ZIKV outbreak hit French Polynesia, and the attack rate (IAR) was first estimated as 66% [18], but updated soon after to 49% [19]. An 82 outbreak of 42 GBS cases was simultaneously reported, but with a three-week delay in the peak timing, 83 and was linked to the ZIKV outbreak [20]. Based on the IAR of [19], the risk of ZIKV induced GBS can 84 thus be calculated as 0.32 GBS cases per 1,000 ZIKV infections, or just ρ = 0.00032.
[20] estimated the 85 proportion to be ρ = 0.00024. Aubry et al. also found that, the ratio of asymptomatic to symptomatic 86 infections (asymptomatic ratio) was about 1:1 in the general population and 1:2 among school children 87 [19]. These findings are notably different from estimates for a previous ZIKV outbreak in Yap island 88 in 2007, where the asymptomatic ratio was 4.4:1 and the estimated overall ZIKV IAR was about 75%  series to infer the pattern of ZIKV outbreak and the overall IAR of ZIKV in Northeastern Brazil. 145 Mathematical modelling provides a possible way to infer the epidemic waves of ZIKV (or together 146 with a minor proportion of CHIKV). First, we assume a constant risk ratio between symptomatic ZIKV 147 cases and reported GBS cases (ZIKV-GBS ratio), denoted by ρ. Second, we simulate our ZIKV model, 148 and fit the model to observed GBS cases with a time-dependent ZIKV transmission rate. Finally, by 149 using iterated filtering techniques, we find the maximum likelihood estimates of ρ and the overall IAR. 1000 < weekly CHIKV < 5000 5000 < weekly CHIKV < 7500 7500 < weekly CHIKV Suspected Zika Cases Surplus GBS Cases GBS/Zika Ratio (%) Figure 2: The (reported) suspected ZIKV cases, excess (or surplus) GBS cases and GBS-to-ZIKV ratio in the NE region of Brazil from January 2015 to November 2016. The red dotted line represents weekly ZIKV disease cases, the dark blue dotted line represents weekly surplus GBS cases and the light blue bars are GBS-to-ZIKV ratios. The "major" (with weekly cases over 1000) CHIKV outbreak of 2016 is shaded in green according to CHIKV disease level. The light green area denotes time periods when the weekly reported CHIKV cases were between 1000 -5000, green denotes weekly reported CHIKV cases between 5000 -7500 and dark green denotes weekly reported CHIKV cases over 7500. The GBS-to-ZIKV ratios are not plotted for the initial few weeks as the scale of the ZIKV data is not large enough to compute a meaningful ratio. and do not develop to the convalescent stage, which is biologically and clinically reasonable [8,9]. We 178 therefore arrive at the following ordinary differential equation (ODE) system (1).
Here, S h is the susceptible host class, E h is the exposed host class (i.e., within ZIKV infection latent period. S v is the susceptible vector class, E v is the exposed vector (i.e., within ZIKV infection latent 184 period) and I v denotes the infectious vector class. The parameter ρ denotes the ratio of reported (excess)

185
GBS cases per symptomatic case of ZIKV. The model (1) parameters are summarised in Table 1.

186
In addition, where N h and N v represent the total number of hosts and vectors respectively, of which N v is time-  As in our previous work, it is assumed that the total mosquito population is given by: where m(t) is the (time-dependent) ratio of mosquitoes population (N v (t)) to humans population (N h ). .
From Eqn (3), it can be seen that R 0 depends on the mosquito-borne transmission path (in term of exposed and asymptomatic compartments, Markov process (POMP, also know as hidden Markov model) with a "spillover" rate (ρ) from local 210 symptomatic ZIKV cases. Here ρ is the combined effect of the GBS reporting ratio and the risk rate of 211 "symptomatic ZIKV inducing GBS i.e., the ratio ρ = reported GBS symptomatic ZIKV (see Table 1).

212
The simulated (weekly) number of excess GBS cases (Z GBS ) from model (1) Here, τ denotes an over-dispersion parameter that needs to be estimated. Finally, the overall log-216 likelihood function, , is given by The vector Θ denotes the parameter vector under estimation. The L i (·) is the likelihood function 218 associated with the i-th NB prior defined in Eqn (4). The term T denotes the total number of weeks 219 during the study period.

220
Our methodology reconstructs the mosquito abundance m = m(t) which is otherwise unknown 221 but variable and time-dependent over the study period. Following Eqn (3), the basic reproduction 222 number is a function of m(t), and thus we also allow R 0 to be time-dependent (i.e., R 0 = R 0 (t)). The 223 time-dependent m(t) is climate-driven and modelled as an exponential function of the daily average 224 temperature and rainfall time series, together with a two-piece step function for the baseline component.

225
It is modelled as follows The term τ 0 is the time delay between the occurrence of weather factors and their effects on the from the weather factors are taken to be one month in total i.e., τ 0 = 3 × 7 + (8 + 10)/2 = 30 days.

233
In Eqn (6), p 1 and p 2 are the scale parameters controlling the effects of local temperature and rainfall respectively. The two terms p 3 and p 4 , are time-driven baseline effects characterizing trends in 235 m that switch on depending on the time period τ 1 . We could view τ 1 as the timing of baseline change in 236 the mosquito population, which could be due to the interference between ZIKV and CHIKV for instance 237 and/or local mosquito control measures. The function 1(·) is an indicator function, which equals 1, if 238 the condition in the brackets is true; but is 0 otherwise.

239
Based on fitting and comparisons, the scale of p 2 was found to be negligible in magnitude, indicating 240 that the effects of the local rainfall is (relatively) negligible, compared to temperature. Thus, in most 241 parts of the analysis that follows, we neglect the rainfall term in Eqn (6)  In this respect, their parametrisation seems problematic, and they probably considered the average 245 lifespan of the mosquito, rather than the female mosquito.

246
According to Eqn (3), R 0 is a function of m(t), and thus R 0 is also time-dependent. Hence, R 0 247 can also be determined by the parameters in Eqn (6) Besides the climate-driven model, we also test a non-mechanistic model where the mosquito population  (Fig 2). But this is a naive calculation and we will seek ways to improve this.  The minimum occurs at ρ = 0.000061, which is our best estimate for ρ.
the flatness of the curve in this regime.

315
In addition to the mechanistic reconstruction of R 0 (t) in the main results here, we also present 316 a non-mechanistic reconstruction in Supplementary Information S3. The non-mechanistic approach 317 is implemented by using a cubic spline function to reconstruct the R 0 (t). The model also fits the 318 disease surveillance data well. The BIC of the non-mechanistic model is 7 units larger than the above 319 climate-driven model in Fig 4. We find that the non-mechanistic reconstruction of R 0 matches the daily 320 temperature reasonably well. This suggests the weather-driven R 0 (t) in our main results here is neither 321 coincidental nor artificial. The estimates of the GBS/ZIKV ratio ρ and the IAR are summarised in Table 2. For the parameter 324 ZIKV symptomatic ratio, θ, we follow the previous serological study conducted in French Polynesia that 325 found asymptomatic : symptomatic case ratios is 1 : 1 in the general population [19]. Thus, we treat 326 the scenarios with θ = 0.5 in our main results. Setting a constant θ = 0.5, the estimation of ρ is roughly 327 0.000063 (= 0.0063%). This appears to hold even if η, the relative infectivity of the asymptomatics, is 328 changed over the interval (0, 1). Estimates of ρ thus appear to be reasonably insensitive to the change of 329 relative infectivity of the asymptomatics (η). However, ρ is sensitive to the change of the symptomatic 330 proportion of ZIKV infections (θ). Setting θ = 0.2 gives ρ = 0.00013, but as Table 2 reveals, this result   331 is also relatively insensitive to changes in η.

332
To calculate the number of ZIKV cases and IAR, we use our estimated ρ = 0.00061 (ratio of 333 reported GBS to symptomatic ZIKV), and we denote our ZIKV symptomatic ratio by θ. The ρ can be 334 estimated from the model. Then, the number of ZIKV cases equals (the number of reported GBS) ÷

335
[reported GBS/symptomatic ZIKV] ÷ (ZIKV symptomatic ratio), which is the number of the reported 336 GBS/ρ/θ. Therefore, the IAR equals the number ZIKV cases ÷ the total population in the NE Brazil.

337
For all pairs of θ and η in Table 2, the estimated IARs are similar with IAR ≈ from 22% to 28% 338 and the 95% CIs largely overlap. Thus, for θ = 0.5, we can be at least 95% sure the IAR of the ZIKV 339 epidemic is below 33%, and is likely to be well below. The estimates of the initial susceptible levels (S h (0)) and the parameters (p 1 , p 3 and p 4 ) that 341 control the temporal pattern of R 0 (t) are summarised in be very close to zero, and its contribution to the whole R 0 is probably far less than the mosquito-borne are not statistically different. According to the 95% CIs of S h (0), it is likely that over a quarter (i.e.,

357
> 25%) of the whole population were not involved in the 2015-16 ZIKV epidemic. 358 We estimate that the time points (τ 1 ) when the baseline of m(t) (or R 0 (t)) changes from p 3 to 359 p 4 in Eqn (6). It was found that τ 1 is most likely to be March 7 of 2016. For the parameters p 3 360    Table 1.
As is conventional, the Partial Rank Correlation Coefficients (PRCC) are adopted to perform a proportion (θ) and asymptomatic infectivity (η).

384
From a recent metadata study [20] and a serological study [19], the ratio ρ of GBS to symptomatic 385 ZIKV cases was found to be 0.00024 and 0.00032 respectively (see Introduction of this study). Similarly, 386 based on the data from eleven countries, Mier-y-Teran-Romero et al.
[15] found the overall estimate for 387 both conditions is similar, it is clear that GBS is a much rarer disease than microcephaly. Nevertheless, we still chose to predict ZIKV cases based on GBS rather than microcephaly cases, because of problems 414 in the correct reporting of microcephaly over the study period. For example, the criteria for identifying 415 microcephaly changed dramatically at different times over the two year period and in different areas, 416 making the reporting coverage highly unstable. Moreover, previous to this period, reporting was not 417 compulsory nor was there consistently defined criteria for identifying the condition.

418
Return now to the dynamics of the reconstructed ZIKV cases generated by Eqns (1) as calibrated 419 on the GBS data (Fig 4). The reproductive number, R 0 (t), which quantifies the transmission rate, was 420 reconstructed by modelling the local meteorological data with Eqn (6). The estimated R 0 (t) was found 421 to oscillate due to seasonality between the values 1.1 < R 0 < 3.3, and on average was found R 0 = 2.2. Previous estimates of IAR relied on poor ZIKV data in Brazilian regions: some estimates appear to 456 be less than 20%, and others yield more than 50% (see Supplementary Information S4). As mentioned, 457 all these estimates must be treated with caution. This study is the first to use the more reliable GBS 458 data as a proxy to estimate the IAR of ZIKV epidemics. We found that the IAR is likely to be below 459 33%.

460
In conclusion, we comment on the likelihood of a future major ZIKV outbreak in NE Brazil. Let 461 us start from a "naive assumption" that the whole population (100%) in NE Brazil was susceptible to 462 ZIKV at the beginning of 2015, even though it was probably less than 100%. Our results tell us that 463 the estimated IAR is most likely below 33%. This indicates that after the 2015-2016 ZIKV outbreaks, 464 probably more than (100 − 33% =) 67% of the population were susceptible and immune-naive. That 465 is, S h > 67%, after the last ZIKV outbreak that ended in 2016.

466
Recall that the effective reproduction number, R eff = S h R 0 < 1, must be less than unity to 467 ensure the epidemic will not emerge. Given the susceptibility at the end of the outbreak was more 468 than 67%, then we need R 0 < 1/S h = 1/67% ≈ 1.5 to ensure R eff < 1 under the naive assumption, 469 and no outbreak will emerge. An R 0 larger than approximately 1.5 will lead to a ZIKV outbreak.

470
On the other hand, according to our estimation (Table 3), the initial susceptibility, S h (0), at the start 471 of 2015 was likely to be below 75%. As Table 3 shows that, typically, S h (0) = 0.57% with 95% CI: