• Loading metrics

Estimating incidence of infection from diverse data sources: Zika virus in Puerto Rico, 2016

Estimating incidence of infection from diverse data sources: Zika virus in Puerto Rico, 2016

  • Talia M. Quandelacy, 
  • Jessica M. Healy, 
  • Bradford Greening, 
  • Dania M. Rodriguez, 
  • Koo-Whang Chung, 
  • Matthew J. Kuehnert, 
  • Brad J. Biggerstaff, 
  • Emilio Dirlikov, 
  • Luis Mier-y-Teran-Romero, 
  • Tyler M. Sharp


Emerging epidemics are challenging to track. Only a subset of cases is recognized and reported, as seen with the Zika virus (ZIKV) epidemic where large proportions of infection were asymptomatic. However, multiple imperfect indicators of infection provide an opportunity to estimate the underlying incidence of infection. We developed a modeling approach that integrates a generic Time-series Susceptible-Infected-Recovered epidemic model with assumptions about reporting biases in a Bayesian framework and applied it to the 2016 Zika epidemic in Puerto Rico using three indicators: suspected arboviral cases, suspected Zika-associated Guillain-Barré Syndrome cases, and blood bank data. Using this combination of surveillance data, we estimated the peak of the epidemic occurred during the week of August 15, 2016 (the 33rd week of year), and 120 to 140 (50% credible interval [CrI], 95% CrI: 97 to 170) weekly infections per 10,000 population occurred at the peak. By the end of 2016, we estimated that approximately 890,000 (95% CrI: 660,000 to 1,100,000) individuals were infected in 2016 (26%, 95% CrI: 19% to 33%, of the population infected). Utilizing multiple indicators offers the opportunity for real-time and retrospective situational awareness to support epidemic preparedness and response.

Author summary

Zika virus (ZIKV) infections, like many infections, are generally underreported due to asymptomatic, mild, or unrecognized cases. Using available surveillance indicators reflecting imperfect proxies of infection, we developed a modeling approach to estimate the weekly incidence of infection by combining independent surveillance indicators and assumptions about system-specific reporting biases in a Bayesian framework. Using our approach, we estimated that approximately 890,000 people in the population were infected with Zika in Puerto Rico in 2016, much higher than the 36,316 reported confirmed infections. Our framework has broad application to other diseases where cases may be underreported through traditional disease surveillance and can provide near real-time changes in incidences.


The emergence and rapid spread of Zika virus (ZIKV), an arbovirus transmitted by Aedes species mosquitoes, in the Americas [1] resulted in large-scale epidemics throughout the tropical areas of the region. The first confirmed locally acquired ZIKV case in Puerto Rico was reported on December 31, 2015 [2], followed by more than 36,000 confirmed cases in 2016 [3]. While confirmed cases provided an indicator of transmission intensity, reported cases represented a small proportion of actual infections [4] in part because many ZIKV infections are asymptomatic or mild, and are not captured by surveillance systems [57]. Furthermore, distinguishing symptomatic (i.e., disease) cases of ZIKV infections from other arboviral infections (e.g., dengue, chikungunya) was difficult due to their similar symptoms (e.g., fever, rash), and serological cross-reactivity with dengue viruses (DENV). Despite these challenges, estimating the underlying ZIKV infection incidence was critical to assess useful metrics (e.g., transmission intensity, the number of people previously infected, and the number still at risk) that informed prevention and response measures, and preparation for severe outcomes like Guillain-Barré Syndrome (GBS) [8] and congenital Zika syndrome [9].

ZIKV serosurveys, like those conducted in Yap and French Polynesia [5,10], provided estimates of cumulative incidence, but are logistically difficult, require substantial time and resources, and present diagnostic challenges due to varying duration of infection markers (RNA and different types of antibodies) and cross-reactivity [11]. Therefore, it is important to find alternative methods to estimate incidence of infection, including statistical techniques that can be easily applied to surveillance data.

Although many infections are undetected during outbreaks, data exist for the set of infections captured through surveillance systems. Bayesian statistical methods explicitly consider both variability in observations (data) and uncertainty in model parameters (e.g., probability of observation), and are well-suited to address challenges like estimating quantities that are not directly observed. During the emergence of ZIKV in Martinique, Andronico et al. [12] developed a Bayesian model to explicitly incorporate a classic epidemiological compartmental model with surveillance data from Martinique using prior information on ZIKV transmission, reporting rates, and GBS risk from French Polynesia. We employed a similar approach in Puerto Rico incorporating multiple surveillance indicators and prior information on the probability of observing infections.

We considered surveillance data on suspected arbovirus cases, suspected Zika-associated GBS cases, and infections identified through a subset of blood banks as indicators of infection. Suspected arbovirus cases identified through passive surveillance reflect symptomatic care-seeking individuals with symptoms indicative of ZIKV, dengue virus, or chikungunya virus infection. Suspected Zika-associated GBS cases represented a more severe and easily recognized manifestation, though GBS can also result from other causes. Blood donor data provided information on asymptomatic and pre-symptomatic infections identified through blood screening. To capture underlying infection dynamics, we used a generic Time-series Susceptible-Infected-Recovered epidemic model within a Bayesian framework to relate infections to data by utilizing evidence-based assumptions on detection probabilities for each indicator. In this framework, we estimated the weekly incidence on ZIKV infection and the cumulative number of infections in Puerto Rico in 2016.


Estimated infections based on individual surveillance indicators

In 2016, 65,820 suspected arboviral disease cases, 175 suspected GBS cases, and 360 ZIKV-positive blood donors (out of 54,588 tested), were reported in Puerto Rico (Fig 1A–1C). We used these data to estimate the weekly and cumulative ZIKV infection incidence in 2016 for each surveillance indicator independently (Fig 1D). Estimated suspected arbovirus cases, suspected GBS cases and ZIKV-positive blood donors from the individual indicator models reflected a reasonable fit to these reported indicator data (Fig A in S1 Text). Weekly ZIKV infections estimated from suspected arbovirus cases and suspected GBS cases had similar trends over time, peaking in August 2016 and declining thereafter. Using the blood bank data, estimated ZIKV incidence peaked in June followed by high incidence through August and declined afterwards. For cumulative incidence estimates based on each of the three indicators, the median estimate was lowest using suspected GBS cases (880,000 infections) and highest using blood bank testing data (960,000 infections), with substantial overlap of credible intervals (Table 1). Estimates based on suspected arbovirus cases had the lowest uncertainty (95% Credible Interval (95% CrI): 630,000 to 1,200,000) and estimates based on suspected GBS cases had the highest (95% CrI: 420,000 to 1,300,000). The estimated proportion of the population infected during the outbreak was similar across the three indicators, with 27% (95% CrI: 19% to 35%) infected using suspected arbovirus case indicator, 26% (95% CrI: 12% to 38%) using the suspected GBS indicator, and 28% (95% CrI: 19% to 37%) using the blood bank indicator.

Fig 1. Suspected arbovirus cases, suspected ZIKV-associated Guillain-Barré Syndrome (GBS) cases, ZIKV-positive blood donors, and estimated weekly Zika infections during the 2016 outbreak in Puerto Rico.

A) Number of suspected arbovirus cases reported (green). B) Number of suspected ZIKV-associated GBS cases reported. C) Number of ZIKV-positive blood donors identified from blood donor screening. D) Estimated weekly infections using each indicator model separately. Colors refer to each specific indicator used. E) Estimated weekly infections from a model using three combined surveillance indicators. Dark bounds refer to the 50% range (interquartile range) and lighter bounds refer to the 95% credible interval (CrI).

Table 1. Estimated Zika virus infections and 95% credible intervals (CrI) using three surveillance indicators.

Estimated infections with combined surveillance indicators

Using a combination of the three surveillance indicators, infections peaked between August and September 2016 (Fig 1E), reflecting the combined peaks in incidence from the three indicators. We estimated that the total incident ZIKV infections was most likely between 810,000 and 970,000 infections (50% CrI, 95% CrI: 660,000 to 1,100,000) (Table 1), corresponding to 24% to 29% (50% CrI, 95% CrI: 19% to 33%) of the total population. These estimates correspond well to an a priori triangular probability distribution used to anticipate resource needs during the epidemic [13,14] (Fig 2). The combined estimates had reduced uncertainty compared to that triangle distribution and each independent estimate based on the individual surveillance indicators.

Fig 2. Estimated probability distribution for the proportion of incident ZIKV infections in Puerto Rico in 2016, and probabilities obtained from published literature.

The triangle represents the estimated distribution of possible incident infections from a priori estimates [13,14] based on previous Zika serosurveys (vertical lines), Zika outbreaks and other arboviral outbreaks in Puerto Rico studies published literature [5,1525]. Thick lines represent the distribution of the proportion infected estimated from combined surveillance indicators (dark red), and separate surveillance indicators.

Prior and posterior parameter distributions

For each of the estimates reported above, we used a model with an informed set of prior parameter distributions. However, we also compared these estimates to those with less informed priors (increased variance) and naïve priors (Fig 3A). Increased uncertainty in the priors resulted in similar median estimates for infections throughout 2016, but increased uncertainty especially for the lower bound of the credible intervals (Table 2).

Fig 3. Prior and posterior parameter distributions from individual indicator models, the combined indicator model, and the combined model over time.

A) Prior distributions of six model parameters. Color lines refer to assessed variance assumptions of prior distributions in sensitivity analyses. Final individual and combined indicator models used informative priors. B) Posterior distributions of model parameters from individual indicator models. The dashed lines for the Beta parameters refer to the posterior parameter distributions from three individual indicator models (suspected arbovirus cases, suspected GBS and blood bank). Separate plots of the beta parameters for each indicator model are available in Fig B in S1 Text. C) Posterior distributions of model parameters from the combined model. D) Posterior distributions over time (i.e., four-week increments from the end of January 2016 to the end of December 2016) from the combined model using informative priors. Dashed lines refer to the informative priors for each model parameter. Darker transparency of the lines refers to each the posterior distribution from each 4-week increment over time (i.e., the darkest lines coincide with 4-weeks increments further into the time-series).

Table 2. Estimated cumulative Zika virus (ZIKV) infections from models using informative, naïve, and increased variances for prior distributions, Puerto Rico, 2016.

For the three prior distributions, the baseline transmission parameter (β0) converged on a similar value regardless of the surveillance data used. As expected, the less informative priors led to higher uncertainty in the posterior distributions for the outcome parameters, particularly for the individual models in which additional data are not available to inform the estimates (Fig 3B). The most notable effect was seen for the suspected GBS surveillance model. In this case, the posterior baseline GBS risk (pG0) was slightly higher and the ZIKV-specific infection risk (pG|Z) was highly uncertain, indicating a lack of sufficient information to distinguish between the two components of GBS risk. On the other hand, the combined model was able to resolve all parameters regardless of the assumed prior variance. However, parameters using naïve priors still had more uncertainty in their posterior distributions (Fig 3C).

Posterior estimates over time

Over the progression of the outbreak, the parameter posteriors evolved over time as more data became available for each surveillance indicator (Fig 3D). For the first 4 weeks of 2016, the posterior estimates for individual parameters largely reflected the priors. However, by 8 weeks the posteriors started to shift, narrow, and stabilize.

We observed similar trends when assessing how the incidence estimates of the individual and combined models changed over 4-week increments (Fig C in S1 Text). When incorporating new data for each surveillance indicator and the combined model over the course of the 2016 epidemic, the incidence estimates had the largest uncertainty in the earliest weeks of the outbreak (Fig C in S1 Text), though the uncertainty was larger for individual models. For each 4-week estimation, the end-of-year estimate based on the full dataset fell within the uncertainty bounds.

Evaluating the time-varying transmission parameter (βt) possible seasonality showed the βt parameter had a constant pattern with some week-to-week variation (Fig D in S1 Text). This trend is consistent with a pattern expected of an emerging pathogen as opposed to a seasonal trend, where stronger seasonal oscillations would be expected.


Estimates of the true burden of infection for ZIKV, like other pathogens, is challenging because many infections are inapparent; apparent infections are not always recognized, confirmed or reported; and disease surveillance systems for collecting case data are highly varied. Here, we developed an epidemic model, applied within a Bayesian modeling framework, to estimate ZIKV incidence in Puerto Rico using three separate infection indicators available from multiple surveillance systems and assumptions about detection probabilities for each system. Our approach utilizes different data sources to increase the precision of infection estimates over time and may further reduce bias by accounting for inherent surveillance biases based on the probability of detection. Using this framework, we estimated that ZIKV infections occurred in roughly a quarter of the population, resulting in 890,000 total incident ZIKV infections in Puerto Rico in 2016, translating to an average of 36 to 44 new infections per 10,000 people per week. The peak of the epidemic occurred during the week of August 15, 2016 (i.e., week 33), when an estimated 120–140 weekly incident infections occurred per 10,000 people, and correspond to the observed mid-August peak of the outbreak when 2,542 ZIKV cases were confirmed [3].

We estimated that 19–33% of the population had symptomatic and asymptomatic ZIKV infections in Puerto Rico in 2016, a much higher proportion infected compared to the reported 36,316 confirmed cases (approximately 1% of the population) [3]. This estimate was similar to estimates for other arboviral outbreaks in Puerto Rico and to ZIKV estimates generated using other approaches. Household-based cluster investigations from September to October of 2016 in Puerto Rico found 114 of 367 (31%) participants with ZIKV infection [4] and a 2016–2017 cohort study estimated a 34% (30–39%) prevalence among 366 household contacts [26], while another Bayesian approach estimated an infection attack rate of 0.31 (95% CrI: 0.28–0.35) for all of Puerto Rico [27]. Community-level dengue seroprevalence studies conducted after emerging outbreaks indicated infection rates of approximately 47% (range: 8–79%, 1969) [15,28] and 30% (range: 22–45%, 1982) [29,16]. The 2014–2015 chikungunya epidemic resulted in 23.5% seropositivity among 1,031 blood donors in 2015 [17], and for communities participating in a chikungunya vector control study, seroprevalence was 23% (intervention) and 45% (non-intervention) [28]. Compared to other blood bank models, our estimate of 28% corresponds to another estimate of 21% (95% confidence interval 18–24%) using data from April to December 2016 [29] while another model by Chevalier et al., using data from April to August 2016 found a lower estimate of infection, 12.9% (95% CI 11.0% - 15.4%), during the 5-month period [30].

In contrast to our analysis, studies from other islands found substantially higher seroprevalence estimates, including 73% in Yap in 2007 [5], 49% in French Polynesia in 2013–2014 [10], and 42%–50% in Martinique in 2015 [7,31]. The differences in the estimated underlying infection burdens may be in part due to heterogeneity in exposure to infection even within an island population, as seen in early dengue serosurveys in Puerto Rico and municipality-level estimates for ZIKV infection rates [27]. In this analysis, we did not attempt to estimate municipality level due to sparse suspected GBS case data and limited spatial representativeness of blood bank data. Invasion patterns, by nature drive some spatial heterogeneity which can influence longer-term transmission dynamics, since areas with high immunity may limit transmission to areas with low immunity. Other factors that may contribute to differing levels of immunity include human mobility, underlying socioeconomics, and cross-immunity from other arboviruses, and all warrant further examination. The estimated 2016 Puerto Rico infection rate is well below most model estimates for how large a Zika epidemic would be if the population was perfectly mixed [32].

The framework developed here offers advantages beyond the estimation of epidemic size. In practice, we used the model to actively inform situational awareness beginning in August 2016. From the analysis of performance over time, the model could have provided useful information even earlier in the epidemic, despite limited availability of information about the outcome probabilities. Informative priors were available early in the year for baseline GBS risk and for the blood bank reporting factor. While specification of informative prior distributions for the reporting of ZIKV cases and suspected GBS cases would have been more challenging, they would not have been completely naïve. Critically, combining indicators lessened the need for strong prior information for any single indicator. Assuming less precise priors had some effect on outcome precision for all models but had the least effect when indicators were combined, except in the case of completely naïve priors. Similar observations were reported by Andronico et al. [12], where examination of different prior assumptions did not strongly affect parameter estimates.

Our approach had some limitations. One of our key indicators, suspected arboviral cases, uses a broad case definition to capture all potential symptomatic cases, and likely includes cases caused by other circulating arboviruses, such as dengue and chikungunya viruses. Confirmed rather than suspected cases could have been used as an indicator, however confirmation is also dependent on testing and test timing and introduces an additional delay. The results also suggest this would have made little difference as the model based only on suspected arboviral cases gave estimates in the same range as those for the other indicators and 92% of suspected arboviral cases were confirmed as ZIKV infections, as dengue and chikungunya were rare in Puerto Rico in 2016 [3]. We expect if there had been more dengue and chikungunya cases, our framework would improve differentiation from ZIKV cases from other arboviral cases. In general, our prior assumptions were based on data available at the time. If these early data contain hidden bias, resulting estimates could also be biased. Our results suggest that being conservative with respect to precision does not sacrifice substantial precision in the results, in part due to using multiple indicators. Our framework did not allow for substantial transmission variation beyond some week-to-week variability and reduced incidence due to acquired immunity in those already infected. In practice, transmission likely changes seasonally and can be influenced by vector control measures. In this case, with the introduction of a completely novel virus and the lack of large-scale effective vector control measures, we expect there would have been little change in transmissibility throughout the year and examination of βt over time did not show evidence of seasonal variation. Nonetheless, it is likely that there are some seasonal patterns to transmission. Those would likely be stronger in other epidemiological settings and could be incorporated into future versions of the framework. In addition, including biologically relevant weighting of the transmission parameter relative to each surveillance indicator could provide additional insight on transmission dynamics over time, including direct estimates of the reproductive number, as well as differences between the surveillance indicators [33,34].

Reported cases generally underestimate the true number of incident infections occurring in an epidemic, since they capture only recognized and reported clinical infections. However, multiple imperfect indicators provide the opportunity to estimate the underlying incidence of infection, utilizing multiple complementary indicators: (1) a broad non-specific indicator with relatively high counts (suspect arboviral disease cases); (2) an indicator of a severe outcome that is rare, but likely to have high reporting fidelity (GBS); and (3) an age- and geography-biased sample of infection prevalence (blood donors). Each indicator offers a unique but biased insight into the progression of the epidemic, capturing different case subgroups including different age-groups, and, when combined, can provide critical situational awareness about the progression of the epidemic.

The approach described here can estimate how many people have been infected in near real-time or identify changes in the trajectory of incidence across various indicators. It is also useful for post-hoc analysis to understand what the impact may have been on the population-level and whether more transmission may be expected. With 19–33% of the population infected in 2016, ZIKV transmission should be much more limited but still possible in Puerto Rico, particularly in areas that may have experienced lower infection rates. These insights are critical both for preparedness for and response to future epidemics, and this modeling approach is applicable to future Zika epidemics as well as epidemics of other pathogens. This approach could be applied to future outbreaks of dengue and other arboviruses in Puerto Rico, using suspected arboviral case, GBS, blood bank, and potentially other data as indicators. Likewise, other types of surveillance data, like reported influenza-like-illness (ILI)-associated hospitalizations, outpatient ILI visits and reported laboratory-confirmed specimens, could be used to enhance influenza surveillance. Each have limitations as individual indicators of the incidence of influenza infection, but when combined they may best approximate the incidence of influenza infection. Approaches like the one we present here provide a tool to incorporate these diverse data and the uncertainties in them to generate timely estimates of incident infection and inform response and control efforts.


Ethics statement

Exemption was obtained from the CDC Human Subjects Research Office as the data were collected as part of regular surveillance activities.


We collected data on suspected arboviral disease cases, suspected GBS cases, and infections detected among blood donors reported in Puerto Rico during January 1, 2016–December 31, 2016. A suspected arboviral disease case was defined as any patient with clinically suspected illness resulting from an arbovirus infection and reported through the Passive Arboviral Diseases Surveillance System (PADSS) in Puerto Rico [3,35]. Suspected GBS cases were patients experiencing onset of neurological symptoms characteristic of GBS (e.g., bilateral flaccid limb weakness [13]) reported through the GBS Passive Surveillance System–a surveillance system capturing GBS cases along with patients experiencing other neurological symptoms (e.g., encephalitis [13]) and operated by the Puerto Rico Department of Public Health (PRDH) with support from the Centers for Disease Control and Prevention (CDC) [36]. Not all suspected GBS cases were confirmed to be either ZIKV infections or GBS cases, rather they represented real-time reports of possible GBS cases. We used an indicator for asymptomatic and pre-symptomatic infections using blood bank data beginning in April 2016 when all blood donations were tested for ZIKV RNA [30]. Two blood collection agencies provided the numbers of total donations and the number testing positive for ZIKV [30], the population of donors was not representative, being predominantly male and not including anyone under age 16. For the population of Puerto Rico, we assumed the population was approximately 3.4 million people based on 2016 census estimates [37]. See S1 Data for the weekly suspected arboviral disease cases, suspected GBS cases, and blood donor data.

Epidemic model

We developed a generalized Bayesian discrete Time-series Susceptible-Infected-Removed model [33,34] to fit surveillance data over the 52 weeks of 2016 to estimate weekly ZIKV infection incidence in Puerto Rico. We used an underlying SIR epidemic model, where a proportion of the population was infected each week, zt, and was defined as the product of the proportion infected in the previous week (zt−1), the proportion of population that was susceptible in the previous week (st−1), and a time-varying transmission rate, βt: where we use the notation Distribution(a,b)(θ) to indicate that we use the stated Distribution with relevant parameter(s) θ but restricted to support (a,b), and so scaled to yield a valid distribution. The time-varying transmission rate (βt) was assumed to have a constant mean reflecting no substantial control, with random variability between weeks, and was constrained to be greater than or equal to zero using a half-normal distribution. We assumed a prior for the baseline transmission rate (β0, β0~Normal(0,∞)(2, 1)) reflecting an expected weekly reproductive number on the order of 1 to 5 [32]. The prior for the standard deviation had a similar scale (σ~Normal(0,∞)(0, 1)). Though the average time between successive generations of arboviral infections, i.e., generation times, is typically several weeks [33,38], we implemented this model with a weekly time step, intended to reflect a generic representation of a weekly transmission process in which βt cannot be directly interpreted as R0, the basic reproductive number. We did not explicitly aim to model the initial phases of the epidemic, and therefore only estimated an initial proportion of the population infected in the first week of 2016, using a restricted normal prior to indicate a small prevalence of infection that week (z0~Normal(0,1)(0, 0.001)). All other individuals were assumed susceptible to infection, as evidence suggests only very limited transmission prior to 2016 [2]. Our model assumed a closed population, meaning that susceptible and infection population estimates depended only on population-level risk, and that there were no births, deaths, or migration.

Reporting models

For each surveillance indicator, we estimated the probability of observing an infection as a function of infection risk (zt) and the observation process for each data type within the epidemic model. The combined model estimated incident infections from the three individual indicators within one epidemic model.

Suspected arboviral cases

Given that laboratory testing identified very few dengue and chikungunya cases [3], we assumed most suspected arboviral cases were suspected Zika cases. We estimated the expected number of reported suspect arboviral cases (St) as the product of population size (N), ZIKV infection prevalence (zt), and the probability of reporting a clinical suspect Zika case per infection (pS|Z) and fit the case data using the negative binomial distribution formulated as a mean and dispersion: For pS|Z, we used a beta-distributed prior to approximate a mean of 0.11 and 95% credible interval [CrI] of 0.01–0.24 based on Mier-y-Teran et al. [39] (pS|Z~Beta(3.3, 27)) (Table 3). The dispersion, φS, was assigned a prior distribution with high expected overdispersion, ΦS~Normal(0,∞)(0, 1000).

Suspected ZIKV-associated GBS cases

We assumed the number of observed suspected GBS cases (GBSt) came from a binomial distribution: where pG0 is the weekly risk of GBS due to other causes and pG|Z is the probability of suspect GBS given ZIKV infection and pG0+(pG|ZpG0*zt) represents the probability of an individual in population N being reported as a suspected GBS case. We assumed the global baseline GBS risk was 0.8–1.9 GBS cases per 100,000 per year [40] for Puerto Rico, and the weekly risk was approximated using a beta prior distribution: pG0~Beta(23, 8.9x107). The prior for the probability of suspected GBS given ZIKV infection pG|Z was based on an estimated range of 0.5–4.6 GBS cases per 10,000 ZIKV infections [39], which was approximated with a beta distribution (pG|Z~Beta(5.9, 2.3x104)).

Blood bank indicator

We assumed the number of positive blood donors (BZ,t) was a binomial sample from all tested donors (Bt): We used an adjustment factor (fBB) to account for the duration of test positivity and the proportion of infected individuals excluded from donating blood because they were symptomatic at the time of donation [30]. Assuming weekly testing, the equations and distributions of Chevalier et al. were used to approximate a prior distribution for this factor: We sampled from distributions of each component, pA, the proportion of asymptomatic infections [5,6,11]; V, the duration of viremia [41]; and D, incubation period [41,42] (see S1 Text), to estimate a Gamma prior for fBB: fBB = Gamma(6.7, 7.6).

Analysis of priors

We examined the effect of different prior distribution assumptions on model posteriors for the probability of a suspected case being reported (pS|Z), the probability of acquiring GBS if ZIKV infected (pG|Z), and the relative incidence of ZIKV in the general population compared to positivity in blood donor (fBB) parameters. We assessed the models with three alternative types of prior variance: informative (as described above), informative with doubled standard deviation in the prior and naïve (uniform or flattened). Prior distributions for each parameter under each assumption are available in Table A in S1 Text.

Model fitting

We fitted epidemic models using a Markov chain Monte Carlo (MCMC) Bayesian framework to estimate incident ZIKV infections and 95% credible intervals [95% CrI] from the three surveillance indicators individually and using all three indicators combined. For each indicator model, we performed 1,001,000 iterations for three chains, and discarded the initial 1,000 iterations as the burn-in period. We evaluated convergence using the Gelman-Rubin diagnostic [32] and thinned the output using every 1,000th sample to obtain 1,000 effectively uncorrelated simulations per chain. For the MCMC simulations, we used the rstan package version 2.19.3 of Stan (version 2.18.0–1, Stan Development Team,, and coda version 0.19–2 [43] package in R version 3.3.2 ( See S1 Code for the combined indicator model Stan code.

Supporting information

S1 Text. Supporting Information for “Estimating incidence of infection from diverse data sources: Zika virus in Puerto Rico, 2016”.

This supplement contains additional parameter descriptions and additional tables and figures.


S1 Code. For combined indicator model.

Stan code for running the model using the combined indicators.



The findings in this article are those of the authors and do not necessarily represent the official position of the U.S. Centers for Disease Control and Prevention or the U.S. Public Health Service.


  1. 1. Fauci AS, Morens DM. Zika Virus in the Americas—Yet Another Arbovirus Threat. N Engl J Med. 2016;374(7):601–4. pmid:26761185
  2. 2. Thomas DL, Sharp TM, Torres J, Armstrong PA, Munoz-Jordan J, Ryff KR, et al. Local Transmission of Zika Virus—Puerto Rico, November 23, 2015-January 28, 2016. MMWR Morb Mortal Wkly Rep. 2016;65(6):154–8. pmid:26890470
  3. 3. Sharp TM, Quandelacy TM, Adams LE, Aponte JT, Lozier MJ, Ryff K, et al. Epidemiologic and spatiotemporal trends of Zika Virus disease during the 2016 epidemic in Puerto Rico. PLoS Negl Trop Dis. 2020;14(9):e0008532. pmid:32956416
  4. 4. Lozier MJ, Burke RM, Lopez J, Acevedo V, Amador M, Read JS, et al. Differences in Prevalence of Symptomatic Zika Virus Infection, by Age and Sex-Puerto Rico, 2016. J Infect Dis. 2018;217(11):1678–89. pmid:29216376
  5. 5. Duffy MR, Chen TH, Hancock WT, Powers AM, Kool JL, Lanciotti RS, et al. Zika virus outbreak on Yap Island, Federated States of Micronesia. N Engl J Med. 2009;360(24):2536–43. pmid:19516034
  6. 6. Musso D, Nhan T, Robin E, Roche C, Bierlaire D, Zisou K, et al. Potential for Zika virus transmission through blood transfusion demonstrated during an outbreak in French Polynesia, November 2013 to February 2014. Euro Surveill. 2014;19(14). pmid:24739982
  7. 7. Gallian P, Cabié A, Richard P, Paturel L, Charrel RN, Pastorino B, et al. Zika virus in asymptomatic blood donors in Martinique. Blood. 2017;129(2):263–6. pmid:27827826
  8. 8. Krauer F, Riesen M, Reveiz L, Oladapo OT, Martínez-Vega R, Porgo TV, et al. Zika Virus Infection as a Cause of Congenital Brain Abnormalities and Guillain-Barré Syndrome: Systematic Review. PLoS Med. 2017;14(1):e1002203. pmid:28045901
  9. 9. Rasmussen SA, Jamieson DJ, Honein MA, Petersen LR. Zika Virus and Birth Defects—Reviewing the Evidence for Causality. N Engl J Med. 2016;374(20):1981–7. pmid:27074377
  10. 10. Aubry M, Teissier A, Huart M, Merceron S, Vanhomwegen J, Roche C, et al. Zika Virus Seroprevalence, French Polynesia, 2014–2015. Emerg Infect Dis. 2017;23(4):669–72. pmid:28084987
  11. 11. Mitchell PK, Mier YT-RL, Biggerstaff BJ, Delorey MJ, Aubry M, Cao-Lormeau VM, et al. Reassessing Serosurvey-Based Estimates of the Symptomatic Proportion of Zika Virus Infections. Am J Epidemiol. 2019;188(1):206–13. pmid:30165474
  12. 12. Andronico A, Dorléans F, Fergé JL, Salje H, Ghawché F, Signate A, et al. Real-Time Assessment of Health-Care Requirements During the Zika Virus Epidemic in Martinique. Am J Epidemiol. 2017;186(10):1194–203. pmid:28200111
  13. 13. Dirlikov E, Kniss K, Major C, Thomas D, Virgen CA, Mayshack M, et al. Guillain-Barré Syndrome and Healthcare Needs during Zika Virus Transmission, Puerto Rico, 2016. Emerg Infect Dis. 2017;23(1):134–6. pmid:27779466
  14. 14. Ellington SR, Devine O, Bertolli J, Martinez Quiñones A, Shapiro-Mendoza CK, Perez-Padilla J, et al. Estimating the Number of Pregnant Women Infected With Zika Virus and Expected Infants With Microcephaly Following the Zika Virus Outbreak in Puerto Rico, 2016. JAMA Pediatr. 2016;170(10):940–5. pmid:27544075
  15. 15. Likosky WH, Calisher CH, Michelson AL, Correa-Coronas R, Henderson BE, Feldman RA. An epidermiologic study of dengue type 2 in Puerto Rico, 1969. Am J Epidemiol. 1973;97(4):264–75. pmid:4697647
  16. 16. Waterman SH, Novak RJ, Sather GE, Bailey RE, Rios I, Gubler DJ. Dengue transmission in two Puerto Rican communities in 1982. Am J Trop Med Hyg. 1985;34(3):625–32. pmid:4003671
  17. 17. Simmons G, Brès V, Lu K, Liss NM, Brambilla DJ, Ryff KR, et al. High Incidence of Chikungunya Virus and Frequency of Viremic Blood Donations during Epidemic, Puerto Rico, USA, 2014. Emerg Infect Dis. 2016;22(7):1221–8. pmid:27070192
  18. 18. Macnamara FN. Zika virus: a report on three cases of human infection during an epidemic of jaundice in Nigeria. Trans R Soc Trop Med Hyg. 1954;48(2):139–45. pmid:13157159
  19. 19. Hammon WM, Schrack WD, Jr., Sather GE. Serological survey for a arthropod-borne virus infections in the Philippines. Am J Trop Med Hyg. 1958;7(3):323–8. pmid:13533740
  20. 20. Dick GW, Kitchen SF, Haddow AJ. Zika virus. I. Isolations and serological specificity. Trans R Soc Trop Med Hyg. 1952;46(5):509–20. pmid:12995440
  21. 21. Robin Y, Mouchet J. [Serological and entomological study on yellow fever in Sierra Leone]. Bull Soc Pathol Exot Filiales. 1975;68(3):249–58. pmid:1243735
  22. 22. Smithburn KC. Neutralizing antibodies against certain recently isolated viruses in the sera of human beings residing in East Africa. J Immunol. 1952;69(2):223–34. pmid:14946416
  23. 23. Smithburn KC. Neutralizing antibodies against arthropod-borne viruses in the sera of long-time residents of Malaya and Borneo. Am J Hyg. 1954;59(2):157–63. pmid:13138582
  24. 24. Pond WL. Arthropod-borne virus antibodies in sera from residents of South-East Asia. Trans R Soc Trop Med Hyg. 1963;57:364–71. pmid:14062273
  25. 25. Cauchemez S, Besnard M, Bompard P, Dub T, Guillemette-Artur P, Eyrolle-Guignot D, et al. Association between Zika virus and microcephaly in French Polynesia, 2013–15: a retrospective study. Lancet. 2016;387(10033):2125–32. pmid:26993883
  26. 26. Rosenberg ES, Doyle K, Munoz-Jordan JL, Klein L, Adams L, Lozier M, et al. Prevalence and Incidence of Zika Virus Infection Among Household Contacts of Patients With Zika Virus Disease, Puerto Rico, 2016–2017. J Infect Dis. 2019;220(6):932–9. pmid:30544195
  27. 27. Moore SM, Oidtman RJ, Soda KJ, Siraj AS, Reiner RC Jr., Barker CM, et al. Leveraging multiple data types to estimate the size of the Zika epidemic in the Americas. PLoS Negl Trop Dis. 2020;14(9):e0008640. pmid:32986701
  28. 28. Lorenzi OD, Major C, Acevedo V, Perez-Padilla J, Rivera A, Biggerstaff BJ, et al. Reduced Incidence of Chikungunya Virus Infection in Communities with Ongoing Aedes Aegypti Mosquito Trap Intervention Studies—Salinas and Guayama, Puerto Rico, November 2015-February 2016. MMWR Morb Mortal Wkly Rep. 2016;65(18):479–80. pmid:27171600
  29. 29. Williamson PC, Biggerstaff BJ, Simmons G, Stone M, Winkelman V, Latoni G, et al. Evolving viral and serological stages of Zika virus RNA-positive blood donors and estimation of incidence of infection during the 2016 Puerto Rican Zika epidemic: an observational cohort study. Lancet Infect Dis. 2020;20(12):1437–45. pmid:32673594
  30. 30. Chevalier MS, Biggerstaff BJ, Basavaraju SV, Ocfemia MCB, Alsina JO, Climent-Peris C, et al. Use of Blood Donor Screening Data to Estimate Zika Virus Incidence, Puerto Rico, April-August 2016. Emerg Infect Dis. 2017;23(5):790–5. pmid:28263141
  31. 31. Cousien A, Abel S, Monthieux A, Andronico A, Calmont I, Cervantes M, et al. Assessing Zika Virus Transmission Within Households During an Outbreak in Martinique, 2015–2016. Am J Epidemiol. 2019;188(7):1389–96. pmid:30995296
  32. 32. Keegan LT, Lessler J, Johansson MA. Quantifying Zika: Advancing the Epidemiology of Zika With Quantitative Models. J Infect Dis. 2017;216(suppl_10):S884–s90. pmid:29267915
  33. 33. Perkins TA, Metcalf CJ, Grenfell BT, Tatem AJ. Estimating drivers of autochthonous transmission of chikungunya virus in its invasion of the americas. PLoS Curr. 2015;7. pmid:25737803
  34. 34. Siraj AS, Oidtman RJ, Huber JH, Kraemer MUG, Brady OJ, Johansson MA, et al. Temperature modulates dengue virus epidemic growth rates through its effects on reproduction numbers and generation intervals. PLOS Neglected Tropical Diseases. 2017;11(7):e0005797. pmid:28723920
  35. 35. Sharp TM, Fischer M, Muñoz-Jordán JL, Paz-Bailey G, Staples JE, Gregory CJ, et al. Dengue and Zika Virus Diagnostic Testing for Patients with a Clinically Compatible Illness and Risk for Infection with Both Viruses. MMWR Recomm Rep. 2019;68(1):1–10. pmid:31194720
  36. 36. Dirlikov E, Ryff KR, Torres-Aponte J, Thomas DL, Perez-Padilla J, Munoz-Jordan J, et al. Update: Ongoing Zika Virus Transmission—Puerto Rico, November 1, 2015-April 14, 2016. MMWR Morb Mortal Wkly Rep. 2016;65(17):451–5. pmid:27149205
  37. 37. Bureau UC. Quick Facts—Puerto Rico 2016 [Available from:
  38. 38. Ferguson NM, Cucunubá ZM, Dorigatti I, Nedjati-Gilani GL, Donnelly CA, Basáñez MG, et al. EPIDEMIOLOGY. Countering the Zika epidemic in Latin America. Science. 2016;353(6297):353–4. pmid:27417493
  39. 39. Mier YT-RL, Delorey MJ, Sejvar JJ, Johansson MA. Guillain-Barré syndrome risk among individuals infected with Zika virus: a multi-country assessment. BMC Med. 2018;16(1):67. pmid:29759069
  40. 40. Sejvar JJ, Baughman AL, Wise M, Morgan OW. Population incidence of Guillain-Barré syndrome: a systematic review and meta-analysis. Neuroepidemiology. 2011;36(2):123–33. pmid:21422765
  41. 41. Lessler J, Chaisson LH, Kucirka LM, Bi Q, Grantz K, Salje H, et al. Assessing the global threat from Zika virus. Science. 2016;353(6300):aaf8160. pmid:27417495
  42. 42. Lessler J, Ott CT, Carcelen AC, Konikoff JM, Williamson J, Bi Q, et al. Times to key events in Zika virus infection and implications for blood donation: a systematic review. Bull World Health Organ. 2016;94(11):841–9. pmid:27821887
  43. 43. Plummer MB N; Cowles K; Vines K. CODA: convergence diagnosis and output analysis for MCMC. R News. 2006;6(1):7–11.