Leveraging multiple data types to estimate the size of the Zika epidemic in the Americas

Sean M. Moore; Rachel J. Oidtman; K. James Soda; Amir S. Siraj; Robert C. Reiner Jr.; Christopher M. Barker; T. Alex Perkins

doi:10.1371/journal.pntd.0008640

Abstract

Several hundred thousand Zika cases have been reported across the Americas since 2015. Incidence of infection was likely much higher, however, due to a high frequency of asymptomatic infection and other challenges that surveillance systems faced. Using a hierarchical Bayesian model with empirically-informed priors, we leveraged multiple types of Zika case data from 15 countries to estimate subnational reporting probabilities and infection attack rates (IARs). Zika IAR estimates ranged from 0.084 (95% CrI: 0.067–0.096) in Peru to 0.361 (95% CrI: 0.214–0.514) in Ecuador, with significant subnational variability in every country. Totaling infection estimates across these and 33 other countries and territories, our results suggest that 132.3 million (95% CrI: 111.3-170.2 million) people in the Americas had been infected by the end of 2018. These estimates represent the most extensive attempt to determine the size of the Zika epidemic in the Americas, offering a baseline for assessing the risk of future Zika epidemics in this region.

Author summary

During the recent Zika epidemic in the Americas millions of people were likely infected, but the true size of the epidemic is unknown because of gaps in the surveillance system. The infection attack rate (IAR)—defined as the proportion of the population that was infected over the course of the epidemic—has important implications for the longer-term epidemiology of Zika in the region, such as the timing, location, and likelihood of future outbreaks. To estimate the IAR and the total number of people infected, we leveraged multiple types of Zika case data from 15 countries and territories where subnational data were publicly available. Datasets included confirmed and suspected Zika cases in pregnant women and in the total population, Zika-associated Guillan-Barré syndrome cases, and cases of congenital Zika syndrome. We used a hierarchical Bayesian model with empirically-informed priors that leveraged the different case report types to simultaneously estimate national and subnational reporting probabilities, the fraction of symptomatic infections, and subnational IARs. In these 15 countries and territories, estimates of Zika IAR ranged from 0.084 (95% CrI: 0.067–0.096) in Peru to 0.361 (95% CrI: 0.214–0.514) in Ecuador. Totaling these infection estimates across these and 33 other countries and territories in the region, our results suggest that 132.3 million (95% CrI: 111.3-170.2 million) people in the Americas were infected with ZIKV by the end of 2018.

Citation: Moore SM, Oidtman RJ, Soda KJ, Siraj AS, Reiner RC Jr, Barker CM, et al. (2020) Leveraging multiple data types to estimate the size of the Zika epidemic in the Americas. PLoS Negl Trop Dis 14(9): e0008640. https://doi.org/10.1371/journal.pntd.0008640

Editor: Elvina Viennet, Australian Red Cross Lifelood, AUSTRALIA

Received: April 14, 2020; Accepted: July 25, 2020; Published: September 28, 2020

Copyright: © 2020 Moore et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All of the cumulative Zika case data used in our analysis are located in a public Github repository: https://github.com/mooresea/Zika_IAR.

Funding: Funding was provided by NIH supplement Grant R01 AI102939-05 (www.nih.gov) to TAP and RAPID grant DEB 1641130 from the National Science Foundation (www.nsf.gov) to TAP and RCR. TAP was also supported by a DARPA (www.darpa.mil) Young Faculty Award (D16AP00114). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Zika virus is a mosquito-borne pathogen that was first identified in Uganda in 1947 [1]. Smaller Zika outbreaks have occurred in Africa, Asia, and the Pacific islands since its discovery, but there had been no confirmed cases in the Americas (excluding Easter Island, Chile) prior to the first confirmation of a Zika case in Brazil in May, 2015 [2]. Subsequent to its discovery in Brazil, the epidemic spread rapidly and cases were reported throughout the Americas over the next two years [3]. The Zika epidemic generated a large amount of concern in the public health community and the general public, leading to a declaration of a Public Health Emergency of International Concern by the World Health Organization (WHO) in February, 2016 [4], because of the discovery of a link between ZIKV infection in pregnant women and congenital Zika syndrome (CZS) in newborns [5–8]. ZIKV infection is also associated with rare but serious neurological disorders, particularly Guillan-Barré syndrome [9]. Following the large epidemic from 2015-2017, substantially fewer Zika cases have been reported in 2018 and 2019 [3].

Now that the initial wave of the ZIKV epidemic in the Americas has passed, there are a number of unanswered questions about what will happen next. If the remaining population at risk is large, then additional outbreaks in the coming years are still possible. On the other hand, modeling has suggested that if a large proportion of the population is now immune, herd immunity will likely prevent another large epidemic for more than a decade assuming life-long immunity following recovery [10]. This scenario could lead to a slow buildup of new susceptible individuals over time, and of particular concern, an eventual buildup of susceptible women of childbearing age if a ZIKV vaccine is not licensed and broadly deployed [10–12]. The number of recent ZIKV infections could also have relevance for the epidemiology of dengue virus (DENV) in the region [10]. There is evidence of an interaction between ZIKV and DENV via the human immune response to infection with either virus [13–17]. If a ZIKV infection provides any temporary cross-protection to DENV, then the reduction in dengue incidence in several Latin American countries over the past few years [18] could be followed by a large dengue epidemic as this temporary cross-protection wanes.

The Pan American Health Organization (PAHO) reported suspected and confirmed Zika cases for every country and territory in the Americas, but these reported cases vastly underestimate the total number of ZIKV infections due to inadequate surveillance, the non-specificity of ZIKV symptoms, and the high proportion of asymptomatic infections [11, 19, 20]. Underreporting is particularly an issue for pathogens such as ZIKV where the majority of infections are asymptomatic or produce only mild symptoms [21, 22]. Estimates of ZIKV infections from blood donors in Puerto Rico through 2016 suggest that almost 470,000 people might have been infected in Puerto Rico alone [23]. High ZIKV seroprevalence estimates from several major cities with populations of more than one million—46% in Managua, Nicaragua [24], 63-68% in Salvador, Brazil [25, 26], and 64% in Recife, Brazil [27]—also suggest that the 806,928 suspected and confirmed cases reported by PAHO represent only a small fraction of the total number of ZIKV infections.

In this study, we take advantage of several features of the recent Zika epidemic that allow us to estimate national and subnational IARs throughout the Americas. First, due to the WHO emergency declaration, surveillance for Zika began in most countries relatively early in the epidemic, and all countries and territories in the Americas reported case data to PAHO. Second, in addition to reporting suspected and confirmed Zika cases in the entire population, many countries also reported Zika cases among pregnant women, CZS or microcephaly cases, and Guillan-Barré syndrome (GBS) cases, providing additional information about the underlying IAR. Third, a number of countries have published subnational Zika surveillance data, which increases the sample sizes for estimation purposes. The subnational data also allows us to capture spatial heterogeneity in ZIKV IAR within these countries. Using a hierarchical Bayesian model fit to multiple data types we estimated the subnational IARs and reporting probabilities for each data type. Estimated IAR and reporting probabilities were then used to extrapolate national-level case data from the rest of the American countries and territories to provide an estimate of the total number of ZIKV infections across this region.

Methods

Data

The cumulative numbers of suspected and confirmed Zika cases in each country and territory in the Americas, as well as the number of confirmed CZS cases, were reported by PAHO on a weekly basis through the first week of 2018 [3]. In addition, PAHO also published country reports following epidemiological week 35 in 2017 that contained additional information, including the total number of cases of microcephaly and Guillan-Barré syndrome (GBS) associated with Zika, where available [28]. Because ZIKV infection attack rates could vary significantly within a country, we restricted our primary analysis to countries and territories where we were able to obtain Zika data at a subnational level for at least one data type. In total, we were able to estimate subnational and national ZIKV IARs for 15 countries and territories (Additional details in S1 Appendix and S1 Table). The included countries were Mexico, all of the countries of Central America (Belize, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama), Bolivia, Brazil, Colombia, Ecuador, and Peru in South America, and the Domican Republic and Puerto Rico from the Caribbean. These 15 countries and territories have a combined population of 507.1 million out of a total population of 641.9 million in the Americas (excluding the United States and Canada) and therefore represent a signficant fraction of the population at risk.

The data types considered were confirmed Zika cases, suspected Zika cases, microcephaly cases associated with a ZIKV infection in the mother, and Zika-associated cases of GBS. In addition, due to the risk of CZS in newborns, many countries also reported the number of pregnant women with either a suspected or confirmed ZIKV infection, which we treated as seperate data points from the confirmed and suspected cases in the entire population due to the differences in the surveillance and reporting systems for these case types. Suspected Zika cases were defined by WHO/PAHO as a patient with a rash and two or more of the following symptoms: fever, conjunctivitis, arthralgia, myalgia, or peri-articular edema [29]. Reporting of Zika-associated microcephaly and GBS cases varied by country, with some reporting only cases associated with a lab-confirmed ZIKV infection and others both confirmed and suspected cases.

Where available, we obtained Zika data at the first administrative level (e.g., province or state) within a country or territory. Lower level data were aggregated to the first administrative level in cases where they were available. National and subnational population estimates were generated from WorldPop 2015 population rasters for each country or territory [30, 31]. The number of pregnant women in each country potentially at risk of a ZIKV infection was estimated using 2015 pregnancies rasters from WorldPop, and the number of births at risk for microcephaly were estimated using the WorldPop 2015 births rasters [32]. All of the data used in our analysis, along with model code, are located in the Github repository https://github.com/mooresea/Zika_IAR.

Model

We estimated the national and subnational Zika IAR in each country using a hierarchical Bayesian model with the number of total infections and symptomatic infections treated as latent variables (Fig 1). Short descriptions for each of the model variables and parameters are presented in S2 Table. The model was run separately for each country because a single model would have contained over one thousand parameters, making it computationally prohibitive to simultaneously estimate parameters across all countries. Therefore, posterior parameter estimates for each country are independent of the estimates from the other countries. The IAR in a population of size N_i in administrative unit i is the proportion of the population that was infected, , where is the number of infections. The number of symptomatic infections in a population, , depends on the size of the infected population, , and the symptomatic probability, , resulting in . The number of confirmed Zika cases in the entire population (T) of administrative unit i, C_T,i, depends on the number of symptomatic infections and the local reporting probability, , resulting in . The number of suspected Zika cases was similarly dependent on a reporting probability for suspected cases, , and the number of symptomatic infections in the total population, . Because misdiagnosis could contribute to the number of suspected cases during an epidemic, we also considered the possibility that there were more suspected cases than symptomatic infections by using a Poisson distribution rather than a binomial distribution to represent the reporting process for suspected cases. The results of this analysis are reported in the Supporting Information (S3 Appendix). The numbers of confirmed or suspected cases in pregnant women (P), C_P,i or S_P,i, were represented by binomial distributions dependent on the number of symptomatic infections in pregnant women. The infection attack rate and probability of an infection being symptomatic in pregnant women were assumed to be the same as that in the entire population. The probabilities that a symptomatic infection was reported as a suspected or confirmed Zika case, and , where x represents either the entire population (T) or pregnant women only (P), were assumed to differ between administrative units within a country or territory. To estimate this within-country variation in reporting probabilities, we assumed that the probability of a symptomatic infection being reported in administrative unit i followed a beta distribution with hyperparameters and such that , where Y denotes confirmed (C) or suspected (S).

Download:

Fig 1. Model schematic.

Model fitting was performed separately for each country or territory. The subscript i indicates administrative unit i within a modeled country. The subscript x represents either the total population (x = T) or pregnant women only (x = P). The top row represents the different data types (C = confirmed cases, S = suspected cases, G = Guillan-Barré syndrome cases, M = microcephaly cases). The second row includes the latent variables ( = Symptomatic infections, and = Infections). The third row includes the symptompatic probability (), the reporting probabilities for confirmed cases () and suspected cases(), the probability that a symptomatic infection leads to a reported GBS case (ρ_G), and the probability that a ZIKV infection in a pregnant woman leads to a reported microcephaly case (ρ_M). The parameters in the bottom row are the hyperparameters for the reporting probabilities and . See text and S2 Table for description of model parameters and variables.

https://doi.org/10.1371/journal.pntd.0008640.g001

The number of reported Zika-associated GBS cases was dependent on the number of symptomatic infections, , where ρ_G is the probability that a symptomatic infection results in a reported GBS case. The number of microcephaly cases associated with Zika was dependent on the total number of births in the population, B_i, and the IAR, such that , where ρ_M is the probability that an infection during pregnancy results in a reported microcephaly case.

Subnational IARs were estimated for each country and territory using available data types. The national-level IAR was calculated from the total number of subnational infections divided by the national population size, N, . For Puerto Rico, Zika IARs were estimated at the municipality-level (n = 78), due to availability of data at that scale. In addition, several datasets there were aggregated at the regional level (n = 8), due to their availability at that scale. For these data types, the regional-level IARs were estimated from the total number of infections within all municipalities in a given region.

Prior assumptions.

The IAR reported in previous Zika outbreaks has been as high as 73% on the Micronesian island of Yap [21]. The IARs of Aedes-transmitted viruses in larger geographical areas tend to be lower than in smaller island environments, such as Yap, because spatial heterogeneity in the presence and abundance of Aedes limits transmission potential within a portion of the region [33]. Studies from several different Zika outbreaks have estimated basic reproduction numbers (R₀) of 1.4 to 6.0 [34–38]. Based on the theoretical relationship between R₀ and the final epidemic size [39], these R₀ values would correspond to IARs of 0.286–0.833. However, IARs are typically lower at a given R₀ value in populations with heterogeneous contact patterns [40], as is typical with transmission by Ae. aegypti mosquitoes [41]. To lightly constrain our ZIKV IAR estimates without precluding the possibility of values anywhere between 0 and 1, we used a Beta(1, 2) prior for the probability of an individual being infected (i.e., the IAR). This prior distribution had a median value of 0.292 (95% range: 0.013–0.842). We also performed an analysis with a uniform prior for the IARs. A comparison of posterior IAR estimates with and without the Beta(1, 2) prior is included in the Supporting Information (S3 Appendix).

Estimates of the symptomatic probability for Zika, , have varied considerably across studies [21, 22, 42]. One recent study estimated the symptomatic probability for three different locations (Yap Island, French Polynesia, and Puerto Rico), taking into account assay senstivity and specificity, as well as the possibility of Zika-like symptoms due to other causes [22]. Median estimates from that study ranged from 27% in Yap to 50% in Puerto Rico. To generate a single prior distribution for in our model, we used the model and data provided in [22] to recreate the posterior estimates of from their analysis, and then fitted a beta distribution to the combined posteriors using the ‘fitdistrplus’ package in R [43]. The resulting distribution was Beta(3.88, 5.34), which has a median of 0.41 and a 95% range of 0.14–0.73, and was used as a prior for each country. The hyperparameters and for the reporting probabilities , where Y = C or S and x = T or P, were specified as Cauchy(0, 25) priors. This distribution provides a weakly informative prior with the distribution peaked at 0 and a long right tail. We assumed non-informative priors for the probabilities that a symptomatic infection results in a reported Guillan-Barré case or that an infected mother will give birth to a child with reported microcephaly. These probabilities represent not just the probability that an infection leads to a syndromic case, but also that such a case ends up being reported through the surveillance system.

Model implementation.

Each country or territory model was fitted using the ‘rstan’ version 2.18.2 package in R using a Hamiltonian Monte Carlo algorithm (an MCMC variant) [44]. For each country or territory, four chains of 5,000 iterations each were run with a burn-in interval of 2,500 iterations. Convergence was assessed using the Gelman-Rubin convergence diagnostic [45]. The full model for Peru failed to converge, so the reporting probabilities of confirmed cases () and confirmed cases in pregnant women () were estimated as single parameters for all administrative units, rather than drawing from hyperparameters α_C and β_C. We considered this simplification to be justifiable because all ZIKV confirmations were handled by either the national CNS laboratory or one of two regional laboratories, so confirmation rates likely did not vary as widely between administrative units as the reporting probabilities for suspected cases. Model diagnostic results for each country and territory are provided in S2 Appendix.

Model validation

Simulation study.

To assess the performance of our estimation method we used the median posterior estimates from Guatemala to simulate new case data for each of the different data types at the first administrative level. Three different simulated datasets were generated with the symptomatic probability () set at either the median posterior estimate from Guatemala (3.2%), 25%, or 50%. We then used our statistical model to estimate the IAR and other parameters either with or without a Beta(2, 1) prior on the IAR parameters.

Posterior predictive checks.

Posterior predictive checks were performed by comparing the empirical data to simulated data from the posterior parameter distributions. Posterior predictive data was generated at each iteration, k, of the MCMC, with for data type j ∈ {C_T, S_T, C_P, S_P, M, G} and its associated parameters θ_j. At each iteration, the observed national total, Y_j, was compared to the predicted national total, . This test statistic was used to calculate a Bayesian p-value , which indicates whether the distribution of the model-generated data was more extreme than the observed data [45]. We also compared the observed national totals for each data type to the predicted model output to determine whether the observed data (across all case types and territories) fell within the 95% credible interval (CrI) of the corresponding posterior distributions from the model.

Holdout analysis.

To determine the sensitivity of model estimates to the inclusion of different data types, we fitted the model while holding out one data type at a time. This analysis was restricted to countries where all data types were available at a subnational level (Guatemala and Dominican Republic) or at most one data type was only available at the national-level (Panama). For these countries, we also fitted the model to one data type at a time to assess the benefit of using multiple data types in the estimation process.

Seroprevalence estimates.

As an additional check, we compared modeled IAR estimates to published seroprevalence estimates from the Americas, as these quantities should be comparable. Seroprevalence estimates from at least one location were available for five countries or territories from ten published studies (S7 Table). Most of these studies used an NS1-based ELISA test for ZIKV IgG antibodies, and several confirmed all postive tests using plaque-reduction neutralization tests (PRNT) or flow cytometry neutralization tests (FRNT). For studies performing both ELISAs and PRNTs, we compared our estimates to the PRNT results. Two included studies did not test for ZIKV IgG: in Guayaquil, Ecuador Zambrano et al. [46] tested a control group of pregnant women for recent ZIKV infection via RT-PCR, and de Araújo [27] tested a control group of new mothers for ZIKV IgM with confirmation via PRNT in Recife, Brazil. The majority of these studies were conducted in a single city and not across a larger adminstrative area, in which case we compared the seroprevalence data to the IAR estimate from the first-level administrative unit where that city was located.

Applying the model to settings with no spatially disaggregated data

Estimates from the 15 country-specific models were used to make predictions about IARs in the 33 other countries and territories in Latin America and the Caribbean where subnational data was not available. The January, 4 2018 Zika report from PAHO included cumulative data from 52 countries and territories in the Americas [3]. Canada, Bermuda, and Chile reported no locally-acquired cases, and the United States reported only a few hundred locally-acquired cases in a limited geographic area, so these countries were excluded from the analysis. For the remaining 48 countries and territories (including the 15 modeled territories), the cumulative numbers of confirmed cases, suspected cases, and cases of microcephaly were used to estimate the national IAR. For each of the 48 countries and territories, a national IAR estimate was obtained by drawing from the posterior distributions of the different reporting parameters from each of the 15 country models. This allowed us to draw from across the full range of estimated reporting probabilities from these 15 countries and territories in predicting the IARs in the remaining countries and territories. For a given country or territory model, k, the probability of a given IAR value in country or territory j was derived from the joint probability for each of the different data types (C_j, S_j, and/or M_j) that were used to fit that model. The combined probability density function for IAR in country j was then taken as the sum of the probability density functions using the parameter estimates from all K = 15 models. As an alternative to drawing from all 15 modeled countries and territories to estimate infections in the non-modeled territories, we explored a second method where parameters were drawn only from a subset of the modeled countries that shared a border or similar characteristics (e.g., island nations) with the country or territory being estimated. Additional details on this estimation method are provided in S4 Appendix. These IAR estimates were used to calculate the total number of ZIKV infections that may have occurred during the epidemic. Initially, the infections arising in the 15 modeled territories were derived from country- or territory-specific model estimates and only the infections from the remaining 33 countries and territories were estimated from this pooled analysis. The territory-specific model estimates were compared to the pooled model estimates for each of the 15 modeled territories to assess the plausibility of the pooled estimates in the non-modeled countries and territories (S4 Appendix).

Results

Infection attack rate estimates

Estimated Zika infection attack rates at the national level ranged from 0.084 (95% CrI: 0.067–0.096) in Peru to 0.361 (95% CrI: 0.214–0.514) in Ecuador (Fig 2A, S4 Table). There was considerable heterogeneity in IARs (Fig 3). In the most populous country, Brazil, the IAR at the subnational level varied from 0.016 (95% CrI: 0.01-0.025) in the State of Paraná to 0.766 (95% CrI: 0.569-0.942) in the State of Sergipe (Fig 3). In the second most populous country, Mexico, the IAR ranged from 0 (95% CrI: 0- 0) in the Federal District to 0.793 (95% CrI: 0.524-0.963) in the State of Yucatán (S17 Fig). Subnational IAR estimates for all 15 modeled countries and territories are presented in S7–S21 Figs.

Download:

Fig 2. Posterior distributions of national ZIKV infection attack rate (IAR) and total ZIKV infections for 15 different countries and territories.

(A) ZIKV IAR for each modeled country or territory ordered by median IAR. (B) Estimated number of ZIKV infections for each country or territory ordered by median number of infections.

https://doi.org/10.1371/journal.pntd.0008640.g002

Download:

Fig 3. Posterior distribution of subnational ZIKV infection attack rates (IAR) for five different territories (Bolivia, Brazil, Ecuador, Nicaragua, and Puerto Rico).

Colored circles and whiskers are the median and 95% credible intervals for each administrative unit. Black circles with dashed lines are seroprevalence estimates from the literature (see S7 Table). The dashed lines are the 95% confidence intervals for the seroprevalence estimates assuming a binomial distribution with the exception of the 95% CI estimate from [17] for Bahia, Brazil which was taken directly from their analysis.

https://doi.org/10.1371/journal.pntd.0008640.g003

Mean estimated IARs in countries and territories where subnational data were not available ranged from a low of 0.003 (95% CrI: 0.000-0.013) in Uruguay to a high of 0.979 (95% CrI: 0.591-1.000) in Saint Martin and 0.979 (95% CrI: 0.616-1.000) in Saint Barthélemy (S4 Table, S22–S25 Figs). Summing these IAR estimates across all countries and territories in the Americas, there were a total of 132.3 million (95% CrI: 111.3-170.2 million) ZIKV infections across Latin America and the Caribbean (Table 1). The majority of these infections, 114.1 million (95% CrI: 99.39-128.4 million), were from the 15 modeled countries and territories, while the other 33 countries and territories accounted for an additional 16.16 million (95% CrI: 5.427-51.71 million) infections. There were an estimated 53.4 million (95% CrI: 40.8-64.9 million) ZIKV infections in Brazil and 25.6 million (95% CrI: 19.2-32.7 million) in Mexico (Fig 2B). Venezuela had the largest number of ZIKV infections (6.63 million; 95% CrI: 0.35-31.5 million) out of the countries that were not explicitly modeled (S4 Table). The projected number of infections in non-modeled countries was lower under the alternative method of estimation based on only a subset of country-specific parameter estimates (Table 1).

Download:

Table 1. Total infections in modeled and projected countries and territories under two different projection methods.

Default method used parameter estimates from all 15 modeled countries, while local method used only parameter estimates from neighboring countries and territories or those with similar characteristics.

https://doi.org/10.1371/journal.pntd.0008640.t001

Parameter estimates

The median symptomatic probability of a ZIKV infection across all countries was 0.1 (95% CrI: 0.02-0.53), which was lower than the prior estimate of 0.41 (95% CrI: 0.14–0.73). The estimated symptomatic probability ranged from 0.03 (95% CrI: 0.01-0.17) in Guatemala to 0.33 (95% CrI: 0.15-0.6) in Colombia (Fig 4A). This shift in the posterior estimate relative to the prior estimate of the symptomatic probability may be the result of identifiability issues between the symptomatic probability and the reporting probability parameters for the different data types. Misdiagnosis of symptomatic ZIKV infections as a different disease, such as dengue fever, could also lower the estimate of the symptomatic probability, although dengue virus infections misidentified as ZIKV infections would have the opposite effect.

Download:

Fig 4. Posterior parameter estimates.

(A) Posterior and prior symptomatic probability estimates for each country or territory. (B) Posterior estimates from each country and territory of the probability that a ZIKV infection in a pregnant woman results in a reported case of microcephaly. Dashed line represents range for estimated risk of Zika-associated microcephaly from published observational studies (see text for references). (C) Posterior estimates from each country and territory of the probability that a symptomatic infection results in a reported Guillan-Barré syndrome (GBS) case.

https://doi.org/10.1371/journal.pntd.0008640.g004

Reported rates of microcephaly in most countries were lower than recent estimates of the risk of microcephaly based on studies of ZIKV infection during pregnancy (Fig 4B). The probability that an infection in a pregnant woman resulted in a reported case of microcephaly ranged from a low of 0.07 per 1,000 infections (95% CrI: 0.01-0.19) in Nicaragua to 8.7 per 1,000 infections (95% CrI: 7.13-11.39) in Brazil. In comparison, estimates of the risk of microcephaly have range from 4.1 (95% CI: 3.4-4.9) per 1,000 [8] to 50 (95% CI: 40-70) per 1,000 births to ZIKV-infected mothers [47]. The probability that a symptomatic infection would result in a reported Guillan-Barré case varied from a low of 0.16 per 100,000 symptomatic infections (95% CrI: 0.04-0.7) in Ecuador to a high of 92.27 per 100,000 symptomatic infections (95% CrI: 55.72-113.6) in Puerto Rico (Fig 4C). The large variability between countries in reporting probabilities for microcephaly and GBS cases could be due to differences in case definitions among countries, differences in surveillance, or differences in underlying dengue immunity that may have impacted the severity of ZIKV infections due to some form of cross-reactive response [15–17].

The probability of a symptomatic infection being reported as a suspected or confirmed Zika case was higher for pregnant women than the general population for all countries where separate data on pregnant women were available, with the exception of suspected cases in El Salvador and Honduras (S26–S29 Figs). The countrywide reporting probability in pregnant women was as low as 0.002 (95% CrI: 0-0.012) for confirmed cases in El Salvador (S29 Fig, and as high as 0.289 (95% CrI: 0.041-0.989) for confirmed cases in Costa Rica and 0.276 (95% CrI: 0.056-0.727) for suspected cases in Dominican Republic (S28 Fig). As with the variability in reporting probabilities for GBS and microcephaly, this variability in confirming ZIKV infections in pregnant women indicates that there were considerable differences in surveillance and testing efforts among countries during the epidemic.

Countrywide reporting of suspected cases in the entire population ranged from 0.012 (95% CrI: 0.006-0.041) in Peru to 0.933 (95% CrI: 0.565-0.998) in Puerto Rico (S26 Fig). Countrywide reporting of confirmed cases ranged from 0.0004 (95% CrI: 0.0001-0.0023) in El Salvador to 0.504 (95% CrI: 0.305-0.54) in Puerto Rico (S27 Fig). The variation in reporting probabilities among administrative units within a country or territory was largest in Colombia for suspected cases (70.2% of variance was between administrative units vs. 29.8% due to within-unit variance), suspected cases in pregnant women (63.8%), and confirmed cases in pregnant women (67.7%). The largest between-administrative unit variance in reporting probabilities for confirmed cases in the total population occurred in Puerto Rico (56.6% of total variance). Several countries showed little variability in reporting probabilities among administrative units, with < 1% of total variance explained (S30–S33 Figs).

The posterior distribution of the national IAR was only weakly negatively correlated with the symptomatic probability posterior distribution in each country (S34 Fig). The IAR posterior distribution was not strongly correlated with the posterior distributions of any of the reporting probabilities, with the exception of the reporting probability for microcephaly cases, which is the only case type that we assumed could result from both asymptomatic and symptomatic infections (S35–S36 Figs). The posterior distributions of the other reporting rates were strongly correlated with the symptomatic probability posterior distribution (S37 Fig).