Co-circulation and misdiagnosis led to underestimation of the 2015–2017 Zika epidemic in the Americas

During the 2015–2017 Zika epidemic, dengue and chikungunya–two other viral diseases with the same vector as Zika–were also in circulation. Clinical presentation of these diseases can vary from person to person in terms of symptoms and severity, making it difficult to differentially diagnose them. Under these circumstances, it is possible that numerous cases of Zika could have been misdiagnosed as dengue or chikungunya, or vice versa. Given the importance of surveillance data for informing epidemiological analyses, our aim was to quantify the potential extent of misdiagnosis during this epidemic. Using basic principles of probability and empirical estimates of diagnostic sensitivity and specificity, we generated revised estimates of reported cases of Zika that accounted for the accuracy of diagnoses made on the basis of clinical presentation with or without laboratory confirmation. Applying this method to weekly reported case data from 43 countries throughout Latin America and the Caribbean, we estimated that 944,700 (95% CrI: 884,900–996,400) Zika cases occurred when assuming all confirmed cases were diagnosed using molecular methods versus 608,400 (95% CrI: 442,000–821,800) Zika cases that occurred when assuming all confirmed cases were diagnosed using serological methods. Our results imply that misdiagnosis was more common in countries with proportionally higher reported cases of dengue and chikungunya, such as Brazil. Given that Zika, dengue, and chikungunya appear likely to co-circulate in the Americas and elsewhere for years to come, our methodology has the potential to enhance the interpretation of passive surveillance data for these diseases going forward. Likewise, our methodology could also be used to help resolve transmission dynamics of other co-circulating diseases with similarities in symptomatology and potential for misdiagnosis.


Introduction
Consistent and correct diagnosis is important for the veracity of clinical data used in epidemiological analyses [1][2][3]. Diagnostic accuracy can depend strongly though on the uniqueness of a disease's symptomatology. On the one hand, diagnosis can be straightforward when there are clearly differentiable symptoms, such as the hallmark rash of varicella [4]. On the other hand, with symptoms that are common to many diseases, such as malaise, fever, and fatigue, it can be more difficult to ascertain a disease's etiology [5][6][7]. Further complicating clinical diagnosis is person-to-person variability in apparent symptoms and their severity [8,9]. In many cases, symptoms are self-assessed by the patient and communicated verbally to the clinician, introducing subjectivity and resulting in inconsistencies across different patients and clinicians [10,11].
When they are used, molecular and serological diagnostics are thought to greatly enhance the accuracy of a diagnosis, as they involve less subjectivity and can confirm that a given pathogen is present [12,13]. Even so, molecular and serological diagnostics (hereafter, laboratory diagnostics) do have limitations, particularly for epidemiological surveillance. As laboratory diagnostics are often not the standard protocol, an infected person first has to present with symptoms in a medical setting and the clinician has to decide to use a laboratory diagnostic. This is particularly unlikely to happen for emerging infectious diseases, as clinicians may not be aware of the pathogen or that it is in circulation [14]. In this context, laboratory diagnostics may also suffer from low sensitivity and specificity, high cost, or unavailability in settings with limited resources [12,15]. Additionally, serological diagnostics often suffer from cross-reactivity across related viruses, which can lead to uncertainty in identifying the disease-causing pathogen [16]. As a consequence of factors such as these, retrospective analyses of the 2003 SARS-CoV outbreak in China [17] and the 2020 SARS-CoV-2 epidemic in the United States [18] estimated that many more cases may have been clinically misdiagnosed than were known to surveillance systems.
Challenges associated with disease diagnosis are magnified in scenarios with co-circulating pathogens, particularly when the diseases that those pathogens cause are associated with similar symptoms [19,20]. Influenza and other respiratory pathogens, such as Streptococcus pneumoniae and respiratory syncytial virus (RSV), co-circulate during winter months in the Northern Hemisphere. The difficulty of correctly ascribing an etiology in this setting is so widely accepted that clinical cases caused by a variety of pathogens are often collated for surveillance purposes as "influenza-like illness" [21]. Similar issues occur in malaria-endemic regions [19,22]. One study in India found that only 5.7% of commonly diagnosed "malariainfected" individuals actually had this etiology, while 25% had dengue instead [19].
One set of pathogens with potential for misdiagnosis during co-circulation includes three viruses transmitted by Aedes aegypti and Ae. albopictus mosquitoes: dengue virus (DENV), chikungunya virus (CHIKV), and Zika virus (ZIKV). Some symptoms of the diseases they cause can facilitate differential diagnosis, such as joint swelling and muscle pain with CHIKV infection [23,24] and a unique rash with ZIKV infection [25,26]. Other symptoms, such as malaise and fever, could result from infection with any of these viruses [23][24][25][26][27][28]. In one region of Brazil with co-circulating DENV, CHIKV, and ZIKV, Braga et al. [28] empirically estimated the accuracy of several clinical case definitions of Zika by ground truthing clinical diagnoses against molecular diagnoses. They found that misdiagnosis based on clinical symptoms was common, with sensitivities (true-positive rate) and specificities (true-negative rate) as low as 0.286 and 0.014, respectively.
Although the estimates by Braga et al. [28] provide valuable information about misdiagnosis at the level of an individual patient, they do not address how these individual-level errors might have affected higher-level descriptions of Zika's epidemiology during its 2015-2017 epidemic in the Americas. The Pan American Health Organization (PAHO) reported 169,444 confirmed and 509,970 suspected cases of Zika across 43 countries between September, 2015 and July, 2017 [29]. Meanwhile, PAHO reported 675,476 and 2,339,149 confirmed and suspected cases of dengue and 180,825 and 499,479 confirmed and suspected cases of chikungunya, respectively, during the same timeframe in the same region [30,31]. The substantial errors in clinical diagnosis reported by Braga et al. [28], combined with the large number of cases lacking a molecular diagnosis [29][30][31], leave open the possibility that a considerable number of cases could have been misdiagnosed during the 2015-2017 Zika epidemic.
Our goal was to quantify the possible extent of misdiagnosis during the 2015-2017 Zika epidemic by leveraging passive surveillance data for dengue, chikungunya, and Zika from 43 countries in the Americas in conjunction with empirical estimates of sensitivity and specificity. Our methodology was flexible enough to use either or both of suspected and confirmed cases, given that their availability varied and they both offered information about reported cases of these diseases. To account for variability in diagnostic accuracy, we made use of joint probability distributions of sensitivity and specificity, one for clinical diagnostics and two for laboratory diagnostics, informed by empirical estimates. This feature of our analysis allowed for generalization beyond the six specific clinical diagnostic criteria quantified by Braga et al. [28]. Using this approach, we updated estimates of Zika reported cases during its 2015-2017 epidemic across the Americas.

Methods
To quantify the degree of misdiagnosis during the Zika epidemic, we leveraged passive surveillance data on Zika, dengue, and chikungunya for 43 countries in the Americas and formulated a Bayesian model of the passive surveillance observational process. Our observation model was informed by the observed proportion of Zika and empirically estimated misdiagnosis rates (Fig 1). We used the model to generate revised estimates of the number of Zika cases that occurred during the 2015-2017 Zika epidemic across the Americas (Fig 1).

Data
We used suspected and confirmed case data for dengue, chikungunya, and Zika from PAHO for 43 countries in the Americas (full time series data available from github.com/roidtman/ zika_misdiagnosis). We differentiated between confirmed and suspected cases on the basis of laboratory diagnosis versus clinical diagnosis as specified by the World Health Organization (WHO) [32]. A suspected case was defined as a person presenting with rash and/or fever and either arthralgia, arthritis, or conjunctivitis [32]. A confirmed case was defined as a person with laboratory confirmation of ZIKV infection due to "presence of ZIKV RNA or antigen in serum or other samples" (i.e., "molecular diagnosis"), or "IgM antibody against ZIKV positive and PRNT 90 for ZIKV" (i.e., "serological diagnosis") [32]. Given the lack of information regarding when and where molecular diagnosis versus serological diagnosis occurred, we considered these two diagnostic scenarios as equally likely and assumed they represented either end of the spectrum, with reality falling somewhere in between. Therefore, we assumed that confirmed cases were all identified using either reverse transcription-polymerase chain reaction (RT-PCR) to detect the presence of ZIKV RNA or IgM assays against ZIKV. Given the cost and logistical complexity of implementing PRNT 90 on a large scale [33], we assumed that PRNT 90 would not have been used to an extent that it would meaningfully influence the accuracy of reported case data on a country level.
For confirmed and suspected cases of chikungunya, we used manual extraction and text parsing algorithms in Perl to automatically extract data from epidemiological week (EW) 42 of 2013 through EW 51 of 2017 [31]. For confirmed and suspected cases of Zika, we used the skimage [34] and numpy [35] packages in Python 3.6 to automatically extract reported case data from epidemic curves for each country from PAHO from EW 39 of 2015 to EW 32 of 2017 [29]. For confirmed and suspected cases of dengue, we downloaded weekly case data available from PAHO from EW 42 week of 2013 to EW 51 of 2017 [30]. We restricted analyses to EW 42 of 2015 (the beginning of the fourth quarter of 2015) to EW 32 of 2017 (the last week with Zika data in our dataset) (Fig 2). Although transmission of all three of these pathogens continued after then, we restricted our analysis to this time frame because it spanned the entirety of the World Health Organization's Public Health Emergency of International Concern (PHEIC) in addition to weeks prior to then and a nearly 40-week period after the PHEIC ended. Given the variability in week to week reporting of dengue, chikungunya, and Zika, we aggregated weekly data to a monthly time scale.

Probabilistic estimates of sensitivity and specificity
Due to variability in the sensitivity and specificity of different diagnostic criteria, we treated se and sp as jointly distributed random variables informed by empirical estimates (S1 Fig). To describe variability in misdiagnosis for molecular diagnostic criteria (i.e., RT-PCR), we included two empirical estimates of molecular sensitivity and specificity that were established with ZIKV RT-PCR on a panel of samples with known RNA status for ZIKV, DENV, CHIKV, or yellow fever virus [36] (Table 1). To describe variability in misdiagnosis for serological diagnostic criteria (i.e., IgM assays), we included 21 empirical estimates of serological sensitivity and specificity that were established with various ZIKV IgM immunoassays on panels of samples with known status for ZIKV, DENV, or CHIKV [37][38][39][40][41][42][43] (Table 1). To describe variability in misdiagnosis for clinical diagnostic criteria, we included six empirical estimates of sensitivity and specificity that were measured in a region of Brazil with co-circulating ZIKV, DENV, and CHIKV [28] (Table 1). These empirical estimates of sensitivity and specificity were PLOS NEGLECTED TROPICAL DISEASES derived by clinically diagnosing a patient with Zika, dengue, or chikungunya based on different clinical case definitions, and then ground truthing against the case's etiology determined by RT-PCR [28]. We used the sample mean, μ, and sample variance-covariance matrix, S, for the molecular and clinical misdiagnosis rates as the mean and covariance in two independent, multivariate normal distributions, such that for each of the molecular, serological, and clinical diagnostic distributions. Our analysis involved some simplifying assumptions about the representativeness of the classification accuracies of different diagnostics. First, clinical misdiagnosis rates were estimated using a cross-sectional study to quantify the diagnostic performance of clinical case definitions proposed for suspected Zika cases [28]. Although this study was conducted in only one setting (Rio de Janeiro, Brazil), the sensitivities and specificities span six different case definitions from 2015 and 2016. Given the lack of additional studies of this nature in other settings, we extended a distributional description of variation in sensitivity and specificity across Serological-Novatec IgM 0.65 0.54 [39] Serological-Inbios IgM (ZIKV Detect) 1.0 0.74 [39] Serological-MAC-ELISA with ZIKV PRNT positive 1.0 0.11 [41] Serological-MAC-ELISA with ZIKV RT-PCR positive 0.14 1.0 [41] Serological-MAC-ELISA with both sample types 0.82 0.72 [41] Serological-Diasorin Liaison with ZIKV PRNT positive 0.85 0.56 [41] Serological-Diasorin Liaison with ZIKV RT-PCR positive 0.29 1.0 [41] Serological-Diasorin Liaison with both sample types 0.74 0.86 [41] Serological-Zika Virus IgG/IgM Antibody Rapid Test 0.714 0.233 [42] Clinical-PAHO-2015 0.813 0.109 [28] Clinical-CDC-2016 1.0 0.014 [28] Clinical-PAHO-2016 0.583 0.519 [28] Clinical-ECDC-2016 0.809 0.580 [28] Clinical-WHO-2016 0.756 0.635 [28] Clinical-Brasil(MoH)-2016 0.286 0.973 [28] https://doi.org/10.1371/journal.pntd.0009208.t001

PLOS NEGLECTED TROPICAL DISEASES
these six case definitions to elsewhere in the Americas for the full period of our analysis. Second, estimates of molecular, serological, and clinical sensitivities and specificities were all in reference to ZIKV only. We did not have specific information regarding differences in sensitivity and specificity depending on if the etiology of a case was DENV versus CHIKV. Therefore, to use those empirical, ZIKV-specific misdiagnosis rates, we combined chikungunya and dengue cases together to represent reported cases that were not attributed to Zika. In this way, we classified cases as either Zika or chikungunya/dengue.

Probabilistic estimates of the proportion of Zika
Our analysis made use of the proportion of cases that were diagnosed as confirmed or suspected Zika,p Z;c andp Z;s , where c and s refer to confirmed and suspected cases and the hat notation refers to observed data. Rather than using the point estimate forp Z;c orp Z;s , however, we worked with Bayesian posterior estimates of these variables obtained directly from reported Zika cases,Ĉ Z;c andĈ Z;s , and reported dengue and chikungunya cases,Ĉ O;c andĈ O;s , as defined by the beta-binomial conjugate relationship [44]. This assumed that the number of reported cases of Zika was a binomial draw from the total number of reported cases of these three diseases combined, with a beta-distributed probability of success,p Z;c orp Z;s . We assumed uninformative priors onp Z;c andp Z;s ; i.e.,p Z;c � betað1; 1Þ andp Z;s � betað1; 1Þ. Therefore,

Observation model of misdiagnosis
We considered the variables p Z,c and p Z,s to be intermediate steps towards calculation of the variable that we ultimately sought to estimate, p Z . To calculate this final estimate of the proportion of reported cases resulting from ZIKV infection among reported cases of all three diseases, we mathematically relatedp Z to p Z using diagnostic sensitivity and specificity, such that We then rearranged Eq 4 to solve for From Eq 5, we determined two constraints for how se, sp, andp Z can relate to one another. The first wasp Z � se, which follows from 0�p Z �1, or 0 �p Z À 1þsp spÀ 1þse � 1, and then simplifying the inequality. The second was se+sp6 ¼1, as this would lead to zero in the denominator of Eq 5. These constraints (Eqs 4 and 5) and subsequent constraints were applied independently to confirmed and suspected cases (see S2 Fig for an example of these constraints applied at different points of the epidemic).
Next, we used samples of p Z,c and p Z,s estimated from Eq 5 to define a single distribution of p Z . As estimates of p Z,c and p Z,s were between 0 and 1, we approximated beta distributions for each using the fitdistr function in the MASS package in R [45] fitted to posterior samples of p Z, c and p Z,s . We then defined the probability of a given value of p Z as where X ranges from 0 to 1. We then multiplied random draws of p Z from the distribution of p Z defined by Eq (6)

Applying the observation model
To apply our observation model of misdiagnosis to empirical data, we first drew 1,000 samples from the beta distributions ofp Z;c andp Z;s and 1,000 samples from the multivariate normal distributions describing sensitivities and specificities of molecular and clinical diagnostics. We applied our observation model to one baseline scenario and three alternative scenarios with different spatial and temporal aggregations to assess the sensitivity of our results to different ways of aggregating reported case data: country-specific temporal data (baseline scenario, 4,214 data points); country-specific cumulative data (alternative scenario, 43 data points); region-wide temporal data (alternative scenario, 98 data points); and region-wide cumulative data (alternative scenario, 1 data point). Given the PAHO data was made available to the public in a country-specific, temporal manner, we considered this as the baseline scenario. Region-wide aggregation indicates that all countries were aggregated into one spatial unit, while cumulative reported case data indicates that all time points were aggregated into one time unit. Under each of these scenarios, we quantified posterior distributions of p Z , drew 1,000 Monte Carlo samples of p Z , and obtained distributions of C Z and C O .

Illustrative example
We constructed a simple example with two generic diseases, A and B, to illustrate the relationship between reported cases and revised cases under different misdiagnosis scenarios. For these generic diseases, we varied the total cases of A and B such that the proportion of cases diagnosed as A,p A , varied from high to low. We used combinations of sensitivity and specificity that spanned all combinations of low, intermediate, and high misdiagnosis scenarios. Using the same methods applied to reallocate Zika, dengue, and chikungunya cases, we revised estimates of reported cases of disease A in light of misdiagnosis with disease B. Reported cases of disease A were not revised when sensitivity and specificity were both low (Fig 3, bottom left), which was due in some cases to the constraint ofp A � se not being met and in other cases to the sum of sensitivity and specificity equaling 1. Whenp A was high (Fig  3, pink lines), revised cases of A were similar to observed cases of A, as only high sensitivities were possible across a range of specificities (Fig 3, top row). With high sensitivities (Fig 3, top  row), misdiagnosis only occurred with B misdiagnosed as A. Whenp A was low (Fig 3, purple  lines), revised cases of A were higher than observed cases, as only high specificities were possible across a range of sensitivities (Fig 3, right column). With high specificities (Fig 3, left column), misdiagnosis only occurred with A misdiagnosed as B. Whenp A was intermediate (Fig  3, green lines), misdiagnosis occurred both ways, as a range of sensitivity and specificity values were possible. This resulted in scenarios in which revised reported cases of A were higher or lower than the observed cases.

Misdiagnosis through time
We estimated the revised proportion of reported cases of Zika among reported cases of all three diseases at each time point for each country. Under the molecular diagnostic scenario, we estimated that there were 74,200 (95% CrI: 35,400-109,500) disease episodes caused by ZIKV that were misdiagnosed as dengue or chikungunya cases in the fourth quarter of 2015, prior to the start of reporting of Zika in most countries (Fig 4). This resulted fromp Z being low early in the epidemic. Similar trends, albeit to a lesser extent, were observed in the serological diagnostic scenario, with approximately 3,100 (-39,100-67,100) disease episodes caused by ZIKV that were misdiagnosed as dengue or chikungunya in the fourth quarter of 2015 (Fig 4). The gray line is the 1:1 line, which separates when revised cases of disease A are higher than reported (above) and when revised cases of disease A are lower than reported (below). Plots with no lines indicate that a constraint was broken (se+sp6 ¼1 orp A � se). Lines only span portions of the x-axis under which revised cases of disease A is positive. https://doi.org/10.1371/journal.pntd.0009208.g003

PLOS NEGLECTED TROPICAL DISEASES
As reported Zika cases increased and peaked in 2016, the intensity of misdiagnosis increased (Fig 4), but the direction of misdiagnosis (i.e., whether there were more Zika cases incorrectly diagnosed as dengue or chikungunya, or vice versa) differed by country, depending on how muchp Z increased, and by laboratory testing method. The differences between the molecular and serological diagnostic scenarios were most notable at the peak of the epidemic, reflecting the much lower sensitivities and specificities associated with serological diagnostics (i.e., more opportunity for misdiagnosis) as compared to molecular diagnostics.

Revising cumulative estimates of the epidemic
We aggregated revised Zika cases to estimate the cumulative size of the epidemic and to compare our estimate to that based on surveillance reports. Comparing revised Zika cases across countries in the Americas, results were generally similar for the molecular and serological diagnostic scenarios, with higher degrees of uncertainty in the serological diagnostic scenario relative to the molecular diagnostic scenario (S1 Table). Differences across laboratory Estimates of revised Zika cases after accounting for misdiagnosis with dengue and chikungunya. Top: Violin plots of the number of Zika cases that were misdiagnosed as chikungunya or dengue cases on a country-level, assuming confirmed cases arose from PCR-RT or IgM tests only, that were then aggregated across the region for visualization. Estimates above zero indicate that there were more Zika cases than observed and estimates below zero (gray region) indicate there were fewer Zika cases than observed. Bottom: Reported Zika, dengue, and chikungunya cases alongside revised estimates of Zika cases and associated uncertainty. Purple band is 95% CrI and green line is median estimate for the PCR-RT-only scenario and gray band with lavender line are estimates for the IgM-only scenario.
https://doi.org/10.1371/journal.pntd.0009208.g004 PLOS NEGLECTED TROPICAL DISEASES diagnostic scenarios in a few countries with high reported cases, such as Brazil and Nicaragua, led to differences in conclusions regarding the final size of the epidemic, with the molecular diagnostic scenario suggesting that the Zika epidemic was larger and the serological diagnostic scenario suggesting the Zika epidemic was smaller than presented in passive surveillance data alone.
Generally, in countries and territories with relatively high reported cases of Zika (p Z close to 1), such as Suriname and the U.S. Virgin Islands, our revised estimates of p Z closely matched p Z (Fig 5, bottom). In countries with relatively low reported cases of Zika (p Z close to 0), such as Mexico and Belize, our revised estimates of p Z were higher thanp Z (Fig 5, bottom). In those countries that reported no Zika cases (i.e.,p Z ¼ 0), such as Bermuda and Chile, our estimates of p Z were much more uncertain (Fig 5, bottom).
According to the PAHO reports that we used, the Zika epidemic totaled 679,414 confirmed and suspected cases throughout 43 countries in the Americas. When we accounted for misdiagnosis among Zika, dengue, and chikungunya, we estimated that the Zika epidemic totaled 944,700 (95% CrI: 884,900-996,400) cases across the Americas under the molecular diagnostic scenario. Under the serological diagnostic scenario, we estimated that the Zika epidemic totaled 608,400 (95% CrI: 442,000-821,800).

Estimates of epidemic size using different aggregations of data
We applied our observation model of misdiagnosis to a baseline scenario, with country-wide and temporal reported case data, and to three alternative scenarios with different temporal and

PLOS NEGLECTED TROPICAL DISEASES
spatial aggregations of the PAHO data. These alternative scenarios included temporal reported case data for the region as a whole (S7 and S8 Figs), cumulative reported case data for each country (S9 and S10 Figs), and cumulative reported case data for the region as a whole ( Table 2). When using temporal case data for the region as a whole, our estimate of the overall size of the Zika epidemic was 1,039,600 (95% CrI: 984,700-1,103,000) for the molecular diagnostic scenario and 880,300 (95% CrI: 603,900-1,177,400) for the serological diagnostic scenario. Under this spatially aggregated scenario, the majority of misdiagnosis occurred during the height of the epidemic (S3 and S4 Figs). In our analysis using cumulative case data for each country, our estimate of the overall size of the epidemic was 844,600 (95% CrI: 724,400-957,300) for the molecular diagnostic scenario and 535,100 (95% CrI: 283,400-1,078,700) for the serological diagnostic scenario, with country-specific estimates of p Z not well-aligned witĥ p Z (S5 and S6 Figs). When using cumulative reported cases for the region as a whole, our estimate of the overall size of the Zika epidemic was 227,600 (95% CrI: 135,800-319,700) for the molecular diagnostic scenario and 464,900 (95% CrI: 19,100-1,792,800) for the serological diagnostic scenario.

Discussion
We leveraged empirical estimates of sensitivity and specificity for both clinical and laboratory diagnostics to revise estimates of the 2015-2017 Zika epidemic in 43 countries across the Americas. We applied our methods to data from PAHO, under the molecular diagnostic scenario found that more than 250,000 disease episodes diagnosed as chikungunya or dengue from September 2015 through July 2017 may have been caused by ZIKV instead. Our revised estimates of the Zika epidemic under the molecular diagnostic scenario suggest that the epidemic was nearly 40% larger than case report data alone would suggest. Additionally, under both laboratory diagnostic scenarios our estimates show that many of these instances of misdiagnosis occurred in 2015, prior to many countries reporting Zika cases to PAHO [29]. An illustrative example of our method showed that these results were driven by the relative numbers of reported cases of Zika and the two other diseases. Hence, differences in our results over time, across countries, and with respect to level of data aggregation resulted from differences in relative numbers of reported cases of Zika and these other diseases across the different ways of viewing the data that we considered. Even so, all of our estimates substantially underestimate the true number of ZIKV infections that likely occurred given that our methods do not account for unreported infections [46].
Although we considered two scenarios for laboratory testing (i.e., molecular vs. serological), we believe the molecular diagnostic scenario to be the more representative scenario. First, Table 2. Revised estimates of cumulative Zika cases misdiagnosed as dengue or chikungunya across the Americas using different spatial and temporal aggregations of reported case data. Positive numbers indicate some portion of cumulative dengue and chikungunya cases were of Zika etiology, while negative numbers indicate some portion of cumulative Zika cases were of dengue or chikungunya etiology. PLOS NEGLECTED TROPICAL DISEASES molecular diagnostics were available much earlier in the Zika epidemic than were their serological counterparts [33]. Second, in the serological diagnostic scenario we considered, we used sensitivities and specificities of ZIKV IgM assays alone, even though the laboratory diagnosis recommendation with serological testing includes an additional step using PRNT90. When using IgM assays with this additional confirmation of ZIKV using PRNT90, the accuracy of the serological tests would have been higher [47]. Therefore, it is likely that the serological diagnostic scenario, where IgM assays and PRNT 90 for ZIKV were used, would have had higher levels of sensitivity and specificity, more similar to the molecular diagnostic scenario using PCR-RT to detect ZIKV RNA. Throughout the remainder of the discussion, we focus more specifically on the molecular diagnostic scenario.

PRC-RT IgM
Some countries appeared to have more Zika cases than surveillance data alone suggest, such as Brazil and Bolivia, while others appeared to have fewer reported cases of Zika than surveillance data alone suggest, such as Venezuela and Jamaica. In Brazil and Bolivia, our countryspecific cumulative estimates of the Zika epidemic were 22% and 58% larger than reported case totals, respectively. In Venezuela and Jamaica, our country-specific estimates of the Zika epidemic were 77% and 40% smaller than case report totals, respectively. These differences across countries can be explained by differences in the proportions of suspected Zika cases, p Z;s , through time. In Brazil and Bolivia,p Z;s was less than 0.2 at nearly every time point, whereas it mostly ranged 0.2-0.8 in Venezuela and Jamaica. Whenp Z;s was low, as in Brazil and Bolivia, the constraint that se �p Z allowed sensitivities to span a larger range, including lower sensitivities that would have resulted in the inference that more cases diagnosed as dengue or chikungunya were caused by ZIKV. Whenp Z;s was moderate to high, as in Venezuela and Jamaica, the constraint that se �p Z limited sensitivities to higher values, resulting in the inference that fewer cases diagnosed as dengue or chikungunya were caused by ZIKV. Similarly, because of a trade-off between sensitivity and specificity for clinical diagnoses, these constraints on sensitivity also imposed constraints on specificity.
We considered alternative spatial and temporal aggregations of reported case data to assess the sensitivity of our methods and results. We found that using different aggregations of data led to different conclusions in multiple respects. Using cumulative data for the region as a whole led to the inference that the Zika epidemic was smaller than suggested by surveillance data, whereas using cumulative data at a country level led to the inference that the epidemic was larger than suggested by surveillance data, but with variation across countries. Using temporally explicit data led to the inference that the epidemic was even larger, regardless of whether data was aggregated at a country or regional level. Overall, these similarities and differences suggest greater consistency temporally than spatially in the relative numbers of reported cases of Zika, chikungunya, and dengue across countries. At least in the case of an emerging disease such as Zika, this suggests that it may be most important to prioritize temporal data when inferring patterns of misdiagnosis. With respect to the timing of inferred misdiagnoses, there were more visible differences between scenarios in which temporal data were aggregated at a country or regional level. When temporal data were aggregated at a country level, we inferred that the majority of misdiagnosis occurred prior to 2016. When temporal data were aggregated at the regional level, we inferred that the majority of misdiagnosis occurred during the epidemic in 2016.
Our observation model incorporated basic features of how passive surveillance data for diseases caused by multiple, co-circulating pathogens are generated, including the potential for misdiagnosis and differences in misdiagnosis rates by data type. With respect to other features of how data such as these are generated, there were some limitations of our approach. First, we aggregated chikungunya and dengue case data, meaning that we were unable to explore the potential for differences in the extent to which misdiagnosis occurred between each of these diseases and Zika. As there is cross-reactivity between Zika and dengue [48], but not Zika and chikungunya, when using serological diagnostic tests, there may have been more misdiagnosis between Zika and dengue compared to Zika and chikungunya, particularly in the serological diagnostic scenario. If additional studies resolve differences in diagnostic sensitivity and specificity of Zika compared to each of these diseases separately, our observation model could be extended to account for this. Second, our observation model relied on a limited, static set of empirical estimates of diagnostic sensitivity and specificity. Given that laboratory diagnosis was not immediately available to identify ZIKV infection and case definitions for clinical diagnosis evolved through time [28], our results could be an underestimation of the full extent of misdiagnosis that occurred throughout the epidemic, particularly early in the epidemic. Similarly, as the balance of diagnostics in use could vary in space or time, as could their sensitivities and specificities [49][50][51], incorporating more detailed information about diagnostic use and characteristics could improve future estimates using our observation model. Third, given confirmed Zika cases could have been diagnosed using one of two laboratory diagnostic tools (i.e., PCR-RT or IgM assays with PRNT 90 ), we do not have specific information across the Americas about where and when molecular versus serological approaches were used. Furthermore, with the IgM-only testing scenario, we could account not for serological cross-reactivity [52] between ZIKV and DENV. We were only able to consider the two extremes of these different laboratory diagnostic scenarios (i.e., only PCR-RT or only IgM assays), while the reality of the situation likely falls somewhere in between.
Although passive surveillance data has been central for understanding many aspects of the 2015-2017 Zika epidemic, our finding that there may have been on the order of 40% more reported cases of Zika than described in PAHO case reports underscores the need to consider the observation process through which passive surveillance data is collated. Here, we accounted for misdiagnosis in the observation process to revise estimates of the passive surveillance data on which numerous analyses depend [53][54][55]. The advancements made here contribute to our understanding of which pathogen may be circulating at a given time and place. By better accounting for the etiology of reported cases, it could become more feasible to implement pathogen-specific response measures, such as proactively testing pregnant women for ZIKV during a Zika epidemic [56,57]. Given the potential for synchronized epidemics of these and other co-circulating pathogens in the future [53,58], continuing to develop methods that disentangle which pathogen is circulating at a given time will be important in future epidemiological analyses based on passive surveillance data. Lastly, adding temporally and spatially detailed information about the deployment of different diagnostic strategies will help to refine analyses like these in the future.
Supporting information S1  Top: Violin plots of the number of Zika cases that were misdiagnosed as chikungunya or dengue cases. Estimates above zero indicate there were more Zika cases than perceived and estimates below zero (gray region) indicate there were fewer Zika cases than perceived. Bottom: Reported Zika and dengue and chikungunya cases alongside revised estimates of Zika cases with associated uncertainty. Purple band is 95% CrI and green line is median estimate. (TIFF) S4 Fig. Estimates of Zika cases after accounting for misdiagnosis using spatially aggregated data, assuming confirmed cases arose from IgM tests only. Top: Violin plots of the number of Zika cases that were misdiagnosed as chikungunya or dengue cases. Estimates above zero indicate there were more Zika cases than perceived and estimates below zero (gray region) indicate there were fewer Zika cases than perceived. Bottom: Reported Zika and dengue and chikungunya cases alongside revised estimates of Zika cases with associated uncertainty. Purple band is 95% CrI and green line is median estimate. (PDF)