Spatial and temporal invasion dynamics of the 2014–2017 Zika and chikungunya epidemics in Colombia

Zika virus (ZIKV) and chikungunya virus (CHIKV) were recently introduced into the Americas resulting in significant disease burdens. Understanding their spatial and temporal dynamics at the subnational level is key to informing surveillance and preparedness for future epidemics. We analyzed anonymized line list data on approximately 105,000 Zika virus disease and 412,000 chikungunya fever suspected and laboratory-confirmed cases during the 2014–2017 epidemics. We first determined the week of invasion in each city. Out of 1,122, 288 cities met criteria for epidemic invasion by ZIKV and 338 cities by CHIKV. We analyzed risk factors for invasion using linear and logistic regression models. We also estimated that the geographic origin of both epidemics was located in Barranquilla, north Colombia. We assessed the spatial and temporal invasion dynamics of both viruses to analyze transmission between cities using a suite of (i) gravity models, (ii) Stouffer’s rank models, and (iii) radiation models with two types of distance metrics, geographic distance and travel time between cities. Invasion risk was best captured by a gravity model when accounting for geographic distance and intermediate levels of density dependence; Stouffer’s rank model with geographic distance performed similarly well. Although a few long-distance invasion events occurred at the beginning of the epidemics, an estimated distance power of 1.7 (95% CrI: 1.5–2.0) from the gravity models suggests that spatial spread was primarily driven by short-distance transmission. Similarities between the epidemics were highlighted by jointly fitted models, which were preferred over individual models when the transmission intensity was allowed to vary across arboviruses. However, ZIKV spread considerably faster than CHIKV.


SPECIFIC COMMENTS
[Lines 110-111] The authors estimated that the origin for both epidemics was Barranquilla. How does this compare to what would be obtained on the basis of line-list data alone? Are the first reported cases in Barranquilla? If so, how does the model provide information on the origin of the epidemic, beyond what could be obtained from the data alone?
[Lines 132-134] Which cities are the possible "sources" of the affected cities in each longdistance transmission event? How long are the distances, and how do these compare to the mean transmission distance?
[Lines 142-144] The models are fit to CHIKV and ZIKV separately. The authors note that both arboviruses are transmitted by the same vector and were introduced into an immunologically naïve population. Therefore, one might expect that the spatiotemporal patterns of spread would not be different across arboviruses. The authors should consider an additional analysis in which they fit the model jointly for CHIKV and ZIKV, assuming that the parameters are the same for both arboviruses. They also might consider variants of this, such as allowing the intensity parameter to vary across arboviruses. Comparison of all models could then be made on the basis of DIC.
[Lines 178-181] The authors note that cities invaded later by CHIKV were smaller in population. However, in lines 163-164, the authors contend that population size of the susceptible city appeared uncorrelated with invasion dynamics, and it seems that this claim was supported by the best model having fixed at zero? How do the authors reconcile these two points? The delta DIC of the two best models was 0.9, which is below the threshold of 5-10 commonly used to identify the best-performing model. Might the claim that the model with fixed at zero is the best be overly strong in light of this and the observation that cities invaded late with CHIKV had smaller population sizes?
[Lines 202-203] I could not find the results for the analysis of the choice of cut-off in the Supporting Information. If these results are central to the methodological choices made in the manuscript, they should be included in the Supporting Information at a minimum.
[Lines 204-207] When you fit to all Colombian cities, the model cannot reproduce the distribution of invasion times for CHIKV and ZIKV (Fig. S7). Why is this the case? How are non-invaded cities accounted for in the model likelihood? If not already doing so, the authors could calculate the probability that each non-invaded city escaped invasion from t = 0 to the end of the epidemic time period. I think addressing the cities that were not invaded is important and presenting the analysis in which you only considered cities that were invaded could be misleading. The analysis should be capable of explaining both the invasion and lack of invasion of CHIKV and ZIKV in the country.
[Lines 245-250] Given the delta DIC of the two best performing models, it does not appear that the authors can make a claim one way or the other about the effects of the susceptible population size on invasion.
[Lines 298-307] Another potential challenge is that at the start of the epidemics, many CHIKV cases may have been misdiagnosed as DENV and many ZIKV cases may have been misdiagnosed as DENV and CHIKV. This could affect your estimate of the invasion week, given that weeks in which there were zero reported CHIKV cases, for instance, could have had DENV cases that were misdiagnosed and were in fact CHIKV. It appears that the authors analysis is robust to uncertainty surrounding the invasion week, but this is certainly a potential issue that may be worth exploring.
[Lines 314-320] Could the authors account for background importation rate of ZIKV and CHIKV, rather than assume that there was only a single introduction? A similar approach was taken by Guzzetta et al. [Lines 362-366] I found the notation in these equations to be extremely confusing. Does ! represent the timing of invasion of city k? Similarly, is " the timing of invasion of city j? Please clarify in the text.
[Lines 369-371 & 374-375] The minimum distance is calculated from pairwise distances. That assumes that the distance is a direct path from two cities. However, the authors may consider calculating the shortest path from two cities on a graph where the weight of each edge is the distance metric considered. You could use the Floyd-Warshall Algorithm to get the all-pairs shortest paths. For geographical distance, the shortest path should likely just be the pairwise distance, but that may not be true for the travel times.
[Line 380] Did the authors consider other spatial models, such as the radiation model? If not, why?
[Lines 445-446] Convergence can be assessed using the Gelman-Rubin statistic, which provides a quantitative measure of convergence beyond visual inspection alone.
[Lines 453] The authors validate their estimates of the week of invasion by comparing the probability distribution obtained to the observed invasion weeks. However, no validation of the model parameters is done. The authors have a framework to simulate epidemics. They could therefore simulate synthetic data sets with the median parameter estimates that they obtained from the real data, and then re-run their analysis on this synthetic data set, confirming that they are able to re-obtain the fitted parameter estimates.