Correlation of the basic reproduction number (R0) and eco-environmental variables in Colombian municipalities with chikungunya outbreaks during 2014-2016

Chikungunya virus (CHIKV) emerged in Colombia in 2014 into a population presumed fully susceptible. This resulted in a quick and intense spread across Colombia, resulting in an epidemic that affected an estimated 450,000 people. The reported Colombian cases accounted for over 49% of all the cases reported to the PAHO. Eco-environmental factors are known to be associated with the spread of arboviruses such as CHIKV, and likely contribute to the differences in transmission profiles that were observed across several municipalities. To determine the association of eco-environmental factors and CHIKV, the basic reproduction number (R0) in 85 municipalities, which accounted for 65.6% of reported CHIKV cases in Colombia, was estimated. Estimates of R0 ranged from 1 to 9, where over 76% of municipalities had R0 values between 1 and 2. When we looked at the distribution of R0, the cumulative proportions were 20% with R0>2, 14% with R0>3, and 9% with R0>4. Next, we determined that there were different patterns of correlation between environmental and/or ecological variables and R0 when we considered different R0 lower-thresholds. Broadly, we found that temperature-related variables are significantly and positively correlated to R0 regardless of the lower threshold, while other variables like duration of outbreak and size of the urban area are inversely related to R0. Specifically, we conclude that high values of temperature-related variables where R0 > 1 will result in a fast growth of cases in a shorter time period (with faster cessation of outbreak transmission) but will result overall in a fewer total cases compared to outbreak areas (R0 > 1, but classified as lower). Thus, in the absence of vector control, a less explosive outbreak may be more advantageous for the virus in terms of transmission.

Introduction Chikungunya (CHIKV) is an arbovirus that causes a febrile illness, and belongs to the family Togaviridae and genus Alphavirus. CHIKV is transmitted by vectors belonging to genus Aedes, primarily Ae. aegypti and Ae. albopictus [1]. CHIKV had been limited to Asia and Africa until 2007 when the virus reached Europe, causing sporadic outbreaks, and finally the Americas in 2013, where it quickly caused over 1 million cases [2]. CHIKV emerged in Colombia in 2014, where the population was generally naïve. The subsequent Colombian epidemic of CHIKV accounts for approximately 49.15% of cases in Latin America as reported by the Pan American Health Organization (PAHO) [3].
One of the metrics used to describe the transmission dynamics of infectious diseases is the basic reproduction number (R 0 ), which is defined as the average number of secondary cases generated by a primary case introduced into a fully susceptible population [4,5]. It has been suggested for vector-borne pathogens that R 0 is the expected number of people that will be infected by one person initially infected by a vector [6,7]. It is theorized that an epidemic occurs when R 0 is greater than or equal to 1, although an outbreak is possible even with R 0 lower than 1 [7].
The generation time, which is the time between the infection of a primary case and one of its secondary cases, is crucial to estimate of R 0 but this value is often unknown [8]. Alternatively, a parameter called the serial interval has been described and is defined as the elapsed time between the onset of the symptoms of the primary and the secondary cases. This interval is more consistently observable in settings where clinical manifestation leads to case detection, but the serial interval is not readily extracted from epidemiological data [8]. Alternatively, some studies of arboviruses have assumed that the serial interval is the sum of estimates of the extrinsic incubation period of a virus (within the mosquito) and the intrinsic incubation periods of a virus (within the vertebrate host) [9,10]. This is appropriate, as it takes into account the properties of virus transmission both within the vector as well as the vertebrate and is the definition for serial interval we use herein. However, it is important to acknowledge that while the intrinsic incubation period for CHIKV has been estimated to be approximately 3 days, there are several estimates of the extrinsic incubation period, which depend on other extrinsic factors, most notably temperature, leading to a range of possible serial intervals [11][12][13][14][15].
While R 0 describes the epidemic potential of a pathogen, it can also be used to investigate modulation of this potential by external factors. For example, temperature is a known effector of mosquito-borne diseases, particularly as it relates to the vector-virus interaction [16][17][18][19]. Since arboviral transmission is affected by environmental and ecological variables, we wanted to determine whether any pattern could be discerned from administrative level incidence data in Colombia. Accordingly, we estimated R 0 of CHIKV from the Colombian National Institute of Health (INS) and investigated the correlation with many potentially influential eco-environmental variables to determine if and how they influence the CHIKV spread and incidence in an outbreak scenario. Further, given the differences in transmission dynamics across the municipalities, we investigated the discrepancy between the magnitude of R 0 and the number of cases.

Estimation of R 0 and variables of interest
R 0 was estimated using the White-Pagano methodology, which was shown to be consistent with other methodologies [9,20]. However, there are advantages to the White-Pagano method when estimates must be made on administrative incidence data. An important advantage of White-Pagano method is that serial interval can be directly approximated from epidemiological data. Given that the White-Pagano methodology allows for the estimation of R 0 without a priori observation of the exact serial interval, it is suitable for the estimation of R 0 for different municipalities with different arboviruses transmission conditions, such as temperature profiles, where the serial interval likely varies. However, the WP method does require an estimated range of possible serial intervals. Herein we assumed a possible serial interval of 1 or 2 weeks so that the unit of time matched the incidence data (as we do not know the distribution of cases in days within each epidemiological week), which allows for some range in estimation of CHIKV serial interval that could be related to variability in EIP [21][22][23][24]. The likelihood-based method used for the estimation of the R 0 and the serial interval (time between the onset of symptoms of first case and the onset of symptoms of the secondary case [8]) is as follows: where m t ¼ R 0 P minðk;tÞ j¼1 N t À jP j , N t is the incidence data; P j is the serial interval with j = 1, . . ., k; and t is the time of epidemiological data. Using this methodology, it is possible to estimate both R 0 and P j by using maximum likelihood techniques [20]. Thus, the White-Pagano methodology allows for estimation of the R 0 even when the serial interval is not known a priori, which makes this methodology suitable for the estimation of R 0 for arboviruses in different municipalities, i.e. with different temperatures which can modify the length of the extrinsic incubation period within the mosquito vector. However, this methodology is sensitive to the total number of cases in the system. Thus, we selected the Colombian municipalities with the highest number of cases in order to confidently apply this derivation of R 0 to comparisons across environmental variables of the municipalities [20].
With the range of serial intervals used to initiate the model of 1-2 weeks, we assumed a binomial distribution for the serial interval for all municipalities [20]. The analysis determined the probability of having a 1-or 2-week serial interval for each municipality and assigned the serial interval with the highest probability (S1 Fig). Additionally, given the 2-week maximum serial interval, case counts were included in the determination of the epidemic curve per municipality if new cases occurred within two weeks of the previous case. If a case occurs outside of this two-week window, it was considered sporadic or possibly the results of a new introduction and not included. Having shaped the epidemic curve for each local epidemic, the estimation of R 0 included cases starting from the week of the first case until the local peak of the epidemic. The outbreak decline phase of the epidemic is not taken into account for the R 0 estimation using this methodology [20].

Epidemiological and demographic data
Epidemiological data was obtained from Colombian INS which publishes through its web page (www.ins.gov.co) the reported cases collected by the national system of epidemiological surveillance (SIVIGILA). Data for CHIKV cases was taken for the years 2014 to 2016. Data collected through SIVIGILA is reported on a weekly basis, i.e. weekly number of chikungunya cases per city reporting. Population projections developed by the Colombian National administrative department of Statistics (DANE according to its name in Spanish) were used to estimate population size per city and subsequently incidence. Incidence was estimated per 100,000 individuals per each city. Urban area size (termed "urbanicity") and map resources were obtained from the National Geostatistical Framework (MGN, according to its name in Spanish) of DANE.

Environmental data
Temperature data of each city were extracted from Worldclim climatic layers [25]. We used data of minimum temperature, maximum temperature and average temperature by using the coordinates taken from the center of the urban area of each municipality that were obtained from the software Google Earth Pro version 7.3.2.5491 (Google LLC, Mountain view, California, USA). The difference of temperature (maximum temperature minus minimum temperature) was estimated from that data.
For each municipality, we calculated the number of cumulative cases until epidemic peak ("Peak cCases"), the time to onset of detected outbreak in the municipality (the first week with reported cases in the municipality relative to the first week with reported cases in Colombia, "outbreak onset"), and duration of the outbreak (in weeks).

Statistical analysis
All analyses were performed in R version 3.5.1. After R 0 was estimated as above, we correlated the values of environmental and demographic variables with the R 0 estimate using Spearman's correlation across all municipalities. Correlations with p<0.05 were considered significant. Following this, we wanted to determine if there was a differential pattern of corollaries dependent on the intensity of transmission. As R 0 is a threshold parameter (lower bound of 1 to indicate outbreak likelihood), we did this by altering the (lower) threshold to indicate differential intensity of transmission. Thus, we conducted the same Spearman's correlation analysis for the following subsets of municipalities: R 0 > 2, R 0 > 2.5, and R 0 > 3. These were chosen based on the distribution of R 0 (see Results below).
We then wanted to determine if the variable values (such as temperature) were differentially associated depending on the R 0 value. For this, we divided municipalities into three pairwise comparisons: 1 < R 0 � 2 versus 2 < R 0 ; 1 < R 0 < 2.5 versus 2.5 < R 0 ; and 1 < R 0 < 3 versus 3 < R 0 . Comparisons were done using Wilcoxon rank sum test.
Maps were created in ArcGIS 10.4.1 from publicly available dataset described above and from our calculation of R 0 from those datasets.

Estimation of R 0 from CHIKV incidence data
From 846 municipalities that reported chikungunya cases between 2014 and 2016 we selected 85 that had a recognizable epidemic peak and reported more than 150 cases. The chosen municipalities comprised approximately 65.6% of Colombian reported cases (Fig 1). From those municipalities, the range of cases reported over the two years of peak CHIKV incidence in Colombia was 151 (city of Soledad) to 4179 cases (city of Cali). The city of Turbaco reported the highest number of cases in a single week, which was 589 cases.
The R 0 in the 85 municipalities ranked from 1.11 in Floridablanca (Department of Santander) to 9.53 in the municipality of El Molino (La Guajira department). Most of the municipalities were estimated to have a one-week serial interval rather than a two-week serial interval (S1 Eco-environmental variables and Chikungunya basic reproduction number (R 0 ) to 9, where over 76% of municipalities had R 0 values between 1 and 2. When we looked at the distribution of R 0 , the cumulative proportions were 20% with R 0 >2, 14% with R 0 >3, and 9% with R 0 >4.
When we visualized the estimated R 0 versus CHIKV incidence in each municipality, we found the relationship to be relatively dichotomous with an apparent break at approximately R 0 > 2 (Fig 2). While the relationship between incidence and R 0 is obvious, the stark separation shown by this relationship was surprising and further lead us to investigate alternate lower bounds of 2.5 and 3 (see below). Table 1 outlines the results of the correlation analysis of 13 variables. When all municipalities were considered, the number of cases, mean, max, and min temperature, as well as the duration of the epidemic and size of urban area were all significantly correlated to R 0 . Similar to  Table 1. Spearman correlation coefficients (ρ) between R 0 and different variables when the lower threshold of R 0 is 2, 2.5, or 3. what is seen in Fig 2, there was a surprising negative correlation between R 0 and the duration and number of cases. Temperature was positively associated with R 0 values. We determined that the magnitude and direction of the correlation coefficients of some variables changed according to the lower bound of R 0 ( Table 2). When the lower bound of R 0 was greater than 2, the maximum temperature, total population, urban population, and incidence were significant. Number of cases and duration of outbreak was not significantly correlated with the value of R 0 , indicating that the relationship between the shape and size of the epidemic and R 0 is driven mostly by those outbreaks where 1�R 0 �2. As the lower bound was further raised to 2.5 and 3, correlation with these variables was not statistically significant. The only exception was a statistically significant and negative correlation between R 0 and the minimum temperature when the lower bound was 3, indicating that the biological trend of higher temperatures leading to more transmission is still apparent, but not captured with maximum temperature solely.

Association of R 0 when lower bound is modified with eco-environmental variables
The Wilcoxon rank sum test showed that there were differences in several variables when compared to groups of "low" versus "high" based on the lower bound of the R 0 estimates ( Table 2). We found consistent differences in the distribution of number of cases when all municipalities were considered (lower threshold R 0 = 1) versus the modified lower thresholds of 2 or 2.5. Mean, minimum, and difference in temperature was not consistently different across all comparisons. Mean and minimum temperature were only significant between the comparison of R 0 >1 versus R 0 >3; difference in temperature was only significant in the comparisons of R 0 >1 versus R 0 >2.5. Maximum temperature was different among all comparisons, indicating that there is a consistent relationship between this variable and increasing R 0 estimates.
Population size was not statistically associated with differences in R 0 estimates, and neither was the number of people in the urban center, incidence during the epidemic, population density, or the outbreak onset. Interestingly, both the duration of the epidemic and the size of the urban area were statistically significant in all comparisons (Figs 3 and 4). Eco-environmental variables and Chikungunya basic reproduction number (R 0 ) Maximum temperature was positively correlated and consistently differential among comparisons. Fig 3 shows this relationship where those municipalities in the lower group had a tendency towards lower temperatures than those municipalities in the higher R 0 . Interestingly, the total number of cases and size of the urban area were negatively correlated and consistently statistically different among lower threshold R 0 groupings. Surprisingly, lower R 0 groupings had a tendency to have higher incidence of CHIKV. The size of the urban area was also negatively and consistently different, with higher R 0 groupings having smaller urban areas. Finally, the duration of the epidemic tended to be shorter in those municipalities with higher R 0 estimates (Fig 4).

Discussion
The White-Pagano methodology allows for estimation of both the serial interval and R 0 using administrative level, epidemiological data [20]. This method is ideal for studies that rely on administrative level data, as it does not require a priori estimation of parameters related to mosquito populations, which may be needed on a per-municipality basis [26]. Using this method, we estimated that in municipalities that account for over 64% of the CHIKV cases in Colombia, the R 0 ranged from 1.11 to over 9. This wide range indicates heterogeneity in transmission that, since Colombia was assumed to be a completely susceptible population, could have been related to other factors than the lack of herd immunity.
We investigated several variables that have been associated with arbovirus transmission (and thus R 0 ) including temperature, urbanicity, and epidemiological characteristics [13,[27][28][29][30]. We confirmed what other studies have demonstrated, that higher temperatures (as measured by maximum temperature) is associated with several transmission factors that could lead to higher values of R 0 [12,18,27,[31][32][33][34][35][36][37]. Though we consider a spectrum of Eco-environmental variables and Chikungunya basic reproduction number (R 0 ) environmental, ecological, and social variables, it is important to note that many other variables likely affect transmission and reporting of cases at different spatial resolutions. Variables such as housing type/quality, income levels, water availability, or access to health services likely all play a role in risk of exposure to arboviruses [38,39]. However, mosquito density and other socio-ecological variables vary across and within municipalities and are not readily available at the country-level and thus the inclusion of these specific types of variables is of limited value to our model, but urban area size and density of population, included in our study, have been used as proxies for some of these other factors, such as mosquito population [38].
The finding that higher R 0 is associated with smaller urban areas is interesting, especially given that there was a lack of consistently significant relationship between R 0 and the total population of the urban area, the total municipality population, and the density of population indicated. Previous work had revealed that transmission by Aedes aegypti is positively associated with intermediate levels of human movement [40]. Particularly with CHIKV, there is evidence in Dhaka, Bangladesh that viral spread is mainly driven by transmission happening at distances that do not exceed neighborhood distances [41]. Our results indicate that higher R 0 values are associated with smaller urban areas, which could be indicative of limited movement of humans in that area. The result is highly focal transmission, which has been reported for CHIKV [42].
Of interest was the negative correlation of R 0 grouping with urbanicity, CHIKV case counts, and outbreak time-to-peak. That is, those municipalities with higher values of R 0 showed an explosive outbreak with a fast increase of the number of CHIKV cases but a shorter time-topeak. These higher R 0 municipalities also had overall lower case counts at the time of peak compared to municipalities that displayed lower values of R 0 , where the epidemic continued for many weeks with smaller peak values, but ultimately higher case counts. This phenomenon can be seen in Fig 5, where we plotted the course of epidemic of the three municipalities with highest values of R 0 and the three municipalities with the lowest values of R 0 (but still �1).
In summary, this work suggests that CHIKV outbreaks are affected by high maximum temperatures and smaller urban areas, which leads to higher estimates of R 0 . These higher R 0 values are then associated with explosive outbreaks with shorter times-to-peak and higher case counts at time of peak (Fig 5). This explosive nature of the initial outbreak leads to a lower overall case count compared to those outbreaks that had longer times-to-peak, lower peakcase counts, but cumulatively larger total number of cases. The tortoise-or-the-hare (TotH) model was proposed to also explain differences in infectious dynamics of arboviruses whereby arboviruses that adopt the tortoise strategy (long-lasting, moderate viremia) will persist longer than arboviruses that adopt the hare strategy (short-lived, high viremia) [43]. We show here that this TotH model, previously used to describe micro-dynamics, may also be suited to describe macro-level dynamics of arboviruses. In theory, the tortoise model would seem to be most advantageous for the virus, as it ultimately achieves the longest and most productive transmission. However, in the context of active surveillance and response where vector control may be activated and result in abrupt cessation or decrease in transmission, the hare model may ultimately represent the most fit population-level phenotype [40,41]. That is, in areas where control is implemented-such as in Brazil during the Zika epidemic, or Miami during local transmission there-a faster "out-of-the-gate" phenotype (the hare) may result in more transmission, which could be interpreted as the most fit phenotype in this scenario [44,45]. The TotH model of how to think about transmission represents an interesting relationship between perceived transmission intensity and the ultimate magnitude of transmission (i.e. number of cases). Thus, these findings could impact response prioritization, especially in areas where resources are limited. That is, more intense transmission may appear to be the most pressing concern, when in reality the less intense situations may need more intervention. . . And given the association of R 0 with eco-environmental variables, the TotH relationship may further have use in predicting allocation of resources at a focal level." Our findings suggest that the non-monotonic relationship between arbovirus transmission and temperature [46] can be at least in part detected with administrative level case counts and the relationships to the lower bounds of R 0 . And given the association of R 0 with eco-environmental variables, the TotH relationship may further have use in predicting allocation of resources at a focal level.