A generalized framework for estimating snakebite underreporting using statistical models: A study in Colombia

Background Snakebite envenoming is a neglected tropical disease affecting deprived populations, and its burden is underestimated in some regions where patients prefer using traditional medicine, case reporting systems are deficient, or health systems are inaccessible to at-risk populations. Thus, the development of strategies to optimize disease management is a major challenge. We propose a framework that can be used to estimate total snakebite incidence at a fine political scale. Methodology/Principal findings First, we generated fine-scale snakebite risk maps based on the distribution of venomous snakes in Colombia. We then used a generalized mixed-effect model that estimates total snakebite incidence based on risk maps, poverty, and travel time to the nearest medical center. Finally, we calibrated our model with snakebite data in Colombia from 2010 to 2019 using the Markov-chain-Monte-Carlo algorithm. Our results suggest that 10.19% of total snakebite cases (532.26 yearly envenomings) are not reported and these snakebite victims do not seek medical attention, and that populations in the Orinoco and Amazonian regions are the most at-risk and show the highest percentage of underreporting. We also found that variables such as precipitation of the driest month and mean temperature of the warmest quarter influences the suitability of environments for venomous snakes rather than absolute temperature or rainfall. Conclusions/Significance Our framework permits snakebite underreporting to be estimated using data on snakebite incidence and surveillance, presence locations for the most medically significant venomous snake species, and openly available information on population size, poverty, climate, land cover, roads, and the locations of medical centers. Thus, our algorithm could be used in other countries to estimate total snakebite incidence and improve disease management strategies; however, this framework does not serve as a replacement for a surveillance system, which should be made a priority in countries facing similar public health challenges.


Unfunded studies
Enter: The author(s) received no specific funding for this work. If the data are held or will be held in a public repository, include URLs, accession numbers or DOIs. If this information will only be available after acceptance, indicate this by ticking the box below. For example: All XXX files are available from the XXX database (accession number(s) XXX, XXX.).
• If the data are all contained within the manuscript and/or Supporting Information files, enter the following: All relevant data are within the manuscript and its Supporting Information files.
• If neither of these applies but you are able to provide details of access elsewhere, with or without limitations, please do so. For example: • All databases and code requiered to replicate our results will be held at a repository recommended by the editorial staff.
Data cannot be shared publicly because of [XXX]. Data are available from the XXX Institutional Data Access / Ethics Committee (contact via XXX) for researchers who meet the criteria for access to confidential data.
The data underlying the results presented in the study are available from (include the name of the third party and contact information or URL). This text is appropriate if the data are owned by a third party and authors do not have permission to share the data.
• * typeset Additional data availability information: Tick here if the URLs/accession numbers/DOIs will be available only after acceptance of the manuscript for publication so that we can ensure their inclusion before publication.

26
Methodology/Principal findings 27 First, we produced snakebite fine-scale risk maps based on venomous snakes' 28 distribution. Then, we proposed a generalized mixed effect model that estimates total 29 incidence based on risk maps, poverty, and travel time to the nearest medical center. snakes. Therefore, we developed an envenoming risk score incorporating the 134 ecological niche modeling scores for the two species (SNMS, see below). This risk 7 score is based on the law of mass action, which states that envenoming will be the 136 result of encounters between humans and snakes, and it will be proportional to the 137 multiplication of their abundances (16): In Eq. 3, I is the snakebite incidence, S is the snake's abundance, P is the rural Snake presence data 146 We Environmental layers 160 We used Bio-climate variables from the WorldClim database server  Table   165 S1. 166 Envenoming risk map estimation and validation 167 We predicted species habitat suitability using a maximum entropy algorithm because 168 we used only presence data (50). After performing the algorithm described in S1 169 text, we generated a set of maps representing the ten best distribution models for 170 each species over each species range to produce habitat suitability maps using a 171 cloglog output format (51). Then, we removed areas above known altitude thresholds 172 for both species (25,31). Finally, given that both species are not sympatric (25,31), 173 we combined all the possible permutations of these ten predictions to obtain 100 174 maps. These score maps are the snake niche modeling score per pixel (SNMS).

175
To select the best SNMS* map to be proposed as our final envenoming risk score

182
To compare SNMS* with reported risk, we used a type II linear regression by using

186
Computation of the accessibility score 187 We assumed that the distance to nearest medical center is the only factor affecting Where ̅ is the real incidence of snakebite, and 1 and 2 are the intersect and the 210 slope of the generalized lineal model respectively, θ accounts for a normal random 211 noise, and sub-index i denotes the geographical scale, in our case municipality.

212
Assuming the population as an offset we can re-write the model as (Eq. 6): Then, we defined the reporting of snakebite incidence as a counting Poisson 215 process, where the mean is the real incidence (Eq. 6) times a reporting fraction 216 (61). So, reported snakebite incidence can be modelled as: To estimate the reporting fraction we assumed that it will depend on the proposed 219 accessibility score and a poverty index (2,13,39,54,62,63). We used the unsatisfied 220 basic needs index as the poverty index, and we defined the accessibility score for 221 each municipality as the geographic-averaged travel time (64). By assuming a 222 general linear dependence, the model for reporting fraction is the following:

223
( 1 − ) = 1 + 2 + 3 (8) 224 We applied the logit function to the reporting fraction to guarantee that this fraction 225 will be between 0 and 1. In this part of the model, 1 , 2 , and 3 are parameters, 226 is the accessibility score, and is the poverty index. Thus, the complete model to difficult to sample because of social conflict (Fig S1). b) Snake niche modeling score for B. 263 atrox. Suitable areas are in the lowlands of the Orinoco and Amazonian basin, while non-264 suitable areas are located close to the piedmont. Note the low sampling effort in regions of 265 the distribution of the snake, where several pristine and isolated forests are located, and 266 data is absent (Fig S1). 267 The logarithm of SIVIGILA' reported risk had a significant linear correlation with 269 SNMS (Pearson's correlation coefficient: 0.77, p-value < 0.001, View Fig 2). This 270 association suggest that the main driver of snakebite risk is the abundance or habitat 271 suitability for venomous snakes. In addition, we found a group of high-risk 272 departments where Bothrops atrox is distributed. the two variables, which was quantified with a Pearson's correlation coefficient of 0.768 (p-281 value < 0.001). Note that departments with B. atrox make a group characterized by high 282 incidence and high envenoming risk score. 283

Underreporting estimation 284
The maximum index for the Gelman-Rubin diagnostic was 1.05 for 1 , and the  (Fig 3.a). Thus, each year the number of cases that do not get 293 medical attention in the country is similar to the reported cases of the most affected 294 department. Interestingly, departments where Bothrops asper is present report most 295 snakebite cases, but the departments with B. atrox are riskier (Fig 3).

Estimates were done with the calibrated model after parameters estimation by MCMC. a) 299
Yearly reported SIVIGILA cases (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019). Note that the spatial distribution of the cases 300 is heterogeneous, where the department that reports most envenoming's is in Antioquia. b) 301 Yearly estimated total cases. Note that most municipalities increased in the scale of cases, 302 indicating that underreporting affects most of the country. c) Yearly reported SIVIGILA 303 snakebite risk. Areas with higher risk are in the Orinoco and Amazonian areas, where 304 Bothrops atrox is distributed. In addition, the riskiest departments are Casanare, Guaviare 305 and Vaupes. d) Yearly estimated total snakebite risk. Most of the country have more risk 306 than the reported ones, where departments as Amazonas and Arauca are now located in 307 the highest risk group of departments. On the other hand, departments with B. asper as 308 Cesar, Atlantico, and Norte de Santander are now the riskiest departments with this species. 309 Our underreporting estimates vary between 5.7% and 38.7% of total cases (Fig S3   310 shows the relationship between underreporting, travel time, and unsatisfied basic 311 needs). Areas located at the eastern and southern part of the Amazonian and 312 Orinoco regions are the most affected by underreporting, sharing the highest 313 snakebite risk (Fig 4, Fig 3.c-d). Moreover, the Pacific and northern Caribbean coast 314 also have a high underreporting, where these areas share high poverty and poor vial 315 infrastructure (Fig 4)(4,39,42,64,68,69).   339 The estimated distribution for both species (Fig 1) seems adequate since our 340 algorithm performance statistics (AUC and BIC) are greater than 0.5, and similar to 341 reported values for successful predictions for different NTDs (19-21,71,72). Given 342 that we selected models based on AICc values, we validated them using snakebite 343 incidence, and same algorithm has been used successfully on other snakebite 344 studies, we can rely on our distribution maps (Fig 2) (19,20). 345 We found that both species habitat suitability is higher on humid lowlands, and it is  352 Most of the country is under high snakebite envenoming risk (Fig 1). Higher risk 353 values tend to cluster in departments where Bothrops atrox is distributed (Fig 2), 366 atrox seems to be more active during day than B. asper (44,86,87). These 367 differences can explain the higher risk in these departments, but sadly that biological 368 data is absent in Colombia.

369
We believe that venomous snakes' natural history can help to understand better 370 snakebite (9). Studies in trophic ecology and habitat usage can help to determine 371 risky micro-habitat conditions which can be targeted to prevent snakebite (44,86,88).

372
In addition, reproduction studies can stablish mechanistically the temporal patterns

385
We estimated that 68% of the country is located at more than two hours from the 386 nearest medical center, while 36% of the country is more than 12 hours faraway from 387 these centers (View Fig S4). Thus, there are areas where time between 388 envenomation and medical attention is high, resulting in an increased burden and 389 underreporting (3,92). Colombian health authorities have made a considerable effort 390 to enhance the reporting system (37), but there is still a considerable 391 underestimation ( View Fig 3 and Fig 4): Snakebite is a life-threatening public health 392 problem that is still neglected, and our approach can help to quantify its threat.

403
Second, the assumption that the exposed population is the total rural population may Universidad de los Andes for funding this study.

429
All data and code used in this study will be deposited in a recommended repository 430 by the journal.

432
The authors declare no conflict of interest.