Using host traits to predict reservoir host species of rabies virus

Wildlife are important reservoirs for many pathogens, yet the role that different species play in pathogen maintenance frequently remains unknown. This is the case for rabies, a viral disease of mammals. While Carnivora (carnivores) and Chiroptera (bats) are the canonical mammalian orders known to be responsible for the maintenance and onward transmission of rabies Lyssavirus (RABV), the role of most species within these orders remains unknown and is continually changing as a result of contemporary host shifting. We combined a trait-based analytical approach with gradient boosting machine learning models to identify physiological and ecological host features associated with being a reservoir for RABV. We then used a cooperative game theory approach to determine species-specific traits associated with known RABV reservoirs. Being a carnivore reservoir for RABV was associated with phylogenetic similarity to known RABV reservoirs, along with other traits such as having larger litters and earlier sexual maturity. For bats, location in the Americas and geographic range were the most important predictors of RABV reservoir status, along with having a large litter. Our models identified 44 carnivore and 34 bat species that are currently not recognized as RABV reservoirs, but that have trait profiles suggesting their capacity to be or become reservoirs. Further, our findings suggest that potential reservoir species among bats and carnivores occur both within and outside of areas with current RABV circulation. These results show the ability of a trait-based approach to detect potential reservoirs of infection and could inform rabies control programs and surveillance efforts by identifying the types of species and traits that facilitate RABV maintenance and transmission.


Introduction
Most wildlife pathogens can infect multiple host species. However, typically only a few host species act as reservoirs, i.e., are responsible for maintaining a pathogen in a region in the long term and for transmitting it to other species of concern [1,2]. This is because most host species lack intrinsic competency to contribute to transmission [1][2][3]. The likelihood of host species to be reservoirs will depend on both their characteristics and the life cycle and infection biology of the pathogen, such that some host traits may favor maintenance of some pathogens, but not others. Determining whether host species have the characteristics to maintain certain pathogens can be extremely difficult to quantify in the field and often requires performing indepth investigations. Thus, only a limited number of wildlife species have been examined as potential reservoir candidates (e.g., [4]) and the focus has been on those that overlap with people and domestic animals the most [5,6].
Neglecting the role of unrecognized reservoir species present in a community may have negative consequences for disease prevention and control [2]. For example, foot-and-mouth disease (FMD) in South Africa was previously perceived as circulating solely in African buffalo (Syncerus caffer) and livestock, but empirical evidence revealed that impala (Aepyceros melampus) may play a critical role for propagating FMD [7]. Further, given the current and future shifts in climatic and environmental conditions [8], wildlife community assemblages are expected to change [9,10]. Therefore, the role of species in reservoir communities is also likely to shift. This means that while a host species may not currently play a role in the transmission and persistence of a pathogen, its reservoir status may change in the future. Hence, there is a pressing need to develop approaches that can rapidly identify potential reservoir species, without necessarily having to perform in-depth, long-term field investigations.
One promising approach for discovering unknown reservoirs is to identify characteristics that 'known' reservoirs of a particular pathogen or pathogens have in common and use these traits to quantify the likelihood that other understudied species could act as reservoirs. This trait-based approach has only recently been used for understanding the ecology of infectious diseases in wildlife and plants (e.g., [11][12][13][14]) but has already identified some interesting patterns. For example, two traits that appear to emerge as important in different host-pathogen systems are animal birth rate and longevity (e.g., [11,[15][16][17] but see [18]). Animals that tend to have a high birth rate and/or a short life-span are predicted to be reservoirs for many types of pathogens (e.g., Borrelia burgdorferi and flaviviruses [19,20]). Thus, traits identified in typically well studied, accessible species can be applied to less well-studied species, directing research and surveillance effort.
Here, we used a trait-based approach to identify candidate wildlife species potentially involved in the transmission and maintenance of rabies Lyssavirus (RABV). RABV continues to be a major public health concern as it is responsible for over 59,000 deaths each year [21], with economic costs estimated to be as high as $6 billion annually [22]. People generally become infected with RABV via the bite of an infected animal. While all mammal species can become infected with RABV, relatively few carnivore and bat species appear to act as reservoirs and sustain transmission independently [23,24]. In many developing countries, particularly African and Asian countries, the domestic dog (Canis lupus familiaris) is considered a primary reservoir [25][26][27]. While local wild carnivores can, in some cases, contribute to the maintenance of certain RABV variants [28][29][30], the role of most wildlife species remains relatively uncharacterized because of the overwhelming number of canine cases and lack of routine wildlife surveillance systems or diagnostic tests [31]. In countries with effective dog vaccination, domestic dogs no longer play a role in the maintenance of RABV [21,32]. However, RABV persists in many of these countries due to wildlife species that maintain independent RABV lineages [33,34]. For example, in the United States, over the past four decades, 90% of reported rabies cases have been from wildlife [35,36]. While key carnivore and bat species have been recognised as primary reservoirs [37], novel reservoirs for RABV are predicted to emerge due to recurring cross-species transmission and/or sustained transmission events (e.g., [38,39]). Thus, anticipating future spillover events is vital if we are to ensure current control programs continue to be successful.
We applied machine learning to life history and ecological data we compiled for both known and previously unidentified RABV reservoirs to (i) identify traits associated with being a reservoir for RABV; (ii) predict which species could be unrecognized or future reservoirs; (iii) determine the contribution of each specific trait to predicted reservoir status; and (iv) investigate the geographic distribution of known and predicted RABV reservoirs to identify hotspots of historic and potential RABV spillover and host shifts. While all mammals are generally susceptible to RABV, we focused on those within the orders Carnivora and Chiroptera because of their established role in the maintenance and onward transmission of RABV [40][41][42]. Further, since RABV does not circulate in bats outside of the Americas [43,44], we focused on bat species occurring in the Americas.

Reservoir assignment and data collection
To determine the reservoir status of each carnivore and bat species, we conducted a general review of the literature. The literature review was performed in 2017 and articles were collected from Google Scholar using the keywords: 'rabies' AND 'reservoir', followed by each species' scientific name. If the keyword 'rabies' and the species scientific name appeared in articles, articles were read in full. Species were classified as reservoirs only if they fell under one of two definitions: a conservative and a liberal definition. The conservative definition labelled species as reservoirs for RABV if they were described as 'reservoirs' in the article and were associated with one or several genetically distinct virus variants [45]. The liberal definition labelled species as reservoirs if individuals of the species had been recorded as infected or had antibodies against RABV, and had been suggested in the article to play a role in RABV transmission (e.g., described as being 'a primary host'). We classified the species into the conservative or liberal group if this was supported by at least one article. Species outside these two groups were classified as not having enough evidence for being a reservoir for RABV. Reservoir assignment data are available at our online data repository 'Predicting-rabies-reservoirs' (https://github.com/ worsl001/Predicting-rabies-reservoirs). Reservoir assignment data are also available at 'Reser-voirFinder' (https://github.com/whit1951/ResevoirFinder), where wildlife reservoir classification of other multi-host pathogens can be deposited (e.g., Leptospira, Hantavirus, Leishmania).

Species traits
The majority of species traits were obtained from the PanTHERIA database [46]. Of the 45 PanTHERIA traits, 15 were examined for carnivores and 9 for bats. The other traits were excluded either because more than 50% of species had missing values, traits had no hypothesized or plausible link to RABV reservoirs (e.g., mean monthly evapotranspiration rate), traits were highly correlated with other traits (i.e., ρ > 0.7; e.g., diet breadth and trophic level), or traits presented little to no variation (e.g., for bats, 97% of species had the same habitat breadth value). For bats, since the litter size trait was relatively uniform across species (median: 0.99, range: 0.98-3.12), it was reclassified into a binary variable (zero for litter size �1 and one for litter size >1). For carnivores, we included two additional traits gathered from the Animal Diversity Web (https://animaldiversity.org/): sociality and mono/polygamous. We also included information on species phylogenetic grouping based on well resolved phylogenies for each group (carnivores; [47], bats; [48]), to account for the statistical non-independence of species due to common ancestry [49]. We calculated the patristic distance (i.e., the sum of branch lengths between two tips) for each group and then applied Principal Coordinate Analysis (PCoA) to reduce the dimensions of each respective matrix. The first PCoA quantified the broadest variation across the phylogeny (e.g., suborder variation) with subsequent axes capturing progressively smaller amounts of phylogenetic variation (e.g., S1 Fig). We included the top three or four principal coordinate eigenvalues as traits (for carnivores, we excluded the fourth principal coordinate because of it being highly correlated with age at sexual maturity). Nine carnivore and eight bat species were excluded because of having no trait data in the PanTHERIA database (

Identifying traits predictive of reservoir status
To identify traits that best predict the reservoir status of each species, we used gradient boosting machine (GBM) models in the statistical program R (version 4.0.2) [50] using the 'caret' and 'gbm' packages (version 6.0-86 and 2.1.8, respectively) [51,52]. We chose to use GBM models over more traditional regression techniques as GBM models offer a flexible and powerful classification approach that can model nonlinear effects and interactions and provide high predictive performance without overfitting [53,54]. Further, GBM models can efficiently analyze a large number of predictors, including categorical predictors, whilst accounting for missing data [54]. We followed the analytical framework proposed by Fountain-Jones et al. [55].
We ran two models for each mammal order (i.e., Carnivora and Chiroptera): 1) a conservative model using the conservative definition of a reservoir species for RABV; and 2) a liberal model using the liberal definition of a reservoir species for RABV. For the carnivore GBM models, we used five categorical and 15 continuous predictor variables (i.e., traits), and for the bat models we used two categorical and 11 continuous variables. For each model, species were split into two groups: a training set (80%) and a testing set (20%). Models were trained using 10-fold cross-validation of the training set. Since we had substantially fewer reservoirs than non-reservoirs in each dataset, we performed down-sampling, which randomly subsets the classes in the training model to avoid potential class imbalance as described elsewhere [55]. Cross-validation was used to determine model accuracy, sensitivity, and specificity based on a confusion matrix. Accuracy represents the proportion of species that were correctly classified as reservoirs or non-reservoirs, sensitivity represents the proportion of species that were correctly classified as reservoirs, and specificity represents the proportion of species that were correctly classified as non-reservoirs. The test set was used to explore model performance on a set of observations not included in model construction. To find the optimal combination of tuning parameters suitable for each GBM model, we used 'expand.grid' in the 'caret' package, which optimizes the learning rate, number of classification trees, and shrinkage [56].
After model training, we quantified variable importance based on all observations using the 'iml' package (version 0.10.0) [57]. Variables are considered to be 'important' if model error increases after permutation [58]. The effect of each variable on the response was visualized by creating partial dependence plots using the 'pdp' package (version 0.7.0) [59]. To visualize how the predicted probability of being a reservoir for RABV varied by species, we included individual conditional expectation (ICE) curves in each partial dependence plot [60].

Reservoir prediction and trait importance
To identify candidate reservoirs for RABV and determine how each trait contributed to the predicted reservoir status of each species, we used a cooperative game theory approach-the Shapley value [61], using 'iml' [57]. The Shapley value aims to explain the prediction of the GBM model for each observation (in this case a host species). Hence, for each species, the Shapley value uses information from the GBM model to assess the contribution of each trait on the models' prediction (i.e., being or not a reservoir for RABV). Positive Shapley values indicate that predictors are increasing the likelihood that the outcome is positive (i.e., a species is a reservoir for RABV), and negative Shapley values indicate that predictors are increasing the likelihood that the outcome is negative (i.e., a species is not a reservoir for RABV). Importantly, the Shapley value uses a different criterion for classifying reservoirs than the GBM model alone. GBM predictions are based on a 0.5 probability (above 0.5 species are considered reservoirs, below 0.5 species are considered non-reservoirs). Shapley values are based on the difference between the GBM predicted value for the species of interest and the average GBM predicted value for all species. Thus, the Shapley value classification criterion is arguably more insightful than the GBM because it evaluates the role of a reservoir species in the context of all other species. Additionally, the Shapley value not only indicates whether species are incorrectly classified (by combining the Shapley values of all predictors) but also provides insight into the importance of each predictor at influencing the reservoir outcome for each species. Thus, a species is classified as a reservoir for RABV if the Shapley scores of predictors sum to a value that is > 0. Otherwise, the species is either classified as not being a reservoir for RABV (if the Shapley scores sum to a value < 0) or as unknown (if the Shapley scores sum to a value that is equal to 0).

Mapping the geographic distribution of known and predicted RABV reservoirs
The geographic range of known and predicted reservoirs for RABV were collected from the International Union for Conservation of Nature's (IUCN) Red List database (www.iucnredlist. org). Pixel values of species ranges were reclassified to be binary (i.e., a pixel value of 1 indicates the species is present and a pixel value of 0 indicates the species is absent). The ranges of species belonging to the same reservoir group (e.g., known carnivore reservoirs based on the conservative criteria) were stacked using the 'rgdal' package (version 1.4-8) [62]. Maps were created using the 'rasterVis' package (version 0.47) [63] to identify areas where predicted reservoir species are likely to co-occur.

Carnivores
Traits associated with being a reservoir for RABV. Of the 277 carnivore species for which sufficient data were available, 23 (8.3%) were identified as being reservoirs for RABV based on the conservative criteria, and 27 (9.7%) based on the liberal criteria. The conservative and liberal models had an accuracy of 67.16% (sensitivity = 75.79%) and 65.89% (sensitivity = 70.0%), respectively (S1 Table). For the conservative model, species phylogenetic grouping (inferred from the second principal coordinate (PCoA-2)) was the most important predictor of RABV reservoir status (prediction error increased by 1.81 orders of magnitude after permutation; Fig 1A). Next most important were age at sexual maturity, median litter size, and diet breadth (error increased by 1.23 orders of magnitude after permutation for all three traits; Fig 1A). Carnivore species were more likely to be reservoirs for RABV if they were part of the Canidae family (PCoA-2 values ranging from 83-86) (Fig 1B; S2 Fig). More generally, the likelihood for carnivores to be RABV reservoirs decreased with age at sexual maturity ( Fig 1C) but increased as the number of young per litter increased ( Fig 1D) and as the number of dietary categories increased ( Fig 1E). All top traits identified in the conservative model were also identified as the top traits in the liberal model (S3 Fig). Predicted RABV reservoirs. The model predicted 38 carnivore species that could act as reservoirs for RABV in the conservative model (Table 1) and 39 in the liberal model (S2 Table) (summing to a total of 44 species across the two models). Further, the conservative model predicted three currently recognized carnivore reservoirs for RABV to be non-reservoirs: the Chinese ferret-badger (Melogale moschata), the kinkajou (Potos flavus), and the raccoon (Procyon lotor). The liberal model predicted two currently recognized carnivore reservoirs to be nonreservoirs: the meerkat (Suricata suricatta) and the spotted hyena (Crocuta crocuta). In the conservative model, several species of the Canidae, Herpestidae, and Mustelidae families were predicted to be reservoirs for RABV (e.g., the culpeo (Lycalopex culpaeus), the common kusimanse (Crossarchus obscurus), and the least weasel (Mustela nivalis); Table 1). For species from the Mustelidae family, this was partly because individuals from these species tend to reproduce at a young age and have large litters (e.g., the least weasel; Fig 2A). Non-canids that do not reproduce at a young age, have small litters (less than~3.5 young per litter), and have only one dietary category were less likely to be reservoirs for RABV (e.g. the lion (Panthera leo; Fig 2B). In contrast, an empirically-recognized reservoir for RABV, like the red fox (Vulpes vulpes), was classified as being a reservoir in our game theory model partly because of individuals reproducing at a young age (~11 months) and having large litters (4-5 young per litter) (Fig 2C).
Geographic range of known and predicted reservoirs for RABV. The greatest richness of known carnivore RABV reservoirs (~5-7 species) mostly clustered in North America, parts of Mexico and Central America, and East Africa for the conservative model (Fig 3A), along with central and southern Africa in the liberal model ( Fig 3B). Predicted carnivore reservoirs based on the conservative model mostly clustered in southern central US, southern, central, and eastern Africa, as well as parts of south-eastern Europe, western and southern Russia, and eastern India (Fig 3C). Predicted reservoirs based on the liberal model clustered for the most part in southern, central, and eastern Africa (Fig 3D).

Chiroptera
Traits associated with being a reservoir for RABV. Of the 326 bat species for which sufficient data were available, 29 (8.9%) were identified as being reservoirs for RABV based on the conservative criteria, and 41 (12.6%) based on the liberal criteria. The conservative and

PLOS NEGLECTED TROPICAL DISEASES
liberal models had an accuracy of 82.59 (sensitivity = 83.75%) and 82.41 (sensitivity = 87.58%), respectively (S1 Table). For the conservative model, median latitudinal extent of range was the most important predictor of RABV reservoir status, followed by geographic range (km 2 ), and litter size, (prediction error increased by 3, 2.43, 2.14 orders of magnitude after permutation, respectively; Fig 4A). Bat species that resided in North America, ranged over 1 x 10 7 km 2 , and had more than one young per litter were more likely to have been predicted as reservoirs for   Predicted RABV reservoirs. The conservative model predicted 16 bat species that could act as reservoirs for RABV (Table 2) and the liberal model predicted 34 (S3 Table) (summing to a total of 34 species across the two models). All recognized RABV reservoirs were correctly classified as reservoirs in both conservative and liberal models, except for two in the conservative model: the black myotis (Myotis nigricans) and the little brown bat (Myotis lucifugus); and four in the liberal model: the dark fruit-eating bat (Artibeus obscurus), the little brown bat (Myotis lucifugus), the tropical big-eared brown bat (Histiotus velatus), and the western yellow bat (Lasiurus xanthinus). In the conservative model, of the newly identified reservoirs, the long-legged myotis (Myotis volans) was predicted to be a reservoir in part because it occurs in North America (Fig 5A). The hairy fruit-eating bat (Artibeus hirsutus), which ranges over a relatively small area in Mexico, was less likely to be a reservoir for RABV (Fig 5B). A well-recognized reservoir for RABV, the vampire bat (Desmodus rotundus), was predicted to be a reservoir because of having a large geographic range (Fig 5C).
Geographic range of known and predicted RABV reservoirs. The greatest richness of known bat RABV reservoirs (~14-18 species) clustered in Mexico and south-western parts of the US for the conservative model (Fig 6A), along with parts of Central America and northern South America in the liberal model (~15-20 species; Fig 6B). The greatest richness of predicted reservoirs based on the conservative model (~4-5 species) clustered mostly in Mexico, southeastern and western US, southern Brazil, and northern Colombia (Fig 6C), and clustered in western Mexico and northern South America based on the liberal model (~7-10 species) (Fig 6D).

Discussion
Up to 68 carnivore and bat species across the globe are known to be RABV reservoirs according to our definition, and our models predicted there to be an additional 78 potential reservoir species. The traits that emerged as most important for predicting RABV reservoir status for  carnivores were phylogenetic grouping, litter size, and age at sexual maturity. For bats, position along the latitudinal gradient of the Americas, geographic range, along with litter size were the most important traits. Interestingly, while the top traits identified by the GBM models were important at predicting the reservoir status of carnivore and bat species, the contribution of each trait varied by species within each order. Additionally, mapping the spatial distribution of known and predicted reservoirs for RABV revealed that predicted carnivore and bat reservoirs both occurred within the range of known RABV reservoirs and beyond. This suggests that some reservoir species might be missed in known RABV hotspots, that several species could be facilitating or have the potential to facilitate RABV maintenance outside of these areas, and that predicted reservoir species could become RABV reservoirs if the right strain was introduced. Age at sexual maturity and having large litters were among the most important traits for being a carnivore RABV reservoir, in both the conservative and liberal models. These two traits are associated with species having short lifespans and reproducing rapidly, and have been identified as important for predicting wildlife reservoir status for other pathogens [16,64]. These types of traits may also be important for determining the maintenance success of pathogens for which density-dependent transmission has been hypothesised, such as RABV ( [65] although see [66,67]). Thus, carnivore reservoirs for RABV appear to have similar characteristics as reservoirs of other directly transmitted pathogens in that they tend to have faster life history characteristics than non-reservoir species. While several other life-history characteristics appeared to play a less important role in influencing the reservoir status of carnivore species, the finding that most predicted carnivore RABV reservoirs tended to be members of the Canidae, Herpestidae, and Mustelidae families suggests that other traits specific to these families are likely to be important.
It is noteworthy that few carnivore species were predicted to be RABV reservoirs from some carnivore families that are known to have RABV reservoirs, and that some known carnivore reservoirs were predicted to be non-reservoirs. For example, the GBM models identified only two new RABV reservoirs for Mephitidae, both of which had low Shapley scores (i.e. Shapley scores of 0.01 and 0.05 for the pygmy spotted skunk (Spilogale pygmaea) and American hog-nosed skunk (Conepatus leuconotus), respectively). This suggests that species from this family possibly are less likely to be reservoirs for RABV. Similarly, our conservative GBM model predicted the raccoon (Procyon lotor) and the kinkajou (Potos flavus) to be non-reservoirs for RABV. One reason for this could be that the number and types of traits included in our GBM models were not sufficient to correctly predict the reservoir status of species that are part of the Procyonidae family. The carnivore models only predicted reservoir status 65-67% of the time (although sensitivity was~70-76%). Thus, it is possible that our GBM models could be missing an important ecological dimension, suggesting that additional information on hosts that more closely relate to the maintenance of RABV is needed to strengthen future models. Additionally, the difference in the number of species in each family that are currently recognized as RABV reservoirs could also be influencing predictions. For example, in exploratory GBM runs, we found that predictions were sensitive to the composition of the training set, particularly for members of the Procyonidae family. This was likely because fewer than a quarter of known carnivore RABV reservoirs are from the Procyonidae family. This highlights the need for more studies on RABV reservoir status of other members of the Procyonidae family as well as the development of cross-validation approaches that account for phylogenetic structure [68].
Similarly, some of the predicted carnivore RABV reservoirs identified are unlikely to contribute substantially to endemic RABV circulation as they are classified as endangered in the IUCN Red List (e.g., the dhole (Cuon alpinus) and the African wild dog (Lycaon pictus)). Our GBM models likely predicted these species to be reservoirs for RABV because our reservoir classifications were based solely on species life-history characteristics and did not account for some species occurring in small and fragmented populations that might be unable to maintain RABV. Hence, while identified endangered species are likely not current RABV reservoirs, their life-history characteristics suggest that they have the potential to be. From a conservation standpoint, identifying endangered species as potential reservoirs for RABV reinforces the need to establish surveillance programs for these species so that transmission can more readily be controlled should an outbreak occur.
The geographic clustering of known carnivore reservoirs in Eastern and Southern Africa and North America is probably associated in part with sampling bias. However, examining the geographic distribution of predicted carnivore reservoirs revealed that several predicted carnivore species occur in areas where known reservoir species occur. The conservative model predicted some carnivore reservoirs to occur in southern and central parts of the US and, the liberal model predicted carnivore reservoirs around East Africa and parts of Central and Southern Africa, which for the latter is consistent with previous work on carnivore zoonotic pathogens [14]. As such, while several carnivore reservoirs have been identified in these RABV hotspots, it is possible that several other carnivore species could facilitate RABV maintenance in these regions, and therefore, threaten the effectiveness of ongoing rabies control programs. However, while the predicted reservoirs could contribute to the transmission cycles of existing variants, they could also sustain undiscovered RABV variants.
As in the carnivore analysis, one of the most important traits for being a bat RABV reservoir was litter size, which is consistent with previous work for other types of bat viruses [69]. While litter size can be a proxy for host density for carnivores, it is most often not the case for bats. For example, several bat species that have more than one young per litter tend to be solitary or live in small groups (e.g., the southern yellow bat (Lasiurus ega)) while several bat species that have only one young per litter tend to live in large groups (e.g., the Mexican free-tailed bat (Tadarida brasiliensis)). Thus, we suspect that the litter size finding is not a reflection of host density in bats. Further, since RABV transmission in bats is more likely frequency than density dependent [70,71], we suspect that a more plausible explanation for the litter size finding is that there is another trait unique to species with more than one young per litter that is driving this association. This highlights a need to explore the importance of other life-history traits at influencing the RABV reservoir status of bats.
We expected phylogenetic grouping to be a primary predictor of RABV reservoirs status for bats since RABV transmission and establishment is more likely to occur between closely than distantly related species [69,[72][73][74]. Despite this, phylogenetic grouping appeared as fifth most important in the conservative model and one of the least important in the liberal model. This finding could be due to data deficiency, or because most of the predicted species were from three of the 21 phylogenetically distinct families (Vespertillionidae, Molossidae, and Phyllostomidae). Further, phylogenetic grouping was likely important at predicting bat RABV reservoir status but did not rank highly, possibly because traits associated with the spatial distribution of species (e.g., species geographic range) were more influential. Likewise, a bat trait that has previously been identified as important for RABV occurrence is diet [75]. Yet, in our models diet ranked as one of the least important predictors of RABV reservoir status for bats. This could be associated with the fact that over 41% of bat species had missing information on their diet status. Thus, more research is needed to determine whether diet is an important predictor of reservoir status for bats.
The fact that the top-ranking traits associated with bats being RABV reservoirs were those associated with the species' spatial distribution may reflect geographic biases in the tendency for bats to have been reported as RABV reservoirs. For example, latitudinal gradient was one of the top predictors, where bat reservoirs are more prone to occur in North America, which could be a result of there being far greater RABV surveillance in North America than in Central and South America [75][76][77]. Further, the greater importance of species spatial distribution over lifehistory characteristics highlights that data on bat ecological and life-history characteristics are alarmingly deficient. For instance, eleven traits in the PanTHERIA database were excluded from our analyses because over 50% of bat species had missing values, and a large proportion of species with missing data occurred in Central and South America. Gathering data on traits that are known to influence RABV transmission and maintenance in bats (e.g., overwintering activity, migration, and roosting behavior; [78]) and focusing efforts on species that have little information would help inform predictive models such as the ones developed here.
The difference in accuracies between the carnivore and bat models is noteworthy. The carnivore models likely had a lower accuracy than the bat models partly because one or several carnivore traits important for RABV maintenance were missing. That said, while the bat models had greater accuracies than the carnivore models, the carnivore findings were generally more insightful than the bat findings because more life-history traits were examined. The high predictive power of the bat models was partly driven by traits that were associated with sampling bias (e.g., location in the Americas). Thus, while both models are useful for identifying traits and potential reservoirs for RABV, they also identify key gaps in both the carnivore and bat datasets. Several additional factors associated with RABV transmission and maintenance should be explored. For example, in addition to traits associated with host density and activity (e.g., population size and roosting behavior), an important factor relates to RABV circulation in species range. The reservoir status of many carnivore and bat species is probably influenced by the number and types of RABV variants circulating in the region, increasing the probability of host shifts. Exploring the importance of such a variable could help tease apart the reservoir status of many species but necessitates that more information on RABV variants be collected and made available.
Our definition of RABV reservoir is a potential limitation of this study. With our definition, species are predicted to be reservoirs across their entire geographic range when in many cases it is populations rather than species that tend to be defined as RABV reservoirs. For example, known reservoirs of the Mephitidae (i.e., the striped skunk (Mephitis mephitis) and the eastern spotted skunk (Spilogale putorius)) and Procyonidae families (e.g. the raccoon (Procyon lotor)) act as reservoirs, but only in certain regions. The Striped skunk, for instance, is considered to be a reservoir for RABV in the southern, central US but not on the eastern coast of the US [34]. Determining which species are likely to be RABV reservoirs across their entire range versus only in certain regions would be an important next step to take. Another potential drawback is our criteria for defining non-reservoirs. We did not account for differences in sampling effort for each species, meaning that our definition of 'a non-reservoir' does not make a distinction between 'evidence that species is not a reservoir' and 'data insufficient'. This is a weakness of many similar approaches, suggesting that future work to address this gap is needed. Our online ReservoirFinder database (https://github. com/whit1951/ResevoirFinder) will provide a valuable resource for future RABV reservoir models when new information is available.
Despite these weaknesses, the list of predicted RABV reservoirs identified as part of this study can be used to help target surveillance and control programs. Further, identification of species for which RABV reservoir status was predicted to be uncertain (i.e., Shapley values less than 0.1) is valuable as it provides direction on the types of species for which more research is needed (on both species ecological characteristics and association with genetically distinct virus variants). However, the list of predicted RABV reservoirs should also be considered with a degree of caution for several reasons. Firstly, predictions made are based on the combined effect of the specific traits examined in this study. This means that any addition or removal of traits has the potential to alter the predicted reservoir status of certain species, especially those species that have Shapley values less than 0.1. Similarly, as new information is gathered for missing traits, model predictions will also likely shift. Thus, this study should be viewed as a preliminary step towards identifying current and future RABV reservoirs. In this way, the findings should be used to help focus current and future rabies research and surveillance efforts, but should not replace generalized surveillance. Indeed, some species that the GBM models predicted as non-reservoirs could be reservoirs but traits examined and/or missing data prohibited the GBM models to identify them as RABV reservoirs. In conclusion, by using advances in machine learning, we predicted previously unidentified carnivore and bat reservoirs of RABV that could be targeted in current and future rabies surveillance programs. Further, by investigating the geographic range of known and predicted RABV reservoirs, we provided insight into the locations where RABV in wildlife communities is likely to persist and where future spillover and host shift events are most expected to occur. Using the Shapley value to understand how each trait contributed to the reservoir status of each species was particularly insightful, and we recommend this approach be used to identify additional reservoirs for RABV as more data become available, and for other zoonotic pathogens. Efforts to control rabies in wildlife should aim to prevent RABV host shifts into carnivore and bat species predicted to be RABV reservoirs.
Supporting information S1  Table. Carnivore species predicted to be RABV reservoirs based on the liberal criteria. Since there is inherent variation when performing permutations, species with Shapley values close to zero (especially those < 0.1) should be considered with caution. (PDF) S3 Table. Bat species predicted to be RABV reservoirs based on the liberal criteria. Since there is inherent variation when performing permutations, species with Shapley values close to zero (especially those < 0.1) should be considered with caution. (PDF)