Predicting disease risk areas through co-production of spatial models: The example of Kyasanur Forest Disease in India’s forest landscapes

Zoonotic diseases affect resource-poor tropical communities disproportionately, and are linked to human use and modification of ecosystems. Disentangling the socio-ecological mechanisms by which ecosystem change precipitates impacts of pathogens is critical for predicting disease risk and designing effective intervention strategies. Despite the global “One Health” initiative, predictive models for tropical zoonotic diseases often focus on narrow ranges of risk factors and are rarely scaled to intervention programs and ecosystem use. This study uses a participatory, co-production approach to address this disconnect between science, policy and implementation, by developing more informative disease models for a fatal tick-borne viral haemorrhagic disease, Kyasanur Forest Disease (KFD), that is spreading across degraded forest ecosystems in India. We integrated knowledge across disciplines to identify key risk factors and needs with actors and beneficiaries across the relevant policy sectors, to understand disease patterns and develop decision support tools. Human case locations (2014–2018) and spatial machine learning quantified the relative role of risk factors, including forest cover and loss, host densities and public health access, in driving landscape-scale disease patterns in a long-affected district (Shivamogga, Karnataka State). Models combining forest metrics, livestock densities and elevation accurately predicted spatial patterns in human KFD cases (2014–2018). Consistent with suggestions that KFD is an “ecotonal” disease, landscapes at higher risk for human KFD contained diverse forest-plantation mosaics with high coverage of moist evergreen forest and plantation, high indigenous cattle density, and low coverage of dry deciduous forest. Models predicted new hotspots of outbreaks in 2019, indicating their value for spatial targeting of intervention. Co-production was vital for: gathering outbreak data that reflected locations of exposure in the landscape; better understanding contextual socio-ecological risk factors; and tailoring the spatial grain and outputs to the scale of forest use, and public health interventions. We argue this inter-disciplinary approach to risk prediction is applicable across zoonotic diseases in tropical settings.


Introduction
Zoonotic diseases disproportionately affect poor tropical communities [1][2][3], accounting for around 26% of Disability-adjusted Life Years lost to infectious diseases in Lower Middle Income Countries (LMICs). Communities affected by zoonoses often depend on surrounding ecosystems for livelihoods and food security. In India, for example, around 300 million people depend directly on degraded forest ecosystems for food, fuel, livestock fodder and other nontimber forest products (NTFPs) [4]. A key cost of altering forest structure and accessing forest goods and services is the increased exposure of humans and livestock to multi-host zoonotic pathogens [5]. Forest habitats and their ecotones are a significant source of emerging and reemerging infections because they support complex ecological communities, including high wildlife host and vector diversity [6][7][8]. Upsurges in incidence of several high burden zoonotic diseases have been linked to deforestation or reforestation in LMICs (e.g. malaria [9], Leishmaniases [10,11], Crimean-Congo Haemorrhagic Fever Virus) and to forest dependence. Living in or near forests has been linked to unfair accumulation of geographic and social disadvantages including political and economic marginalisation [12]. Forest communities are rendered even more vulnerable by their remoteness from healthcare infrastructure [13]. Disentangling the ecological and social mechanisms by which changes in forest habitat can precipitate impacts of multi-host pathogens is critical for the design of effective intervention strategies.
Integrating land use patterns, ecosystem and social factors into interpretations of past disease patterns at a range of geographical scales [14] can indicate potential mechanisms and facilitate prediction of disease risk across new landscapes or for the same landscapes under future alternative environmental and development policies [15]. Conceptual frameworks, like the global "One Health" paradigm, recognise the interconnectedness of human health, wildlife and domestic animal health and the environment [16]. Despite this, models of zoonotic disease risk in tropical regions have often focussed on a single set of processes that drive variability in disease risk and effectiveness of interventions ( [17] but see [14,15,[18][19][20]). For example, published risk maps and spatial decision support tools for tick-borne zoonoses tend to focus on mapping environmental hazard (or presumed correlates of environmental hazard like tick abundance or presence [21]), but often do not integrate the social factors which drive patterns in exposure [20,22]. Tools and maps are rarely linked to intervention programs at a scale appropriate to sources of epidemics [17] and ecosystem use. Leach & Scoones [17] recommend instead that disciplines, data and models are not only integrated, but "triangulated" with deliberation around framing assumptions, policy narrative, politics and values. The process of coproduction is ideally suited to the development of models to understand and predict zoonotic diseases. Co-production is based on the need to integrate different forms of knowledge into decision-making. It involves active engagement of stakeholders from different sectors and scales as knowledge holders and future model users through three key stages of framing the problem, knowledge integration and experimentation [23].
In our inter-disciplinary One Health Indo-UK partnership, the MonkeyFeverRisk project [24], co-production is used to improve understanding and develop decision support tools for a fatal tick-borne zoonotic disease, Kyasanur Forest Disease (KFD), that is spreading across degraded Western Ghats forest ecosystems in India.
Kyasanur Forest Disease Virus (KFDV; family Flaviviridae, genus Flavivirus) causes debilitating and fatal haemorrhagic disease (around 500 cases p.a., up to 10% mortality [25]) in forest communities. Key affected groups include small-holder farmers engaged in cultivation and grazing of cattle in forests [26], forest-dependent tribal communities who gather NTFP, day labourers in plantations and State forest department workers [27][28][29].
As well as affecting diverse human communities, the transmission cycle of KFDV is complex. KFDV cycles between different life stages of tick species from several genera (principally Haemaphysalis but also some Ixodes species) and amplifying vertebrate hosts including wild rodents and shrews, monkeys and some birds [25]. Humans contract KFDV when bitten by an infected tick, but are incidental hosts for the disease. Monkeys, principally the black-footed grey langur (Semnopithecus hypoleucos) and the bonnet macaque (Macaca radiata), are thought to act as amplifying hosts, by infecting large numbers of larval ticks with the virus [30]. Cattle do not amplify KFD since they do not develop viraemia of long duration [31], but may amplify tick populations through their importance as a blood meal host.
The emergence of KFD in humans has been widely linked to human modification of the forest ecosystem through deforestation [10,26,30]. The initial epidemics in Karnataka in the 1950s and those in the 1980s were preceded by population increases and extensive deforestation, to make way for plantations (such as Areca and cashew), paddy cultivation, housing and roads. This created mosaic tropical evergreen and deciduous forest, interspersed with cultivation (e.g. paddy), and interface scrub habitat between villages and forests that was conducive to both tick populations and cattle grazing [26,30,32]. These conditions are hypothesised to have facilitated the emergence of KFDV into humans from a cryptic sylvatic enzootic cycle involving small mammals and monkeys [30]. However, it is still unknown how tick vectors and potential amplifying vertebrate hosts are linked to different habitats and to human exposure within agro-forest mosaics.
Human epidemics were restricted to focal areas of Karnataka State from 1957 to 2012, but since then human cases have been detected in four neighbouring states (Tamil Nadu, Kerala, Goa and Maharashtra) [33]. Human serological evidence indicates wide KFDV circulation in other states across India (Gujarat, West Bengal, the Andaman and Nicobar Islands) and on the border with China. Therefore, the landscape conditions favouring KFDV transmission are widespread [33], and the subset of these conditions that lead to human disease impacts, need to be delineated urgently. KFD impacts are managed currently through vaccination, awareness campaigns and promotion of tick protection measures in and around recently affected areas. However, constraints on availability and efficacy of the vaccine, and reluctance of local communities to be vaccinated and adopt personal protection measures can exacerbate epidemics [28,29,34]. Thus targeting of interventions towards the most vulnerable communities is critical.
This paper describes the co-production-with actors and beneficiaries across the public health, animal health and forestry sectors-of the landscape-level spatial models and understanding of risk factors for a case study zoonotic disease, Kyasanur Forest Disease. The model is developed for Shivamogga district in Karnataka, which has been affected by KFD since the 1950s, and reports a high proportion of India's human cases (e.g. 656 or 34% of 1929 cases reported between 2010 and 2019). Because of this, health managers of this district have long experience in disease surveillance and control. Using point locations for human cases recorded at sub-village level by health managers between 2014 and 2018, spatial Boosted Regression Tree models [35,36] are used to quantify the relative role of forest characteristics and loss, topography, host densities and public health factors in driving patterns in KFD at the landscape scale.
The co-production process involved framing potential key risk factors for KFD with crosssectoral managers [37]. Spatial proxies of these risk factors were integrated into the model framework. The spatial grain of the model and its output was tailored to the scale at which people use forests (from household surveys) and the scale at which public health managers collect and report outbreak data. The models were then validated during the 2019 outbreak season with health managers, in terms of their predictive accuracy and utility for management.
Furthermore, through geographical thinning [38,39], the model framework accounted for the sparse, spatially clustered recording effort that often arises in public health surveillance datasets [40]. We discuss the extent to which disease patterns were predictable from landscape, topographical host and landscape metrics, whether human cases of KFD are associated with particular forest types, mosaic habitats or forest loss and how model predictions could improve targeting of interventions and surveillance.

Ethics statement
The protocols for this study were approved by the Institutional Ethics Committee of the Institute of Public Health (IPH IEC), Bangalore (Study ID, IEC-FR/04/2017) and received a Favourable Ethical Opinion from the Liverpool School of Tropical Medicine Research Ethics Committee (research protocol 17/062). All workshop participants were adults and provided informed consent via email through acceptance of the workshop invitation. The IPH IEC approved the access and use of confidential patient data from DHFWS and all data were anonymised appropriately prior to analysis.

Study area
Shivamogga covers an area of 8465 km 2 between 13.45 o and 14.65˚Latitude and 74.63 o and 75.73˚Longitude (Fig 1). The district is diverse in topography and vegetation, comprising Western Ghats mountains that are subject to high annual rainfall (900 to 8000 mm per annum) and drier inland plateau areas (range in elevation across Shivamogga is -70 m to 2674 m a.s.l., mean ± s.d. = 460 m ± 379 m a.s.l.), and a corresponding transition from evergreen and semi-evergreen forests, to moist deciduous forest and scrub. The forests of the Western Ghats have been degraded and fragmented throughout the 19 th and 20 th centuries, due to timber extraction, industrial development (roads, railways, dams and mines) and increases in agriculture and plantations [41,42]. The area of forest vegetation in Shivamogga has declined from an estimated 43.8% of the district in 1973 to 22.3% of the district in 2012, producing patch and edge forest [43].

Human case data
Human cases of Kyasanur Forest Disease occur seasonally between December and May when the abundance of infected nymphal ticks in the forest is at a peak. Designated laboratories for processing human samples of KFD from Shivamogga District include the Virus Diagnostic Laboratory (VDL), Shivamogga, the ICMR-National Institute of Virology, Pune and Manipal Centre for Virus Research. Human cases, arising from samples testing positive for Kyasanur Forest Disease by RT-PCR or IgM ELISA in either laboratory in the five years between the December 2013 / May 2014 and December 2017 / May 2018 outbreak seasons, were compiled by co-author, Dr S. K. Kiran, who served as Taluka Medical Officer for Tirthahalli from 2010-2019. Cases were assigned to locations retrospectively using Google Earth, following personal visits to households of affected patients conducted by Dr Kiran, or Medical Officers in other talukas, during the outbreak seasons. These locations were marked on Google Earth to retrieve the geographical coordinates in Latitude and Longitude to an estimated spatial precision of around 300 m. Cases of febrile illness may be reported to Primary Health Centres and from home addresses that are very distant from where infection is acquired. For example, migrant agricultural labourers work in plantations and pilgrims visiting temples in forest areas that can be 10s to 100s of kilometres from their homes [29]. Based on case-tracing that Taluka Medical Officers had performed during the outbreak seasons, such cases (< 10) could be excluded from the analysis. In total, 329 cases from 117 different household or village locations were The presence and number of cases were summarised at a 1 km x 1 km and 2 km x 2 km grid resolution across Shivamogga, resulting in 65 (of a total of 7732) land cells and 53 (of a total of 1926) land cells positive for KFD respectively at these study grains. These two study grains were chosen to reflect the range of distances from households at which forest users may acquire infection from forest habitats during their livelihood activities. Interviews with members of similar forest communities in Wayanad, Kerala revealed that forest users move between 1 and 4 km through forest habitats from their homes during the main risk KFD period(January to March, see S1 File).
During the 2018/2019 transmission season in Shivamogga District, households with cases throughout Shivamogga District were geo-located as a passive independent validation dataset. Geo-location of affected households was achieved directly in the field using the Smartphone Android App "AndLocation" or by health workers who shared their live location on WhatsApp with project staff for transfer to Google Earth and retrieval of coordinates. These 2018/2019 season cases spanned 84 of the 1 km grid cells and 68 of the 2 km grid cells. They provided an ideal test of whether the risk maps developed before the transmission season (using human cases data from the prior five transmission seasons) are capable of predicting new outbreaks, including new geographical foci like the one in Sagara taluka.

Framing key socio-ecological risk factors for KFD with stakeholders
The participatory MonkeyFeverRisk Framing workshop was held on 16th August 2018 in Bengaluru, Karnataka, India. It involved over 20 experts from different KFD-affected districts and states level of Karnataka, Maharashtra and Kerala, including officials from the public and animal health, agriculture, forestry and social welfare sectors [37]. Participants of the workshop were selected based on a stakeholder mapping exercise. This identified key actors from different sectors and working at different scales, likely to play major roles in the understanding and management of KFD. The two aims of the workshop were to (i) identify the key risk factors for KFD as prioritized by stakeholders and (ii) identify key policies that affect KFD transmission and management using participatory approaches, as outlined in S2 File. The key risk factors  for KFD that received four or more votes during the ranking exercise among stakeholders are shown in Table 1, together with their links to particular spatial environmental predictors in the analysis (right hand column). The full suite of gridded spatial environmental predictors that was generated within topographical, landscape, host and public health categories is shown in Table 2, including additional risk factors drawn from scientific literature on KFD epidemiology and ecology. The sources and processing methods for selected spatial environmental predictors, including measures to account for collinearity between predictors, are detailed in S3 File.
Forest loss and degradation was the highest ranked environmental risk factor for KFD by stakeholders, consistent with scientific literature linking forest loss and creation of mosaic forest-paddy-scrub village habitat to human emergence events [26,30,32]. Stakeholders considered that abrupt shifts in land use between forest and village areas made communities more vulnerable to KFD. Metrics of forest change considered in the analysis were area of forest loss and gain since 2000 per grid cell derived from Hansen et al. [44]. Area of forest gain was highly collinear with area of forest loss (Pearson's correlation coefficient, r = 0.968), and much less prevalent (making up 0.16% versus 1.2% of 30m land pixels), so was excluded from the analysis. To quantify mosaic habitat, the amount and diversity of different forest types, of agricultural or fallow land and plantation and overall land use diversity were extracted from the MonkeyFeverRisk Land Use Land Cover map (LULC map), derived from Landsat Thematic Mapper imagery (2016-2017) as detailed in S4 File. Edge metrics for forest types were also calculated as a measure of the amount of interface habitat between forest, agriculture and villages but were highly collinear with amount of individual forest types and so were not included in the final analysis (S3 File).
Stakeholders also ranked human use of forests and living in/around forests as key risk factors. They identified policies (or poor policy implementation) linked to grazing and encroachment in and around forest areas as increasing the risk of KFD. This is consistent with casecontrol studies that have linked grazing cattle inside forests, handling cattle and gathering of dry leaves for animal bedding to higher exposure to KFD [29] and the hypothesised role of cattle in amplifying tick populations. Available spatial proxies for extent of forest use for grazing were the densities of indigenous cattle (since smallholders in Shivamogga keep indigenous breeds) and buffalos per grid cell.
Again consistent with literature [28,34], diverse public health factors were ranked by stakeholders as key risk factors for KFD (Table 1) such as lack of awareness of KFD and preventative measures, low acceptance and coverage of vaccination, poor diagnostics and surveillance including under-reporting of human cases or monkey deaths. Available spatial proxies of access to health services and education and surveillance effort, developed from Indian Government census data (S3 File), were the proximity to a primary health centre, overall human population size and the number of medics available per head of population at village level (S3 File). Some risk factors like poor data management, low vaccine uptake and under-reporting of monkey deaths by the Forest Department, will be less well linked to such spatial proxies.
Consistent with literature linking tick demography and host-seeking behaviour to microclimatic factors [21,45,46], stakeholders also mentioned suitable micro-climates for tick populations among key risk factors for KFD (Table 1). However, available gridded weather station data is too coarse in resolution to define spatial variation in climate at village to district levels. Topographical factors, namely slope and elevation, and associated vegetation types from the LULC map were considered to be better proxies of micro-climatic conditions at village and district scale. Small water bodies of around 10m 2 were hypothesised by disease managers to be key locations in the landscape where monkeys carrying KFD, nymphal ticks and grazing animals might co-occur. Conversely, large water-bodies could constitute barriers to dispersal of wildlife hosts for tick vectors and KFD, and in turn, to epidemic spread [47,48]. The LULC map is based on 30 m resolution Landsat data, thus only detects water-bodies of larger than 30m. Thus, the coverage of such larger water-bodies per grid cell was included a priori amongst predictors, with the expectation that high coverage of such water-bodies would be unfavourable for KFD.

Modelling the distribution of Kyasanur Forest Disease with Boosted Regression Trees
A boosted regression tree (BRT) modelling [35,36] framework was used to determine the sensitivity of patterns in human KFD cases to land use, topographical and host variability, and to generate maps of potential distribution across Shivamogga. BRTs combine regression trees, which build a set of decision rules on the predictor variables by portioning the data into successively smaller groups with binary splits [35,36], and boosting, which selects the tree that minimises the loss of function, to best capture the variables that define the distribution of the input data. BRTs have been shown to have high performance amongst methods used to predict species distributions [49], probably due to their ability to fit complex, non-linear responses to environmental covariates and their robustness to outliers. However, they can be prone to over-fitting data and therefore a number of stringent cross-validation checks were used to avoid this (see below).
Due to the high degree of spatial clustering in the case presence data (Fig 2), it was clear that this presence data should be thinned prior to analysis to avoid inflation of model accuracy and pseudo-replication of particular environmental conditions in the model [38,39]. Presence data can be thinned in geographical or environmental space [38]. The lack of quantitative information on the environmental drivers of KFD makes it difficult to select key environmental axes by which to thin the data. Thus thinning in geographical space was conducted. A rule of thumb is to take the peak distance between pairs of presence records (this is 10 km for KFD in S1 Fig) and thin the points so that they are no closer than half this distance (~5 km for KFD), ensuring that there is only 1 presence in each 5 km grid cell. Thus, at each resolution (1 km grid cell and 2 km grid cell), we randomly selected one of the presence records per 5 km grid cell, for each of the 30 x 5 km grid cells in which presence records were found. The whole process was repeated 100 times to generate 100 presence datasets at each resolution.
Absence data: Since Boosted Regression Trees require both presence and absence data, the presences were matched with an equal number of selected absence grid cells as recommended by Barbet-Massin et al. [50] for BRTs. When selecting absence data, it is important to try to mimic the recording process that gives rise to the presence data. For example, human cases of KFD tend to be reported from rural land areas and communities. Therefore very large water bodies and built-up areas like towns and cities were excluded from the absence selection process by including only 1 km or 2 km grid cells encompassed by the census village boundaries in the selection area. Cells containing presences were also removed from the selection area at each resolution. We then randomly selected 30 absence cells at each resolution, each of which occurred in a different 5 km grid cell, and this process was repeated 100 times to generate 100 absence datasets. Each presence dataset was combined with one of the absence datasets to generate 100 presence-absence datasets at each resolution.
Due to the disparity between local land use maps and global forest loss data (S3 File), at each resolution, a set of 100 BRT sub-models was fitted to all environmental predictors and another set fitted to all predictors except area of forest loss. Models were fitted using the gbm. step function of the dismo package in R [51]. Model settings were as follows: learning rate = 0.001; tree.complexity = 4; bag rate = 0.6; to allow two way interactions between environmental predictors and to ensure that optimum number of trees always exceeded 1000. The gbm.step function automatically identifies the optimum number of trees for a BRT model using ten-fold cross-validation, selecting the number of trees that minimises hold-out deviance (cross-validation deviance) across folds. In addition to the cross-validation deviance, gbm.step reports several metrics of model performance in cross-validation across folds including; (i) the Area Under the Receiver Operator Curve Statistic (AUC) on hold-out dataset [52], or crossvalidation AUC, where an AUC value of 0.5 indicates no discriminative ability between presence and absence, and a value of 1 indicates perfect discrimination; (ii) cross-validation coefficient, which is the Pearson's correlation coefficient between the predicted probability of presence and the true presence/background for the hold-out dataset. We report the mean, median and standard deviation of these metrics across the 100 BRT sub-models in each model set. This provides a full picture of the average performance and consistency across each model set. The true prevalence of the disease across the study region is unknown, and the models are parameterised with ad hoc presence data combined with selected pseudo-absences rather than true absences. Therefore, the models predict the relative rather than absolute probability of presence between cells. This model fitting process was repeated at each resolution for the two sets of environmental predictors as above, giving rise to four different model runs-1 km with forest loss, 1 km without forest loss, 2 km with forest loss, and 2 km without forest loss.
Relative contribution statistics of predictor variables are reported only for the BRT model with the optimum number of trees (not for the folds). Relative importance is defined as the number of times a variable is selected for splitting, weighted by the squared improvement to the model as a result of each split and averaged over all trees [52]. These contributions are scaled to sum to 100, with a higher number indicating a greater effect on the response. Again, we report the average of these values across the 100 BRT sub-models.
The direction of the association between human cases of KFD and particular predictor variables was evaluated from the response curves produced from the BRT model. The response curves for a predictor were averaged across the 100 BRT sub-models by calculating mean and standard deviation of the marginal predicted probabilities within~40 bins of the predictor values.
The extent to which geographical thinning was successful in removing spatial autocorrelation caused by clustering of presence records was examined by plotting correlograms of the residuals of each fitted sub-model using the correlog function in the ncf package in R [53]. The Moran's I values and significance values were then averaged across sub-models within different distance bins (from 0 to 80 km in increments of 5 km), to look for evidence of systematic positive spatial autocorrelation.
To predict the distribution of KFD across Shivamogga in unsampled areas, each BRT submodel was applied to the prediction layers for a given resolution and environmental set using the predict.gbm function of the gbm package in R [54]. The predicted relative probability of presence was averaged across sub-models to produce an ensemble mean ± standard deviation of the relative probability of presence per grid cell. For each BRT model, the threshold relative probability of presence that maximises discrimination between presence/background for the hold-out dataset was calculated by gbm.step and averaged across folds (cross-validation threshold). Occurrence patterns were summarised across sub-models by converting the predicted distributions to binary presence-absence maps per sub-model using the mean cross-validation threshold relative probability of presence for each sub-model, and counting the number of times a cell was predicted to be present across the 100 sub-models. The predicted extent of occurrence in terms of number of study grid squares or pixels was calculated for each predictor resolution and environmental set. The geographical extent of predictions was limited to the region for which the environmental predictors were available and, out of necessity, omitted very large water-bodies and areas classified as towns or cities in the census.
Independent validation of models was achieved using the presence points from the 2018/ 2019 outbreak, by calculating the predicted-to-expected (P/E) ratio for all model sets using the ecospat.boyce function of the ecospat package in R. Here a graph of predicted versus expected for a good model should show a monotonically increasing curve [55], and the correlation between P/E and habitat suitability should be positive if model predictions are consistent with the distribution of presences in the independent validation dataset [56].
Not only does this independent data test the validity of the models, but also the assumption of stationarity that cannot be tested through cross validation. That is to say, the assumption that any relationships derived from the observed data hold true and are consistent for areas/ periods beyond the range of observed data. This assumption is crucial if derived models are to be used for any early warning system.
The validation was first conducted using all the 2018 to 2019 season case locations (n = 84 cells at a 1 km resolution, n = 68 cells at a 2 km resolution). Then, to test whether the models could predict new geographic foci of human cases, we conducted the validation separately and compared results for: (i) locations from the newly affected sub-district of Sagara (61% of 2018 to 2019 season case locations, spanning 37 cells at a 1 km resolution and 25 cells at a 2 km resolution); and (ii) locations from the sub-district of Tirthahalli (36% of 2018 to 2019 season case locations, spanning 40 cells at a 1 km resolution and 36 cells at a 2 km resolution), which recorded 95% of the past cases used to parameterise the model.

Environmental predictors of the recent past distribution of Kyasanur Forest Disease
Models combining topography, landscape metrics, hosts, and public health predictors predicted recent patterns in KFD with a high degree of accuracy. Values of AUC in cross-validation ranged from 0.85 to 0.90 (Tables 3 & 4). Models had a similarly high accuracy whether area of forest loss was included or not.
At a 1 km resolution, predictors which were consistently most important in predicting presence of KFD are the area of moist evergreen forest, the diversity of forest types followed by the density of indigenous cattle and the area of dry deciduous forest (Table 5, left hand columns). Elevation and the area of plantation were also often ranked highly in importance amongst predictors. The geographical variability in these key predictors in relation to the 2014 to 2018 distribution of cases is depicted in Figs 2 and 3.
The response plots in Fig 4 indicate that KFD outbreaks are more likely to occur when the proportional area of moist evergreen forest is intermediate or high (>15% of the area), when the proportion of dry deciduous forest is low (<10% of the area), when the amount of forest loss pixels is higher (> 10% of the area), when indigenous cattle densities are higher, when forest diversity is higher (>1.0) and the proportional area of plantation is higher. Figs 2 and 3 illustrate how the cases between 2014 and 2018 were indeed concentrated in 1 km squares with a diverse mosaic of forest types, namely moist evergreen and plantation that are likely to have undergone rapid change. Once area of forest loss was added to the model, it ranked second in importance to the area of moist evergreen forest but the subsequent order of importance of the other landscape predictors listed above remained the same as for models without forest loss ( Table 5, right hand columns). The addition of area of forest loss to the models at 1 km does not increase the accuracy in cross-validation over models without area of forest loss.
The results at a 2 km resolution were similar in that the area of moist evergreen forest again far outranked the other predictors with a relative importance of around 20% and modal ranked importance of 1 ( Table 6). The area of dry deciduous forest and area of plantation were next in importance followed by forest diversity and elevation. Once area of forest loss was added to models at a 2 km resolution, it again ranked second in importance to the area of moist evergreen forest but the subsequent order of importance of the other landscape predictors listed above remained the same as for models without forest loss ( Table 6, right hand columns). Again, the addition of forest loss to the models at 2 km did not increase the accuracy in cross-validation over models without forest loss.
Response plots (see S5 File) indicate that KFD outbreaks are more likely to occur when the proportional area of moist evergreen forest is intermediate or high (>20% of the area), the proportion of dry deciduous forest is low (<10% of the area), the amount of forest loss is higher (> 10% of the area), elevation exceeds 650-675 m.a.s.l., forest diversity is higher (>1.0) and proportional area of plantation is higher.
There was considerable variation in the rank importance of variables across model runs for all model sets (see rank columns in Tables 5 and 6) which is to be expected given the low sample size that results from the geographical thinning. In all model sets, the public health predictors (proximity to health centres, number of medics per head of population) and human population size had consistently low importance.
In tests for spatial autocorrelation in model residuals, significant but low magnitude (Moran's I < = 0.2) was found at distances between 1 and 15 km but this was not consistent across model runs. Inference from the BRT models is more robust to small spatial correlation due to the flexibility of the modelling approach, the robustness to outliers and the use of importance metrics rather than significance testing. Therefore we expected this remaining small spatial autocorrelation to have a negligible impact on the above inferred role of environmental predictors and estimates of model accuracy (S6 File).

Predicted distribution of Kyasanur forest disease
Geographical patterns in areas predicted to have a high probability of KFD presence are similar between 1 km and 2 km grid cell resolutions (S7 File). There was also a high degree of correlation between mean predicted probability of presence layers between models with and without forest loss (Pearson's r = 0.974, p < 0.005 at 1 km).
Since the area predicted to be suitable for KFD varies between model runs, it is useful to look at the orange and red cells in the bottom panel of the prediction figure, which are cells in which KFD is predicted to be present in more than half of model runs. Suitable habitat for KFD is predicted to occur across a wide area in the south and south east of the district, extending between 5 km and 10 km around current presence locations (see Fig 5 for models at 1 km without forest loss, S7 File for all other model sets). Pockets of suitable habitat for KFD are predicted to occur also to the south and north of the large water-bodies (white cells) in the east of the district and in sporadic isolated locations in the north. The absence of KFD in agricultural areas along the northeastern band of Shivamogga (blue areas in all panels) and the southwestern coast is widely predicted across the models.

Validating predictive models in an outbreak situation
The Boyce Index values i.e. the correlation between the predicted to expected ratio and the predicted probability of presence were uniformly high (r > 0.8) when all case locations in Shivamogga district or only those in the long-affected Tirthahalli taluka were considered in validation. For these areas, the Boyce Index was slightly higher for 1 km models than 2 km models and when area of forest loss was excluded from models (Tables S8A & S8B in S8 File) suggesting that the predictions from these models are most consistent with the distributions of the presences in the independent validation datasets. This result was mirrored in the graphs of predicted versus expected ratios (S8 File) which were closest to a monotonic increase for the 1 km models without forest loss. When the model predictions for Tirthahalli are overlaid with the locations of cases from the 2018-2019 season (red circles, Fig 6), it can be seen that all but 2 of around 40 cases occur in yellow to red areas of medium to high predicted probability of presence. Case locations in the newly affected Sagara taluka were less well captured by the models than case locations in the long-affected Tirthahalli taluka. Boyce Index values exceeded 0.8 for most model sets but were consistently lower for the Sagara data than the Tirthahalli data, (S8 File) and the graphs of predicted versus expected ratios were further from a monotonic increase (S8 File). The 2 km models excluding forest loss were most consistent with the distributions of the presences in the Sagara data. Though the major foci of human cases in central Sagara were captured by model predictions, several isolated case locations in the north, east and west occurred in grid cells predicted to have a low probability of presence of KFD (Fig  7).
Forest loss locations from the global dataset [46] did not correspond well with Landsatderived estimates of current land use for Shivamogga (S3 File). Models including the area of forest loss per grid cell as a predictor had equivalent performance to models excluding area of forest loss in cross-validation, and generally performed worse in validation with the independent outbreak data (S8 File).

Discussion
This study advances understanding of the landscape determinants of human cases of a zoonotic disease by considering wide-ranging topographical, host, land use factors and public health constraints. Human cases of Kyasanur Forest Disease tend to be highly localized, with epidemics lasting three to five years in a particular 30-40km 2 area, in which the disease shifts each season to affect a new handful of villages [28]. As seen in other tick-borne diseases [57], such highly focal patterns in human epidemics results from the interplay of environmental and ecological dynamics underpinning hazard and the human activities that govern exposure across the landscape [14,19].
Our models indicated that focal patterns in human KFD cases can be predicted with a high degree of accuracy from combined metrics of area and diversity of different forest types, plantation and cattle, as well as elevation. Overall, landscapes that were at high risk of occurrence of human KFD cases were characterised by diverse forest-plantation mosaics, containing large amounts of moist evergreen forest and plantation and low amounts of dry deciduous forests, with high indigenous cattle densities, occurring at elevations over 650 m a.s.l. These findings are consistent with prior suggestions that KFD is an "ecotonal" disease [8] and that creation of such habitat mosaics, when forest is removed for paddy cultivation and plantations, precipitates emergence of KFD in humans [25,26]. The models presented here quantify and map "risky" habitat explicitly. The models predicted human case locations with high accuracy in the sub-district from which the input data from the prior four years had been drawn. They also predicted large clusters of human cases in the newly affected sub-district over 30 km away. Their failure to predict some isolated case locations in the newly affected area, suggests that the models, parameterised with four years data from an epidemic in one sub-district, may be under-estimating the "environmental niche" of human spill-over of KFD cases to some degree. Ideally, such correlative, pattern-matching models should be updated annually, preferably prior to the transmission season, as cases arise in new areas. Currently, KFD management activities of vaccination, surveillance and awareness raising are conducted within a 5 km to 10 km radius of prior known cases (from the past 4 years). Given that vaccine resources are often constrained during the outbreak season and the uncertainty in our model predictions (depicted in Fig 5), we would not recommend that Health Departments rely on the resulting maps for spatial targeting of vaccine doses. Instead, pre-season tick surveillance activities of the Health Department and awareness raising in Primary Health Centres (early diagnosis of KFD) could be targeted towards areas of high relative predicted suitability of KFD that fall outside the 5 km radius of known cases. Thus would increase the likelihood that new foci of transmission are detected rapidly before or very early in the season, triggering rapid responses including vaccination.
Since model accuracy did not increase substantially when the area of forest loss was added to the models, this may indicate that the composition of forest and plantation types in the diverse mosaic around a settlement is of greater importance in determining KFD risk than a particular level of deforestation per se. This concurs with our a priori expectations that human contact rates across landscapes of different composition will depend on how human activities and the dynamics and infection rates of key vectors, wildlife and domestic reservoirs are linked to different types of forest and cultivation. However, accurate quantification of the level of forest loss that can lead to emergence of KFD in humans was hampered by the poor correspondence between global spatial data on forest loss and gain [44] and the local patterns in forest types shown on the Land Use Land Cover map. The strong correlation between areas of forest gain and areas of forest loss in the global dataset for our study area suggests that the forest loss metric is more of an indicator of 'intensification', where forest is converted into plantation, rather than permanently lost (to make way for crops or built up areas). Local-scale understanding of these loss and intensification processes can be improved through time series analysis of Landsat Imagery (currently underway in the MonkeyFeverRisk project).
The finding that human cases of KFD are more likely to occur in landscapes with greater coverage of plantations is consistent with suggestions that large KFD epidemics in the 1970s and 1980s were linked to international development projects that replaced evergreen forest with cashew plantations [26]. It is also consistent with observations that migrant agricultural labourers in plantations have been widely affected in recent epidemics in Maharashtra and Goa States [27,58]. There are numerous potential explanations for the finding that human cases or KFD are more likely in landscapes with higher cattle densities. Cattle are thought to amplify local densities of ticks and bring ticks into households (either directly or through dry leaves used as fodder) whilst cattle grazing may bring small-holders into contact with tick habitat in forests, paddies and plantation.
Since our models are developed with disease data at household level, and scaled to the daily movement of local communities of 1-2 km through the landscape, impacts of micro-scale factors such as small water-bodies (<10 km) on disease patterns may not be detectable. Obtaining finer scale data on how locations of exposure to relate to such micro-scale factors is difficult because people are often unable to pinpoint where on their daily routes infected ticks were picked up and close monitoring of burdens by researchers may hamper livelihood activities.
The socio-ecological mechanisms linking human cases of KFD and landscape factors can be understood only through joint study across fragmented forest landscapes of the intersecting (1) human activities and priorities of forest users that underpin exposure and (2) vector and host population dynamics and infection rates that underpin hazard. Such empirical, inter-disciplinary, landscape scale studies for KFD are currently underway within the MonkeyFever-Risk project.
Prior studies have already highlighted several advantages of combining participatory methods and traditional models for understanding and predicting zoonotic diseases [59,60]. These include an improved understanding of contextual risk factors (including social, cultural, political and economic dimensions), both by corroborating a priori knowledge and highlighting hitherto unknown factors, and allowing models to be parameterised with realistic data at field level rather than with aggregated data (that is more often available from Public and Animal Health Systems). Here we highlight the (additional) benefits from linking participatory methods and models within a co-production process. Firstly, throughout the framing and knowledge integration stages, cross-sectoral stakeholders contribute valuable hypotheses about key risk factors and the policy landscape for zoonotic diseases that may have been over-looked in the scientific literature and can be tested within models (Table 1). For example, a key next step is to integrate remote sensing of seasonal dynamics of rice paddies into the MonkeyFeverRisk predictive model framework and field surveys since local disease managers highlighted these landscape features (and harvest time) as key potential nodes of interaction between people, wildlife hosts and tick vectors for KFD. Several risk factors highlighted by stakeholders, including poor data management, low vaccine uptake and under-reporting of monkey deaths, could not be integrated into models, either because they are not measured or because they are poorly linked to available geographical proxies. Awareness of these risk factors still informs treatment of input data, interpretation of model outputs and priorities and partnerships for future collection of both health and environmental data. For example, spatial information on vaccine coverage is now being collected by Primary Health Centres and shared with District Health Officials whilst the Indian Government National Biodiversity Mission aims to address some gaps in data on alternative hosts and vectors for priority zoonotic diseases.
A second advantage of co-production in this context is that the spatial grain of models for zoonotic diseases was tailored to the scale at which forests are used, and to the size and distance between settlements, that govern how forest communities access health services and report cases of KFD. Thus co-production helped to bridge the traditional gap between the scale of model outputs and the scale of intervention and ecosystem use [17] and enabled finer scale relationships to be established between the environment and human disease cases (householdlevel). Many predictive models in the scientific literature that map geographical patterns in infectious diseases draw data from online databases such as ProMed and HealthMap which can often record only the locations of hospital or Primary health centres where cases are reported [40]. By working with disease managers to collate and interpret case data, it becomes clear that visits to health centres can occur far away from the settlements or forest habitat in which infection is acquired (e.g. for KFD may range from 5-10 km for small-holders but some 10s of km for tourists, students travelling between home and college, pilgrims and migrant labourers). If hospital or health centre data are used indiscriminately to parameterise diseaseenvironment relationships, the resulting associations and predictive maps could be spurious and unsuitable for spatial targeting of interventions. Though these data drawbacks have long been noted for tick-borne diseases [61], probably due to the time required to collate fine scale data and engage with local disease managers, predictive maps are still being developed based solely on health centre locations and advocated as disease management tools, including for KFD [62]. We concur with Boden and McKendrick [63], who argued that all models supplied to health policy makers should adhere strictly to the principles of independence, transparency (autonomy), beneficence and justice. It is incumbent on disease modellers to appraise and be transparent about the suitability of available case data for relating and predicting infection processes from environmental conditions. The iterative, reflexive engagement with stakeholders' needs and knowledge, through a co-production process, can facilitate this transparency.
Finally, working directly with disease managers to interpret and validate model outputs means that the scale, appearance and explanation of resulting maps and guidance can be better tailored to their needs, increasingly the likelihood of uptake for spatial targeting of interventions. For example, cross-sectoral stakeholders in the MonkeyFeverRisk project indicated that predictive maps at scales from village level up to clusters of villages, with contextual landscape features like roads and household locations, would be most helpful for planning of vaccination, surveillance and awareness campaigns [37]. As a consequence, preliminary risk layers that could be visualised within Google Earth were supplied to the Shivamogga Health Department for validation and experimentation.
Our approach of using co-production to guide production of risk maps that integrate hazard and exposure factors influencing human disease, harnessing a broad range of stakeholder knowledge and expertise across sectors, represents an important step forward in managing zoonotic disease in LMICs. This approach is applicable across wide ranging individual and interacting zoonotic diseases affecting dependent communities in different ecosystems. It will be imperative to develop context-dependent co-production processes that account for the cultural and political dimensions that affect exposure through ecosystem use, alongside local environmental and ecological factors that determine hazard, and underpin success of inter-sectoral One Health collaboration.