Skip to main content
  • Loading metrics

Quantifying geographic accessibility to improve efficiency of entomological monitoring

  • Joshua Longbottom ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Validation, Visualization, Writing – original draft

    Affiliations Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom, Centre for Health Informatics, Computing and Statistics, Lancaster Medical School, Lancaster University, Lancaster, United Kingdom

  • Ana Krause,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom

  • Stephen J. Torr,

    Roles Conceptualization, Funding acquisition, Investigation, Supervision, Writing – review & editing

    Affiliation Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom

  • Michelle C. Stanton

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Supervision, Writing – review & editing

    Affiliations Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom, Centre for Health Informatics, Computing and Statistics, Lancaster Medical School, Lancaster University, Lancaster, United Kingdom



Vector-borne diseases are important causes of mortality and morbidity in humans and livestock, particularly for poorer communities and countries in the tropics. Large-scale programs against these diseases, for example malaria, dengue and African trypanosomiasis, include vector control, and assessing the impact of this intervention requires frequent and extensive monitoring of disease vector abundance. Such monitoring can be expensive, especially in the later stages of a successful program where numbers of vectors and cases are low.

Methodology/Principal findings

We developed a system that allows the identification of monitoring sites where pre-intervention densities of vectors are predicted to be high, and travel cost to sites is low, highlighting the most efficient locations for longitudinal monitoring. Using remotely sensed imagery and an image classification algorithm, we mapped landscape resistance associated with on- and off-road travel for every gridded location (3m and 0.5m grid cells) within Koboko district, Uganda. We combine the accessibility surface with pre-existing estimates of tsetse abundance and propose a stratified sampling approach to determine the most efficient locations for longitudinal data collection. Our modelled predictions were validated against empirical measurements of travel-time and existing maps of road networks. We applied this approach in northern Uganda where a large-scale vector control program is being implemented to control human African trypanosomiasis, a neglected tropical disease (NTD) caused by trypanosomes transmitted by tsetse flies. Our accessibility surfaces indicate a high performance when compared to empirical data, with remote sensing identifying a further ~70% of roads than existing networks.


By integrating such estimates with predictions of tsetse abundance, we propose a methodology to determine the optimal placement of sentinel monitoring sites for evaluating control programme efficacy, moving from a nuanced, ad-hoc approach incorporating intuition, knowledge of vector ecology and local knowledge of geographic accessibility, to a reproducible, quantifiable one.

Author summary

Assessing the impact of vector control programmes requires longitudinal measurements of the abundance of insect vectors within intervention areas. Such monitoring can be expensive, especially in the later stages of a successful program where numbers of vectors and cases of disease are low. Efficient monitoring involves a prior selection of monitoring sites that are easy to reach and produce rich information on vector abundance. Here, we used image classification and cost-distance algorithms to produce estimates of accessibility within Koboko district, Uganda, where vector control is contributing to the elimination of sleeping sickness, a neglected tropical disease (NTD). We combine an accessibility surface with pre-existing estimates of tsetse abundance and propose a stratified sampling approach to determine locations which are associated with low cost (lowest travel time) and potential for longitudinal data collection (high pre-intervention abundance). Our method could be adapted for use in the planning and monitoring of tsetse- and other vector-control programmes. By providing methods to ensure that vector control programmes operate at maximum efficiency, we can ensure that the limited funding associated with some of these NTDs has the largest impact.


Vector-borne diseases (VBDs) are important causes of mortality and morbidity in humans and livestock, particularly for poorer communities and countries in the tropics, accounting for an estimated 17% of the global burden of all infectious diseases [1]. The control of VBDs, or their elimination as a public health problem, is dependent upon effective vector management, which includes pre-intervention surveys and subsequent longitudinal monitoring of vector abundance to assess the effectiveness of an intervention. Such monitoring is an important component of the overall costs of control.

To improve the efficiency of vector control programs, there is a requirement to identify optimal locations for longitudinal monitoring site placement. Ideally, these sites should be in locations that maximise information on the distribution and density of vectors while minimising costs of obtaining these data. In practice, most vector surveillance is opportunistic and lacks a rigorous framework [2]. A more rational method would involve combining information on vector abundance with estimates of geographical accessibility, to identify sites across operational areas where pre-intervention catches are high and sampling costs are low. Towards this goal, we examined the utility of remotely sensed (RS) data to produce contemporary estimates of geographic accessibility to entomological sampling sites, using sleeping sickness control as an example application.

Sleeping sickness control as an example application

Human African trypanosomiasis (HAT) is a neglected tropical disease (NTD) affecting remote areas of sub-Saharan Africa. The disease, also termed ‘sleeping sickness’, is caused by the protozoan parasite Trypanosoma brucei with two sub-species, T.b.gambiense and T.b.rhodesiense, causing Gambian (gHAT) and Rhodesian (rHAT) human African trypanosomiasis respectively. The burden of the Gambian form of the disease, for which humans are the main hosts, is >10 times that of the Rhodesian form, with annual reported cases being in the region of 2–3,000 [3]. The World Health Organization (WHO) has targeted the elimination of gHAT as a “public health problem” by 2020, which is defined as a 90% reduction in areas reporting >1 case in 10 000 compared to 2000–2004, and <2000 annually reported cases globally [4]. Several countries appear to be on track to achieve this target [5]. Uganda is unique in that it is the only country where both gHAT and rHAT occur, albeit within different local level zones [6, 7]. Vector control forms an important part of Uganda’s efforts against both forms of HAT [8, 9].

The important vectors of gHAT are Palpalis-group species of tsetse, which concentrate in riverine vegetation where, consequently, interventions are focused. In Uganda, tsetse control is being achieved through the deployment of Tiny Targets, small (20 x 50 cm) panels of insecticide-treated material which are deployed at 50-100m intervals along rivers [9, 10]. Prior work produced estimates of tsetse abundance across Northern Uganda, identifying locations of high pre-intervention abundance [11], which has informed the identification of operational control areas.

Methods to quantify accessibility largely involve cost-distance analyses, which have been widely used within the field of public health in analyses mapping accessibility to healthcare [1215]. Such analyses require an input surface of landscape friction (‘resistance’)–estimates of associated travel cost for gridded cells within a Cartesian plane. The cost-distance analysis identifies the cumulative cost of traversing each cell based on the given resistance surface and an origin location–opting to traverse through cells associated with the lowest resistance values. The use of accessibility mapping in the planning and implementation of control programmes for vector-borne disease is novel and has the potential to improve the efficiency of monitoring VBD interventions.

In this paper, we use remotely sensed (RS) satellite data to derive a contemporary road network within Koboko district, Northern Uganda, where an existing tsetse control programme is in operation. To obtain a road network within this district, we compare the utility of RS data at two differing spatial resolutions (one source characterising locations within the district as 3 × 3m grid cells on a Cartesian plane, and another as 0.5 × 0.5m grid cells) [16, 17], and an existing open source dataset detailing road locations [18]. Image classification algorithms, specifically maximum likelihood estimators were used to detect dirt and tarmac roads within the RS imagery [19]. Ground truth tracking (GPS) data detailing motorbike speeds along roads within the district were used to assign on-road travel costs to each grid cell. We used published estimates of time taken to traverse through different densities of vegetation to assign resistance values to off-road grid cells [20, 21]. Resistance surfaces were validated using withheld ground-truth tracking data, comparing observed and predicted travel times within a linear regression. The resulting resistance surfaces were used within a least-cost path algorithm to identify cumulative costs to locations of high tsetse abundance [11]. We apply a stratified sampling approach to determine locations which are associated with low cost (lowest travel time) and potential for rich longitudinal data collection (high pre-intervention abundance).

Here, by combining field data on travel time along varying road types and remotely sensed imagery, we describe the process of producing a high-resolution accessibility surface. By integrating such estimates with predictions of tsetse abundance, we propose a methodology to determine the optimal placement of sentinel monitoring sites for evaluating the efficacy of a tsetse control programme, moving from a nuanced, ad-hoc approach incorporating intuition, knowledge of vector ecology and local knowledge of geographic accessibility to a reproducible, quantifiable one. The work described here is presented in the context of tsetse control, but the methods used are applicable to a wide range of vector-borne diseases.

Materials and methods

Study area

The focal area of this study was Koboko District, located within the West Nile Region of Uganda. The West Nile region consists of eight districts, with current and planned intervention initiatives (i.e. the Tiny Target programme), operating in seven. Koboko district covers roughly 860km2 and has a population of 229,200 people [22]. Between 2000 and 2018, 14.6% (620/4235) of gHAT cases reported from Uganda occurred in Koboko, but the incidence of gHAT is in decline as a consequence of an integrated programme of screening and treatment of the human population and, more recently, vector control [23]. A map showing the location of existing, and planned intervention areas within West Nile Region is provided as S1 Fig, highlighting the position of Koboko within these intervention districts.

Field methodology and data collection

To obtain data informing variation in speeds along road class, technicians making routine visits to traps within Koboko were provided with GPS devices. The recording of GPS tracks was performed during three time periods in the dry season: May-June 2017, February-April 2018, and December 2018-January 2019. Trap attendants within Koboko operate using motorbikes; therefore, observed speeds were representative of motorbike-based travel. Devices were configured to record track points at ~15-second intervals.

Obtaining remotely sensed satellite data

To compare the effect of different spatial resolutions of satellite data on the ability to identify roads, we used two differing sources of remotely sensed imagery. Imagery obtained from PlanetScope satellites, captured on February 12th, 2018 were utilised. PlanetScope imagery is provided at a 3m × 3m resolution, and includes the following four spectral bands: blue (455–515 nm), green (500–590 nm), red (455–515 nm), and near infrared (780–860 nm) [16, 24]. PlanetScope data are freely accessibly through an education and research program account.

Data captured through the Pléiades-1A satellite, available at a 0.5m × 0.5m resolution and captured on 27th December 2016 were used to represent high-spatial resolution imagery [25]. Imagery captured on this date was the most contemporary data available. The Pléiades-1A imagery similarly consists of the same four spectral bands as PlanetScope. Data obtained by Pléiades-1A is available by request through Airbus (previously known as the European Aeronautic Defence and Space Company) [17].

GPS data review and cleaning

To calculate travel speeds, the time-difference between subsequent points within a track and the Euclidean distance between these points were used within the following formula (Eq 1):

Where xi represents the GPS coordinate of point i, ti represents the time recorded for point i and ||∙|| represents the Euclidean distance: (Eq 1)

Recorded points with a speed <1km/hr were assumed to be stationary points (based on average walking speeds [26]), and were removed from the track dataset. Similarly, we removed data points for which the speed exceeded 150 km/hr (93.2 mph) as these were likely to be artefacts created due to errors with location positioning and are not representative of true travel speed.

Open street map validation

To determine the accuracy of currently available open source data, OpenStreetMap (OSM) geolocated roads, and roads visible within 0.5m and 3m satellite data were compared. Shapefiles detailing mapped roads hosted by OSM were retrieved from Geofabrik OSM Data Extracts on March 3rd, 2018, to align with the dates during which field-obtained tracking data were collected [18]. A 1 km × 1km fishnet constructed for Koboko district was used to produce a random sample of 25 grid squares for manual digitisation. The digitisation process consisted of tracing over visible roads and tracks, as seen in the 0.5m resolution imagery (metric one), or as seen in the 3m resolution imagery (metric two). The length of digitized road obtained from each of the three sources was calculated in metres.

Remote sensing image preparation

In total, 14 scenes covering an area of 745.8 km2 were downloaded from To produce one complete surface, overlapping scenes were merged using ArcGIS (version 10.4), and the composite image was cropped to district boundaries. Imagery obtained from Pléiades-1A (0.5m) were provided as a pre-prepared mosaic.

Image classification

To aid image classification, image segmentation utilising a mean-shift approach was first performed within ArcGIS (version 10.4). Mean-shift segmentation is a process that identifies segments in imagery by grouping adjacent pixels that have similar spectral characteristics; a detailed introduction and theory related to mean-shift segmentation algorithms can be found within Demirović 2019 [27]. We utilised the “Segment Mean Shift” tool within the “Spatial Analyst Toolbox” in ArcGIS, with the following default parameters: spectral detail = 15.5, spatial detail = 15, minimum segment size (in pixels) = 20. Following mean-shift segmentation, we applied a maximum likelihood (ML) classification algorithm using an equal a priori probability weighting to identify the class in which each cell had the highest probability of being a member. The ML classification algorithm considers both the variances and covariances of pixels assigned to ‘classes’ (groups of pixels relating to a specific type of land-cover, in this instance), selected within a signature training file [19]. Under the assumption that the distribution of a class sample is normal, each class was characterized by the mean vector and the covariance matrix. Given these characteristics, for each cell value within the remotely sensed imagery, the statistical probability of a cell belonging to each class is calculated and an appropriate classification is assigned [19]. We opted to use the following classes within this analysis: dirt road and/or track, tarmac road, dense vegetation (for example: woodlands, forest, bushwood and shrubwood), grassland (for example: grassland, meadow, steppe and savannah) and barren land. Signature files for use in the ML classification were produced by manually tracing and assigning pixels within the remotely sensed imagery to one of the five classes described above. Classification was performed using the “Train Maximum Likelihood Classifier” tool within the “Spatial Analyst Toolbox” in ArcGIS. To account for “salt and pepper” speckling effects representative of potentially misclassified and/or isolated cells, we performed post-classification processing. This processing stage included filtering to remove isolated cells [28], smoothing to smooth rugged class boundaries [29], and generalizing to reclassify small regions of isolated cells [30]. Post-classification cleaning was performed in ArcGIS.

Classification validation

A total of 500 accuracy assessment points were randomly generated for each classified surface (i.e. 3m × 3m and 0.5m × 0.5m imagery). A step-by-step comparison was then made for each randomly selected point, noting the algorithm-derived class and the manually assigned (ground-truth) class. Utilising this information, a confusion matrix was constructed for each image source. Accuracy was calculated with respect to both omission and commission rates, where omission refers to instances where a feature (point) is omitted from the evaluated category, and commission refers to instances where a feature is incorrectly assigned to the category being evaluated.

Road network update

Using the outputs from the image classification process, the GPS tracking data, and available OSM data, two contemporary road networks (one per remotely sensed data source) were produced. Cleaned, field-obtained tracking points were used to inform estimates of average travel speeds along selected roads as follows. Tracking points were converted to polylines, consisting of line segments constructed from five trailing points. These segments were assigned a mean observed speed by calculating the Euclidean distance of each segment and incorporating start and end times. These segments were then rasterised, resulting cells were stacked, and overlapping cells resulting from replicate trips across all tracking days were averaged. This produced a surface indicating the average observed speed for each cell. Tracks obtained during December 2018 were withheld from this network and were used for validation (see below). A surface detailing urban and rural locations [31] was used to categorise roads as being within urban or rural areas. This classification was paired with data from the Ugandan Traffic and Road Safety Act, detailing maximum speed limits based on roads within urban/built-up areas and rural areas. Characterising roads by these features imply a legal maximum speed for each road representative of true travel speeds. Classified urban and classified rural cells were assigned the speeds given in Table 1, as informed by the official Traffic and Road Safety Act 2004 [32] and the Highway code [33].

Table 1. Assigned travel speeds to roads lacking ground-obtained tracking data.

Normalized Difference Vegetation Index analysis

As the majority of mapped roads do not lead directly to a river or tributary, trap attendants are required to traverse off-road in order to reach suitable habitats for trap placement. We therefore aimed to characterise the cost associated with off-road travel within our analysis. Utilising the two differing imagery sources, two separate NDVI surfaces were generated (Eq 2). During the NDVI calculation, output values were normalised to range between -1.0 and 1.0, representing greenness. Generally, output NDVI values ≤0 represent waterbodies including lakes and major rivers; values between 0.1 and 0.2 represent barren land, including areas of rock, sand, or snow; values between 0.2 and 0.3 represent shrub and grassland (areas of moderate vegetation), and values between 0.3 and 0.8 represent areas of dense vegetation (for example temperate and tropical rainforest) [34, 35].

Where NIR represents the near infrared band, and R represents the red band within the RS imagery: (Eq 2)

Assigning off-road resistance values

Resistance values are values associated with a specific cost to traverse through a cell (time, in seconds). For this study, off-road resistance values were assigned utilising the NDVI outputs, with cost values ranging based on indicative terrain. Locations which contain dense vegetation are generally slower to navigate and therefore cells representative of these areas were associated with a higher resistance value; conversely, cells which represent areas with little to no vegetation were presumed to be easier to traverse and were assigned a lower resistance value. Average off-road walking speeds for differing terrains were obtained from published literature [20, 21] (Table 2).

Table 2. Resistance values (cell crossing time) associated with off-road travel.

Resistance surface and cost-distance analysis

The updated road networks, featuring a cell crossing time based on assigned speeds (representative of on-road resistance), were combined with their respective off-road resistance surface. To validate the generated surfaces, we used field-obtained tracking data (obtained December 2018) withheld from the road network construction. Sixty-three segments along the withheld tracks were used to create validation points. Using the resistance surface, the travel time from the start to the end point of each segment was generated utilising a least-cost path algorithm within QGIS 3.4.4 [36], plugin “Least-Cost Path” (produced by FlowMap Group [37]). The specific algorithm implemented is referred to as Dijkstra’s algorithm, and is an approach utilising graph theory to identify the shortest path between two nodes; the algorithm is described in detail in Dijkstra 1959 [38]. A linear regression model was then fitted to the observed travel time data with predicted travel time being included as the only covariate to quantify the relationship between the two measures. The ability of the predicted travel time to each validation point to accurately predict the observed travel time was used to detect an association between the two, and to provide a means of adjusting the generated surface values if necessary. The accuracy of each resistance surface was defined by the coefficient p-values, and by root-mean-square error (RMSE). Utilising these resistance surfaces, two separate cost-distance analyses were performed (one per spatial resolution), each using the location of our district entomologist’s base as the origin. The cost-distance analysis again implemented Dijkstra’s algorithm, calculating the cumulative cost of travel from the origin to each grid cell in the resistance surface.

Identifying optimal sentinel site placement

We performed a spatially stratified sampling approach to aid the identification of 104 least-cost, high abundance locations per 25km2 for sentinel site placement. Firstly, we produced a fishnet consisting of 5 km × 5 km grid squares across Koboko district, and assigned each grid square a sequential stratum identification number (see S2 Fig for strata distribution). For each strata within the proposed intervention area, we ranked each cell by their predicted tsetse abundance values [11], and by their predicted travel time from the origin–as obtained from the cost-distance output. To account for spatial clustering, and to ensure a more even spatial distribution of sentinel sites, we retained the cell with the highest predicted abundance and lowest associated cost per 50m × 50m area. We calculated the cumulative rank for each cell within the de-clustered dataset, where predicted abundance values were ranked from high to low, and accessibility values ranked from low to high. We retained two locations (paired sites) with the lowest cumulative rank per sampled strata, with these locations being identified as the optimal placement for sentinel monitoring sites.

Utilising the travelling salesperson problem (TSP) to identify the optimal route

Once the optimal location of monitoring sites was identified, we applied the travelling salesperson problem (TSP) to identify the most efficient order in which to visit each site. The TSP is an optimisation problem in which the following question is addressed: “Given a list of cities and distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?”[39]. We adapt this problem to answer “Given a list of monitoring locations and travel times between each pair of locations, what is the shortest possible route that visits each monitoring site and returns to the origin location?”. We solve this through the implementation of “Concorde’s algorithm” [39], through the TSP package in R [40]. First, the 3 × 3m friction surface was converted into a transition matrix through use of the “transition” function in the gdistance R package [41]. Second, the pairwise distances between each site was calculated to produce a distance matrix, through use of the “costDistance” function in the gdistance package. We then implemented the TSP using the function “TSP” and the distance matrix, and solved the TSP with “solve_TSP”; both functions are from the TSP R package. By following the route identified by solving the TSP, and incorporating 30-minute stays at each pair of sites to deploy traps and/or collect samples, we group sites into ‘clusters’ which are feasible to visit within a 5-hour sampling day.


GPS data collection

To inform estimates of on-road travel cost for each 3m × 3m and 0.5m × 0.5m cell within Koboko district, Northern Uganda, we obtained tracking data during three periods: May-June 2017, February-April 2018, and December 2018-January 2019. Tracks collected between May 2017—April 2018 were used to inform road speeds, and tracks collected between December 2018-January 2019 were withheld for validating the resistance surfaces (S3 Fig).

OpenStreetMap accuracy assessment

Analyses evaluating the accuracy of an existing, community-driven, open-source road network (from OpenStreetMap), indicate that at least one road exists within the OpenStreetMap (OSM) dataset for 17 out of 25 randomly sampled 1km2 grid squares across Koboko district (mean road length = 1.97 km). Only one out of 25 grid squares contained no visible roads across sources (i.e. 0.5m imagery, 3m imagery, and OSM). When comparing total road length visible in 3 × 3m imagery with that charted by OSM, the two sources show close agreement (97.43% similarity [total road length across 25km2], paired t-Test p = 0.91), however, when comparing the 0.5 × 0.5m imagery and the OSM dataset, only 28.16% of digitised roads are charted by OSM (paired t-Test p < 0.001, Fig 1, S1 Table, S4 Fig). This section of the analysis provided the rationale for the classification of 0.5m imagery, with the inclusion potentially capturing up to 71% more roads than OSM within the study area.

Fig 1. Example of composite images of digitised road networks within Koboko district.

Purple roads represent roads visible in 0.5m imagery [17], as digitised in this study; black roads represent roads visible in 3m imagery [24], as digitised in this study, and light blue roads represent roads available within the OSM dataset [18]. The overlap of all three colours indicate areas of consistency across sources.

Image classification

Classification of two differing sources of remotely sensed imagery (0.5 × 0.5m and 3 × 3m) yielded varying accuracies across classes, and across spatial resolutions, with accuracy values ranging from 38% to 89% for dirt roads and 5% to 84% for tarmac roads for 3m and 0.5m imagery respectively (Table 3; Fig 2). Overall image classification accuracy, considering all five classes utilised (dirt road and/or track, tarmac road, dense vegetation, grassland and barren land), ranged from 53% (3m) to 78% (0.5m), with 0.5m imagery proving to be more effective at identifying both dirt and tarmac roads than the 3m imagery.

Fig 2.

Confusion matrices for the classification of each surface (Left: 3m, Right: 0.5m). Diagonal squares (bottom left to top right) indicate the percentage of correctly classified cells per class.

Table 3. Maximum likelihood classification (MLC) accuracy assessment validation values for each class.

Values represent the percentage of correctly classified cells (classified vs ground truth) for the five classes of interest.

Resistance surface and cost-distance analysis

The accuracy of the resistance surfaces was assessed by investigating the relationship between observed travel times and predicted travel times using withheld field-obtained GPS tracks and a linear regression. Predicted values produced utilising the 3m resistance surface have a much closer alignment with ground truth (observed) values, root-mean-square error (RMSE) = 3.93 (3m) than the 0.5m resistance surface (RMSE = 6.01). In separate regressions with validation data from both surfaces, we identify that there is a significant association between observed and predicted values (p<0.001 (0.5m) and p<0.001 (3m)), indicating a high performance of each surface, with the 3m surface showing a stronger relationship with less variability (R2 = 0.66 vs R2 = 0.49, 3m and 0.5m respectively). Summaries of resistance surface validation are provided within S5 Fig and Table 4. Output cost-distance surfaces detailing the travel time from the location of our field station to each gridded cell within Koboko district are provided as Fig 3.

Fig 3. Cost-distance surfaces.

Figures show the cumulative travel time from the field site origin (black point), to each subsequent cell within the surface. Left: 3m cost-distance surface, Right: 0.5m cost-distance surface. This figure was generated using ArcGIS version 10.4 [42].

Table 4. Model summaries for resistance surface validation.

Summary statistics from four separate linear regressions are provided.

Identification of optimal sentinel site placement

Utilising the 3m cost-distance surface and a predictive surface of tsetse abundance [11], we identified the optimal placement of 104 sentinel sites within the current intervention area (52 paired locations) (Fig 4). Such sites are positioned within the most easily accessible, high abundant locations for 26 unique 5 x 5 km strata across the intervention area. Optimal sentinel-site placement identifies locations with abundance values ranging from 0.04–19.57 (mean = 5.21) flies per cell, and locations which are within 5.55–151.81 (mean = 68.42) minutes from the field station location.

Fig 4. Optimal placement of sentinel sites (max two sites per grid square [25km2]) within Koboko district.

Location of optimal sites visualised alongside the 3m accessibility surface (this study) and tsetse abundance surface [11], dashed lines represent the 5 x 5km sampling strata used to allocate optimal sites. This figure was generated using ArcGIS version 10.4 [42].

Identification of the optimal route

Utilising the coordinates of the 52 paired monitoring site locations, derived above, we implemented the traveling salesperson problem (TSP) to identify the optimal route in which to visit these sites. The result of the TSP is shown as Fig 5. Based on the assumption that the field-team will spend up to 5 hours sampling per day, incorporating travel times, we grouped sites to identify sampling clusters to visit per day. We show that a sampling period of four days is required to ensure that all sample locations are visited.

Fig 5.

Left: Optimal route in which to sample the identified sentinel sites within Koboko district. Right: Clusters of sentinel sites are identified by enforcing a maximum sampling and travel period of 5-hours within Koboko district. This figure was generated using ArcGIS version 10.4 [42].


This analysis investigated the ability of high-resolution satellite imagery to inform estimates of accessibility to entomological sampling sites, using tsetse control as an example application. We started by scrutinising the completeness of an existing open source road network for Koboko district, Uganda, comparing charted roads with those obtainable from manual digitisation of remotely sensed (RS) imagery at two differing spatial resolutions. Results from this section of the analysis indicate that, for this region of Uganda, roads visible within 3m imagery matched 97.43% of roads identified in OpenStreetMap (OSM) (paired t-Test p = 0.91) (Fig 1, S1 Table). Comparing roads visible within 0.5m RS imagery, and those charted by OSM, yields 28.16% consistency across sources (paired t-Test p < 0.001) (S1 Table).

As data published on OSM is the result of community contributions incorporating local knowledge, data coverage is often inconsistent. The recent establishment of several refugee camps across the West Nile Region has resulted in increased road mapping efforts within this area, which explains the high levels of coverage seen here [43]. OpenStreetMap completeness varies globally and the analyses we have developed will be particularly useful in places where OSM and standard sources of information on road networks are scant [44].

Part of our analysis aimed to infer the effect of including spatially disaggregated data on estimates of accessibility, detailing whether the extra information obtainable from 0.5m imagery produces refined estimates. The results of a maximum likelihood classification algorithm indicate a high ability to identify roads and associated features within the 0.5m imagery, mirroring that seen by manual digitisation (Table 3; Fig 2). Results from image classification also indicate that the spatial detail available within 3m imagery is too coarse to classify roads in this district accurately (38% and 5% accuracy for dirt and tarmac roads respectively). This result is to be expected as the majority of roads within Koboko district rarely exceed a width of 3m, resulting in decreased visibility; narrow roads are likely to be common across large parts of rural Africa [45]. The utility of 3m imagery may be greater in more developed areas, where roads exceed 3m in width.

Despite a higher image classification accuracy and a better model fit to training data, the 0.5m resistance surface appears to under-perform when presented with withheld GPS tracking data compared to the 3m resistance surface (Table 4, S5 Fig). Both resistance surfaces show a significant linear relationship between observed and predicted values, however, the 3m resistance surface has a lower root-mean-square error (3.93 vs 6.01 respectively). This under-performance may be due to the increased number of roads within the 0.5m resistance surface, and some of the assumptions made regarding travel along roads of differing class. While we have used the best possible information available to us, there will invariably be additional factors that may affect how accessible a location is. Should, in practice, a location be more difficult to access than predicted using our approach, an alternative location will be selected based both on the outcome of this approach and field-based information. We envisage this process to be somewhat iterative, with new GPS data collected during the first visit to a proposed monitoring site. This new data may be used to improve surface validation and refine some of the assumptions made during the approach described here. When using the surfaces to identify optimal placement of sentinel-sites, the relative travel-time to each cell is as informative as the actual travel-time. Despite varying RMSEs, the significant relationship between predicted and observed travel times, support the utility of the generated surfaces.

By combining the generated 3m accessibility surface (Fig 3) with previously published estimates of tsetse-abundance [11], we provide a novel framework for the identification of efficient locations in which to place sentinel-monitoring sites (Fig 4). Previous methods to inform the placement of sentinel-monitoring sites have been based on intuition, incorporating knowledge of tsetse ecology and local knowledge of roads within an intervention area. Here, we further quantify this process, providing a more robust approach that can be applied to a range of vector-borne diseases. The movement from a nuanced, ad-hoc process to an evidence-based one will allow for a more efficient assessment of tsetse control programmes. Although we have provided a quantifiable approach for prioritising spatial sampling of disease vectors, we are aware that knowledge of additional country and context specific factors such as varying vector behaviours and geographic accessibility are invaluable for designing and implementing an effective monitoring program. Such approaches should be tailored for the vector, disease, and country of interest, with the work described here providing a framework from which to build. Local knowledge can still be useful in the design and implementation of this approach, potentially when identifying changes in accessibility (such as the creation or disuse of roads), or through refinement of selected sites. The application of the methods used here to the context of intervention monitoring and assessment is novel, and the refinement of results has several cost-effective implications as vector control expands to other areas within the region.

The distribution and abundance of disease vectors dynamically change in response to variations in biotic and abiotic conditions [46, 47]. The methodology described here is receptive to new surfaces detailing expanding or decreasing species ranges, however our approach is focused on identifying static monitoring sites based off conditions at the time of implementation of the intervention. Periodic updates to OSM data may be used to generate contemporary geographic accessibility surfaces reflecting the creation of new road networks or the disuse of others. Although this methodology has the potential for dynamic updates, we are aware that our approach requires a technical understanding of GIS and remote sensing, factors which may prevent uptake and application in developing countries outside of the framework of internationally supported programs. These factors may be addressed through capacity strengthening programmes, where GIS skills can be integrated as part of the curriculum.

Several important vector-borne NTDs have been targeted for elimination as a public-health problem by 2020 within the WHO NTD roadmap [4]. Unfortunately, however, the burden of numerous VBDs will continue beyond the ambitious 2020 target [4850]. As evident within the WHO roadmap, both disease and vector surveillance form large components of most elimination strategies; however, the Strategic and Technical Advisory Group (STAG) for NTDs also recognise the need for a better understanding of the economic aspects of NTD control. By providing methods to ensure that vector control programmes operate at maximum efficiency, we can ensure that the limited funding associated with some of these NTDs has the largest impact.

Although this analysis does not serve as an economic evaluation of methods to assess control programme efficacy, previous work has shown that vehicle running and travel costs are within the top five associated costs of running a tsetse control programme [51, 52], with staff salaries being the most expensive element. By strategically placing sentinel-monitoring sites in locations that are associated with a low accessibility cost, programmes can reduce costs associated with travel (e.g., fuel, maintenance) and staff expenses, with current costs of tsetse monitoring being ~9.0$/km2/year (10.6% of tsetse control programme budgets) [52]. The accessibility surface may also contribute toward cost-effective planning of pre-intervention surveys, which are responsible for roughly 6% of control program budgets [52]. Furthermore, by informing the positioning of these sites by additional metrics, such as pre-intervention abundance, we identify locations that may provide more accurate evaluations of control efficacy. Further research should be performed to evaluate the precise economic gains of this approach.

Accessibility, in general, is a very sought-after metric and the methodology applied here, although currently restricted to one district in Northern Uganda and limited to the purpose of identifying accessible tsetse monitoring sites, could inform other accessibility analyses within the area such as access to HAT diagnostic centres, and may be applied to a range of vector-borne diseases.

Supporting information

S1 Fig. Existing and planned intervention areas.

Blue areas identify both current and planned Tiny Target intervention areas within the West Nile Region of Northern Uganda.


S2 Fig. Distribution of 5 × 5 km sampling strata across Koboko district.


S3 Fig. Spatial distribution of ground truth data.

Left: Tracks used to inform speeds along select roads (training data). Right: Tracks used to validate the generated cost-distance surfaces (validation data).


S4 Fig. Composite images of digitised road networks within Koboko district.

Purple roads represent roads visible in 0.5m imagery; black roads represent roads visible in 3m imagery, and light blue roads represent roads available within the OSM dataset. The overlap of all three colours indicate areas of consistency across sources.


S5 Fig. Regression plots.

Plots from a linear regression using observed travel time data with predicted travel time as the only covariate. Top Left: Regression using 3m within-sample (training) data. Top Right: Regression using 3m out-of-sample (validation) data. Bottom Left: Regression using 0.5m within-sample (training) data. Bottom Right: Regression using 0.5m out-of-sample (validation) data.


S1 Table. Results of OpenStreetMap data validation.


S1 File. 3 meter resolution resistance surface.

Available via FigShare: 10.6084/m9.figshare.11837019.


S2 File. 0.5 meter resolution resistance surface.

Available via FigShare: 10.6084/m9.figshare.11837070.



The authors wish to thank Dr Simon Wagstaff and Mr Andrew Bennett for providing the computational resources to perform this analysis.


  1. 1. World Health Organization. Vector-borne diseases 2017 [cited 2019 22nd February 2019]. Available from:
  2. 2. Sedda L, Lucas ER, Djogbenou LS, Edi AVC, Egyr-Yawson A, Kabula BI, et al. Improved spatial ecological sampling using open data and standardization: an example from malaria mosquito surveillance. bioRxiv. 2018:465963.
  3. 3. Franco JR, Cecchi G, Priotto G, Paone M, Diarra A, Grout L, et al. Monitoring the elimination of human African trypanosomiasis: Update to 2014. PLoS Negl Trop Dis. 2017;11(5):e0005585. pmid:28531222
  4. 4. World Health Organization. Accelerating work to overcome neglected tropical diseases: a roadmap for implementation. Geneva: 2012.
  5. 5. Moloo A. Eliminating sleeping sickness as a public health problem is on track 2017 [18th June 2018]. Available from:
  6. 6. Picozzi K, Fèvre E, Odiit M, Carrington M, Eisler MC, Maudlin I, et al. Sleeping sickness in Uganda: a thin line between two fatal diseases. BMJ. 2005;331(7527):1238–41. pmid:16308383
  7. 7. Berrang-Ford L, Odiit M, Maiso F, Waltner-Toews D, McDermott J. Sleeping sickness in Uganda: revisiting current and historical distributions. Afr Health Sci. 2006;6(4):223–31. PMC1832067. pmid:17604511
  8. 8. Welburn SC, Coleman PG, Maudlin I, Fevre EM, Odiit M, Eisler MC. Crisis, what crisis? Control of Rhodesian sleeping sickness. Trends in parasitology. 2006;22(3):123–8. Epub 2006/02/07. pmid:16458071.
  9. 9. Tirados I, Esterhuizen J, Kovacic V, Mangwiro TNC, Vale GA, Hastings I, et al. Tsetse control and Gambian sleeping sickness; implications for control strategy. PLoS Negl Trop Dis. 2015;9(8):e0003822. pmid:26267814
  10. 10. Lehane M, Alfaroukh I, Bucheton B, Camara M, Harris A, Kaba D, et al. Tsetse control and the elimination of Gambian sleeping sickness. PLoS Negl Trop Dis. 2016;10(4):e0004437. pmid:27128795
  11. 11. Stanton MC, Esterhuizen J, Tirados I, Betts H, Torr SJ. The development of high resolution maps of tsetse abundance to guide interventions against human African trypanosomiasis in northern Uganda. Parasit Vectors. 2018;11(1):340. pmid:29884213
  12. 12. Juran S, Broer PN, Klug SJ, Snow RC, Okiro EA, Ouma PO, et al. Geospatial mapping of access to timely essential surgery in sub-Saharan Africa. BMJ Glob Health. 2018;3(4):e000875. pmid:30147944
  13. 13. Agbenyo F, Marshall Nunbogu A, Dongzagla A. Accessibility mapping of health facilities in rural Ghana. J Transp Health. 2017;6:73–83.
  14. 14. Ouma PO, Maina J, Thuranira PN, Macharia PM, Alegana VA, English M, et al. Access to emergency hospital care provided by the public sector in sub-Saharan Africa in 2015: a geocoded inventory and spatial analysis. Lancet Glob Health. 2018;6(3):e342–e50. pmid:29396220
  15. 15. Delamater PL, Messina JP, Shortridge AM, Grady SC. Measuring geographic access to health care: raster and network-based methods. International journal of health geographics. 2012;11(1):15–. pmid:22587023.
  16. 16. Planet. Planet imagery product specification: PlanetScope & RapidEye. 2016.
  17. 17. Airbus. Satellite Data 2019 [cited 2018 7th January 2018]. Available from:
  18. 18. OpenStreetMap contributors. Geofabrik OpenStreetMap Data Extracts. 2018.
  19. 19. ESRI. ArcMap 10.3: Spatial Analyst Toolbox: How Maximum Likelihood Classification works 2016 [13th June 2018]. Available from:
  20. 20. Houben RM, Van Boeckel TP, Mwinuka V, Mzumara P, Branson K, Linard C, et al. Monitoring the impact of decentralised chronic care services on patient travel time in rural Africa—methods and results in Northern Malawi. International journal of health geographics. 2012;11:49. Epub 2012/11/17. pmid:23153311; PubMed Central PMCID: PMC3517381.
  21. 21. Soule RG, Goldman RF. Terrain coefficients for energy cost prediction. J Appl Physiol. 1972;32(5):706–8. pmid:5038861.
  22. 22. Uganda Bureau of Statistics. Projections of demographic trends in Uganda 2007–2017. Kampala, Uganda: Uganda Bureau of Statistics, 2016.
  23. 23. World Health Organization. Global Health Observatory data repository: Human African Trypanosomiasis: World Health Organization; 2019 [cited 2019 18th February 2019]. Available from:
  24. 24. Planet. Planet Explorer 2018 [7th June 2018]. Available from:
  25. 25. ASTRIUM. Pléiades Imagery—User Guide. 2012.
  26. 26. Bohannon RW. Comfortable and maximum walking speed of adults aged 20–79 years: reference values and determinants. Age Ageing. 1997;26(1):15–9. pmid:9143432
  27. 27. Demirović D. An Implementation of the Mean Shift Algorithm. Image Processing On Line. 2019;9:251–68.
  28. 28. ESRI. Filtering classified output 2019 [6th December 2019]. Available from:
  29. 29. ESRI. Smoothing class boundaries and clumping classified output 2019 [6th December 2019]. Available from:
  30. 30. ESRI. Generalizing classified output by removing small isolated regions 2019 [6th December 2019]. Available from:
  31. 31. Facebook Connectivity Lab, Center for International Earth Science Information Network—CIESIN—Columbia University. High Resolution Settlement Layer. 2016.
  32. 32. Traffic and Road Safety Act (Uganda). The Traffic and Road Safety (Speed Limits) Regulations, 2004. 2004 Contract No.: Section 131(k) of the Traffic and Road Safety Act, Cap. 361.
  33. 33. Ministry of Works and Transport. The Highway Code. Uganda: 2009.
  34. 34. Sellers PJ. Canopy reflectance, photosynthesis and transpiration. Int J Remote Sens. 1985;6(8):1335–72.
  35. 35. Myneni Ranga B., Hall Forrest G., Sellers Piers J., Marshak A. The interpretation of spectral vegetation indexes. IEEE Trans Geosci Remote Sens. 1995;33(2):481–6.
  36. 36. QGIS Development Team. QGIS Geographic Information System version 3.4.4. Open Source Geospatial Foundation Project. http://qgisosgeoorg. 2019.
  37. 37. FlowMap Group. QGIS Python Plugins Repository: Least-Cost Path ( 2019 [6th December 2019]. Available from:
  38. 38. Dijkstra EW. A note on two problems in connexion with graphs. Numerische Mathematik. 1959;1:269–71.
  39. 39. Applegate D, Cook W, Rohe A. Chained Lin-Kernighan for Large Traveling Salesman Problems. INFORMS Journal on Computing. 2003;15(1):82–92.
  40. 40. Hahsler M, Hornik K. TSP Infrastructure for the Traveling Salesperson Problem. Journal of Statistical Software; Vol 1, Issue 2 (2008). 2007.
  41. 41. van Etten J. gdistance: Distances and Routes on Geographical Grids. R package version 1.2–2. 2018 [10th December 2019]. Available from:
  42. 42. ESRI. ArcGIS Desktop: Release 10.4.1. Redlands, CA: Environmental Systems Research Institute.; 2011.
  43. 43. Geoffrey Kateregga. Community mapping for refugees in Uganda 2018 [updated 19th March 201919th March 2019]. Available from:
  44. 44. Barrington-Leigh C, Millard-Ball A. The world’s user-generated road map is more than 80% complete. PLoS One. 2017;12(8):e0180698. pmid:28797037
  45. 45. Riverson J, Gaviria J, Thriscutt S. Rural Roads in Sub-Saharan Africa: Lessons from World Bank Experience. Washington, D.C.: 1991.
  46. 46. Burkett-Cadena ND, McClure CJW, Estep LK, Eubanks MD. Hosts or habitats: What drives the spatial distribution of mosquitoes? Ecosphere. 2013;4(2):art30.
  47. 47. Smith DL, Dushoff J, McKenzie FE. The Risk of a Mosquito-Borne Infectionin a Heterogeneous Environment. PLOS Biology. 2004;2(11):e368. pmid:15510228
  48. 48. Ferguson NM. Challenges and opportunities in controlling mosquito-borne infections. Nature. 2018;559(7715):490–7. pmid:30046071
  49. 49. Hotez PJ, Basáñez M-G, Acosta-Serrano A, Grillet ME. Venezuela and its rising vector-borne neglected diseases. PLoS Negl Trop Dis. 2017;11(6):e0005423. pmid:28662038
  50. 50. Mackey TK, Liang BA, Cuomo R, Hafen R, Brouwer KC, Lee DE. Emerging and reemerging neglected tropical diseases: a review of key characteristics, risk factors, and the policy and innovation environment. Clin Microbiol Rev. 2014;27(4):949–79. pmid:25278579
  51. 51. Muhanguzi D, Okello WO, Kabasa JD, Waiswa C, Welburn SC, Shaw AP. Cost analysis of options for management of African animal trypanosomiasis using interventions targeted at cattle in Tororo district; south-eastern Uganda. Parasit Vectors. 2015;8:387. Epub 2015/07/23. pmid:26198109; PubMed Central PMCID: PMC4510899.
  52. 52. Shaw APM, Tirados I, Mangwiro CTN, Esterhuizen J, Lehane MJ, Torr SJ, et al. Costs of using “Tiny Targets” to control Glossina fuscipes fuscipes, a vector of Gambiense sleeping sickness in Arua district of Uganda. PLoS Negl Trop Dis. 2015;9(3):e0003624. pmid:25811956