Estimating and Mapping the Population at Risk of Sleeping Sickness

Background Human African trypanosomiasis (HAT), also known as sleeping sickness, persists as a public health problem in several sub-Saharan countries. Evidence-based, spatially explicit estimates of population at risk are needed to inform planning and implementation of field interventions, monitor disease trends, raise awareness and support advocacy. Comprehensive, geo-referenced epidemiological records from HAT-affected countries were combined with human population layers to map five categories of risk, ranging from “very high” to “very low,” and to estimate the corresponding at-risk population. Results Approximately 70 million people distributed over a surface of 1.55 million km2 are estimated to be at different levels of risk of contracting HAT. Trypanosoma brucei gambiense accounts for 82.2% of the population at risk, the remaining 17.8% being at risk of infection from T. b. rhodesiense. Twenty-one million people live in areas classified as moderate to very high risk, where more than 1 HAT case per 10,000 inhabitants per annum is reported. Discussion Updated estimates of the population at risk of sleeping sickness were made, based on quantitative information on the reported cases and the geographic distribution of human population. Due to substantial methodological differences, it is not possible to make direct comparisons with previous figures for at-risk population. By contrast, it will be possible to explore trends in the future. The presented maps of different HAT risk levels will help to develop site-specific strategies for control and surveillance, and to monitor progress achieved by ongoing efforts aimed at the elimination of sleeping sickness.


Introduction
Human African trypanosomiasis (HAT), or sleeping sickness, is a vector-borne disease caused by two sub-species of the parasitic protozoa Trypanosoma brucei (i.e. T. b. gambiense and T. b. rhodesiense).
Trypanosomes are transmitted to humans by the infected bite of various species of tsetse fly (genus Glossina). Transmission of the disease only takes place in sub-Saharan Africa, in discrete areas of endemicity, or 'foci', within the geographic distribution of the tsetse fly. The Gambian form of sleeping sickness is normally characterized by a long asymptomatic period and it is found in western and central Africa. The Rhodesian form, which is encountered in eastern and southern Africa, displays a much more rapid onset of overt symptoms and a faster progression.
In the early 1960s, the reported incidence of the disease was at a trough, with only a few thousand cases being reported annually. However, a decline in surveillance in the post-independence period allowed sleeping sickness to regain ground. By the end of the 20 th century, the World Health Organization (WHO) estimated that 300,000 people contracted the infection every year [1]. Since then, a global alliance led by WHO set elimination as the goal of its strategy against HAT [2,3]. This renewed commitment by international and national institutions, including the private sector, succeeded in reverting the trend. As compared to the peak in 1998, when 37,991 new cases of HAT had been reported at the continental level, 6,743 cases were reported in 2011, corresponding to a reduction of 82.3%. Also, many countries considered as endemic have not reported any cases in recent years [4].
The magnitude of the recent advances in HAT control and surveillance is such that up-to-date estimates of the number and geographic distribution of people at risk are urgently needed.
In the past, estimates of sleeping sickness risk at the continental, regional and national levels could only be based on educated guess and rough estimations of experts, rather than on a clearly laid out, objective analysis of the epidemiological evidence. In 1985, a WHO Expert Committee indicated that a population of 78.5 million was at risk of HAT in sub-Saharan Africa [5]. This figure was based on national-level information provided by the Ministries of Health of affected countries. In 1995, a new WHO Expert Committee indicated that 60.8 million people were at risk of contracting sleeping sickness [1], thus providing what was, to date, the latest global estimate of HAT risk. To derive this latest figure, a semi-quantitative method was used, whereby rural populations involved in agricultural activities within known HAT transmission areas were considered at risk. In both estimates, subjectivity remained high and the link to the epidemiological evidence loose.
Since the latest estimations were made, HAT control and surveillance were scaled up [6], and data collection and reporting were substantially improved, with WHO coordinating the efforts of the National Sleeping Sickness Control Programmes (NSSCPs), bilateral co-operation, Non-Governmental Organizations (NGOs), Research Institutes and the private sector [7]. Also, over the last 10 to 15 years, the increased availability and utilization of the Global Positioning System (GPS), remote sensing data and Geographical Information Systems (GIS) triggered the development of novel, more objective methodologies to map the risk of many diseases [8,9,10,11].
Till recently, geospatial analysis had never been used to estimate HAT risk at the regional or African scale. In 2008, the Atlas of HAT was launched, aiming at assembling, harmonizing and mapping datasets on the geographic distribution of sleeping sickness in sub-Saharan Africa [12]. Comprehensive and accurate epidemiological maps were generated [4,13], which laid the foundations for more objective, evidence-based estimations of sleeping sickness risk. Thereafter, a GIS-based methodology for risk estimation was developed and tested in six Central African countries [14]. In this methodology, harmonized epidemiological data and global human population layers were combined, thus enabling different levels of HAT risk to be estimated and mapped. 'Risk' was regarded as the likelihood of infection, and the likelihood was estimated as a function of disease intensity and geographical proximity to HAT reported cases.
In the present study, the methodology tested in the six Central African countries was applied at the continental level in order to map the risk of sleeping sickness in sub-Saharan Africa and to estimate at-risk population. In an effort to generate comparable estimates for both T. b. gambiense and T. b. rhodesiense infections, the same methodology was applied to all HAT-endemic countries and to both forms of the disease.  [4]. The Atlas provided village-level mapping for 81.0% of the cases, corresponding to 19,828 different locations mapped. The average spatial accuracy for reported cases mapped was estimated at <1,000 m using methods already described [4].

Input data
For the remaining 19.0% of the cases, village-level information was unavailable but the area of occurrence was known (e.g. focus, parish, health zone, etc.). For the purpose of risk estimation, these cases were apportioned among the endemic villages of their area of occurrence by means of proportional allocation [14].
Reported cases also included those diagnosed in non-endemic countries -most notably in travellers and migrants -which in the Atlas of HAT are mapped in the probable place of infection and flagged as 'exported' [15]. For T. b. rhodesiense exported cases, the place of infection most frequently corresponds to a park or another type of protected area. For the sole purpose of risk estimation, T. b. rhodesiense exported cases were randomly distributed within the boundaries of their respective park/protected area of origin.
The geographic distribution of human population was derived from Landscan TM databases [16]. Landscan provides global grids where census counts are allocated to grid nodes on the basis of probability coefficients. The spatial resolution of Landscan is 30 arcseconds (<1 km at the equator), and the population layer is updated on a yearly basis.
To delineate risk areas, an average of the ten Landscan population datasets from 2000 to 2009 was used. Subsequently, Landscan 2009 was combined with the risk map to provide estimates of people at risk at the end of the study period [14].

Spatial smoothing
Both input layers (i.e. sleeping sickness cases and human population) can be regarded as spatial point processes, and thus amenable to spatial smoothing.
Spatial smoothing methods are used in epidemiology to facilitate data analysis, and they allow to transform point layers into continuous surfaces of intensity. In this context, the intensity l(s) of a point process is the mean number of events per unit area at the point s [17]. The term 'event' is used to distinguish the location of an observation (s i ) from any other arbitrary location s within a study region R. Spatial smoothing techniques can be based on localized averages or more complex, three-dimensional

Author Summary
The present thrust towards the elimination of human African trypanosomiasis (HAT, or sleeping sickness) requires accurate information on how many people are at risk of contracting the disease, and where they live. This information is crucial to target field interventions effectively and efficiently, as well as to monitor progress towards the elimination goal. In this paper, a Geographic Information System was used to delineate areas at different levels of risk. To this end, accurate data on the spatial distribution of HAT cases (period 2000-2009) were collated and combined with maps of human population. A total of 70 million people are estimated to be at risk of contracting sleeping sickness in Africa. This population is distributed over a surface of one and a half million square kilometres, an area six times that of the United Kingdom. Half of the people and of the areas at risk are found in the Democratic Republic of the Congo. mathematical functions (e.g. kernels), but they all rely on a moving window, whose size and shape determines how far the effect of an event will reach [18]. For this study, intensity was estimated through a kernel function k (?), so that the intensity estimatê l l t (s)could be expressed as: Here, s was a location anywhere in the study region R, s 1 ,.., s n were the locations of the n observed events, and k(?) represented the kernel weighting function. t.0 is normally referred to as the bandwidth or search radius, and s i were the events that lay within the area of influence as controlled by t.
There are various shapes of kernel to choose from, all usually represented by symmetric bivariate functions decreasing radially. The choice of shape has relatively little effect on the resulting intensity estimatel l t (s) [19,20] and we used a quadratic kernel [20]. A more important choice is the selection of the bandwidth t, the rule being that the higher t, the smoother the intensity surface. Although different techniques are available for selecting t [21,22], no optimal value exists, and characteristics of the biological process under study are often better suited to guide the choice, so that the smoothed surface provide insights into the underlying data [18].
By taking into account the epidemiological features of HAT, the behaviour of the tsetse vector and the mobility of people in the average rural African milieu where HAT occurs, a search radius of 30 km was chosen [14]. In particular, a few studies investigated the daily distance covered by people living in HAT foci [23,24,25] and revealed that this tends not to exceed 15 km. The distance of 30 km enabled to take into account, at least in part, also people's movements that do not occur on a daily basis. Figure 1 provides a three-dimensional illustration of the output of spatial smoothing. In the example, the point layer used as input comprised one single 'event' (i.e. one HAT case) localized at the centre of the grid.

Delineation of risk areas
Prior to spatial smoothing, the number of HAT cases reported in 2000-2009 was divided by ten, thus providing the average number of cases per annum (p.a.). Similarly, Landscan human population layers from 2000 to 2009 were averaged [14]. Both averaged layers were subjected to spatial smoothing using the same quadratic kernel function. Importantly, both intensity surfaces were generated using the same 30 km bandwidth [26].
Spatial smoothing resulted in the two surfaces D _ t (s) and P _ t (s), which represent the average annual estimates of disease intensity and population intensity respectively. The input to and output of spatial smoothing are exemplified in Figure 2.
The ratio between the intensity of HAT cases and the population intensity can be defined as the disease risk [18], so that a risk function was estimated as: Thresholds were applied to the risk function R _ t (s) in order to distinguish and map different categories of risk, ranging from 'very low' to 'very high' (Table 1). Outside the areas mapped as at risk of HAT, i.e. in areas where ,1 HAT case per 10 6 inhabitants p.a. was reported, the risk to contract the disease was considered 'marginal'. These marginal areas were not taken into account further in this study. The term 'marginal' was chosen because, in such areas, risk could not be considered as non-existent, since residents of these zones could still expose themselves to infection if visiting transmission areas.

Estimates of people at risk
The map depicting the different categories of HAT risk was combined with Landscan 2009 dataset to estimate the number of people at risk at the end of the study period [14].

Results
An area of 1.55 million km 2 in Africa is estimated to be at various levels of HAT risk, ranging from 'very high' to 'very low' The Risk of Sleeping Sickness (Table 2 and Table 3). Areas at 'very high' to 'moderate' risk account for 719 thousand km 2 (46.3%) and areas at 'low' to 'very low' risk account for the remaining 833 thousand km 2 (53.7%).
The total population at risk of sleeping sickness is estimated at 69.3 million (Table 4 and Table 5). The categories at 'very high' to 'moderate' risk account for a third of the people at risk (21 million), whilst the remaining two thirds (48.3 million) are at 'low' to 'very low' risk.
The geographic distribution of risk areas in central Africa, western Africa and eastern-southern Africa are presented in Figure 3, Figure 4 and Figure 5 respectively. Country-level risk maps are provided in Supporting Information (Maps S1). Focus level risk maps will be provided at HAT/WHO website: http:// www.who.int/trypanosomiasis_african/country/en/.

Trypanosoma brucei gambiense
A total of 57 million people are estimated to be at risk of contracting Gambian sleeping sickness (Table 4). This population is distributed over a surface of 1.38 million km 2 ( Table 2). Approximately 19.6 million (34.4%) of the people at risk live in The risk patterns in Cameroon, Central African Republic, Chad, Congo, Equatorial Guinea, and Gabon have already been described in some detail elsewhere [14]. In essence, areas at very high to high risk are localized in southeastern and northwestern Central African Republic, southern Chad, along lengthy stretches of the Congo river north of Brazzaville, and by the Atlantic coast on both sides of the border between Gabon and Equatorial Guinea.
The Democratic Republic of the Congo is, by far, the country with the highest number of people at risk (<36.2 million) and the largest at-risk area (<790 thousand km 2 ). Areas at risk can be found in the provinces of Bandundu, Bas Congo, É quateur, Kasai-Occidental, Kasai-Oriental, Katanga, Kinshasa, Maniema, Orientale, and South Kivu. More details on the risk and the geographic distribution of sleeping sickness in the Democratic Republic of the Congo will be provided in a separate paper.
In South Sudan, a sizable area (<100 thousand km 2 ) and over a million people are estimated to be at risk of sleeping sickness, including a number of high to very high risk areas in Central and Western Equatoria provinces. These findings highlight the need for continued surveillance in this country [27]. In neighbouring Uganda, the area at risk of T. b. gambiense infection (<17 thousand km 2 ) is located in the North-west of the country. It mostly falls in the category 'moderate', and it accounts for over two million people at risk.
In Angola, sleeping sickness is found in the northwestern part of the country (<180 thousand km 2 -4.8 million people at risk), and most of the high-risk areas are located in the Provinces of Bengo, Kwanza Norte, Uige and Zaire.
In western Africa, the most affected endemic areas are categorized at moderate risk and they are localized in costal Guinea and central Côte d'Ivoire ( Figure 4). Areas at lower risk fringe the main foci, but they are also found in other zones such as southern Guinea and southern Nigeria.

Trypanosoma brucei rhodesiense
Rhodesian sleeping sickness is estimated to threaten a total of 12.3 million people in eastern and southern Africa (Table 5). This population is distributed over a surface of 171 thousand km 2 (Table 3 and Figure 5). Of the total population at risk of T. b. rhodesiense, a minor proportion (<1.4 million -11.8%) live in areas classified at moderate risk or higher, the rest (<10.9 million -88.2%) live in areas classified at low to very low risk. In Uganda, Rhodesian HAT threatens a population of <7.9 million, and the risk area (29 thousand km 2 ) stretches from the northern shores of Lake Victoria up to Lira District, north of Lake Kyoga. The areas in Uganda where risk is relatively higher (i.e. 'moderate') broadly correspond to the districts of Soroti, Kaberamaido and northwestern Iganga.
Because of a comparatively lower human population density, some areas in the United Republic of Tanzania are estimated to be characterized by higher levels of risk than Uganda, despite fewer reported cases of HAT. In particular, risk is estimated to be high in proximity to the Ugalla River Forest Reserve (Tabora Province). Also all of the other risk areas in the United Republic of Tanzania are associated in one way or another to protected areas, most notably the Moyowosi Game Reserve and the natural reservations  in the northeast of the country (i.e. Serengeti, Ngorongoro and Tarangire). Overall, <1.8 million people (66 thousand km 2 ) are estimated to be at risk in this country. In Kenya, HAT risk ranging from low to very low is localized in the western part of the country, adjacent to risk areas in neighbouring Uganda. Also, although no cases were reported from the Masai Mara National Reserve during the study period, part of its area is estimated to be at risk, as influenced by the risk observed in the neighbouring Serengeti National Park (United Republic of Tanzania). Interestingly, two cases have been reported recently (2012) in travellers visiting the Masai Mara [28]. Nature reserves also shape the patterns of HAT risk at the southernmost limit of T. b. rhodesiense distribution, most notably in Malawi, Zambia and Zimbabwe. In this region, the highest number of people at risk is found in Malawi (<0.9 million people), where risk is associated to the wildlife reserves of Vwaza Marsh, Nkota-Kota, and the Kasungu National Park. In Zambia (<0.4 million people at risk), risk areas are scattered across the country, predominantly in the East and most notably around the North and South Luangwa National Parks. In Zimbabwe, an area of 7.8 thousand km 2 is estimated to be at risk (94 thousand people). This risk zone in associated to the Mana Pools National Park and the Lake Kariba.

Discussion
Approximately 70 million people (1.55 million km 2 ) are estimated to be at various levels of HAT risk in Africa. This corresponds to 10% of the total population and 7.4% of the total area of the endemic countries. This figure is not far from estimates made by WHO over the last thirty years, (78.54 million in 1985 [5] and 60 million in 1995 [1]). However, the meaning and interpretation of these various figures substantially differ, and it is unwarranted to make comparisons between the results of the present study and previous figures, especially if the goal is to explore trends. In the early 80 s, the only way to derive countryand continental-level estimates of people at risk of HAT was to collate heterogeneous information from the Ministries of Health of the affected countries [5]. A decade later, an attempt was made to update the estimates [1], but the degree of subjectivity in the methodology and the reliance on expert opinion remained high.
By contrast, the present methodology is quantitative, reproducible, based on evidence and provides a categorization of risk. The use of global human population layers [16] and the regular update of the Atlas of HAT [4] will enable regular and comparable updates to be made.
The presented maps of different HAT risk categories will help to plan the most appropriate site-specific strategies for control and surveillance, and they will contribute to ongoing efforts aimed at the sustainable elimination of the sleeping sickness.
However, the reported incidence levels underpinning the different risk categories differ by orders of magnitude, so that a more accurate representation of HAT risk can be given by focusing on the different risk categories. For example, 21 million of people (0.7 million km 2 ) are estimated to live at 'moderate' to 'very high' risk of infection. These are the areas where the most intensive control measures need to be deployed. Low to very low risk categories account for <48 million people (0.8 million km 2 ). In these areas, cost-effective and adapted measures must be applied for a sustainable control.
From the methodological standpoint, assumptions affect all estimates of disease risk, including those presented in this paper. One important assumption in the proposed methodology is that it is possible to use the same approach based on human cases of trypanosomiasis to estimate risk of both forms of sleeping sickness. This assumption met the primary goal of generating continental risk estimates in a consistent fashion. However, especially for T. b. rhodesiense, different approaches could be explored, explicitly addressing the pronounced zoonotic dimension of this form of the disease.
Another important choice in the proposed methodology is that of the 30 km bandwidth -the distance from affected locations beyond which disease intensity is considered zero. Sensitivity analysis conducted for six central African countries showed that there is a positive linear relationship between bandwidth on the one hand, and the extent of risk areas and the at-risk population on the other [29,30]. However, the categories at higher risk were shown to be the least affected by bandwidth. Therefore, as a rule, increasing the bandwidth would inflate the low-risk categories, but it would have a more limited effect on the delineation of areas at higher risk.  The estimates presented here also rest on the assumption of isotropy for the risk function. In the future, anisotropy may be explored in an effort to account for the linear nature of some important landscape features such as rivers or roads.
When interpreting the presented risk estimates it is important to acknowledge the uncertainty inherent in the human population datasets used as denominator [31]. Also, it has to be borne in mind that no attempt was made to model HAT under-detection and under-reporting, which, despite recent progress in surveillance [12], are still known to occur. HAT under-detection can occur both in areas covered by active or passive surveillance and in areas that, because of remoteness or insecurity, are off the radar of health care services, and therefore sometimes referred to as 'blind spots'. These two types of under-detection are expected to have different effects on risk estimation and mapping. The former is likely to impinge mainly on the level of risk, with a limited effect on the delineation of risk areas and on the estimates of the total population at risk. By contrast, if under-detection occurs in zones were no surveillance is in place, a few areas at risk will fail to be captured and mapped, which is bound to result in underestimation of the total population at risk. In the proposed risk mapping methodology, the latter areas would have been included in the 'marginal' risk category. Efforts should be made to identify and accurately delineate these hypothetical transmission zones, finding adaptive strategies to cope with the constraints of remoteness and insecurity that affect them. Knowing the true epidemiological status of these areas has vast implications not only for risk estimation but most crucially for the prospects of HAT elimination.
For the chronic T. b. gambiense infection [32], under-detection can be addressed by continuous passive case detection and regular active screening surveys. The fact that we took into consideration ten-year data on disease occurrence and control activities should contribute to the robustness of the T. b. gambiense risk estimates. However, in the case of T. b. rhodesiense, due to the acuteness and rapid progression of infection, under-detection poses more serious challenges. Although attempt were made to model underdetection for T. b. rhodesiense [33], both data and methodological constraints prevent these methods from being applied at the continental-level. In the future, methodologies should be developed to estimate and map the coverage of active and passive surveillance. These would provide valuable information complementing risk maps, whilst also assisting in optimizing field interventions.
The temporal dimension is also crucial when interpreting risk maps. The proposed estimates were based on an average of HAT reported cases for a ten-year period. No weighting for the different reporting years was applied, despite the fact that a reduction in reported cases was observed during the last years of the study period. As a result, all cases contributed equally regardless of when exactly they were reported.
Importantly, the estimates of people at risk presented in this paper, being based on reported cases, can not account for the possible future spread of HAT, and the risk thereof, into presently unaffected areas. Other approaches to risk modelling could be more interested in predicting the future risk of sleeping sickness, focusing on the environmental suitability for HAT rather than on its present occupancy. To this end, the relationships are to be explored between HAT occurrence and a range of factors, including human and livestock population movements [34], environmental, climatic and socio-economic variables, as well as disease and vector control. The potential of this type of models has been investigated in a few local contexts, for example in southeastern Uganda for T. b. rhodesiense [35,36,37,38], and coastal Guinea for T. b. gambiense [25]. Recent attempts have also tried to address risk forecasts at the regional level in relation to climate change [39]. The potential of various modelling frameworks could be explored for modelling the future risk of HAT [40,41]. The growing range of spatially explicit environmental datasets [42] and increased computational power enable these models to be applied even across large geographical areas. Interpretation of model outputs will probably be the most serious challenge. In fact, incompleteness and biases in the real-world epidemiological records often blur the line between concepts such as the theoretical fundamental niche of a pathogen and its realized niche.
Where estimates of prevalence are available, most notably in T. b. gambiense areas, model-based geostatistics could also be applied, which utilize Bayesian methods of statistical inference and enable rigorous assessment of uncertainty [11]. Their potential for, and applicability to, a low-prevalence, focal disease such as HAT would be interesting to explore.

Supporting Information
Map S1 Maps of distribution of population at risk of human African trypanosomiasis in 21 disease endemic countries, where any level of risk has been identified during the period 2000-2009. Countries are organized on geographical order, west to east + north to south, and from T.b.gambiense to T.b.rhodesiense endemic countries: Guinea, Sierra Leone, Côte d'Ivoire, Nigeria, Cameroon, Chad, Central African Republic, South Sudan, Equatorial Guinea, Gabon, Congo, The Democratic Republic of the Congo, Angola, Uganda, Kenya, United Republic of Tanzania, Burundi, Zambia, Malawi, Mozambique and Zimbabwe. (PDF)