Spatio-Temporal Analysis of Smear-Positive Tuberculosis in the Sidama Zone, Southern Ethiopia

Background Tuberculosis (TB) is a disease of public health concern, with a varying distribution across settings depending on socio-economic status, HIV burden, availability and performance of the health system. Ethiopia is a country with a high burden of TB, with regional variations in TB case notification rates (CNRs). However, TB program reports are often compiled and reported at higher administrative units that do not show the burden at lower units, so there is limited information about the spatial distribution of the disease. We therefore aim to assess the spatial distribution and presence of the spatio-temporal clustering of the disease in different geographic settings over 10 years in the Sidama Zone in southern Ethiopia. Methods A retrospective space–time and spatial analysis were carried out at the kebele level (the lowest administrative unit within a district) to identify spatial and space-time clusters of smear-positive pulmonary TB (PTB). Scan statistics, Global Moran’s I, and Getis and Ordi (Gi*) statistics were all used to help analyze the spatial distribution and clusters of the disease across settings. Results A total of 22,545 smear-positive PTB cases notified over 10 years were used for spatial analysis. In a purely spatial analysis, we identified the most likely cluster of smear-positive PTB in 192 kebeles in eight districts (RR= 2, p<0.001), with 12,155 observed and 8,668 expected cases. The Gi* statistic also identified the clusters in the same areas, and the spatial clusters showed stability in most areas in each year during the study period. The space-time analysis also detected the most likely cluster in 193 kebeles in the same eight districts (RR= 1.92, p<0.001), with 7,584 observed and 4,738 expected cases in 2003-2012. Conclusion The study found variations in CNRs and significant spatio-temporal clusters of smear-positive PTB in the Sidama Zone. The findings can be used to guide TB control programs to devise effective TB control strategies for the geographic areas characterized by the highest CNRs. Further studies are required to understand the factors associated with clustering based on individual level locations and investigation of cases.


Introduction
Tuberculosis (TB) is an infectious disease affecting and claiming the lives of millions, with developing countries being hit the worst [1]. The magnitude of the problem varies across settings, possibly due to unfavorable socio-economic conditions, overcrowding, poverty, poor access to health services, socio-cultural barriers and HIV infection [1][2][3][4][5][6].
Increasing evidence about disease distribution is being generated using Geographic Information Systems (GIS) and scan statistics to analyze and detect spatial and spatio-temporal variations and clustering of diseases [7]. Various studies reported spatial [8][9][10][11][12][13] and spatiotemporal clustering [12,[14][15][16] of TB, thereby generating important information about the distribution of the disease and its transmission pattern, risk factors for the disease and the evaluation of intervention efforts [11][12][13][14][15][16][17][18][19][20][21]. However, most studies were conducted in urban settings over a short period of time, which make them deficient in detecting the pattern of the disease distribution in predominantly rural areas.
TB Reports from Ethiopia show variations in the trends and case notification rates by region [22], although little is known whether the variations are due to the spatial and spatio-temporal pattern of the disease. Moreover, the reports are based on basic management units (BMU). The BMU reports may include cases outside of the administrative catchment or miss cases from their catchment enrolled in neighboring health facilities, which could cause over-or underreporting. The reason for this is because patients could cross the administrative boundaries for seeking health services due to access, quality of care and preference of the patients. The national TB prevalence survey of Ethiopia reported a lower smear-positive TB prevalence (108 per 100,000 people) than WHO estimates [23]; however, the report did not show the spatial distribution and burden of the disease within-and between the lower administrative units.
Consequently, the national TB program implements similar interventions across settings regardless of the burden of the disease in the community, which could be due to a lack of evidence on the distribution pattern of the disease in different settings. Furthermore, information about the spatial distribution of the disease is limited, with the exception a single publication reporting a spatio-temporal variation in the northern part of the country [16]. Understanding the spatial pattern and spatio-temporal variations of the disease in wider geographic settings, including urban-rural areas, may help policy and decision-making in resource-constrained settings such as in Ethiopia. As a result, we aim to assess the spatial distribution and look for the spatio-temporal clustering of the disease over the past 10 years in the Sidama Zone in southern Ethiopia.
administrations with a population of over 3.4 million and covers a geographic area of 6,982 square kilometers [24], while 92% of the population live in rural areas (Fig 1). There were 563 kebeles, which are the smallest administrative units within districts, with a population of 5,000 on average. In the study area, there were 39 urban and 524 rural kebeles.

Data collection and analysis
Data were collected from August 2012 to February 2013. We collected the data from unit TB registers from all health facilities that provided Directly Observed Treatment Short-term (DOTS) services from 2003 to 2012, and matched individual cases to their place of residence using codes given by the Central Statistical Agency of Ethiopia (CSA). An address with similar names, but from other locations, was also identified and linked to their true address using location codes. The data collection was carried out using a semi-structured pretested questionnaire by university graduates after four days of practical training. The data were double entered and checked by the principal investigator (PI) and health management information system (HMIS) experts. In addition, the data were checked by year, district and health facilities against unit TB registers for consistency and completeness throughout the entire data collection process.
The data were entered into Microsoft Access, and the analysis was done using SPSS 20. The number of cases and patient information entered in each year were checked page-by-page and by year of treatment with the information in the TB registry. We carried out the exploratory analysis, looked for errors and corrected them. The corrected errors were a duplication of cases, or incomplete or missing information, while smear-positive PTB CNRs were computed for each district and kebele. We obtained CNRs by dividing the number of TB cases reported by health facilities over the population of a given year and multiplied by 100,000 to obtain the CNRs [1], and we used the total population of each kebele and district to calculate the respective CNRs of smear-positive PTB cases. Patient records or information was anonymized and de-identified prior to analysis.
Kebele centroids were used to represent a geographically weighted central location as coordinates. We also prepared an attribute table containing the population for each kebele, the number of cases, the case notification rate and the coordinates, and joined the variables of interest to ArcGIS 10.1. The coordinates' projection was defined using the World Geodetic System (WGS) 1984, Universal Transverse Mercator (UTM) Zone 37°N. All TB cases were geocoded and matched to the kebele level layers of polygon and point using ArcGIS 10.1. The mean and each year's CNRs of TB in the districts and kebeles were computed. Spatial empirical Bayes smoothing (SEBS) was also applied in geographic data analysis tools (GeoDa) in order to overcome a variance instability in small areas, which is due to differences in population size, as well as few cases of disease in some areas [25].
In using the SEBS technique, the prior distribution to correct for the variance in instability is localized and based on a locally varying reference mean and variance. We used the number of smear-positive PTB cases as an event (numerator) and population for each location as a base (denominator) variable. A queen weights matrix (which defines the location's neighbors as those with either a shared border or vertex) was used for spatial weights [25]. We used box plots as well as a comparison of raw and smoothed rates to help assess the sensitivity of the smoothing. Furthermore, we mapped the annualized and the smoothed rates [7] to explore the pattern of the disease distribution.

Spatial autocorrelation analysis
We applied the Global Moran's I statistics in ArcGIS 10.1 to help investigate the spatial autocorrelation and distribution pattern of smear-positive PTB in the study area. We also employed local Gi Ã statistics to examine the local level clusters and to determine the locations of clusters or hot spots. The Gi Ã statistics carry out the spatial analysis by looking at each feature within the context of a neighboring feature. The local sum for a feature and its neighbors is proportionally compared to the sum of all features. When the local sum is significantly different from the expected sum, and the difference is too large to be the result of random chance, a statistically significant Z score results [7,26]. We used the mean rate of smear-positive PTB as the input field.
The equation for the Gi Ã statistics is [26]: where x j the attribute value (CNR) for feature i, w i,j is the spatial weight, thus explaining the closeness between features i and j, n, which is equal to the total number of features and: X ¼ . Therefore, the G i Ã statistic is a Z-score.
The computed value of Gi Ã ! 1.96 and a P-value of < 0.05 were both considered to be a statistically significant high rate. Spatial analysis. A Kulldroff's scan statistic (SaTScan 9.2) [27] was used for spatial and space-time analysis. Kulldroff's scan statistics are a widely used tool for spatial and space-time cluster analysis for diseases in different settings [28,29]. The scan statistics carry out a cluster analysis and detect cluster size and locations, compute the relative risk (RR) and provide a Pvalue using Monte Carlo Simulation. We used the number of cases (the aggregated data of cases of smear-positive PTB at the kebele level), population and coordinates as input files, as well as the discrete Poisson model, with the assumption that the number of cases at each location was Poisson distributed with a known population at risk. Scan circles of various sizes, including the default setting in scan statistics, was used to identify the most likely spatial clusters of smear-positive PTB. For maximum spatial cluster size, the upper limit, which is 50% of the population at risk, was used. The likelihood ratio was calculated to measure a relative risk [27], and the most likely and secondary clusters were identified and reported when a P-value was less than 0.05. The results of the analyses were presented in tables and on the maps to depict the locations where unusually high rates of the disease have occurred.

Space-time analysis
The space-time scan statistic method applies a cylindrical window, in which the circular geographic base is corresponding to the space and height to time for potential clusters [27]. It assumes that the RR of smear-positive PTB was the same within the window compared to the outside. The Poisson probability model was used, in which the number of events in areas is Poisson-distributed according to a known population at risk [27]. The geographic size of the window was limited to half the expected number of cases, and the time was limited to the total time period. The test of significance was obtained from comparing the likelihood ratio test against a null distribution computed from a Monte Carlo Simulation. The number of permutations was set to 999, and P<0.05 was considered to be statistically significant. In 2010, an intensive case finding campaign was conducted in nine districts of the study area, and since 2011 a community-based active case finding intervention has been implemented in all districts in the study area to increase TB case notification. Thus, we carried out the space-time analysis for the period from 2003-2012 and sub-time-phases from 2009-2010 and 2011-2012 to help investigate the transmission pattern and a presence of recent space-time clusters of the disease.

Ethical clearance
We obtained ethical approval and clearance from the ethical review committee of the Public Health Research and Technology Transfer Support Process at the Regional Health Bureau of southern Ethiopia. We also obtained a letter of support from the Sidama Zone Department of Health to obtain information from all districts and health facilities. Personal identifiers of the cases were coded prior to analysis and medical records were kept in a secure place to help maintain the confidentiality of the clinical information of cases.

Results
A total of 37,333 cases were diagnosed and treated during the period from 2003 to 2012. Of these, 37,070 (99.3%) cases were from the study area, whereas 263 cases were from the neighboring areas. Most cases 22,545 (61%) were smear-positive PTB, while 7,996 (22%) were smear-negative and 6,464 (17%) were extra pulmonary TB. We used a total of 22,545 smearpositive PTB cases for spatial analysis over 10 years. The mean age (SD) of smear-positive PTB cases was 29 (SD = 14) years, and of the 22,545 smear-positive PTB cases, 10,296 (46%) were women and 12,240 (54%) were men, with a male to female ratio of 1:1.2. Ninety-five percent (21,302 cases) of the cases were new, and 5% (1,190 cases) were retreatment cases. Fifty eight percent of the cases were from seven districts: Boricha, Dale, Shebedino, Wondo Genet, Chuko, Aleta wondo and Bensa, with these districts constituting 48% of the study area population (Table 1). Urban areas account for 11% (2,448 cases), whereas the urban population was only 8% of the total population of the study area. Ninety-seven percent (21,793 cases) of the cases had a kebele address, with the exception of 3% (752 cases) with no kebele address, who were excluded from the spatial analysis.

Spatial autocorrelation analysis and spatial clustering of smear-positive PTB in the Sidama Zone
The Global Moran's I autocorrelation analysis showed that smear-positive PTB was significantly auto-correlated for each year (Table 3). In a purely spatial analysis, we identified a significant most likely cluster for a high occurrence of smear-positive PTB, which consisted of 192 locations in eight districts (Fig 5 and Table 4). The overall RR of the cluster was 2, with an observed number of 12,155 cases notified during 2003-2012, compared with 8,668 expected cases. We found secondary clusters of smear-positive PTB in the Wondo Genet, Aroresa, Hula, Chire and Bensa districts during 2003-2012, and all locations were in urban settings (Fig 5). The districts where the most likely cluster was identified accounted for 60% of cases reported during 2003-2012. We observed for the pattern and stability of spatial clusters in each year during the study period, and the clusters were stable in most districts except in 2010 and 2012 (Fig 6). The clusters were detected in the Shebedino, Dale, Aleta wondo, Dara, Hula, Boricha, Hawassa Zuriya, Wonsho, Loka Abaya and the Chuko districts (from 2003-2009 and 2011), the Dale district (in 2010) and the Chire district (the southeastern border of the study area) in 2012. Furthermore, the most likely clusters were accompanied by secondary clusters during the study period, and the secondary spatial clusters were detected in all years except in 2003 (Fig 6 and Table 5). The Gi Ã statistic also identified local clusters of smear-positive PTB in the same areas identified by scan statistics, except for differences in a few locations (Fig 7).

Space-time clustering
In a space-time cluster analysis of smear-positive PTB during 2003-2012, we found the most likely clusters at 193 locations in eight districts (RR = 1.92, p<0.001) with 7,584 observed and 4,738 expected cases (Fig 8 and Table 6). The locations for space-time clusters were the same with the locations in which the purely spatial clusters were detected, except for the secondary clusters. We looked into the pattern of recent space-time clusters in sub-time-phases from   Table 6).

Discussion
We found spatial and spatio-temporal clusters and variations in the distribution of smear-positive PTB in the northwestern and east central districts of the study area. These clusters were stable over the years with the exception of a few location differences. This could be explained by a high transmission over many years due to the existence of disproportionate high-risk factors, and a varying program performance. In our finding, most cluster locations were identified in urban areas, rural areas with a high population density, as well as neighboring areas close to towns and areas near road networks, which connect major towns. Various studies have reported that poor socioeconomic conditions such as social inequality, low income, poverty, poor housing conditions, overcrowding and social unrest could all be risk factors for the high burden and variations of disease occurrence [2,3,5,6,15,20,[30][31][32][33]. In addition, patient care factors [34] and poor access to health care and TB control services could also contribute to a high rate of the disease [35] since infectious cases may remain undiagnosed and may not acquire treatment, which could consequently contribute to the transmission dynamics of the disease. Furthermore, most urban kebeles where the clusters identified were the capitals of the districts, had market places, had public transportation routes and were the hub for different socio-economic activities. The better access to road and  movement using public transportation in a crowded and poorly ventilated environment may assist in facilitating contact with infectious cases, which could favor a transmission of the disease in the locations where clusters were detected [36][37][38]. Moreover, TB can be associated with HIV and other risk factors such as a decreased immunity [39,40]; however, in our data we could not compare the clustering of smear-positive TB with the geographic distribution of  HIV prevalence because the data on spatial distribution of HIV were not available. Therefore, the inclusion of these factors in cluster analyses in the future may help to improve our understanding of their effect on the clusters of the disease in the study area. The space-time statistic also identified clustering in the same locations detected by the purely spatial scan statistics and Gi Ã statistics. Our finding was in agreement with other studies that report the existence of both spatial and space-time clusters in the same areas, which could support the evidence of an uneven distribution and burden of the disease [8,9,13,15]. Likewise, both methods of the local analysis of disease clustering (Kulldroff's scan statistics and Gi Ã statistics) have identified unusually high rates of smear-positive PTB in the same districts, with the exception of a few differences in the number of locations. Therefore, the methods (spatial, space-time and Gi Ã statistics) can be useful and robust tools to identify and detect areas of unusually high disease occurrences. Moreover, cluster analysis and mapping of the disease distribution could add value beyond that which can be obtained by presenting the disease rates in a table, as a cluster analysis helps identify areas with unusually high disease rates, which have less likely occurred by chance [7].
A study from Mexico reported that improving TB control efforts could help reduce the transmission and change the geographic distribution of TB [17]. In our study, despite different intervention programs aimed at reducing disease transmission and improving case detection over many years, the unusually high rates of the disease persisted in the same places, with the most likely spatial clusters showing a stable pattern in the preceding years during the study, except in 2010 and 2012. This could explain, at least in part, that the interventions may not be properly focused on influencing the disease epidemiology or could be due to a low case finding and poor treatment of infectious cases, which may indicate the continued transmission of the disease. In 2008, a new secondary spatial cluster was identified along the southwestern border of the study area. This area is in an urban setting with neighboring kebeles and the area has a marketplace, better road and public transportation access, a high population density and a lower altitude compared with other areas in the southwest, which could contribute to a higher CNR. In 2010, there was a change in the distribution pattern of the clusters. This could be because of an intensive case finding campaign conducted in nine districts in the study area, which could contribute to the observed change in the distribution pattern of the disease. Since 2011, a community-based TB case finding intervention has been launched in all districts, and the intervention has improved access to TB care and increased the CNR in the study area [41]. Thus, after one year of implementation of the intervention in 2012, the most likely cluster shifted to the southeastern border (the Chire district) of the study area; nonetheless, the secondary clusters persisted in locations where the most likely clusters were detected in the preceding years. This could be due to the fact that the access to-and utilization of health services have been limited, and that the intervention increased access to the services, as reflected by the increased case finding. This implies that devising a focused intervention in the future based on the disease burden could be effective in tackling the transmission in areas where the clusters were detected. On the other hand, an improved surveillance, and an improved access to-and utilization of TB control services may help increase TB case notification rates or could contribute to a decline in the disease transmission, which might also affect the disease notification rates. Further investigation is therefore needed to better understand the role of health service access to the case notifications and clustering of smear-positive PTB in the study area.
Conversely, the eastern-, east central-and southern parts of the study area had lower smearpositive PTB rates than the west central-and northern parts of the study area. This is possibly due to poor access to TB control services or possibly due to a lower burden of the disease. Studies have shown that environmental factors such as altitude also correlated with the incidence of tuberculosis [42,43], and further analysis is required to help understand the factors that could contribute to the lower notification rates such as access to-and the availability of TB control services, as well as environmental-and socio-economic factors.
It has been suggested that the "one fits all" interventions could not be equally effective in different areas with variations in disease occurrence because areas with high disease rates, which are less likely to have occurred by chance, may need more attention than areas with a low risk for targeting interventions. Therefore, our findings suggest that policymakers and health authorities should better understand and prioritize areas that need attention for focused interventions, as well as for strengthening TB surveillance.
The limitations of our study were that the study was not a population-based survey, instead based on surveillance data of smear-positive PTB cohorts. Hence, cases that were not diagnosed, or with a delayed diagnosis and smear negative but culture positive cases that were not captured by sputum-smear microscopy, could be missed and underestimate the CNRs. Cases may not be registered after being diagnosed; however, the possibility of missing was rare because it is mandatory for health facilities to link TB cases to DOTS services and to report to the next administrative levels once the cases are diagnosed and the treatment is free of charge.
The strengths of our study were that the study was based on true CNRs since cases were linked to their home address; in addition, cases that were registered outside the study area (nearby areas) were also linked to their actual address in the study area. We used the kebele (small scale) as a unit of analysis and included urban and rural settings, which helped in improving our understanding of the spatial epidemiology of the disease in a wider geographic context. The long study period (10 years) enabled us to assess the spatial pattern and stability of clusters of the disease in the study area. Lastly, missing information in our data for the unit of analysis was only 3%, so the percentage was too small to affect our results.

Conclusion
We found spatio-temporal clusters and spatial variations of smear-positive PTB in the Sidama Zone. As a result, TB in the study area did not uniformly occur in different geographic settings, and exhibited a non-random distribution. The findings can be used to guide TB control programs to help devise effective TB control strategies for the geographic areas characterized by the highest CNRs. Further investigations based on individual level locations are needed to identify the presence of localized spatial clustering and causes for unusually high rates in those areas by incorporating socioeconomic factors, type of TB strain and access to health services so as to improve our understanding of the possible causes for unusually high disease rates. Supporting Information