Optimizing community screening for tuberculosis: Spatial analysis of localized case finding from door-to-door screening for TB in an urban district of Ho Chi Minh City, Viet Nam

Background Tuberculosis (TB) is the deadliest infectious disease globally. Current case finding approaches may miss many people with TB or detect them too late. Data and methods This study was a retrospective, spatial analysis of routine TB surveillance and cadastral data in Go Vap district, Ho Chi Minh City. We geocoded TB notifications from 2011 to 2015 and calculated theoretical yields of simulated door-to-door screening in three concentric catchment areas (50m, 100m, 200m) and three notification window scenarios (one, two and four quarters) for each index case. We calculated average yields, compared them to published reference values and fit a GEE (Generalized Estimating Equation) linear regression model onto the data. Results The sample included 3,046 TB patients. Adjusted theoretical yields in 50m, 100m and 200m catchment areas were 0.32% (95%CI: 0.27,0.37), 0.21% (95%CI: 0.14,0.29) and 0.17% (95%CI: 0.09,0.25), respectively, in the baseline notification window scenario. Theoretical yields in the 50m-catchment area for all notification window scenarios were significantly higher than a reference yield from literature. Yield was positively associated with treatment failure index cases (beta = 0.12, p = 0.001) and short-term inter-province migrants (beta = 0.06, p = 0.022), while greater distance to the DTU (beta = -0.02, p<0.001) was associated with lower yield. Conclusions This study is an example of inter-departmental collaboration and application of repurposed cadastral data to progress towards the end TB objectives. The results from Go Vap showed that the use of spatial analysis may be able to identify areas where targeted active case finding in Vietnam can help improve TB case detection.


Results
The sample included 3,046 TB patients. Adjusted theoretical yields in 50m, 100m and 200m catchment areas were 0.32% (95%CI: 0.27,0.37), 0.21% (95%CI: 0.14,0.29) and 0.17% (95%CI: 0.09,0.25), respectively, in the baseline notification window scenario. Theoretical yields in the 50m-catchment area for all notification window scenarios were significantly higher than a reference yield from literature. Yield was positively associated with treatment failure index cases (beta = 0.12, p = 0.001) and short-term inter-province migrants (beta = 0.06, p = 0.022), while greater distance to the DTU (beta = -0.02, p<0.001) was associated with lower yield. PLOS  Introduction Tuberculosis (TB) is one of the most intractable public health challenges and a leading cause of avoidable deaths worldwide. In 2016, there were an estimated 10.4 million incident cases of TB worldwide and 1.7 million TB deaths.
[1] Despite advances in treatment and prevention programs, TB incidence is declining only at 1.65% per annum. [2] At this rate, global TB elimination may only be achieved near the end of the 22 nd century. [3] A major cause for the slow decline is the estimated 4 million people who develop TB annually who are missed by national TB control programs (NTPs). For this reason, early detection and improved diagnosis of TB is a vital component of WHO's End TB strategy. [4] burden in geographic proximity to index patients, possible cluster size of such TB hotspots or the number needed to screen to find a TB case within these disease clusters. [43,44] We similarly lack evidence on targeting proximal neighbor(hood) contacts for TB screening. [5,45,46] This opens up the avenue to explore "catchment areas" as means to quantify the number needed to screen (NNS) and theoretical yield from targeted ACF. However, to apply spatial restrictions methods such as catchment areas using the concentric circle approach will require the ability to quantify the potential effectiveness and to assess the associated resource implications. [47][48][49] The advent of GIS and mobile communications technology in low-and middle-income countries warrants applying them to screening activities. We conducted an exploratory analysis of spatial and temporal relations of notified TB cases in an urban district of Ho Chi Minh City, Viet Nam, to quantify the search parameters, i.e., size of the area and incubation time after notification, and propose a pragmatic neighborhood contact screening strategy that could improve coverage while maintaining an acceptably high yield.

Study design & aims
This study was a retrospective cross-sectional, spatial analysis of routine TB surveillance and digital cadastral data. The aim of this study was to determine the existence and size of a catchment area around an index case, in which door-to-door screening could theoretically yield significantly more cases compared to population-wide screening. Our objectives were to assign individual GPS coordinates to all index cases in our sample and calculate the average theoretical yield from simulated door-to-door screening in three catchment area sizes, which we then compared to a published reference value. Lastly, we identified secondary index case parameters that were positively and negatively associated with theoretical yield.

Study setting
The study took place in Go Vap district, Ho Chi Minh City (HCMC). Go Vap is an urban district with a population of 650,000 people in an area of~21km 2 for a population density of approximately 31,000 persons per km 2 . As such, Go Vap is one of the most densely population areas in HCMC (4,020 persons/km 2 ) and Viet Nam (290 persons/km 2 ) overall. In 2014, 768 TB cases were notified in Go Vap for a notification rate of 136.7 per 100,000 people compared to 188.3 in HCMC and 112.8 in Vietnam. [50]

Data sources & processing
The Center for Applied GIS of Ho Chi Minh City within the HCMC Department of Science and Technology has a GIS database of the city with detailed geospatial vector data (shapefiles) of real estate property lots. [51] In this study, we repurposed this database, which typically informs urban planning and construction projects, for public health use. We included all drug-sensitive and drug-resistant TB cases notified in Go Vap between 1 November 2011 and 30 November 2015 with a recorded address in the district. These patients served as index cases for analysis throughout the study. We excluded patients re-enrolled immediately, i.e., within 1 month subsequent to a recorded treatment failure. We automated geocoding of patient addresses using a matching algorithm, manually geocoded addresses that did not produce an exact match. We excluded patients from the sample for whom automated and manual geocoding was unsuccessful. Using the GIS software, we then calculated concentric catchment areas around each patient's residence with radii of 50m, 100m and 200m. We selected these discrete radii as they corresponded to the amount of time (a week, a month and a quarter, respectively) an outreach worker needed to screen all households based on prior experience. [52] A catchment area with radius 50m included on average 74.5 (IQR: 49-95) households, while areas of 100m and 200m included an average of 250.5 (IQR: 184-311) and 873.9 (IQR: 696-1062) households, respectively.
For each index case, we counted the number of TB cases who resided inside the three catchment areas and who were notified after the index case within three predefined notification windows (one, two and four quarters). We used the total number of real estate property lots as a proxy for households in each catchment area. We counted a property lot to lie within the catchment area, if it included any part of the property boundary vectors. We multiplied the national average urban household size with the number of property lots to enumerate the estimated population in each catchment area. [53] We calculated the theoretical yield from doorto-door screening in these areas as the proportion of notified TB cases over the estimated number of residents in each catchment area (S1-S4 Figs).

Data analysis
We calculated summary statistics for the counts of notifications and households, and descriptive statistics for the patient covariates in the sample. The dataset consisted of multidimensional panel data with nine repeated measures for each index case (three catchment areas and three notification windows). The response variable, theoretical yield for each index case and catchment area-notification window combination, showed a semi-continuous, negative binomial distribution. We used generalized estimating equation (GEE) methods to adjust standard errors for non-normality and within-subject correlation of the repeated measures. [54,55] We chose GEE methods over mixed effects models due to the time-invariant nature of the study and its parameters, the large sample with limited repeated measures and the population-level nature of the response variable. [56] Given its continuous nature and the lack of missing data in the primary exposure and response variables, the analyses were not exposed to typical GEE limitations. [57] We used univariate regression to describe the association of theoretical yield (primary response) and catchment area size and notification window (primary exposures), and secondary covariates. Aside from age and gender, secondary covariates with a p-value of less than 0.2 in the univariate GEE regression model were fitted in the multivariate model. The model accounted for interaction between the two primary exposure variables. Based on the Quasi-likelihood Information Criterion (QIC) we used an exchangeable correlation structure and a Gaussian variance function as model specifications and calculated localized, theoretical detection yields from the model's coefficients. [58,59] We expressed these theoretical TB detection yields in terms of number needed to screen (NNS), which was calculated as the inverse of the theoretical yield.

Ethical considerations
We obtained written permission for analysis of the TB patient data from the Go Vap District Preventive Health Center, the administrative authority of the District TB Unit and legal owner of the data. The ethics committee of the Ho Chi Minh City Provincial HIV/AIDS Committee provided ethical approval for this study.

Results
In the period from 2011 to 2015, the Go Vap District TB Unit notified 3,133 TB people with TB. Among these, 79 people were retreatment cases who enrolled immediately after treatment failure and did not meet the inclusion criteria. We geocoded 2,513 (82%) cases automatically and 533 (18%) manually locating them at the nearest main street and primary alley. We were unable to geocode 8 addresses, so the final sample size included 3,046 people (99% of those notified). Table 1 shows descriptive statistics of the sample. About one-third (n = 976) of notified cases were female and median age was 40 years (IQR: 28-52) years. People with TB/HIV coinfection comprised 6% (198) of cases, while 5% (145) reported comorbid diabetes. The majority of the sample was unemployed (42%, 1,275) or employed as unskilled or semi-skilled labor (44%, 1,320). Temporary residents comprised 29% (869) of the sample, among whom the majority (675) consisted of short-term, inter-province migrants, defined as persons whose household registration is in a district or province different from the one in which they currently reside. The median distance to the DTU was 2.8km (IQR: 1.5, 3.5). Smear-positivity characterized 54% (1,631) of cases, among whom 2% (39) were diagnosed with MDR-TB while about a quarter of the cases were extra-pulmonary TB. Previously untreated cases Notes ¥ Individual parameters may include missing data, which were excluded from the regression analysis ¶ Human Immunodeficiency Virus/Acquired Immunodeficiency Syndrome § AFB(+) = Sputum smear positive; AFB(−) = Sputum smear negative; EP = Extra-pulmonary TB # LTFU = Loss to Follow-up ┼ Inbound transfers and referrals with prior uncertain exposure to anti-TB drugs.
In addition to the strong association between theoretical yield and the two primary exposures, catchment area size and notification window, results from the fitted GEE linear regression model in Table 3 displayed associations between theoretical yield and other index patient characteristics. The model showed a significant negative association between theoretical yield and distance to DTU (beta = -0.02, p<0.001) higher theoretical yield among short-term interprovince migrants (beta = 0.06, p = 0.022) and among people for whom treatment failed (beta = 0.12, p = 0.001).

Discussion
This was an exploratory study using the combination of routine TB surveillance and urban planning data to explore opportunities to optimize active case finding for TB. We found no other studies that have attempted this enumeration at the level of index patient household or individual catchment area. It is clear that ACF will be needed to reach the people with TB that are currently missed by facility-based case finding NTPs use since their reach is limited. [1,4,7,9,60] However, ACF is inherently more expensive and indiscriminate measures are not productive. [61][62][63][64]  Approaches that improve the yield of ACF interventions are needed. We used estimated localized notification rates, i.e., TB cases notified over the total estimated population in a catchment area, to illustrate spatial heterogeneity and the existence of TB disease clusters in our sample as an alternative to standard spatial autocorrelation methods, which may be included in further analyses. [65,66] In a review of ACF interventions, the weighted mean NNS for community and population-wide screening was 603 corresponding to a detection yield of 0.17%. [67] The theoretical yields in the 50m catchment area size across all notification window scenarios were significantly higher (S5 Fig), but on their own, are unlikely to merit community screening.
Our results suggest that hotspots may be identified even in catchment areas with 100m and 200m radii in densely populated urban settings, if a sufficiently long notification window were permitted. An important consideration for the economic viability of this type of spatially restricted door-to-door screening involves the identification and avoidance of zero-yield catchment areas. Our results showed that neighborhood contact screening in a catchment area of 50m within a quarter of notification would have yielded no additional case in almost four-fifths of index patients. Targeting the correct index cases and avoiding zero-yield catchment areas increased yield in our study 2-3 times. Given the rudimentary nature of the routine surveillance data available for this study, we identified only three index patient covariates that were significantly associated with theoretical yield and may improve targeting of the right catchment areas. One of these parameters was treatment failure of the index patient. This parameter was positively associated with yield, possibly linked to prolonged transmission. Short-term, temporary residency status was also associated with non-zero community cases which may be a function of the propensity of economic migrants to reside in boarding homes and urban slum communities upon arrival in the city. [68][69][70][71][72] Concordant with other studies, temporary residency may be an appropriate indicator for the higher likelihood of finding a TB hotspot. [73][74][75][76][77] The third significantly, albeit negatively, associated parameter was index case distance to the District TB Unit. This finding may relate to our use of routine notifications, which evidence has shown to be lower at greater distances from the TB treatment facility. [78,79] As such, this result may imply a localized under-detection rather than under-representation of TB patients.
In summary, the results of this study suggest that conducting door-to-door screening in a 50m radius around an index case with temporary residency status and history of treatment failure may be an economically viable strategy to expand coverage at acceptable case detection yields. In concordance with our results, studies have similarly evaluated and identified neighborhood contacts [80], and specifically those within 50m of an index case [43], to comprise a viable target population for intensified screening with productive yields.
While the government of Viet Nam passed legislature with the goal of reducing TB prevalence to 20 per 100,000 by 2030, the current prevalence and rate of reduction of 4.6% suggest that the country may miss the projected deadline by over three decades. [81,82] Implementing neighborhood screening in 50m catchment areas around retreatment and migrant index cases may be one rapidly implementable strategy to bend the curve. This strategy may also be applicable outside of Viet Nam in other high-burden countries with similar urbanization trends and sociocultural attributes. Studies have shown that proximal clustering of first-degree relatives and high degrees of social interaction in the immediate neighborhood and neighborhood establishments, e.g., bars, cafes, karaoke shops, are significant contributors to tuberculosis transmission, particularly in high burden settings. [83][84][85][86] However, given the limited geographic scope and retrospective nature of the study, further research on this topic seems warranted. A follow-up study on this subject could aim to validate prospectively the theoretical yields obtained from our analysis. Such a prospective study may employ rapid molecular diagnostics instead of smear microscopy for diagnosis of TB in the neighborhood contact and genotypic fingerprinting for validation of the index case as the source of transmission. The prospective study could further evaluate interventions with different types of intensity. For example, the study could evaluate the effectiveness and cost effectiveness of screening a catchment area using an approach based on mobile radiography units rather than door-to-door screening by community health workers.
An inherent inaccuracy of this study is that theoretical yield should be discounted for cases notified through routine case finding over time. In our sample of routine notification data from 2011-2015 in Go Vap, we identified a total of 356 (12% of total notifications) household contacts living in 170 households. Of the 170 households, 155 (91%) included the index case and one other notified household contact. In 14 (8%) households there were three notified patients and one household contained four notified cases. Identifying households with multiple notified cases may help identify "super-spreaders" for whom more intensified outbreak investigation may be warranted. [87,88] A nationally representative cluster-randomized controlled trial on facility-based household contact investigation conducted in Viet Nam reported a relative risk between active and routine household contact investigation of 2.5, suggesting that approximately 40% of household contacts may be notified through routine case finding, while the remainder would have been missed or detected later. [9] We re-analyzed the dataset excluding all 356 household contact notifications. The theoretical yields of this subset across all nine catchment area-notification window scenarios did not change significantly. This suggests that household contact investigation may not affect the yield of catchment area screening at a population level, likely due to the limited proportion of TB cases stemming from intra-household transmission compared to other community sources in moderate and high prevalence settings.
This study has several limitations. The theoretical yields are based on passive notification in the public sector, primarily detected with microscopy, all factors associated with under-detection of incident cases[1, 89,90], meaning the yields are likely underestimates. One of the variables explaining lack of community cases was distance to the health facility, which may also mean that people with TB were missed by the passive system rather than a true association with fewer community cases since other studies measuring this have shown similar results. [75] In addition, we did not differentiate between residential and commercial property lots. We were also not able to differentiate between property lots with single-family or congregate housing, particularly informal boarding home communities. The analysis did not take into consideration property lot sizes for the population estimate of the catchment areas. These factors may have contributed to an over-or underestimation of the total number of residents in a catchment area and subsequently theoretical yield. However, uncertainties may have been mitigated by the large sample size of index cases and high granularity in the cadastral data. Transmission patterns via genotyping of notified cases and health-seeking behaviors through additional primary data collection as used in similar spatial analysis studies may help our understanding of the results. [91,92]

Conclusions
To reach the people with TB currently missed by NTPs, we need new strategies that detected people with TB earlier and in greater numbers. There is strong agreement that eliminating TB will require intensified TB case finding beyond the status quo. Using geospatial mapping to create models to enhance theoretical case finding yields may be useful to optimize active case finding approaches.