Genotyping and spatial analysis of pulmonary tuberculosis and diabetes cases in the state of Veracruz, Mexico

Background Genotyping and georeferencing in tuberculosis (TB) have been used to characterize the distribution of the disease and occurrence of transmission within specific groups and communities. Objective The objective of this study was to test the hypothesis that diabetes mellitus (DM) and pulmonary TB may occur in spatial and molecular aggregations. Material and methods Retrospective cohort study of patients with pulmonary TB. The study area included 12 municipalities in the Sanitary Jurisdiction of Orizaba, Veracruz, México. Patients with acid-fast bacilli in sputum smears and/or Mycobacterium tuberculosis in sputum cultures were recruited from 1995 to 2010. Clinical (standardized questionnaire, physical examination, chest X-ray, blood glucose test and HIV test), microbiological, epidemiological, and molecular evaluations were carried out. Patients were considered “genotype-clustered” if two or more isolates from different patients were identified within 12 months of each other and had six or more IS6110 bands in an identical pattern, or < 6 bands with identical IS6110 RFLP patterns and spoligotype with the same spacer oligonucleotides. Residential and health care centers addresses were georeferenced. We used a Jeep hand GPS. The coordinates were transferred from the GPS files to ArcGIS using ArcMap 9.3. We evaluated global spatial aggregation of patients in IS6110-RFLP/ spoligotype clusters using global Moran´s I. Since global distribution was not random, we evaluated “hotspots” using Getis-Ord Gi* statistic. Using bivariate and multivariate analysis we analyzed sociodemographic, behavioral, clinic and bacteriological conditions associated with “hotspots”. We used STATA® v13.1 for all statistical analysis. Results From 1995 to 2010, 1,370 patients >20 years were diagnosed with pulmonary TB; 33% had DM. The proportion of isolates that were genotyped was 80.7% (n = 1105), of which 31% (n = 342) were grouped in 91 genotype clusters with 2 to 23 patients each; 65.9% of total clusters were small (2 members) involving 35.08% of patients. Twenty three (22.7) percent of cases were classified as recent transmission. Moran`s I indicated that distribution of patients in IS6110-RFLP/spoligotype clusters was not random (Moran`s I = 0.035468, Z value = 7.0, p = 0.00). Local spatial analysis showed statistically significant spatial aggregation of patients in IS6110-RFLP/spoligotype clusters identifying “hotspots” and “coldspots”. GI* statistic showed that the hotspot for spatial clustering was located in Camerino Z. Mendoza municipality; 14.6% (50/342) of patients in genotype clusters were located in a hotspot; of these, 60% (30/50) lived with DM. Using logistic regression the statistically significant variables associated with hotspots were: DM [adjusted Odds Ratio (aOR) 7.04, 95% Confidence interval (CI) 3.03–16.38] and attending the health center in Camerino Z. Mendoza (aOR18.04, 95% CI 7.35–44.28). Conclusions The combination of molecular and epidemiological information with geospatial data allowed us to identify the concurrence of molecular clustering and spatial aggregation of patients with DM and TB. This information may be highly useful for TB control programs.

Introduction Tuberculosis (TB) remains one of the main causes of morbidity and mortality in low-and medium-income countries, where the number of individuals with diabetes mellitus (DM) is rapidly increasing [1,2]. In 2017, TB incidence rate in Mexico was 17 per 100,000 inhabitants indicating that the disease continues to represent a public health problem; while DM prevalence of 9.17% among individuals older than 20 years of age ranks sixth among adults worldwide [3,4]. The convergence of both diseases in Mexico has led the International Diabetes Federation (IDF) to conclude that more than 10% of TB patients can be attributed to DM [5].
Many studies have explored the association between DM and TB, including a recent systematic review demonstrating that the risk of TB among people with DM triples that of people without DM [6]. Moreover, available evidence indicates that DM comorbidity worsens the clinical outcomes of TB patients [7,8].
Genotyping of Mycobacterium tuberculosis together with conventional epidemiological methods has contributed to the characterization of strains [9][10][11] and has broadened the understanding of dynamics of transmission of TB in different places and specific populations [12][13][14]. Current evidence suggests that in addition to individual and social risk factors for genetic grouping of TB cases [12,[15][16][17][18][19], several ecologic, geographic, climatic and socioeconomic factors also have a critical impact on TB prevalence [20][21][22].
Geographical information systems (GIS) have become important tools in research and planning of TB programs [23][24][25]. GIS can be used to map rates of disease, define populations at risk, identify outbreaks and map social and environmental risk factors [23,26]. Spatial aggregation of TB patients has been shown to occur both in high and middle and low-income settings [24,[26][27][28].
Several research groups have combined genotypes with geospatial data to characterize TB distribution and transmission within specific groups [28][29][30][31][32]. To our knowledge, the hypothesis that recent transmission and spatial aggregation occurs among TB patients with DM has not been previously tested [33,34], although clinical manifestations among patients with TB and DM such as delayed sputum and culture conversion [8,35,36], higher likelihood of pulmonary (versus extra-pulmonary) forms [37] and cavitation [8] due to dysfunctional innate and acquired immune system indicate that patients with DM might have an important role in TB transmission. In addition, patients with DM and hyperglycemia have a greater risk of infection and disease [38], which would increase their likelihood of participating in chains of transmission. Identification of high-risk subpopulations and areas where TB transmission is occurring would be useful to focus prevention and control strategies where they might be more effective.
Since 1995, we have been conducting a population-based study of pulmonary TB in Southern Mexico where almost one-third of TB patients have been previously diagnosed with DM [39]. The purpose of the present study was to test the hypothesis that DM and pulmonary TB may occur in spatial and molecular aggregations. If confirmed, spatially heterogeneous interventions may be considered to interrupt transmission in specific locations. [40].

Study population and recruitment
We carried out a retrospective analysis of a prospectively recruited cohort of pulmonary TB patients. The methods of selection, recruitment and follow up have been previously described [8,39]. The study area includes 12 municipalities of the Sanitary Jurisdiction of Orizaba in the State of Veracruz, Mexico. The site has an area of 618.11km2 with 413,223 inhabitants, of which 26.3% live in rural communities [8]. TB incidence in Veracruz ranked among the ten highest in Mexico in 2015 with 27.3 cases per 100,000 inhabitants [3]. In 2012, state prevalence of DM among individuals 20 years and older ranked third in Veracruz with 10.7%. [4]. Health coverage for the study area is provided by public and private institutions. Public institutions include the Ministry of Health (SSA) for the uninsured population, Seguro Popular which is a program of comprehensive national health insurance for the previously uninsured; the Mexican Social Security Institute (IMSS) for the employed population; the State Workers Social Security Services Institute (ISSSTE) for government employees and Petróleos Mexicanos (Pemex) for oil workers. Because antituberculosis medications are available free of charge through the public health institutions, private physicians provide care for relatively few tuberculosis patients. Primary health care is provided through 48 rural and urban health centers.
Briefly, between March 1995 and April 2010, ongoing recruitment was performed by community health workers who were trained to identify patients with cough persisting more than two weeks among members of the community. Additionally, shelters, jails, orphanages, and self-support groups for users of alcohol or illegal drugs and patients living with diabetes were visited periodically to explain the purpose of the study and identify patients with respiratory symptoms. Study personnel invited participation of patients older than 20 years of age with acid-fast bacilli (AFB) in sputum smear or Mycobacterium tuberculosis in sputum culture. Each consenting patient completed a clinical exam (including a standardized questionnaire, physical examination, chest x-ray, blood glucose and HIV test) and provided sputum samples for microbiological and molecular tests. Subsequent episodes of TB in the same patient were also documented. Chest x-rays were independently evaluated by certified radiologists. Trained personnel administered previously validated standardized questionnaires. Sputum cultures were conducted from 1995 to 1999 in samples with acid-fast bacilli (AFB), from 2000 to 2005 in all sputum samples, and from 2005 to 2010 in previously treated TB patients, household contacts of patients with drug resistance and patients with DM.
Drug susceptibility test results were sent to treating physicians. Patients received treatment in the local health centers. The majority of patients received treatment in a single center.

Definitions
Diabetes mellitus: Patients were considered to have DM if they had received a previous diagnosis from a physician, had been prescribed oral hypoglycemic medication or insulin before TB diagnosis or had resulted with 200 mg/dl or higher of glucose in a random test when diagnosed with TB.
Voluntary HIV testing and counseling was offered to all participants. Results were informed to the patient. In case of positive results, he/she was referred to receive appropriate treatment. Testing for HIV was done as per the Mexican HIV Prevention and Control Program using two different tests [41]. All positive results were confirmed by Western blot. Previous HIV diagnosis was also considered.
Rural residence (towns with less than 2,500) and homelessness were defined as in the Population and Household Census [42]. Usage of alcohol (> 10 drinks per week), smoking (> 10 cigarettes per week), usage of illegal drugs, (marijuana, cocaine and its derivatives, heroin, methamphetamines, hallucinogens, inhalants and other drugs) were defined as in the National Survey of Addictions (NSA) (SSA, 2005). This information was obtained through self-report.
Body mass index was calculated as weight in kilograms divided by the square of the height in meters (kg/m 2 ). To evaluate health care access, we assessed the distance to the nearest health center and the time elapsed between the onset of symptoms and the beginning of treatment.

Mycobacteriology and genotyping
Ziehl Neelsen stain, mycobacteria culture, species identification and drug susceptibility tests (DST) were conducted using standardized procedures [43]. Isolates were genotyped and compared using IS6110-based restriction fragment-length polymorphisms (RFLP) and spacer oligonucleotide typing (spoligotyping). If the isolate's IS6110 RFLP patterns had fewer than 6 bands [44] patients were considered "genotype-clustered" if two or more isolates from different patients were identified within 12 months of each other and had six or more IS6110 bands in an identical hybridization banding pattern, or < 6 bands with identical IS6110 RFLP patterns and a spoligotype with the same spacer oligonucleotides. [45]. We used a single isolate for each episode of TB. We used the "n minus one" method to estimate ongoing transmission [46]. Microbiology and molecular tests were conducted at the Mycobacteriology Laboratory of the Instituto Nacional de Ciencias Médicas y de Nutrición Salvador Zubirán.

Spatial analysis
We georeferenced all patient's residential addresses at the time of diagnosis. We also georeferenced all health centers in the study area. We used a Jeep hand GPS. The coordinates were transferred from the GPS files to ArcGIS using ArcMap 9.3. The statistical analysis was done using ArcMap 10.3.
We analyzed spatial aggregation of genotype-clustered patients (defined as described above: IS6110-RFLP/spoligotype clustered TB patients identified within 12 months of each other) and patients harboring unique patterns using global Moran's I, which is a measure of spatial autocorrelation [47]. This tool tested for the occurrence of spatial aggregated, dispersed, or random patient distribution. Moran's I values vary from -1 to 1, with 1 being the maximum positive association and -1 for the maximum negative association. There was no correlation when the value was 0; a higher positive value indicated a stronger spatial correlation; negative values meant weaker spatial correlation. Z-score was used to evaluate the significance of the estimation of Moran's I. We considered TB cases to be spatially and statistically aggregated when Moran's I was >0 and Z-score is !1,96 [48].
Hotspot analysis. Since the results of Moran´s I showed a non-random distribution, we investigated whether patients forming genotype clusters showed local indicators of spatial aggregation (LISA or hotspots) using the Getis-Ord Gi Ã tool [49,50]. This tool identifies spatial aggregations which are statistically significant with high scores (hotspots) and low scores (coldspots) and confidence level bins (Gi_Bin). Features in the +/-3; +/-2 and +/-1 bins were statistically significant at the 99%, 95%, and 90% confidence level respectively [51]. Spatial aggregation for features with 0 for the Gi_Bin field was not statistically significant. To determine spatial aggregation using this tool, we used the identifier of each IS6110-RFLP/spoligotype molecular cluster as "Analysis Field." Therefore, hotspots refer to spatial aggregations of patients in the same IS6110-RFLP/spoligotype molecular cluster. We also analyzed data filtering the study population (in molecular clusters) according to whether patients had been diagnosed with DM or not. It is important to note that this method does not allow analysis of patients with unique patterns since according to our genotype cluster definition (two or more patients sharing identical patterns), patients with unique patterns were not assigned an identifier of IS6110-RFLP/spoligotype molecular cluster. We chose to use a default distance equivalent to the minimum distance to ensure that every genotype-clustered patient had at least one neighbor of the same molecular cluster. The default neighborhood search threshold was 3774.6351 meters.

Statistical analysis
We used bivariate analyses to test for differences in sociodemographic, behavioral, clinical and bacteriological characteristics between patients who were fingerprinted with those that were not; patients in IS6110-RFLP/spoligotype clusters with those harboring unique fingerprints; and genotype-clustered patients in hotspots with genotype-clustered patients not in hotspots. Among patients in genotype-clusters, we used logistic regression to assess variables associated to hotspots. Variables with p 0.20 in the bivariate analysis and biological plausibility were included in multivariate models. We estimated the odds ratio (OR) and 95% CI and identified the covariates that were independently associated with each outcome. We built five models: using overall population and the following subgroups (patients with and without diabetes, and patients diagnosed between 1995 and 1999 and patients diagnosed between 2000 and 2010). In the subgroup analyses, we used the same covariates as in the overall model since we wanted to investigate if DM diagnosis and attendance at the urban health center of Camerino Z.
Mendoza continued being associated with hotspots in each of the subgroups. All data analysis was performed using STATA 13.1.

Ethical approval
Participants provided written informed consent to participate in this study. Ethical approval was obtained from the Ethical Commission of the Instituto Nacional de Salud Pública (approval number = 527). All participants were referred to health facilities to receive treatment in accordance with the stipulations of the National Program for the Prevention and Control of TB.

Results
Between 1995 and 2010, 1370 patients older than 20 years were diagnosed with pulmonary TB; of these, 33% had DM. Eighty percent of M tuberculosis strains were genotyped (80.66%, (1105/1370); 31% of genotyped isolates (342/1105) formed IS6110-RFLP/spoligotype clusters. The flow diagram shows the different numbers of individuals at each stage of the study (Fig 1). There were no differences regarding other demographic, socioeconomic, epidemiological or clinical variables (S1 Table). We compared characteristics of patients with and without fingerprints according to the three different methods of diagnosis of patients used during the study period and found some additional differences. Patients without fingerprints were less likely to have cavities in chest X-ray, and more likely to have earthen floors and live in rural areas in the period 1995 to 1999; and they were older and less likely to smoke and use drugs in the period 2000 to 2005 (S2-S4 Tables). Table 1 shows five categories of patients in IS6110-RFLP/spoligotype clusters according to the number of members of each genotype cluster. In total, 342 genotype-clustered patients were distributed in 91 IS6110-RFLP/spoligotype clusters involving 2 to 23 patients; 65.9% of molecular clusters were small (2 members) involving 35.08% of patients. The proportion of cases attributed to recent TB transmission in the cohort was estimated at 22.7%. Table 2 shows the comparison between patients in IS6110-RFLP/spoligotype clusters and patients with a unique pattern. Patients in IS6110-RFLP/spoligotype clusters were more likely to have been diagnosed after 2000 (77% (262/342) vs. 68% (516/763), p = 0.002), to be drug users (8% (26/342) vs. 4% (33/762), p = 0.025), to harbor pan-susceptible strains (82% (250/ 304) vs. 76% (489/641), p = 0.039) and less likely to harbor MDR strains (4% (11/304) vs. 8% (50/641), p = 0.015). DM did not show a statistically significant difference between both groups (32% (108/342) vs. 36 (272/763), p = 0.188).

Spatial distribution and hotspot analysis
The geospatial distribution of 342 IS6110-RFLP/spoligotype clustered patients is shown in Fig  2 including 108 patients with DM. Patients were concentrated around the Orizaba municipality and extending toward surrounding municipalities (Río Blanco, Nogales, and Camerino Z. Mendoza municipalities). Even though municipalities such as Huiloapan de Cuauhtemoc and Tlilapan had a higher incidence rate, we did not observe spatial aggregation in these localities.
Of the 91 genetic clusters, 20 (22%) had their first identified or "index" case located within the Mendoza or Nogales municipalities (10 each). Mendoza municipality concentrated 2 of 15 cases and 12 of 21 cases of the two largest molecular clusters (D and E).
Local spatial analysis using Getis-Ord Gi Ã statistic showed statistically significant spatial aggregation of patients in IS6110-RFLP/ spoligotype clusters forming "hotspots" and "coldspots". Of the patients in IS6110-RFLP/spoligotype clusters, 14.6% (50/342) were part of the hotspot in Camerino Z. Mendoza and Nogales municipalities, of these 60% (30/50) were patients with DM. Orizaba municipality was identified as a coldspot, that is, the distance between patients in IS6110-RFLP/ spoligotype clusters was larger, and the frequency of spatial aggregation was lower (Fig 3).
Patients with both TB and DM showed statistically significant spatial aggregation in the Camerino Z. Mendoza and Nogales municipalities (Fig 4). We reviewed clinical files of patients in hotspots and found that the majority (38% (11/29), received clinical care in one urban health center in Mendoza (Fig 4).
The local spatial analysis of patients without DM in IS6110-RFLP/spoligotype clusters (234/342) identified a higher likelihood of patient concentration belonging to the same molecular cluster but without forming a hotspot in Camerino Z. Mendoza. This is shown in orange dots (Fig 4). When expanded, this area showed that patients mainly attended the urban health center of Mendoza (40% (8/20) (Fig 5).    Table 3 shows the comparison between patients in IS6110-RFLP/ spoligotype clusters belonging to a hotspot with those not belonging to a hotspot. The main significant differences were distance to the health center (958mt, (486-1462) vs. 627mt (410-918), p = 0.005), receiving  belonging to hotspots was associated with DM (OR 12.85 (95% CI 1.19-138.65), p<0.05) and attending clinics in Camerino Z. Mendoza municipalities (OR 17.08 (95% CI 1.55-188.77), p<0.05) ( Table 4).

Discussion
The combination of molecular and epidemiological information with geospatial data allowed us to identify the occurrence of molecular clustering and spatial aggregations of patients with DM and TB. These results indicate the need to conduct prospective research to determine if health care might be facilitating transmission among DM patients and to implement the collaborative framework recommended by WHO and the Union broadly to prevent and control TB among patients with DM. [53] Most of our patients were diagnosed with DM before or at the same time of their diagnosis with TB. Therefore, it is biologically plausible that immunological dysfunction associated with DM might favor their participation in transmission chains when exposed to infectious patients. Our hypothesis that health centers might be the source of TB infection for patients with DM is based on the finding that patients with TB and DM carrying the same genotype were geospatially aggregated around the health center where they received health care. The finding of localized transmission did not correspond to the most densely populated areas; on examination of population densities of the study area, we found that Camerino Mendoza Municipality ranked third after Orizaba and Rio Blanco municipalities [52]. Moreover, spatial autocorrelation of genotypically unclustered patients revealed a pattern not significantly different than random. The finding that transmission might be occurring in localized areas can have several alternative explanations. We limited geocoding to the patient´s residential address and to the health centers where each patient had received health care. Therefore, we did not consider other settings where transmission might have occurred such as transportation routes, workplaces or social gatherings in the same "hotspot" areas. Patients with the hotspot genotypes occurring outside the hotspot could be seeding larger areas. Therefore, timely detection of hotspots could prevent further dissemination.
As occurring for the rest of Mexico, and other countries worldwide [54][55][56], the frequency of DM among TB patients in the study was high (32.84%). If our hypothesis that patients with TB and DM occur in spatial and molecular aggregations proves to be correct, control efforts could be implemented in specific high-risk areas. Patients with DM have not been specifically studied, although clinical manifestations among patients with TB and DM indicate that patients with DM might have an important role in TB transmission. Moreover, patients with DM have been described as index and secondary cases in TB outbreaks [57,58]. Usage of molecular epidemiologic techniques has previously allowed us to show that increased risk of TB among patients with DM is due to both reactivation and recently transmitted infection [39]. More recently, we demonstrated that patients with DM were reinfected with a different strain as the one that caused the initial episode in one-fifth of the cases [8]. We conjectured that exogenous TB reinfection in DM patients might be due to TB transmission associated with health care occurring as a result of DM patients attending clinics where there is a high prevalence of diagnosed and undiagnosed TB, as has been described for HIV infected patients [59].
The low percentage of clustering and attributable recent transmission found in this study could result from decreasing morbidity or effective interventions such as contact investigation, increased detection or to high DOTS compliance [19,60,61]. As has been previously described, we found that most of our genotype clusters were small [12,30,62,63]. Yuen et. al. have recently proposed that small genotype clusters (less than five individuals) represent limited recent transmission, based on the hypothesis that populations that have limited TB transmission mostly due to effective TB control strategies differ from populations with uncontrolled transmission [64].

Strengths and limitations of the study
We conducted a large molecular epidemiological study of pulmonary TB in an area where TB is endemic, and DM is increasing. Our study provided sociodemographic, clinical and epidemiological information that allowed us to obtain adjusted odds ratio associated with belonging to hotspots. Several studies have sought to determine the degree of correspondence between spatial aggregation and molecular clustering of patients with TB and test the hypothesis that both phenomena concur. While some authors have found that genotype-clustered cases share locations [28, 62,65]; other authors have not confirmed this finding [29,66]. A partial explanation for these discrepancies may be found in that multiple characteristics impact over genotype clustering frequency. These characteristics include among others, TB incidence, study duration, the intensity of contact tracing, migration patterns into the study area, size of molecular clusters, sampling fraction, the occurrence of endemic strains, the frequency of strains with low copy numbers, and age of study populations [67]. In the present study, we tried to deal with some aspects that might have limited our findings. We attempted to reduce the impact of the long duration of our study establishing that cases were identified within 12 months to be considered genotype-clustered. Throughout our study, we performed passive case finding supported by active case finding conducted by community-based health workers to decrease incomplete sampling. The results that there was no difference between the  Mendoza between patients whose isolates were or were not genotyped indicates that the study was not biased regarding the probability of genotyping according to health centers. Therefore, we consider that genetic clustering is not explained by increased coverage of specific clinics. Migration into and out of the area may have limited our findings. Limitations include that although the cohort of patients was prospectively recruited, the present analysis was conducted retrospectively with the inherent problems of data quality that such design entails. Descriptive design of the study limits the possibility of determining if TB transmission occurred in the health centers. We were unable to culture and genotype all isolates. The main reason was the delay in receiving the sample in the laboratory due to the remoteness of the patient´s home and consequent low quality of the sample. We used strict criteria to define genetic clustering. Our definition might represent an underestimate of 'true' clustering. A definition which allowed nearly identical patterns might have included more isolates in the hotspot analysis. However, the impact on the hotspot analysis is more difficult to predict. Finally, since eligibility criteria for culturing differed along the study we compared the proportion of patients who underwent genotyping in each of the study periods; we found that proportion of genotyped patients were all above 75%. There were some differences when we compared patients whose isolates were fingerprinted with those among whom we were unable to fingerprint their isolates. This limitation might have affected the representativeness of our results to some subgroups such as individuals living in rural areas, particularly during 1995 to 1999.

Conclusions
We successfully applied a combination of sociodemographic, clinic, molecular and geospatial analysis to study TB, providing evidence indicating the usefulness of these strategies for control programs of TB and DM in settings where both conditions are endemic. National TB programs emphasize the importance of timely detection, treatment and contact tracing. If transmission occurs more frequently in certain settings, further prevention and control measures might be implemented increasing their cost-effectiveness. For example, if our hypothesis on health centers proved correct, it would be necessary to adopt specific administrative, environmental, and personal protection measures in health units that provide care to patients with DM and patients with TB.