The Effects of Socioeconomic Status, Clinical Factors, and Genetic Ancestry on Pulmonary Tuberculosis Disease in Northeastern Mexico

Diverse socioeconomic and clinical factors influence susceptibility to tuberculosis (TB) disease in Mexico. The role of genetic factors, particularly those that differ between the parental groups that admixed in Mexico, is unclear. The objectives of this study are to identify the socioeconomic and clinical predictors of the transition from latent TB infection (LTBI) to pulmonary TB disease in an urban population in northeastern Mexico, and to examine whether genetic ancestry plays an independent role in this transition. We recruited 97 pulmonary TB disease patients and 97 LTBI individuals from a public hospital in Monterrey, Nuevo León. Socioeconomic and clinical variables were collected from interviews and medical records, and genetic ancestry was estimated for a subset of 142 study participants from 291,917 single nucleotide polymorphisms (SNPs). We examined crude associations between the variables and TB disease status. Significant predictors from crude association tests were analyzed using multivariable logistic regression. We also compared genetic ancestry between LTBI individuals and TB disease patients at 1,314 SNPs in 273 genes from the TB biosystem in the NCBI BioSystems database. In crude association tests, 12 socioeconomic and clinical variables were associated with TB disease. Multivariable logistic regression analyses indicated that marital status, diabetes, and smoking were independently associated with TB status. Genetic ancestry was not associated with TB disease in either crude or multivariable analyses. Separate analyses showed that LTBI individuals recruited from hospital staff had significantly higher European genetic ancestry than LTBI individuals recruited from the clinics and waiting rooms. Genetic ancestry differed between individuals with LTBI and TB disease at SNPs located in two genes in the TB biosystem. These results indicate that Monterrey may be structured with respect to genetic ancestry, and that genetic differences in TB susceptibility in parental populations may contribute to variation in disease susceptibility in the region.


Introduction
Approximately one third of the world's population is infected with TB [1]. Among the majority of infected people, the immune response effectively neutralizes the bacteria in the lungs; these individuals are asymptomatic and not contagious [2]. Approximately 10% of infected individuals progress from latent TB infection (LTBI) to disease status, and, in the absence of treatment, about half of these individuals die.
Most TB-related deaths occur in low-income countries in Africa, Asia, and the Americas [1]. Rates of TB disease and death are comparatively lower in Western Europe and in people of Western European ancestry in the Americas and other regions. U.S. whites, for example, experience lower rates of TB disease than other national, racial, and ethnic groups. This disparity is largely attributed to socioeconomic factors associated with place of birth, income, education, and health care access [3][4][5][6][7]. Related factors that predispose individuals to exposure to TB and risk of developing disease include crowding, poor nutrition, and disease co-morbidities [8][9][10]. Genetic factors are also associated with TB susceptibility worldwide [11][12][13][14][15], although the role of regional differences in genetic susceptibility is less clear [16][17][18]. The objectives of this study are to identify the socioeconomic and clinical predictors of the transition from LTBI to pulmonary TB disease in the urban setting of Monterrey in northeastern Mexico, and to examine whether genetic ancestry plays an independent role in this transition.
The Monterrey Metropolitan Area (MMA) is the urban and industrial hub of Nuevo León. Rates of TB disease are high in the MMA compared to other regions of North America, and even other regions of Mexico [19]. The current rate of 16 new TB cases per 100,000 [19,20] contrasts with a rate of 3.4 cases in the United States [21], and 5.8 in U.S. Hispanics [21]. Despite the fact that the MMA is one of the wealthiest areas in Mexico, the state of Nuevo León ranks ninth highest in the country for TB incidence (out of 31 states plus the Federal District) and sixth highest for TB deaths [19]. Rates of drug-resistant TB are also excessive compared to other Mexican states [22]. Additionally, the MMA is a rapidly growing urban center with substantial variation in the socioeconomic and clinical factors that impact health.
Previous studies have documented variation in African, European, and Native American genetic ancestry in the MMA [23][24][25][26]. This variation is the product of intermixing between predominately Spanish men and Native American women starting around 1519, and with Africans starting in the early 16th century [26,27]. A recent meta-analysis identified variation in associations between risk alleles and TB disease among these regional ''parental'' groups [16], suggesting that genetic ancestry has the potential to be informative about TB susceptibility in the MMA and other admixed populations in the Americas.
To accomplish our research objectives, we collected in-depth socioeconomic, clinical, and genetic data from 194 MMA residents with pulmonary TB disease and LTBI. The data included regionspecific measures of socioeconomic status for individuals and households, clinical data related to health and lifestyle, and genetic ancestry estimated from 291,917 SNPs. We used crude association tests and multivariable logistic regression analyses to examine associations between these data and TB disease status, and we examined differences in genetic ancestry between individuals with TB disease and LTBI at SNPs located in 273 genes listed in the TB biosystem in the NCBI BioSystems database. To our knowledge, no previous studies have examined the association between genetic ancestry and TB disease in a Hispanic population in the Americas while accounting for region-specific socioeconomic and clinical factors. Our findings have broad implications for exploring the correlates of multifactorial disease in admixed groups in urban centers throughout the Americas.

Study design and participants
We recruited adults with pulmonary TB disease (n = 97) and LTBI (n = 97) from the Universidad Autónoma de Nuevo León (UANL) José E. González Hospital between January 2010 and February 2011. The public hospital is located in Monterrey, which is a moderate-to low-socioeconomic status municipality in the MMA [28]. The open-door policy of treating patients independent of insurance status or income attracts residents from all municipalities of the MMA. The hospital treats approximately one quarter of all new TB cases in the MMA each year [29].
TB disease status was confirmed by a positive culture. LTBI status was defined by a positive TB skin test with an induration of 10 mm or greater [30]. Among the LTBI participants, 40 were recruited from hospital clinics and waiting rooms, and 57 were recruited from hospital staff (see Table 1). The clinic-waiting room LTBI individuals were visiting the hospital for other medical reasons or were family members of patients in other clinics. The hospital staff LTBI individuals included physicians, nurses, secretaries, medical students, researchers, laboratory technicians, and custodians. LTBI individuals had no history of conversion to disease. None of the 194 participants were biologically related.

Ethics statement
This study was approved by the University of New Mexico (UNM) and UANL Institutional Review Boards (UNM Human Research Review Committee #09-318, UANL Ethics Committee #IN09-001). Participants gave informed written consent.

Socioeconomic and clinical data
The socioeconomic and clinical data were collected during faceto-face interviews in private settings at the UANL Hospital. Interview questions were compiled from established Mexican national and Latin American surveys [31,32] and TB risk assessments used by the UANL Hospital [33,34]. The variables included 15 measures of socioeconomic status, 30 clinical variables related to health and lifestyle, three measures of indigenous ethnicity, and several demographic measures. The measures of ethnicity were based on self-report and included indigenous language capacity. Indigenous language capacity is rarely measured in disease studies; it potentially reflects recent and direct connections to Native American communities that might not be captured by self-reported ethnicity alone.
Socioeconomic variables included region-specific, householdbased measures from a 10-item survey developed by the Mexican Association of Marketing Research and Public Opinion Agencies (AMAI) [31]. The survey ascertained computer and colored television ownership, type of floor, number of rooms, functioning shower, exclusive bathroom, number of lights, type of stove, number of automobiles, and education of the highest income earner in the household. The survey assigns points to each item; the points were summed to create an overall measure of socioeconomic status. We also created a three-category ordinal variable (low, medium, and high) from the point total, and nominal versions (low vs. high) of each AMAI-survey item. Clinical data were collected from interviews and medical record reviews; they included measures of tobacco use, alcohol consumption, substance abuse, disease co-morbidities, and BCG vaccination status. Individuals with extra-pulmonary TB and HIV were excluded from the study.

Crude association tests and multivariable logistic regression
We conducted crude association tests between each variable and TB disease status to assess the unadjusted effects of socioeconomic factors, clinical measures, and genetic ancestry [18,35]. Crude associations were assessed by Pearson's chi-square and Fisher's exact tests for categorical variables, and continuous variables were converted to Z scores and assessed with t-tests. We also used logistic regression to examine crude associations; we report odds ratios (OR) with 95% confidence intervals (CI) as measures of association. Body mass index (BMI) and body weight were excluded from analyses because weight loss is a well-known symptom of TB disease.
Before conducting these crude association tests, we examined the distribution of the social factors, clinical measures, and genetic ancestry in the LTBI sample. This examination revealed that, for many of these measures, LTBI individuals recruited from the hospital's clinics and waiting rooms differed significantly from those recruited from hospital staff (Table S2). This heterogeneity was due in part to the fact that the hospital staff included higher paid medical doctors and nurses. Based on these analyses, we determined that LTBI individuals recruited from clinics and waiting rooms were more representative of the population that was susceptible to developing TB disease.
We therefore limited multivariable logistic regression analyses to TB disease patients and clinic-waiting room LTBI individuals. We used a significance cutoff of #0.1 from the crude association tests for variable entry into multivariable regression, as recommended when exploring large numbers of risk factors [35][36][37]. Prior to building the multivariable model, we assessed multicollinearity among socioeconomic and clinical variables using a variance inflation factor of 2.5. TB disease was the dependent variable in the multivariable analyses, and we used both forward selection and backward elimination methods. We report OR and their 95% CI for each statistically significant variable. Statistical analyses were conducted in SAS 9.2 and PASW Statistics 18.0.

Genetic Ancestry
The genetic data consisted of 291,917 SNPs [38] assayed from mouthwash samples. DNA was extracted using a modified Puregene extraction protocol. Extracts were genotyped at the University of Michigan's DNA Sequencing Core on the Illumina HumanCyotoSNP-12 DNA Analysis BeadChip Kit. The chip contained a subset of 2.2 million SNPs common in Yoruban, Utah Mormon, Chinese and Japanese individuals in the International HapMap Project [39]. All SNP call rates exceeded 99%. The SNPs were genotyped in a subset of 142 individuals of which 83 had TB disease and 59 had LTBI ( Table 1). The TB disease subset excluded TB patients for whom we were unable to collect DNA because poor health prevented them from successfully completing the mouthwash-collection protocol. The genotyped LTBI subset was comprised of 35 individuals recruited from the hospital's clinics and waiting rooms and 24 individuals recruited from hospital staff. The SNPs were also assayed in 40 Africans, 54 Europeans, and 45 Native Americans from the CEPH-Human Genome Diversity Panel [40]; the three groups served as parental populations in the genetic ancestry analyses. The three parental groups were comprised of individuals from the following populations: Europeans: French, Adygei, Orcadian, Russian, Sardinian, and Tuscan; Native Americans: Mexican Pima, Maya, Colombian, Karitiana, and Surui; Africans: Yoruba, Mandenka, Bantu, and San.
We performed Hardy-Weinberg tests and various qualitycontrol tests for each SNP in Plink [41]. Native American, European, and African genetic ancestry proportions were estimated for each individual using the fast, model-based approach of Alexander and colleagues [42]. We also estimated genetic ancestry at each SNP in each LTBI individual and TB disease patient using the LAMP package [43]. The analyses were performed separately for each chromosome. We used the default LAMP settings of 1E-8 for the recombination rate and 0.2 for the offset of adjacent windows. In different runs, we set the time of admixture to between 5-20 generations. The results were effectively the same for all times; we report the results for 17 generations, roughly corresponding to the midpoint between the initial arrival of Spaniards and Africans.
A fraction of the SNPs (n = 1,314) was located in 273 genes listed in the NCBI BioSystems database [44]. The database collates information from other NCBI sources about genes and proteins associated with disease and other biological systems. The database entry for TB contained all 138 of the TB candidate genes listed in the Genetic Association Database [45], an archive of the results of genetic association studies of multifactorial diseases. We used the Genetic Association Database and dbSNP [46] to identify additional potential candidate genes in areas of large difference in genetic ancestry between TB patients and LTBI individuals. We also examined SNPs located in the FcGR1B gene; it was recently identified in expression studies as a potentially important TB susceptibility gene [47]. We combined the results for the individual chromosomes into a single plot showing the average difference in genetic ancestry at each SNP in individuals with TB disease vs. LTBI. We considered a SNP to be potentially informative about TB-disease status if the difference in genetic ancestry between TB patients and LTBI individuals at the SNP was three standard deviations above or below the mean difference in genetic ancestry across all SNPs.

Crude association tests and multivariable logistic regression
In the crude association tests of TB patients and LTBI recruited from clinics and waiting rooms, 12 variables were associated with TB disease status at a critical value #0.1 (Table S1). For the socioeconomic variables, TB disease was associated with low levels of education, a history of non-professional employment or unemployment, and lower measures for three of the individual AMAI survey items. For the clinical variables, TB disease status was positively associated with drug and alcohol abuse, smoking, diabetes, and history of incarceration. Marital status was also predictive; individuals with TB disease were less frequently married or in a civil union than LTBI individuals. Notably, selfreported indigenous ancestry, indigenous language-capacity, and genetic ancestry were not associated with TB disease. Of the 97 individuals with TB disease, 13 (13.4%) were living outside of the MMA at the time of the study; place of residence, however, was not a significant predictor of TB status.
The results of the multivariable logistic regression analyses of TB disease patients and the clinic-waiting room LTBI individuals are shown in Table 2 (N = 137). The final multivariable model revealed that marital status, diabetes, and smoking were independently associated with TB status (p#0.1). The odds of being married or in a civil union were lower among TB disease patients compared to clinic-waiting room LTBI individuals (OR 0.31, 95% CI 0.14, 0.72), while the odds of having diabetes (OR 2.94, 95% CI 1.05, 8.24) and smoking (OR 1.65, 95% CI 0.91, 2.97) were greater among TB disease patients.
As noted in the Methods section, we found that LTBI individuals recruited from clinics and waiting rooms differed significantly from LTBI recruited from hospital staff with respect to many of the socioeconomic and clinical variables. These differences are reported in Table S2. Clinic-waiting room LTBI individuals had, for example, consistently lower socioeconomic status and higher incidence of disease and substance abuse than LTBI individuals recruited from hospital staff. They also had significantly higher European genetic ancestry and significantly lower Native American genetic ancestry (Table S2).

Genetic ancestry
The genetic ancestry estimates are plotted in Figure 1 and listed in Table 3. The top-left panel of Figure 1 shows the estimates for the full sample; these estimates fall within previously reported ranges in the MMA and northeastern Mexico by Martinez-Fierro and colleagues [48], though Cerda-Flores and colleagues reported higher European and lower Native American ancestry values [23,24]. The standard errors of our estimates for each individual were low (mean = 0.7%), but there was substantial heterogeneity in genetic ancestry across individuals. The mean Native American genetic ancestry for the clinic-waiting room LTBI sample was 56.1% (s.d. = 14.0%), the mean European ancestry was 39.6%  Table 3).
The remaining panels of Figure 1 show the Native American, European, and African genetic ancestry ranges for the full sample (labeled ''All''), for individuals with TB disease, and for the two LTBI samples. The plots highlight the statistically significant differences in Native American and European genetic ancestry for the two LTBI samples. These differences are also documented in Table 3 (columns 3 and 4) and Table S2. Native American genetic ancestry was significantly higher in the clinic-waiting room LTBI sample, and European genetic ancestry was significantly lower. Overall, these results suggest that the MMA may be structured with respect to socioeconomic status and genetic ancestry, and that this structure may have health-related consequences.

Genetic ancestry in genes in the TB biosystem
These analyses were restricted to 83 TB disease patients and the 35 clinic-waiting room LTBI individuals. Native American genetic ancestry ranged between 52.5% and 56.5% on the autosomes. It  was relatively high on the X chromosome at 61.0%, consistent with a history of directional mating between Spanish males and Native American females. Native American genetic ancestry was higher in TB patients than in LTBI individuals on all chromosomes, and there was a corresponding deficit of European genetic ancestry. Figure 2 shows the results of the analyses of locus-specific genetic ancestry. It plots differences in Native American, European, and African genetic ancestry between TB patients and LTBI individuals for each SNP vs. physical position on the chromosome, beginning with position zero on chromosome 1 and terminating at the end of chromosome 23. Compared to LTBI individuals, people with TB disease had on average 4.65% greater Native American genetic ancestry (P = 2E-16, Wilcoxon signedrank test) and 4.64% lower European genetic ancestry (P = 2E-16).
African genetic ancestry did not significantly differ between the two groups (average = 29E-5).
Asterisks mark 1,314 SNPs located in 273 genes in the NCBI's TB biosystem and two SNPs in the FcGR1B gene. Overall, only two genes showed differences in genetic ancestry between TB disease patients and LTBI individuals at the three standard deviation level. Four SNPs in the IL-12B gene (Chr 5) had, on average, 3.53 standard deviations lower Native American genetic ancestry in TB disease patients than in LTBI individuals, and 3.09 standard deviations greater European genetic ancestry. Two SNPs in the ATP6AP1 gene (Chr 23) had an average of 3.59 standard deviations lower European genetic ancestry in TB disease patients. SNPs in the surrounding regions of each gene also showed substantial differences in genetic ancestry between TB disease patients and LTBI individuals.  Twenty additional genes in the TB biosystem showed differences in genetic ancestry at the two standard deviation level. While this number is unremarkable given the fact that we examined 273 genes, many of the SNPs were located in or near regions of particularly sharp transitions in genetic ancestry, and they were linked to SNPs that also showed this sharp transition. The genes include IL-1B and IL-1RN (Chr 2), RAB5A (Chr 3), NOS3 (Chr 7), NAT2 (Chr 8), CARD9 (Chr 9), and NOS2A (Chr 17).
Several other genes that have been implicated in immune system disorders (other than TB) were located in regions of large differences and sharp transitions in genetic ancestry between TB patients and LTBI individuals. These genes include CNTN2 (Chr 1), CBLB (Chr 3), PRKG1 (Chr 10), ATXN2 (Chr 12), SH2B3 (Chr 12) and G6PD (Chr 23). Previous studies [49,50] have linked polymorphisms in two of these genes to Type 1 diabetes (CBLB and SH2B3).

Discussion
Each of the statistically significant socioeconomic and clinical factors from our crude association tests have been identified in other studies as important determinants of the transition from LTBI to TB disease [3,4,6,7,9,[51][52][53][54]. Most of the socioeconomic variables are interrelated, and it is difficult to disentangle their independent effects on TB disease susceptibility, but they reflect living conditions, health-care access, and health-related lifestyle factors [8,9,52].
In multivariable logistic regression analyses comparing TB disease patients to clinic-waiting room LTBI individuals, we found that marital status, diabetes, and smoking were independently predictive of TB status ( Table 2). Smoking is a well-known risk factor for developing TB disease [55]. Smoking impairs immune functions related to cellular defenses against TB, especially in the lungs [56]. Being married or in a lifetime partnership is increasingly recognized as a protective factor against TB disease [10,57]. Horwitz (1971) found that being married mitigated TB disease severity and mortality, possibly due to spousal influence on treatment completion [58,59] and the beneficial impact of ''cohesive marriages'' on physical and mental health [60]. Conversely, it is possible that individuals with chronic illnesses like TB may be less likely to be in a lifetime partnership due to the strain that the disease creates on relationships [60]. A recent review of Medline literature found that diabetes increased the risk or odds of TB between 1.5-and 7.8-fold [61]. A study conducted in Southern Mexico concluded that diabetes may be on par with HIV co-infection in terms of co-morbidity with TB in the country [62]. While diabetes may predispose people to TB through impaired immune function, TB may also predispose people to diabetes through impaired glucose tolerance [63].
An important finding of this study is the differences in genetic ancestry and socioeconomic status for the two LTBI groups. This result may provide evidence for assortative mating by genetic ancestry or socioeconomic status in the MMA. European ancestry is also correlated with socioeconomic status in other large urban centers in Mexico [64,65]. A previous study of spousal choice in Hispanic populations in Mexico City and the San Francisco Bay Area showed strong correlations for assortative mating by European ancestry and Native American ancestry [66]. These results suggest that studies of the social and genetic causes of diseases in admixed population in the Americas should account for population structure.
Genetic ancestry was not predictive of the transition from LTBI to TB disease. This findings suggests one or more of the following: 1) genetic differences in TB-causing alleles do not exist between the ancestral populations that formed the Monterrey population, 2) any genetic differences that do exist contribute proportionately little to variation in TB disease compared to socioeconomic and clinical factors, 3) genome-wide genetic ancestry fails to capture locus-specific genetic differences that do exist between the parental populations, or 4) power was too low to detect existing associations.
With respect to the first two possibilities, independent of our findings, there is conflicting evidence for the existence of macrogeographic differences in the frequencies of TB-susceptibility alleles. Previous studies have identified associations between several polymorphisms in the SLC11A1 gene and TB risk among different regions [67][68][69][70][71]. One recent meta-analysis identified statistically significant associations between four SLC11A1 variants and TB risk in Asians and Africans, but not Europeans [16]. In contrast, a second meta-analysis in 2011 found no heterogeneity among African and European groups for TB disease risk [17]. Even if the SLC11A1 alleles contribute to regional differences, their overall effect sizes are small; the mean odds ratio across all regions for the four SLCA11A1 variants, for example, was only 1.29 (range 1.04-1.59) [17]. Similarly low effect sizes are reported in metaanalyses for variants of other TB-related candidate genes, such SP110 [72], P2X7 [73], TIRAP S180L [74], and a vitamin D receptor gene [13]. The measures of association are often considerably higher for socioeconomic predictors of TB. For example, in rural Mexico, drivers of TB risk include long-term indoor air pollution exposure in homes that use biomass cook stoves (OR 3.3), and households with only one room (OR 15.4) [75]. Case-control studies in other countries have reported significant odds ratios as high as 15 for social factors like ethnicity [18], and education [57]. Substance abuse also plays a major role, and a recent meta-analysis by Rehm and colleagues [76] concluded that 10% of all TB cases in the world could be attributed to alcohol.
Even though individual-level genetic ancestry was not predictive of TB status in this study, genetic ancestry differed sharply between TB disease patients and LTBI individuals at SNPs in the IL-12B and ATP6AP1 genes, and at SNPs in the surrounding regions of both genes. The IL-12B gene encodes a subunit common to interleukin IL-12 and IL-23, both of which protect against infectious diseases and cancer [77]. Variants of the gene have been linked to TB, type 1 diabetes, childhood asthma, and malaria [78][79][80][81][82]. ATP6AP1 is located in the TB disease biosystem, but the role of polymorphisms in this gene in TB pathogenesis has not yet been specified.
There are several important limitations to this study. First, our sample sizes were relatively small. As a result, we lacked the power to identify variables with relatively small effect sizes, and to sort out the complex interactions among the socioeconomic and clinical variables. Despite these limitations, we had sufficient power to detect modest associations between TB status and genetic ancestry (in the range of 1-3%), but smaller differences in genetic ancestry may still be independently predictive of TB status in the MMA. Second, our study sample represented an urban population with access to a public hospital in a developing country, so findings might not be generalizable to rural populations, those seeking private health care, or those in developed countries. In this respect, while we attempted to examine region-specific socioeconomic factors using the AMAI survey, this effort failed to capture several factors that are likely to impact TB disease risk in Monterrey and other urban centers in developing nations. Future work would benefit, for example, from measuring community-specific measures of income inequality [9] and health service disparities [54]. Third, by restricting our locus-specific analyses to genic regions, we may have missed potential TB-risk SNPs in intergenic regions, which potentially account for variation in TB risk [12].