Determinants of the Sympatric Host-Pathogen Relationship in Tuberculosis

Major contributions from pathogen genome analysis and host genetics have equated the possibility of Mycobacterium tuberculosis co-evolution with its human host leading to more stable sympatric host–pathogen relationships. However, the attribution to either sympatric or allopatric categories depends on the resolution or grain of genotypic characterization. We explored the influence on the sympatric host-pathogen relationship of clinical (HIV infection and multidrug-resistant tuberculosis [MDRTB]) and demographic (gender and age) factors in regards to the genotypic grain by using spacer oligonucleotide typing (spoligotyping) for classification of M. tuberculosis strains within the Euro-American lineage. We analyzed a total of 547 tuberculosis (TB) cases, from six year consecutive sampling in a setting with high TB-HIV coinfection (32.0%). Of these, 62.0% were caused by major circulating pathogen genotypes. The sympatric relationship was defined according to spoligotype in comparison to the international spoligotype database SpolDB4. While no significant association with Euro-American lineage was observed with any of the factors analyzed, increasing the resolution with spoligotyping evidenced a significant association of MDRTB with sympatric strains, regardless of the HIV status. Furthermore, distribution curves of the prevalence of sympatric and allopatric TB in relation to patients’ age showed an accentuation of the relevance of the age of onset in the allopatric relationship, as reflected in the trimodal distribution. On the contrary, sympatric TB was characterized by the tendency towards a typical (standard) distribution curve. Our results suggest that within the Euro-American lineage a greater degree of genotyping fine-tuning is necessary in modeling the biological processes behind the host-pathogen interplay. Furthermore, prevalence distribution of sympatric TB to age was suggestive of host genetic determinisms driven by more common variants.

Introduction change the nature of human immune responses to infection [26][27][28]. Other than the differences in the host's immunological response to infection, the influence of the infective strain on disease phenotype has been gaining attention. An increasing number of studies reported associations between human genetic variants and particular M. tuberculosis lineages [29][30][31][32][33]. Thus, these have been classified as sympatric host-pathogen relationships, when host and pathogen share a common ancestral geographic origin, or allopatric, when they originate from non-overlapping geographic areas [34]. Accordingly, the more stable associations between these lineages and their human populations would characterize the sympatric host-pathogen relationship [6][7]35]. Although useful in helping to model divergence, attribution of host-pathogen relationships to one of these two categories depends on the resolution of strain genotypic characterization, having subsequent implications on the understanding of divergence and the modeling of genetic events [36]. However, in spite of these many contributions, "co-evolution" of TB with its human host is a model at the dawn of its understanding.
The occurrence of immune-compromising diseases is likely to influence the host-pathogen relationship. In a recent molecular-epidemiological study, Fenner and collaborators identified human immunodeficiency virus (HIV) infection as a possible disruptor of the relationship of locally adapted M. tuberculosis strains to the host [37]. They considered the combination of a particular strain lineage and its corresponding patient population as sympatric (e.g. Euro-American lineage in Europeans) or allopatric (e.g. East-Asian lineage in Europeans) and concluded that HIV infection was associated with the less adapted allopatric lineages among patients born in Europe, providing evidence that the sympatric host-pathogen relationship in TB was disrupted by HIV.
Here, we reanalyze our current understanding of the M. tuberculosis population structure in Portugal and its particular phylogeographic characteristics. Accordingly, using a retrospective molecular epidemiology approach we investigated the influence of HIV infection and other clinical and demographic factors on the sympatric host-pathogen relationship. Finally, we discussed our findings in terms of their possible impact on future investigations on the TB hostpathogen relationship.

Patient characteristics
This retrospective molecular epidemiology study included 859 TB patients with positive culture for M. tuberculosis, having been diagnosed and treated according to recommended procedures in Portugal, between the years 1999 and 2006. The corresponding M. tuberculosis isolates included 665 isolates obtained between 1999 and 2005 through consecutive sampling of one isolate per TB patient hospitalized in the Greater Lisbon area [23], and a convenience sample from 2006 of 69 isolates from the Lisbon district, 74 from the Porto district and 51 from various locations within Portugal. Socio-demographic data (age, sex) and laboratory parameters (M. tuberculosis drug susceptibility status to first line drugs, HIV infection status) were obtained from the hospital fully anonymized and associated only with the M. tuberculosis isolate number.
A total of 547 TB patients with known HIV status, 82.3% (547/665) of the total number of cases, were included in the study. Of these 32.0% (175/547) were HIV-infected and 68.0% (372/547) HIV-negative. Conventional susceptibility testing for first line anti-TB drugs (streptomycin, isoniazid, rifampicin, ethambutol) was carried out using the Bactec 960 TB system (Becton Dickinson, Quilaban, Portugal) and was available for 86.7% (474/547) of the cases.
Since important spoligotype variability was observed [23], to avoid artifacts, a "major subgroup" of 339 cases was defined as those cases in which the M. tuberculosis shared international types (SITs) grouped 15 or more isolates, representing 62.0% (339/547) of the patients included in the study. This arbitrary cutoff for the number of isolates was applied to the entire 859 TB patient dataset and not to the 547 subgroup of patients with known HIV status, in order to avoid any bias that may have occurred in the prescription of HIV diagnosis directed towards risk groups. Drug susceptibility status was available for 86.7% (294/339) of these cases.
All the spoligotype patterns were subject to phylogenetic analysis using the MIRU-VNTRplus database (http://www.MIRU-VNTRplus.org) [39][40]. A minimum spanning tree was calculated by the program according to Kruskal's algorithm and a force-directed graph layout used for visualization. MIRU-VNTRplus spoligotype derived clonal complexes (CC) and Singletons (not grouped) were obtained through single locus variation (SLV), reflecting a single spoligotyping spacer difference.

Sympatric versus allopatric classification of strain genotypes
Consecutive sampling of one isolate per patient from 1999 to 2005 was used in attributing local specificity of SITs [23]. A dataset of 665 SITs was generated representing all the TB cases diagnosed during this time period at the Fernando Fonseca Hospital, which deserves a densely populated area of greater Lisbon. From the analysis of this dataset, SITs were considered characteristic of Portuguese or Portuguese related settings when their relative frequency weighed heavily in that observed in the European countries of traditional Portuguese immigration (Belgium, France, Germany, Great Britain, Ireland, Luxembourg, Netherlands, Spain and Switzerland), with over 60% of the isolates, or the SpolDB4 as a whole, with over 10% of the isolates [23]. The Portuguese related SITs all belonged to the LAM and T spoligotype families, included in the Euro-American lineage [8][9]. In this study, following Fenner and collaborators proposal [37], strains with these highly prevalent locally adapted genotypes were referred to as sympatric. Historically Portugal was not a country of immigration. The major wave was relative to the decolonization period in the 1970s. Today, these populations constitute Portuguese born second and third generations. Therefore, in our setting, place of birth was not relevant as a proxy of the ancestry of the study population as in other settings [6][7]35,37]. These included the LAM1 sub-family SIT20, where the Portuguese subset alone accounted for 21% of the worldwide representativeness of the genotype, the T1 sub-family SIT244 accounting for 53%, the LAM6 sub-family SIT64 accounting for 11%, the LAM1 sub-family SIT389 accounting for 68% and the LAM9 sub-family SIT 1106 accounting for 83%.

Influence of the infectious strain on patient age distribution
We analyzed the distribution of TB caused by particular pathogen genotypes, looking at the frequency of cases as a function of the age group categories in years: 0 to 4, 5 to 14, 15 to 24, 25 to 34, 35 to 44, 45 to 54, 55 to 64, and over 65. As HIV infection is a more recent epidemic and associated to specific usually younger risk groups in the Portuguese population [39], in order not to warp observations regarding evolutionary tendencies of the sympatric relationship between M. tuberculosis and its human host, these curves were analyzed in HIV-negative individuals considered as the reference group. Therefore, the typical trimodal distribution curves of incidence to age group were verified for genotypic groupings of M. tuberculosis isolates in HIV negative individuals in order to further qualify the sympatric compared to allopatric hostpathogen combinations.

Statistical analysis
For descriptive statistics, the binary variables were expressed as percentages and the continuous variables as means.
All data analysis was performed with R (R Development Core Team 2005). We tested the association of M. tuberculosis Euro-American lineage, considered by previous authors to be in a sympatric relationship with Europeans, relative to HIV infection (two levels: infected or negative), MDRTB (two levels: presence or absence), gender (two levels: male, female) and age (eight levels: 1: 0 to 4 years; 2: 5 to 14 years; 3: 15 to 24 years; 4: 25 to 34 years; 5: 35 to 44 years; 6: 45 to 54 years; 7: 55 to 64 years; 8: over 65 years). For this purpose we used the generalized linear model Tuberculosis with a Euro-American lineage strain~HIV infection + MDRTB + gender + age group assuming a binomial distribution of the error. Differences for the different factors tested in the overall generalized linear model were considered significant when the p-value 0.05 (Anova). We then tested in relation to MIRU-VNTR-plus spoligotype derived clonal complexes, using the generalized linear model Tuberculosis with strains from MIRU-VNTR-plus Clonal Complexes~HIV infection + MDRTB + gender + age group assuming a binomial distribution of the error as in the above. Finally, for our final model we tested in the same manner sympatric TB SITs, Tuberculosis with a sympatric strain~HIV infection + MDRTB + gender + age group, for both HIV-infected and HIV-negative patients, combined, and for the HIV-negative patients alone. In order to avoid artifacts in our analysis resulting from rare genotypes, the model was tested for the entire dataset (n = 547) as well as for the more frequent genotypes included in the "major subgroup" (n = 339). Odds ratios were derived for our final models. The interactions between factors were tested but, because they were not statistically significant, they were not included in our final models.
We also tested the effect of age on TB prevalence using the generalized linear models Tuberculosis prevalence~Sympatric tuberculosis + Age group, using the HIV-negative patients as the reference group, assuming a Poisson distribution of the error, with sympatric TB (two levels: presence or absence) and age group (eight levels: 1: 0 to 4 years; 2: 5 to 14 years; 3: 15 to 24 years; 4: 25 to 34 years; 5: 35 to 44 years; 6: 45 to 54 years; 7: 55 to 64 years; 8: over 65 years). When significant differences were found for the different factors in the overall models (Anova, p-value 0.05) we performed post-hoc pairwise comparisons between factor levels using Tukey's honest significant differences (HSD) tests (alpha = 0.01).

Results
Molecular analysis of the infectious strains was obtained for a full set of 547 TB cases (S1 and S2 Tables). In this table, genotypes were listed according to spoligotype pattern (SIT, shared international type). Spoligotype families [38] as well as the Large Sequence Polymorphism (LSP) defined phylogeographic lineages [7] are shown for each SIT. The predominant lineage was the Euro-American lineage, 92.1% (504/547). The MIRU-VNTRplus program was used for additional subdivision of the Euro-American lineage into spoligotype derived clonal complexes [39][40].
Descriptive analysis of demographic and clinical factors between the allopatric and sympatric strains is shown for the "major subgroup" (n = 339) in Table 1. HIV status was 33.3% (113/ 339) HIV-infected and 66.7% (226/339) HIV-negative. Age distribution is shown relative to the HIV-infected and HIV-negative groups. Furthermore, 56.3% (191/339) were infected with allopatric strains and 43.7% (148/339) with sympatric. In this "major subgroup", all but the Beijing strains belong to Euro-American lineage, 95.6% (324/339). A phylogenetic tree representing the infectious strains was obtained by MIRU-VNTRplus software (Fig 1). The major spoligotype familly was the LAM, mainly grouped into the MIRU-VNTR PLUS spoligotype derived clonal complex CC1. As described in the methods section, the sympatric strains were represented by SITs 20, 64, 389, 244 and 1106 whereas the allopatric strains were represented by SITs 42,150,17,53,50,1,34,47. The sympatric strains were all classified by the SPOT-CLUST program as LAM strains except SIT244 (T2 82% T1 18%), included within the T family, and SIT1106 (LAM9 60% T2 40%). Corroborating the M. tuberculosis population structure of the six year consecutive sampling from TB cases in the greater Lisbon area [23], the sympatric strains, except SIT244, were all identified in the independent convenience samples from the Lisbon and Porto districts [41][42] (S3 and S4 Tables, respectively).
Statistical analysis of strain classification according to genotype and various parameters (HIV infection, MDRTB, gender and age) performed for the full dataset (n = 547), showed no statistical significance in the association between these and the Euro-American lineage ( Table 2). A statistically significant association with male gender for two clonal complexes derived from MIRU-VNTRplus analysis was observed, CC1 p = 0.24 and CC4 p = 0.03. Furthermore, an association with HIV co-infection was observed for the Singletons (p = 0.03) and CC4 (p < 0.001).
The degree of resolution of the genotypic patterns observed was further increased in subsequent analysis using our earlier classification of locally adapted sympatric strains classified according to SIT. From analysis of the S1 and S2 Tables the association of CC4 to HIV was attributed to SIT244 infected cases. As HIV related genotypes have been previously associated to allopatric strains [37], two subsequent analyses were considered in our final model, one in which SIT244 was included in the sympatric group of strains and the other where it was grouped with the allopatric. In the former case, a positive association between HIV infection and sympatric strains was observed. However, if SIT244 was excluded from the sympatric and included in the allopatric group of SITs neither a positive nor negative relationship was observed with HIV infection. However, a significant level of association of MDRTB with sympatric strains was observed in the analysis of the entire dataset (n = 547) (S5 Table) and, in order to avoid artifacts due to rare spoligotype patterns, in analysis restricted to the "major subgroup" (n = 339) ( Table 3). The association was significant, whether SIT244 was included in the sympatric (p < 0.001) or the allopatric (p < 0.001) group and was not affected when HIV infected cases were excluded (p < 0.001). Thus, the association of sympatric strains to MDRTB was not dependent on HIV status.
For allopatric tuberculosis, prevalence distribution according to age showed three peaks, corresponding to the very young children, young adults and the elderly, generally referred to as a trimodal curve (Fig 2). Prevalence differences in age groups were not so pronounced for TB caused by sympatric strains due to a drop in prevalence in the very young and elderly, and alterations of the central peak. This was translated into a tendency towards a typical (standard) The MIRU-VNTR plus web application (http://www.miru-vntrplus.org) was used to analyse spoligotype data of the M. tuberculosis isolates [39][40]. For minimum spanning tree analysis the Kruskal's algorithm and force-directed graph layout for visualization according to the SIT using one single locus variation from the Node, SLV(1) was used to generate groupings into clonal clusters (CC). 3 M. tuberculosis lineages were classified using spoligotype data. Spoligotype designation was obtained from the SpolDB4 database (http://www.pasteur-guadeloupe.fr:8081/ SITVITdemo) [14][15] and using the Spotclust program (http://www.rpi.edu/$bennek/EpiResearch) [38]. Lineage designation, East-Asian and Euro-American, was on Long Sequence Polymorphism (LSP) analysis [7,9]. 4 Drug susceptibility data was available for 86.7% (294/339) of the cases and was used to determine the percentage of multidrugresistant tuberculosis (MDRTB) cases to strain genotype. 5 HIV = human immunodeficiency virus infection 6 Age in years was expressed as the mean ± sd, sd = standard deviation. distribution curve. This tendency was further accentuated when SIT244 was included in the allopatric group of strains.

Discussion
Portugal has remained endemic for TB, with higher incidence in large urban areas. Corroborating a slow decline of TB, the average age of incidence has increased [43]. During the time frame of this investigation, the estimated incidence of 28.2 cases per 100,000 inhabitants remained relatively high compared to the average of 17 cases per 100,000 inhabitants reported for the other European countries [43][44]. The highest reported incidence was of 45.8 per 100,000 in the Porto district followed by Lisbon with 35.5 per 100,000. In this setting, an important epidemiological concern was the 15% rate of TB/HIV co-infection, representing by far Europe's highest, knowing that this level was certainly under reported [43,45]. HIV susceptibility has been related in Portugal and other southern European countries to a genetic background [46]. Lineage attribution based on whole genome LSP analysis considered six main M. tuberculosis phylogenetic lineages associated with different geographic regions [6][7]. This approach The interactions between factors were tested but because they were not significant they were not included in the final model. Age and sex were not significant. however appears less discriminative than the spoligotyping approach since the latter is able to resolve clinical isolates within the branch of the modern strains that are not solved by LSP. Although a better understanding of the M. tuberculosis population structure may be obtained with the concomitant use of several genotyping methods, spoligotyping alone is considered a tool for classification of strains within the Euro-American lineage [47]. Herein, spoligotyping allowed a smaller grain of resolution for the designation of sympatric strains in association analyses where other authors used the Euro-American lineage as a whole [37]. Within the Euro-American Lineage, spoligotype defined LAM strains have accounted for the vast majority of the cases in settings related to the Portuguese and Spanish colonization history. This family is particularly prevalent in the Mediterranean region and Latin America. From our previous findings it appears that Portugal may have one of the highest global proportions of LAM endemic strains [23,25]. Spoligotyping data restricted to the metropolitan area of Lisbon revealed a 51% prevalence of the LAM family, and a high proportion of SIT20 (LAM1) and SIT42 (LAM9) sub-families [23]. Furthermore, a major deletion, RD Rio , characteristic of LAM1 strains, although also associated with other LAM sub-families, was shown to have particular expression in settings related to early Iberian overseas expansion [18,25,[47][48][49]. Interestingly, these strains were associated with ongoing transmission and higher bacillary load [50][51].
Accordingly, with reference to previous studies [23], M. tuberculosis LAM patterns related to the Portuguese population were considered as belonging to the sympatric group of strains, whereas others without this specific distribution were designated as allopatric. Therefore, considering the population structure specificities in our sample, characterized by a high prevalence of the LAM family, it was not unexpected that we should identify both allopatric and sympatric strains within the coarser grain of resolution of the Euro-American lineage. Possibly due to this  overlap, we did not observe a significant relationship between the Euro-American lineage and HIV infection. We further investigated the statistical relevance of the association of sympatric strains with regards to HIV infection, age, gender and MDR TB at a finer grain of analysis, using MIRU-VNTRplus spoligotype phylogenetic clusters. Association of HIV with the spoligotype clonal complex containing SIT244 was evidenced (p < 0.001). This T spoligotype family genotype, contrary to the LAM sympatric genotypes, was not detected in the independent convenient samples from other regions in Portugal. These facts contributed to rebuttal against the classification of this SIT244 amongst the sympatric strains. Moreover, in the SpolDB4, database SIT244 was important in only two settings, Portugal and Bangladesh [15,52]. Portuguese traders and missionaries were the first Europeans to settle in Bengal (what is now known as Bangladesh) as far back as the 15th century. In the sixteenth century Chittagong (Porto Grande) was a thriving Portuguese and Eurasian community of over 5000 people. The Portuguese presence remained in the region until the mid-seventeenth century when they were forced to leave by local opposition. Therefore, the nature of ties between the two populations, limited to a particular socioeconomical sector of the population during a specific time frame would unlikely have resulted in continuous sharing of genetic features. The SIT244 may have resulted from an imported allopatric strain that ravished amongst HIV infected patients in the Lisbon area.
Consequently, eliminating SIT244 from the sympatric group of strains our results corroborated previous findings in that TB with sympatric strains was not associated with HIV infection [37]. Although a high proportion of SIT244 strains was found in HIV-infected patients, no association of the allopatric strains including this SIT with HIV status was demonstrated. The finding, according to Fenner and collaborators [37], that HIV associates preferentially with allopatric strains was not confirmed in our setting when considering the non Euro-American lineage strains as allopatric versus the Euro-American lineage as sympatric. The association of HIV to MIRU-VNTR spoligotype based clonal complex CC4 was evidenced at a greater degree of genotypic resolution than lineage. Our findings do not invalidate the conclusion of Fenner and collaborators [37], but highlighted the necessity of zooming in on the genotypic resolution for detecting relevant associations.
A significant level of association of MDRTB with sympatric strains, was found. Sympatric strains are prone to outperform allopatric pathogens namely with increased and frequent ongoing transmission compared to allopatric host-pathogen combinations [7,37]. The major spoligotype signature SIT20 in our study belongs to the LAM1 sub-family. Also, in our previous studies, the RD Rio deletion was detected in up to 60% of the LAM isolates in Portugal and was associated with MDR and XDR transmission clusters [25,53]. So, although active transmission of isolates from this study was not evaluated, the active transmission of major LAM spoligotypes relevant to this study was available from other reports [22,25,48,53].
Additionally, the statically significant association with male gender, observed for two clonal complexes, has been previously described in this and other settings. This tendency has been described as characteristic of disease endemicity [43] although the reasons remain unclear [54].
In order to further qualify the host pathogen relationship, distribution curves of prevalence in relation to patients' age were analyzed with regards to the sympatric and allopatric host pathogen relationship in HIV negative patients. The typical trimodal curve has previously proven useful as a marker of the epidemic evolution of the disease, showing a gradual peak shift to older persons [4], also observed in Portugal, in the decade of 1997 to 2006 [43]. Age related peaks in the lower and upper extremes of disease prevalence distribution curves have been typically attributed to impaired immunity, and the central peak to socioeconomic factors. At a higher age, the increase of cases is frequently attributed to reactivation. Although new cases versus reactivation was not assessed in this age group, this idea has come to be questioned in other settings since the generalized application of molecular epidemiological tools, revealing a high proportion of clustered isolates suggestive of recent transmission [55][56][57][58].
The trimodal distribution mentioned above was accentuated in the case of allopatric genotypes (Fig 2A) and maintained when SIT244 was included in this group of strains ( Fig 2B). However, for TB caused by sympatric strains, a tendency towards a typical (standard) distribution curve was observed. It would appear that age is less relevant for susceptibility in the case of these highly transmissible sympatric strains.
On the other hand, the human genetics approach to mycobacterial infections has brought new insight to the genetic predisposition to disease with practical implications towards the development of a personalized medicine [59][60]. Genetic association studies represent a powerful tool in the identification of host genetic variants implicated in infection and disease and have given proof of evidence of the association between clinical disease and genes of both the innate or adaptive anti-mycobacterial immunity [59][60][61][62][63][64][65][66][67][68]. This approach is being increasingly discussed in terms of the host-pathogen relationship, where pathogen genotype is also considered [29][30][31][32][33].
Clinical onset at an early age has been attributed to rare mutations responsible Monogenic Mendelian traits and late onset to common polymorphisms having a milder effect on the risk of clinical disease. These however, have been considered as the two ends of a continuous spectrum of genetic susceptibility. Ensuring this continuum we would find relatively rare variants having a major gene effect [62,64]. Relative to the allopatric strains, we may assume that there has been less exposure, translated by an accentuation of the relevance of the age at risk as reflected in the trimodal distribution for TB prevalence. On the contrary, for the sympatric strains, the greater stability of host-pathogen relationship, characteristic of populations having survived historical exposure, the attenuation of the central peak and loss of the peaks at the extremes of the age distribution could be suggestive of genetic determinisms due to more common polymorphisms. Moreover, a major gene effect governing the host-pathogen sympatric relationship as suggested by these results from our particular study setting would be concordant with aspects governing the geographic specificity of Europe's human genetic variation [69] and appears particularly relevant for future study design.
An important strength of our study was the availability of large dataset resulting from the length of the sampling period (7 years) and the use of a consecutive mode of sampling of one isolate per TB patient from the major hospital deserving one of the most densely populated urban areas of Lisbon. Information on country of birth (or ancestry) was not available from our data set. To counter this fault, we relied on nationwide surveys reported by the Portuguese National School of Public Health referent to the same time frame as this study indicating that the incidence and endemic levels of TB in Portugal are mainly due to TB in nationals, contrary to what has been observed in some of the Northern European countries [43,45,70]. Generally speaking, a patient's country of birth has been used as a surrogate of the ancestry of the study population in M. tuberculosis phylogeography [8,34,37]. However, this does not apply to our setting as the wave of immigration in Portugal dates to decolonization in the 1970's so these populations now constitute Portuguese born second and third generations.
In TB, molecular epidemiological evidence suggestive of host-pathogen co-evolution has allowed pathogen genotypes to be grouped into distinct lineages in a way that correlates with ethnicity and geography. A central aspect of this analysis is the introduction of the idea that differences in the immunological response of the host can vary in accordance to the infectious strain. This premise has important implications, namely suggesting there could be a need for different vaccines in different parts of the world (precision medicine). Less virulent strains can be associated with immune deficiencies. As a working hypothesis, allopatric strains could be successful against a host background with less history of exposure and higher vulnerability.
Moreover, populations less fitted to the epidemic strains of the industrial revolution would have given way to the modern Europeans adapted to their sympatric strains. In terms of public health, the question of the duration of sympatric strain transmission clusters which, as we have seen, may be associated with severe MDRTB cases, remains unanswered. The reduction in number of TB cases accompanied by the steady rise of the age of the at-risk group, as observed on national scales, reflects the decline of an earlier epidemic. Would this be consistent with the view of coevolution between M. tuberculosis and its human host? How could the consequences that may be expected in terms of the genetic predisposition to the disease be translated into testable models using complementary approaches to provide the needed insight to deal with modern TB epidemics?

Conclusions
Increased recognition of the phylogeographic distribution of M. tuberculosis has brought much attention to pathogen genotype and how it may articulate with the host genetic background in the host pathogen-relationship. M. tuberculosis strains can be classified within sympatric or allopatric host-pathogen relationships at the discretion of genotypic resolution. Here, we argue that considering the M. tuberculosis Euro-American lineage as a whole, ignoring lineage subdivisions into genotypic families as published by Fenner and collaborators [37], may compromise future assumptions in this type of approach. Associations with clinical and demographic factors were evidenced using more discriminative strain genotyping, not being discernible at the lineage level. The impact on modeling divergence was also reflected by the study of the distribution of genotype prevalence among different age groups. This was discussed in terms of host genetic predisposition to disease, which may have important consequences in the design of genetic association studies. The choice of determinants in the definition of sympatric or allopatric host-pathogen relationships may thus affect working models of infection and disease in TB and, likewise, the interpretation of epidemiology and clinical outcome, with important implications in dealing with the modern TB epidemics.
Supporting Information S1