The Association between Mycobacterium Tuberculosis Genotype and Drug Resistance in Peru

Background The comparison of Mycobacterium tuberculosis bacterial genotypes with phenotypic, demographic, geospatial and clinical data improves our understanding of how strain lineage influences the development of drug-resistance and the spread of tuberculosis. Methods To investigate the association of Mycobacterium tuberculosis bacterial genotype with drug-resistance. Drug susceptibility testing together with genotyping using both 15-loci MIRU-typing and spoligotyping, was performed on 2,139 culture positive isolates, each from a different patient in Lima, Peru. Demographic, geospatial and socio-economic data were collected using questionnaires, global positioning equipment and the latest national census. Results The Latin American Mediterranean (LAM) clade (OR 2.4, p<0.001) was significantly associated with drug-resistance and alone accounted for more than half of all drug resistance in the region. Previously treated patients, prisoners and genetically clustered cases were also significantly associated with drug-resistance (OR's 2.5, 2.4 and 1.8, p<0.001, p<0.05, p<0.001 respectively). Conclusions Tuberculosis disease caused by the LAM clade was more likely to be drug resistant independent of important clinical, genetic and socio-economic confounding factors. Explanations for this include; the preferential co-evolution of LAM strains in a Latin American population, a LAM strain bacterial genetic background that favors drug-resistance or the "founder effect" from pre-existing LAM strains disproportionately exposed to drugs.


Introduction
The Mycobacterium tuberculosis Beijing clade is thought to be hyper-virulent [1,2] and in some regions has been associated with drug resistance [3]. One of the explanations for this association is the hypothesis that Beijing strains mutate more rapidly than other strains such as those of the Euro-American lineage (lineage 4) [4]. However, recent evidence for the rapid mutation rate of Beijing strains was derived from the comparison of Beijing strains against lineage 4 strains primarily of laboratory origin (CDC 1551, H37Rv and Erdman). Although two clinical strains were also included in this analysis these strains can not necessarily be regarded as representative of the broad diversity within lineage 4 [4].
The association of Beijing strains (or indeed any other strain) with drug resistance could also be biased by a 'founder effect'. In the Eastern ex-Soviet states where MDRTB is particularly prevalent, strains of the M. tuberculosis Beijing family have always been dominant while Latin American Mediterranean strains of the Euro-American lineage are under-represented [5,6]. When the national tuberculosis programs of the ex-Soviet Union failed, Beijing strains were disproportionally subjected to failing treatment regimens as well as being exposed to poor treatment within the prison system [7].
The Latin American Mediterranean family has been associated with drug resistance in subpopulation level studies in Kwa-Zulu Natal [8], Brazil [9], Russia [10] and the Ukraine [6]. The Latin American Mediterranean family and the Haarlem family both of the Euro-American lineage are dominant in South America. However, the country of Peru has both Latin American Mediterranean strains, Haarlem strains and the highest proportion of Beijing strains in the Americas [11]. Peruvian Beijing strains were imported to the country well before the advent of antibiotics in the mid 19 th century along with significant Chinese migration to the country [12]. Beijing, Haarlem and Latin American strains in Peru were then subjected to an inadequate national TB program in the 1980's and early 1990's [13] which preceded a dramatic rise in drug resistance.
To determine which M. tuberculosis genotypes were most associated with drug resistance, we undertook a prospective population level molecular epidemiological study of incident tuberculosis cases in Lima, Peru.

Laboratory Methods
Strains from positive MODS liquid cultures were sub-cultured onto solid Ogawa medium and transported to the research laboratory at Universidad Peruana Cayetano Heredia (Lima, Peru) for DNA extraction [18] and spacer-oligonucleotide typing ('spoligotyping') as described previously [19]. Automated 15-loci MIRU-VNTR was performed at the Kobe Institute (Kobe, Japan) following established protocols [20]. The denominator of all culture positive sputum samples sent to both regional laboratories was obtained from the national reference laboratory database. All drug resistant strains were sent to the national reference laboratory to confirm the diagnosis of drug resistance by the proportions method and a subset of half of all drug resistant samples were sent for illumina high-seq sequencing to confirm the diagnosis by molecular methods.
Strains were named by uploading the combined MIRU and spoligotype data to the MIR-U-VNTRplus website (www.miru-vntrplus.org) and following their protocol. Strains that could not be named because they did not match a clade in the MIRU-VNTRplus database were termed 'unknown' strains. Any strains that failed spoligotyping or MIRU-VNTR (because one or more loci were ambiguous by either MIRU or spoligotyping) were also genotyped again by both techniques, if they failed a second time they were excluded from the analysis.

Spatial and Statistical Analysis
Socio-economic status per city block (upper, middle and lower tertile) and population density (number of people per city block) was obtained from the latest Peruvian National Census [21]. All cases that could be mapped, were mapped to their place of residence at the level of the city block either manually with a handheld GPS machine or where possible directly onto Google Maps using Google Earth 9.0. Mapped cases were combined with census data by spatially merging the geographic coordinates using ArcGIS.
Data was analyzed in 'R' (R Foundation for Statistical Computing, Vienna, Austria 2011, www.R-project.org) and Stata (Release 11, StataCorp. 2009). The Manhattan distance (the sum of all absolute pair-wise differences between loci) was used to determine the genetic distance between genotypes. Minimum spanning trees were constructed using Cytoscape rather than Bionumerics in order to present the data more effectively using the organic graph drawing algorithm in Cytoscape. The UPGMA (Unweighted Paired Group Method with Arithmetic Mean) algorithm was chosen to construct the phylogeny because it allowed us to make a like for like comparison of our phylogeny with the phylogeny of the strain collection of the MIRU-VNTRplus website.
Determinants of drug resistance (any drug resistance vs drug susceptibility) were tested in the context of a multivariate logistic regression. All predictor variables with a significance of p<0.2 on univariate analysis and any potentially important confounding variables were included in a multivariate logistic regression analysis. All biologically plausible interactions were also examined for significance in the model. A significance value of p<0.05 was chosen for predictor variables in the multivariate model. The Haarlem clade was chosen as the reference comparison for all other clades as it was the second most prevalent clade in the population. The model was also re-run with a dichotomized clade variable (e.g. Beijing vs non-Beijing) to make comparisons between the clade in question and all other clades.

Independent Data Set Comparison
The principal study outcome was compared to an independent dataset compiled by the Institut Pasteur Guadaloupe [22] of 2192 different strains collected across South America all of which had the strain genotype and phenotype available for comparison. The study dataset was also independently analyzed by DC and NR at the Institut Pasteur Guadeloupe.

Ethics Statement
Ethical approval was obtained from the institutional review board of Universidad Peruana Cayetano Heredia before the study began. Consent was not obtained from participants because the data were analyzed anonymously after checking for duplicated patients. Institutional approval for the study was obtained from the Peruvian Ministry of Health.

Study Recruitment
The first positive culture of 2,139 different tuberculosis patients (2,139 cultures from 2,139 patients) was genotyped by both 15-loci MIRU-VNTR and spoligotyping. A total of 2086 strains were successfully genotyped (53 genotypes were excluded because either MIRU typing or spoligotyping failed or generated ambiguous results) and formed the core of the analysis. The demographic data of these patients is given in Table 1 (Fig 1). Other strains included those from the X family (135/2086, 6.5%, 95% CI 5.4%-7.6%) and T strains (196/2086, 9.4%, 95% CI 8.2-10.7%).
More than a quarter of the LAM sub-groups LAM1, LAM4, LAM5 and LAM9 (as defined by the SpolDB4 international database) strains were drug resistant. However, not all LAM subgroup strains were associated with drug resistance (Fig 2b). A minimum spanning tree of all 2086 strains coloured according to drug resistance demonstrates the clusters with the highest proportion of drug resistance (Fig 3). Linked nodes that differed by a Manhattan difference of 1 demonstrate possible strain evolution between patients. The largest cluster in the minimum spanning tree was part of the Haarlem clade. Both clusters and unique strains that were part of the LAM clades had the highest proportions of drug resistance and the Beijing clade demonstrated a high level of clonality.

Discussion
This large prospective molecular epidemiological study of tuberculosis conducted in a high incidence multidrug-resistant tuberculosis setting demonstrates a significant clade specific association with drug resistance independent of genetic clustering, and important clinical and socioeconomic confounding factors. The Latin American Mediterranean (LAM) clade was highly associated with drug resistance while the Haarlem and Beijing strains were less likely to be drug resistant. The founder effect could explain the association of LAM strains with drug resistance in South America and the association of Beijing strains with drug resistance in Asia [23,24]. However, in Peru because of European and Asian immigration a diversity of strains existed that predated the use of antibiotics. Strains from the Haarlem, Beijing and Latin American Mediterranean families were very likely to have been circulating in the population for over a century prior the use of antibiotics, yet only the Latin American Mediterranean family has emerged as being highly associated with drug resistance. The association of these strains with drug resistance remained statistically significant in both clustered and non-clustered strains excluding the role of a purely clonal expansion of drug resistant strains. One possible explanation for our finding is that LAM strains could harbor advantageous mutations that allow them to maintain fitness while also becoming drug resistant. Differential host-pathogen co-evolution may also explain why Latin American strains are so associated with drug resistance in South America. Latin American strains may more easily become drug resistant within a Latin American host. The prison within the study area had a prevalence of Beijing strains that was three times that of the surrounding community and prisoners were twice as likely to be drug resistant which suggests that the prison could act as an amplifier of drug resistant Beijing strain transmission in the community.
A sub-population level study of drug resistant strains in South Africa suggested that the LAM4 clade is one of the main contributors to the extensively drug resistant (XDR) tuberculosis outbreak [8]. This association has also recently been described in another study of 237 isolates in Brazil that also lacked the denominator of all strains to be able to compare the proportion of drug resistance between clades [9]. Interestingly the LAM subtypes most associated with drug resistance in the Brazilian study were the LAM1, LAM4, LAM5 and LAM9 strains, exactly the same subtypes that had the highest levels of drug resistance in our data set. Pre-existing evidence of this association in other regions and our confirmation of the association in an independent dataset makes our findings relevant across South America. One previous study undertaken in Peru by our research group with an independent set of hospital derived strains also highlighted the association of LAM9 with MDRTB [25], while another study by Taype et al supports our finding that Beijing strains are not associated with drug resistance in Lima [26].
This study benefited from the population level implementation of a sensitive and inexpensive diagnostic test, which enabled data to be gathered on a large proportion of new tuberculosis cases in a study area of approximately 2.3 million people. However, whilst epidemiological links are not necessary for tuberculosis to be transmitted [27] the knowledge of epidemiological links between cases would have improved our cluster definition. The duration of high population coverage in this study was 14 months, a longer duration of high coverage would limit the contribution of the strains diagnosed at the beginning and the end of the study and ensure that saturation of the clustered proportion was reached [28].
It is also necessary to acknowledge the limitations of MIRU-typing and spoligotyping. These techniques do not provide the same level of phylogenetic quality and resolution as 24-loci MIRU typing, whole genome sequencing and long sequence polymorphism typing. This risks misclassification of some strains into clades that are closely related by MIRU-typing and spoligotyping but determined to be more distantly related by whole genome sequencing or long sequence polymorphism typing. However, despite the limitations, MIRU-VNTR has been demonstrated to generate phylogenetic relationships that are broadly congruent with those of SNP typing, spoligotyping and long sequence polymorphisms [29]. MIRU-VNTR is still widely used for population level molecular epidemiological genotyping [20] and the resolution provided by MIRU-typing is also maximized in strains of the Euro-American lineage most frequently observed in our data set [30].
This population level study of tuberculosis in Lima has identified the Latin American Mediterranean clade as being highly associated with a drug resistant phenotype. Patients with drug sensitive disease caused by LAM strains may be more likely to acquire drug resistance or cause secondary cases of drug resistant disease after having become drug resistant. The standard Directly Observed Therapy short course (DOTs) is failing to prevent the onset and spread of drug resistant tuberculosis, particularly among these patients. In settings where LAM1, LAM4, LAM5 and LAM9 strains are highly prevalent it is particularly important that robust mechanisms are in place for drug resistance testing and surveillance.