Genetic Diversity of Mycobacterium tuberculosis in Peru and Exploration of Phylogenetic Associations with Drug Resistance

Background There is limited available data on the strain diversity of M tuberculosis in Peru, though there may be interesting lessons to learn from a setting where multidrug resistant TB has emerged as a major problem despite an apparently well-functioning DOTS control programme. Methods Spoligotyping was undertaken on 794 strains of M tuberculosis collected between 1999 and 2005 from 553 community-based patients and 241 hospital-based HIV co-infected patients with pulmonary tuberculosis in Lima, Peru. Phylogenetic and epidemiologic analyses permitted identification of clusters and exploration of spoligotype associations with drug resistance. Results Mean patient age was 31.9 years, 63% were male and 30.4% were known to be HIV+. Rifampicin mono-resistance, isoniazid mono-resistance and multidrug resistance (MDR) were identified in 4.7%, 8.7% and 17.3% of strains respectively. Of 794 strains from 794 patients there were 149 different spoligotypes. Of these there were 27 strains (3.4%) with novel, unique orphan spoligotypes. 498 strains (62.7%) were clustered in the nine most common spoligotypes: 16.4% SIT 50 (clade H3), 12.3% SIT 53 (clade T1), 8.3% SIT 33 (LAM3), 7.4% SIT 42 (LAM9), 5.5% SIT 1 (Beijing), 3.9% SIT 47 (H1), 3.0% SIT 222 (clade unknown), 3.0% SIT1355 (LAM), and 2.8% SIT 92 (X3). Amongst HIV-negative community-based TB patients no associations were seen between drug resistance and specific spoligotypes; in contrast HIV-associated MDRTB, but not isoniazid or rifampicin mono-resistance, was associated with SIT42 and SIT53 strains. Conclusion Two spoligotypes were associated with MDR particularly amongst patients with HIV. The MDR-HIV association was significantly reduced after controlling for SIT42 and SIT53 status; residual confounding may explain the remaining apparent association. These data are suggestive of a prolonged, clonal, hospital-based outbreak of MDR disease amongst HIV patients but do not support a hypothesis of strain-specific propensity for the acquisition of resistance-conferring mutations.


Introduction
Molecular fingerprinting of M. tuberculosis (MTB) permits investigation of the epidemiology of tuberculosis to a previously unattainable level of detail, revealing insights into the differential transmission success of strains whilst observation and analysis of this epidemiology can generate testable hypotheses about strain biology [1]. There is limited data on the epidemiology and strain diversity of M tuberculosis in Peru [2], though there may be interesting lessons to learn from a setting where multidrug resistant TB has emerged as a major problem despite an apparently wellfunctioning DOTS control programme [3]. Here we report the results of an exercise to spoligotype all the strains of a large bank of well-characterized MTB strains derived from research projects conducted in Lima, Peru between 1999 and 2005, conduct spoligotype-based phylogenetic analyses and explore phylogenetic associations with HIV infection and drug resistance.

Strain bank
The sampling framework for this study was opportunistic, making use of a strain bank of anonymised but phenotypically well characterized strains of M tuberculosis collected in the course of four clinical research studies conducted amongst adults with pulmonary tuberculosis in hospital and community-based studies in Lima, Peru. All studies had been reviewed and approved by the Institutional Review Board of Universidad Peruana Cayetano Heredia (Lima, Peru). Spoligotyping was undertaken on 794 strains of M tuberculosis (1 per patient) which came from (1)  . Recruitment for each of these studies has been reported previously [4,5,6]. Strains were stored at 270uC in the Laboratorio de Investigación de Enfermedades Infecciosas of Universidad Peruana Cayetano Heredia (UPCH) in Lima. Available patient data was limited to gender, age and HIV status; strain data included first line drug susceptibility profile and name of source study with date of collection of original clinical sample.
Spoligotyping was undertaken at UPCH in batches over several months and films were interpreted by two independent readers; for the rare occasions where there was lack of independent agreement and subsequent failure to resolve discrepancies between both readers spoligotyping was repeated and the new film was used. Phylogenetic analyses and the construction of phylogenetic trees and spoligoforests (drawn using the SpolTools software available through http://www.emi.unsw.edu.au/spolTools; [7,8]) permitted identification of clusters and orphan strains by comparison with the SITVIT2 database (Institut Pasteur de Guadeloupe). The minimum spanning tree (MST), based on spoligotyping patterns, was drawn using BioNumerics software version 3.5. The MST is an undirected connected graph which links all the strains together with the fewest possible linkages between nearest neighbours. Contrarily to the MST, the spoligoforest trees are directed graphs which only evolve by loss of spacers. In these trees, nodes are not necessarily all connected (indeed, in case of too many changes between two strains, there are no edges linking them. In combination with SpolTools software, GraphViz software (http://www.graphviz.org) [9] was used to colour the orphan strains on the spoligoforest trees. Strains were categorized into spoligotype international types (SIT) and clades for the purpose of reporting strain diversity within the strain bank. In a univariate analysis odds ratios were computed for associations between strain groupings (by clade and by SIT separately) and patient gender, HIV status, strain year of origin, isoniazid mono-resistance, rifampicin mono-resistance and multidrug resistance (with each compared to drug susceptible reference group); in subsequent multivariate logistic regression only those clade or SIT associations with a p value,0.1 on univariate analysis were included in the model.

Strain diversity
Of 794 strains from 794 patients there were 149 different spoligotypes identified. Of these there were 27 strains with novel, unique orphan spoligotypes, 718 which mapped to spoligotypes already described at least twice, and 49 which were newly created shared types either within the present study or after a match with an orphan in the database. Descriptions of the orphan strain spoligotypes and the demographics of the source patients are given in table 1 The phylogenetic relationships between strains are illustrated in minimum spanning trees ( Figure 1) which demonstrate that PGG2/3 (H, LAM, T, X, S) strains are highly predominant (representing 83.8% of all strains), most notably H (n = 228, 28.7%), LAM (n = 225, 28.3%) and T (n = 161, 20.3%). (figure S1 includes SIT numbers which can be seen by zooming in on pdf file). The spoligoforests shown in Figure 2 (and figure S2) highlight (regardless of layout technique) that SIT50 (H3) is the largest node (n = 130), followed by SIT53 (T1, n = 98), SIT33 (LAM3, n = 66), SIT42 (LAM9, n = 59) and SIT1 (Beijing, n = 44).

Strain clade associations
There was no predominant spoligotype associated with MDR amongst TB patients without HIV co-infection (table 4) -the odds of MDR were highest in those with disease caused by the LAM9 spoligotype SIT 42 though this was not statistically significant. Amongst patients with HIV co-infection this spoligotype was associated with by far the highest odds of MDR (87.5% of HIV patients with SIT42 disease had MDR); the T1 spoligotype SIT 53 was also associated with a increased odds of MDR, though only amongst patients with HIV (60.0% of HIV patients with SIT53 disease had MDR compared to 14.0% of HIV uninfected patients).
HIV infection was strongly associated with MDRTB in this analysis. This association was significantly reduced (though incompletely mitigated) after adjustment for confounders, an effect largely mediated by inclusion of SIT42 and SIT53 (Table 5). Indeed even after adjustment for HIV and other covariates SIT42 and SIT53 were independently associated with MDRTB, though not with either isoniazid or rifampicin resistance. On univariate analysis male gender was associated with MDRTB but this effect was entirely driven by the increased HIV prevalence in males and disappeared after adjustment in the multivariate model. Neither study site nor year of sample collection were associated with drug resistance in the multivariate model.

Discussion
In this report of strain diversity from Peru covering the period 1999-2005 3.4% of spoligotypes observed were from novel, orphan strains. The nine most frequently observed spoligotypes (out of 149 observed) accounted for over 60% of all disease and the eight of these also featured amongst the nine most frequently    LAM3 Clustered *A total of 100/122 SITs (n = 718) matched a preexisting shared-type in the database, whereas 22/122 SITs (n = 49 isolates) were newly created either within the present study or after a match with an orphan in the database. A total of 66 SITs containing 711 isolates were clustered within this study (2 to 130 isolates per cluster), while 56 SITs contained a unique strain within this study. Note that SITs followed by an asterisk indicates ''newly created shared-type'' (n = 22 containing 49 isolates) due to 2 or more strains belonging to an identical new pattern within this study or after a match with an orphan in the database. SIT designations followed by number of strains: 2961* this study (n = 2); 3000* this study (n = 3); 3001* this study (n = 8) and USA (n = 1); 3004* this study (n = 2); 3005* this study (n = 2); 3006* this study (n = 2) and South Africa (n = 1); 3007* this study (n = 1) and USA (n = 1); 3008* this study (n = 1) and USA (n = 1); 3009* this study (n = 2); 3010* this study (n = 1) and USA (n = 1); 3011* this study (n = 4); 3012* this study (n = 2); 3013* this study (n = 3) and USA (n = 1); 3014* this study (n = 1) and Argentina (n = 1); 3015* this study (n = 2); 3016* this study (n = 1) and Panama (n = 1); 3017* this study (n = 3); 3089* this study (n = 2) and Mexico (n = 1); 3168* this study n = 1, Sweden (n = 1); 3431* this study (n = 2); 3432* this study (n = 2); 3433* this study (n = 2), BRA (n = 1). **Lineage designations according to SITVIT2 using revised SpolDB4 rules; ''Unknown'' designates patterns with signatures that do not belong to any of the major clades described in the database. ***Clustered strains correspond to a similar spoligotype pattern shared by 2 or more strains ''within this study''; as opposed to unique strains harboring a spoligotype pattern that does not match with another strain from this study. Unique strains matching a preexisting pattern in the SITVIT2 database are classified as SITs, whereas in case of no match, they are designated as ''orphan'' (see Table 1). doi:10.1371/journal.pone.0065873.t002 *Worldwide distribution is reported for regions with more than 3% of a given SITs as compared to their total number in the SITVIT2 database. The definition of macro-geographical regions and sub-regions (http://unstats.un.org/ unsd/methods/m49/m49regin. observed in a previous study in north Lima [2]; 5.5% were SIT1/ Beijing family. Strains belonging to Haarlem, T, LAM and Beijing families predominated, and drug-resistance was not shown to be associated with any specific family, including Beijing, findings consistent with the single previous report from Peru [2]. With the exception of the Beijing family strains, recently examined in greater detail [10] very few PGG1 strains (AFRI, BOV, EAI, Manu) were found in this study (n = 9, 1.1%). One may notice that these PGG1 strains are not located at central positions on the trees (spoligoforests and MST). Instead, they mostly occupied terminal leaves of the trees (in the MST), or were isolated with few or no connections with other strains in the spoligoforests.
However, PGG2/3 group (n = 665, 83.8%) strains which are predominant in the study occupied a more visible and central position on the trees. Spoligoforest trees have been used to highlight the predominance of some specific well known shared types (SIT). These trees can also give us an overview on the parental links that probably exist between strains belonging to different lineages. For example, one may notice on the top left of the hierarchical layout figure (2A), that SIT19/EAI2-Manila may lead to SIT1/Beijing, through loss of many spacers.
The MST very well shows the similarity (or the distance) between each strain, and clearly defines the major evolution of the MTB lineages present in the study. For example, one can notice that in Figure 1, the Beijing family group is very far from the strains present in the central nodes of strains belonging to the PGG2/3 group (Euro American). At the very bottom of the MST, we can note the presence of the only two strains belonging to EAI lineage (SIT19/EAI2-Manila and SIT11/EAI3-IND).
The spoligoforest tree demonstrates that most of the orphans belong to the modern PGG2/3 group (H, T, LAM, T, X, S). The orphan strains are mostly located at terminal positions on the trees or are located in the top right layer of the hierarchical layout as isolated strains without interconnections with the other strains. The phylogenetic tree connects each genotype based on degree of changes required to go from one allele to another. The structure of the tree is represented by branches (continuous vs. dotted lines) and circles representing each individual pattern. Note that the length of the branches represents the distance between patterns while the complexity of the lines (continuous, black dotted and gray dotted) denotes the number of allele/ spacer changes between two patterns: solid lines, 1 or 2 changes (thicker ones indicate a single change, while the thinner ones indicate 2 changes); dotted lines, three or more changes (black dotted for 3, and grey dotted for 4 or more changes). The color of the circles is proportional to the number of clinical isolates in our study, illustrating unique isolates (sky blue) versus clustered isolates (Blue, 2-5 strains; dark blue, 6-9 strains; Bordeaux, 10-19 strains; Red, 20 and more). Note that orphan patterns are circled with the letter ''o'' in red. Patterns marked by an asterisk (*) indicate a strain with an unknown signature (unclassified). doi:10.1371/journal.pone.0065873.g001 Indeed, none of the orphans explicitly belonged to the PGG1 group (considering that some orphans have an unknown lineage).
Given the celebrated performance of the Peruvian National Tuberculosis Control Programme in demonstrating the effectiveness of DOTS in bringing about a reduction in TB incidence [3], the emergence of MDRTB in Peru as a major threat might be viewed as surprising [11]; conventional wisdom suggests MDR is driven by weak healthcare systems. We were therefore interested in exploring whether other factors, such as strain-specific biological propensity for resistance, might be relevant. In unselected community-based TB patients (largely HIV-negative) there was no association observed between drug resistance and specific spoligotypes. However amongst patients with HIV recruited from a hospital setting MDR was particularly frequently seen amongst the SIT42 and SIT53 strains. After multivariate analysis to control for the effects of HIV infection, gender and year Figure 2. Discrete spoligotypes relationships for all isolates (n = 794) presented through spoligoforest tree drawn as a ''hierarchical layout'' using the SpolTools software (available through http://www.emi.unsw.edu.au/spolTools; Reyes et al. 2008 [11]). Each spoligotype pattern from the study is represented by a node with area size being proportional to the total number of isolates with that specific pattern. Changes (loss of spacers) are represented by directed edges between nodes, with the arrowheads pointing to descendant spoligotypes. In this representation, the heuristic used selects a single inbound edge with a maximum weight using a Zipf model. Solid black lines link patterns that are very similar, i.e., loss of one spacer only (maximum weight being 1.0), while dashed lines represent links of weight comprised between 0.5 and 1, and dotted lines a weight less than 0.5. Orphan isolates, indicated in cyan, are isolated strains without interconnections with the other strains. This presentation illustrates for example the parental links for PGG2/3 strains such as SIT53 and SIT42, showing how SIT53 may be considered as the precursor of all other modern PGG2/3 patterns. SIT53 leads to SIT50/H3 by the loss of spacer 31, and it leads to SIT42 by the loss of four spacers (spacers 21-24), which in turn leads to SIT1355/LAM via SIT64/LAM6 then SIT95/LAM6. Through other spacer deletions, SIT53 leads to SIT91/X3 via SIT119/X1 and SIT92/X3. Lastly, SIT222/Unknown has no parental SITs in our study. doi:10.1371/journal.pone.0065873.g002 the effect size increased; given the lack of such an association in the community one hypothesis to explain this would be that this is highly suggestive of a prolonged nosocomial clonal outbreak with strains of these two spoligotypes. The alternative hypothesis of a biological predisposition of these specific strains to acquire drug resistance-conferring mutations is much less likely given the absence of an association with isoniazid or rifampicin monoresistance. It is noteworthy that the association of HIV with MDR, though diminished after adjustment for SIT42 and SIT53, remained significant indicating that an outbreak with strains from these two spoligotypes is insufficient to explain the whole HIV-MDR association. We cannot exclude the possibility of residual confounding as the explanation for this apparent association.
There are acknowledged limitations of the data presented here. Most importantly our sampling strategy was opportunistic, making use of a strain bank derived from several studies with different designs so the study populations differed and though all relevant subgroups (community and hospital based, HIV infected and uninfected) were included the sample could not be considered representative. There are advantages in having a strain bank which is delinked from patient identifiers but the drawback is that only limited clinical data is available and returning to clinical notes for further detail is not possible -it would have been interesting to differentiate between new and retreatment cases and to investigate patient outcomes by strain, for example. Finally, because the samples were all from studies in adults we were unable to describe the strains causing paediatric disease thus missing an opportunity to clearly identify currently/recently circulating strains, and because all strains were from pulmonary TB patients we were unable to investigate whether extrapulmonary disease phenotype was associated with any particular strain in Peru as has been suggested elsewhere [12].
There are important strengths in our strain bank: (1) each sample evaluated here is a single strain from a unique patientthough serial strains are available in the bank, for this analysis we were careful to only examine one strain per patient, (2) availability of drug susceptibility data and HIV status for every strain enabled the analysis we report here with very few missing values, in contrast to an earlier report for which only 70% of strains had drug susceptibility data and HIV status was not reported [2], (3) the spread of strains includes diverse but well characterized patient demographic groups which are also geographically spread across metropolitan Lima (home to one third of the population of Peru and more than 75% of the incident TB), (4) the collection reported here span a time period of 6 years (indeed the bank continues to accumulate strains to the present day, extending the collection to more than 13 years) enabling investigation of temporal trends (none were found here).
In summary, we report the strain distribution of M tuberculosis isolates in Lima, Peru, highlight a significant proportion of novel spoligotypes, and hypothesize a prolonged, clonal, hospital-based outbreak of MDR disease amongst HIV patients but no evidence to support a hypothesis of strain-specific propensity for the acquisition of resistance-conferring mutations. Figure S1 MST from manuscript Figure 1 presented in PDF format which (through zooming in) enables reading of SIT labels. (PDF) Figure S2 Discrete spoligotypes relationships for all isolates (n = 794) presented through a Fruchterman Reingold spoligoforest tree drawn using the SpolTools software (available through http://www.emi.unsw.edu. au/spolTools; Reyes et al. 2008 [11]). Each spoligotype pattern from the study is represented by a node with area size being proportional to the total number of isolates with that specific pattern. Changes (loss of spacers) are represented by directed edges between nodes, with the arrowheads pointing to descendant spoligotypes. In this representation, the heuristic used selects a single inbound edge with a maximum weight using a Zipf model. Solid black lines link patterns that are very similar, i.e., loss of one spacer only (maximum weight being 1.0), while dashed lines represent links of weight comprised between 0.5 and 1, and dotted lines a weight less than 0.5. Orphan isolates, indicated in cyan, appear at terminal positions on the tree, as isolated strains without interconnections with the other strains. (PDF) Table S1 Detailed genotyping and drug-resistance data and demographic information on M. tuberculosis strains (n = 794) isolated from adults with pulmonary tuberculosis in Lima, Peru. (PDF)

Table S2
A comparison of the proportion of the most predominant SITs found in Peru as compared to neighbouring countries (Brazil, Colombia) and regions (Central America and Caribbean), recorded in the SITVIT2 database as consulted on 9 April 2013. (PDF)