Analysis of Mycobacterium tuberculosis Genotypic Lineage Distribution in Chile and Neighboring Countries

Tuberculosis (TB), caused by the pathogen Mycobacterium tuberculosis (MTB), remains a disease of high importance to global public health. Studies into the population structure of MTB have become vital to monitoring possible outbreaks and also to develop strategies regarding disease control. Although Chile has a low incidence of MTB, the current rates of migration have the potential to change this scenario. We collected and analyzed a total of 458 M. tuberculosis isolates (1 isolate per patient) originating from all 15 regions of Chile. The isolates were genotyped using the spoligotyping method and the data obtained were analyzed and compared with the SITVIT2 database. A total of 169 different patterns were identified, of which, 119 patterns (408 strains) corresponded to Spoligotype International Types (SITs) and 50 patterns corresponded to orphan strains. The most abundantly represented SITs/lineages were: SIT53/T1 (11.57%), SIT33/LAM3 (9.6%), SIT42/LAM9 (9.39%), SIT50/H3 (5.9%), SIT37/T3 (5%); analysis of the spoligotyping minimum spanning tree as well as spoligoforest were suggestive of a recent expansion of SIT42, SIT50 and SIT37; all of which potentially evolved from SIT53. The most abundantly represented lineages were LAM (40.6%), T (34.1%) and Haarlem (13.5%). LAM was more prevalent in the Santiago (43.6%) and Concepción (44.1%) isolates, rather than the Iquique (29.4%) strains. The proportion of X lineage was appreciably higher in Iquique and Concepción (11.7% in both) as compared to Santiago (1.6%). Global analysis of MTB lineage distribution in Chile versus neighboring countries showed that evolutionary recent lineages (LAM, T and Haarlem) accounted together for 88.2% of isolates in Chile, a pattern which mirrored MTB lineage distribution in neighboring countries (n = 7378 isolates recorded in SITVIT2 database for Peru, Brazil, Paraguay, and Argentina; and published studies), highlighting epidemiological advantage of Euro-American lineages in this region. Finally, we also observed exclusive emergence of patterns SIT4014/X1 and SIT4015 (unknown lineage signature) that have hitherto been found exclusively in Chile, indicating that conditions specific to Chile, along with the unique genetic makeup of the Chilean population, might have allowed for a possible co-evolution leading to the success of these emerging genotypes.


Introduction
Mycobacterium tuberculosis (MTB), a Gram-positive bacterium, is the causative agent for tuberculosis (TB); a disease that presents millions of new cases and deaths every year and is considered as a major threat to public health [1]. Although a quick and assertive diagnosis is essential for early treatment and disease management, many people lack proper access to the facilities required for adequate or early diagnosis. In this context, the molecular typing of MTB has greatly improved our knowledge and control of tuberculosis by allowing for: (i) the detection of unsuspected transmission, (ii) the identification of false-positive cultures, (iii) discrimination between reinfection and relapse [2], and (iv) the study and analysis of global biodiversity and phylogeographical variations of the tubercle bacilli [3]. In the last 2 decades, the cumbersome "gold standard" IS6110-RFLP methodology necessitating Southern blotting [4,5] was progressively replaced by PCR-based methods, namely spoligotyping and Mycobacterial interspersed repetitive unit-variable number of tandem-repeat (MIRU-VNTR) typing [6][7][8][9][10]. These methods showed satisfactory discriminatory power and reproducibility [9], particularly when used concomitantly that further allows to identify MTB genetic lineages-a prerequisite to study TB population structure at local, regional, and global scale [11].
As far as TB in the Americas is concerned, it accounted for approximately 285,200 new cases of TB in 2013, which equals to 29 new cases per 100,000 inhabitants [12]. Despite twothirds (69%) of all cases reported being from South American patients, wide epidemiological differences including varying incidence rates were found both country and region wise, highest incidence rates being from Brazil (46/100,000), Bolivia (123/100,000) and Peru (124/100,000) [12]. In Chile different rates of incidence ranging from 7.5/100,000 to 23.7/100,000, were observed, the highest rates being reported from the northern regions of the country [13]. According to the latest available data, the TB incidence in Chile for 2014 was 12.3 per 100,000, which remains significantly higher than the rate of elimination recommended for declassifying TB as a public health problem (<5 per 100,000).
Investigations conducted on MTB genotypic lineage distribution in Southern America have shown that although the Euro-American lineage is the most widely represented, regional differences in the distribution of lineages/sublineages are frequently observed both between as well as within countries [14][15][16][17][18][19][20][21][22]. According to the SITVIT2, the LAM, Haarlem and T families are the most commonly observed members of the Euro-American lineage in South and Central America and the Caribbean, a distribution profile shared with Europe and Middle Africa [23]. Nonetheless, some regional specificities such as cases involving a LAM family strain designated as RD Rio associated with multiple drug resistance (MDR) in Brazil [18], and the almost exclusive "significant" presence of the Beijing family in Peru and Colombia associated both with MDR as well as XDR cases [20,21] may be cited. It is today possible to pinpoint such specificities as well underline the finer differences in MTB population structure at local, regional and macro regional levels, thanks to international genotyping databases and web tools for molecular epidemiology of tuberculosis [23,24].
Limited MTB genotyping studies in Chile, conducted primarily in metropolitan regions, suggested that Chile is dominated by three major MTB lineages, LAM, T, and Haarlem [25][26][27]. To provide with a more comprehensive snapshot of MTB lineages circulating in Chile, we collected and analyzed a total of 458 M. tuberculosis isolates (1 isolate per patient) originating from all 15 regions of Chile, including the Metropolitan Region and large cities in north (e.g., Iquique) as well as the south (e.g., Concepción). The data on MTB lineage distribution in Chile versus neighboring countries were compared using the SITVIT2 database [3,28], so as to highlight exclusive emergence of certain genotypes as well as the potential associated risks in Chile.

Sample collection
A total of 458 isolates (1 isolate per patient) of M. tuberculosis were collected from 15 regions of Chile (S1 Fig). The isolates, obtained from treatment virgin patients (VT), were collected between 2011 (n = 338 or 73.80%) and 2012 (n = 120 or 26.20%), all the isolates were used in the study. Of them, 304 isolates were from male patients (66.38%) and 154 from female patients (33.62%). The isolates were extracted from body fluids (sputum: 95%, bronchoalveolar lavage fluid: 3.7%, tissue: 0.9%, blood 0.2% and pleural fluid: 0.2%). TB was diagnosed in the clinical centers of the respective cities and the diagnosis was confirmed at the Supranational Reference Mycobacteria Laboratory at the Institute of Public Health of Chile which also conducted first and second line drugs susceptibility tests using the Lowenstein-Jensen medium and the proportion method [29].
The strains used in this study were collected routinely during activities of the state TB control program. No patients were contacted to request additional information. The study was reviewed and approved by the review board of Biomedical Department of Public Health Institute who granted permission for use of the MTB isolates and clinical data for the purposes of the study and waived the need for written informed consent from participants. This study is part of the monitoring carried out by the Institute of Public Health of Chile for the diagnosis and characterization of infectious agents.

DNA extraction and genotyping
MTB isolates obtained from the Lowenstein Jensen medium cultures were resuspended in 500 μL TE (10 mM Tris pH 8.0, 1 mM EDTA pH 8.0). Suspensions were inactivated by heating at 95°C for 15 minutes. Bacterial DNA was isolated by treatment with cetyltrimethylammonium bromide (CTAB) in the presence of 0.7 M NaCl as described previously [30].
Spoligotyping was performed at the Molecular Genetics sub-department of Institute of Public Health of Chile by using the Spoligotyping kit (Ocimum Biosolutions) as described by Kamerbeek et al. [8]. Briefly, the 43 spacers between the direct repeats in the target region were amplified using biotinylated primers and the PCR products were then hybridized to a membrane and visualized by chemiluminescence. M. tuberculosis H37Rv and M. bovis BCG controls, included in the kit, were used as controls for each run. Spoligotyping results were converted into the octal code for comparison with the SITVIT2 proprietary database of Institut Pasteur de la Guadeloupe which is an updated version of the previously released SITVITWEB database (available at: http://www.pasteur-guadeloupe.fr:8081/SITVIT_ONLINE/) [23]. Within this database, a SIT is created when 2 or more isolates share an identical spoligotyping pattern.

Phylogenetic and statistical analysis
In order to study the phylogenetic relationships of given spoligotypes obtained in this study (n = 458 isolates), a minimum spanning tree (MST) was created using the BioNumerics version 6.6 software (Applied Maths, Sint-Martens-Latem, Belgium; available at: http://www.applied-maths.com/bionumerics). Furthermore, a spoligoforest tree was constructed using the Fruchterman-Reingold algorithm in the SpolTools software (http://www.emi.unsw.edu.au/ spolTools) [31]; the tree was reshaped and colored using the GraphViz software (http://www. graphviz.org) [32]. Unlike the MST, the spoligoforest is a directed graph which tentatively highlights parent to descendant relationships between spoligotypes. Lastly, Fisher's Exact and Pearson's Chi-squared tests were calculated using R software. A p-value<0.05 was considered statistically significant.
Spoligotyping of the 458 isolates revealed a total of 169 different patterns: 119 patterns (408 strains, 89.08% of sample population) corresponded to the shared-types or SITs when compared to the SITVIT2 database, and 50 patterns (50 strains, 10.92% of sample population) corresponded to orphan strains that have not yet been reported in literature. A total of 95/119 SITs (n = 357 isolates) matched a preexisting shared-type recorded in the database, whereas 24/119 SITs (n = 51 isolates) were newly created when they matched orphan strains recorded in the database (SIT4014 -SIT4037). The 50 orphan remaining strains were also recorded in the database. A detailed description of the 119 SITs (n = 408 isolates) and their corresponding spoligotyping defined lineages/sublineages are shown in Table 1.
It is noteworthy that SIT33/LAM3 was particularly well represented in our study (n = 44; 9.61%). Also, SIT720/LAM3 was observed in the Chilean population with 6 isolates in our study representing around 54.55% of isolates recorded in SITVIT2 database; this SIT was reported exclusively in South America (see Table 2). The other two SITs that were well represented in this study as compared to the registered SITs in SITVIT2 databases, were SIT211 and SIT1277 (10.17% and 16.67% respectively). Interestingly, results obtained from our study suggest that the LAM3 sublineage is very well represented in the Chilean cohort. Interestingly two novel SITs, 4014 and 4015, have been reported at a relatively high proportion (n = 7 isolates each) for the first time in this study. ). Note that SIT followed by an asterisk indicates "newly created" SIT due to 2 or more strains belonging to an identical new pattern within this study or after a match with an orphan in the database; SIT designations followed by number of strains: 4014* this study n = 7; 4015* this study n = 7; 4016* this study n = 2; 4017* this study n = 3, VEN n = 1; 4018* this study n = 2; 4019* this study n = 1, ESP n = 1; 4020* this study n = 1, ARG n = 1; 4021* this study n = 1, IDN n = 1; 4022* this study n = 1, TUN n = 1; 4023* this study n = 1, PER n = 1; 4024* this study n = 3; 4025* this study n = 4; 4026* this study n = 2; 4027* this study n = 1, ESP n = 1; 4028* this study n = 1, PER n = 1; 4029* this study n = 1, BRA n = 1; 4030* this study n = 2; 4031* this study n = 2; 4032* this study n = 2; 4033* this study n = 1, ITA n = 1; 4034* this study n = 2; 4035* this study n = 2; 4036* this study n = 1, BRA n = 1; 4037* this study n = 1, BRA n = 1. The 3 letter country codes are according to http://en.wikipedia.org/wiki/ ISO_3166-1_alpha-3. ** Lineage designations according to SITVIT2 using revised SpolDB4 rules; "Unknown" designates patterns with signatures that do not belong to any of the major lineages described in the database. *** Clustered strains correspond to a similar spoligotype pattern shared by 2 or more strains "within this study"; as opposed to unique strains harboring a spoligotype pattern that does not match with another strain from this study. Unique strains matching a preexisting pattern in the SITVIT2 database are classified as SITs, whereas in case of no match, they are designated as "orphan" When the proportion of predominant SITs found in Chile (n = 458 isolates) as well as the distribution patterns for major cities (Santiago, Iquique and Concepción) were compared, it was observed that the strains belonging to SIT91/X3 were localized to Iquique, whereas the strains belonging to SIT37/T3, SIT60/LAM4, SIT64/LAM6, SIT283/H1 and SIT720/LAM3 were mostly isolated from the Santiago based population, and strains belonging to newly created SIT4014/X1 were predominantly found in Concepción city (p-value<0.008; Table 3). Significant differences were noted when the distribution patterns of predominant SITs in this study were compared with that of neighboring countries (Peru, Brazil, Paraguay, Argentina) as recorded in the SITVIT2 database (n = 7378 isolates; p-value<0.00001; Table 3 and Fig 1A). In comparison to neighboring countries, SIT33 is well represented in Chile (9.61% of isolates) . The phylogenetic tree connects each genotype based on degree of changes required to go from one allele to another. The structure of the tree is represented by branches (continuous vs. dashed and dotted lines) and circles representing each individual pattern. Note that the length of the branches represents the distance between patterns while the complexity of the lines (continuous, gray dashed and gray dotted) denotes the number of allele/spacer changes between two patterns: solid lines, 1 or 2 or 3 changes (thicker ones indicate a single change, while the thinner ones indicate 2 or 3 changes); gray dashed lines represent 4 changes; and gray dotted lines represent 5 or more changes. The size of the circle is proportional to the total number of isolates in our study, illustrating unique isolates (smaller nodes) versus clustered isolates (bigger nodes). The color of the circles indicates the phylogenetic lineage to which the specific pattern belongs. The labels of nodes indicate predominant SITs in study (containing at least 5 or more isolates).
doi:10.1371/journal.pone.0160434.g001 Table 2. Description of clusters containing >1% (n = 5 or more isolates) in this study, and their worldwide distribution in the SITVIT2 database. SIT   as compared to others countries. Similar results have also been reported in case of SIT211 and other SITs such as SIT37, SIT64, SIT91, SIT283, SIT720, SIT1277, SIT4014 and SIT4015, which are yet to be reported or present in very low percentages. On the other hand, SIT53, SIT50 and SIT42, which are well represented in our study (11.57%, 5.90%, 9.39% respectively) have also been reported with a relative high percentage in the neighboring countries. In general, it can be concluded that last three SITs are widely distributed. All the analysis described is collated in a phylogeographical distribution map (Fig 1A) of major M. tuberculosis lineages in the three of the most important Chilean cities enrolled in this study (Santiago, Iquique and Concepción), as well as the neighboring countries (Brazil, Paraguay and Peru; data extracted from the SITVIT2 database). The Fig 1B shows the spoligotyping-based MST; it is evident that LAM and T lineages compose two of the most predominant groups in our study and include most of the isolates with frequently observed patterns SIT33, SIT42, and SIT53, SIT37 respectively. Other isolates were classified as the Haarlem (SIT50 recorded as most frequent) and X (SIT4014 recorded as most frequent) lineages. More distance was observed amongst isolates that join in with the Haarlem than those integrating with LAM or T lineages.
The spoligotype analysis of common SITs (SIT42, SIT53, SIT33, SIT50 and SIT37) clearly demonstrated that they were composed of very closely related isolates and were distinguished from one another by changes in only few alleles/spacers. The spoligoforest tree generated by means of the Fruchterman-Reingold algorithm is illustrated in Fig 2. It allows the visualization of all genetic associations and mutations in our strain sample. In this tree, each node represents a spoligotype colored according to the lineage involved, while any potential mutation event that might have occurred from the parental spoligotype is represented as an edge. Thus spoligoforest analysis not only allowed to identify genetic associations between different MTB strains but also helped to gain insight into their genetic diversity and evolution as follows. Briefly, it confirmed the dominance of SIT42, SIT33 (LAM) and SIT53 (T), and to a lesser extent that of SIT50 (H) and SIT37 (T). The SIT42/LAM9 cluster was the largest node evolved from SIT53/T1 and multiple spoligotypes were observed to be rising from it, e.g., SIT33/LAM3, while the second largest spoligotype pattern SIT50/H3 appeared to have derived originally from SIT53/T1. Lastly, the newly created SIT4014/X1 (found only in this study) was a relative precursor of SIT91/X3 (predominantly found in USA, Peru, and Haiti).

Discussion
In this study, we analyzed the genetic diversity of MTB isolates obtained from clinical samples encompassing 15 regions of Chile. To the best of our knowledge, this is the first study of its kind aimed at investigating the genetic diversity of MTB at the level of the entire country. Our results indicated that the LAM (40.61%), T (34.06%) and Haarlem (13.54%) genotypes were the most abundant spoligotypes of all of M. tuberculosis samples collected and analyzed. Together, these three lineage families accounted for 88.21% of all genotypes while all other families, namely Beijing, X, AFRI and Cameroon, accounted for only 4.81%. Interestingly, it was noted that the X lineage is poorly represented in the Santiago population (1.66%) whereas in Iquique and Concepción it accounts for nearly 11.76% each. This difference can potentially be explained as a result of: (i) immigration patterns, which have greatly increased in recent years from countries like Peru, Colombia or Ecuador, or (ii) a genetic predisposition that makes the population of these cities more susceptible to certain genotypes. This is consistent with data available from most South American countries which claim that the most represented lineages are LAM, T and Haarlem in proportions that are similar to those reported by us; in some cases the proportions may be subject to change but invariably it is these three families that dominate epidemiologically. In Fig 1A, a representation of data available from three of our neighboring countries (Peru, Brazil and Paraguay) and three of the major Chilean cities included in this study are plotted. Several published reports on population Spoligoforest tree based on all spoligotypes (n = 458 isolates). Spoligoforest was drawn using the Fruchterman-Reingold algorithm from the SpolTools software (http://www.emi.unsw.edu.au/spolTools) [29], and reshaped and colored using the GraphViz software (http://www.graphviz.org) [30]. Each spoligotype pattern from the study is represented by a node with area size being proportional to the total number of isolates with that specific pattern. Changes (loss of spacers) are represented by directed edges between nodes, with the arrowheads pointing to descendant spoligotypes. The heuristic used selects a single inbound edge with a maximum weight using a Zipf model. Solid black lines link patterns that are very similar, i.e., loss of one spacer only (maximum weigh being 1.0), while dashed lines represent links of weight comprised between 0.5 and 1, and dotted lines a weight less than 0.5. Spoligotyping Analysis of Mycobacterium tuberculosis in Chile structure of MTB in South America have emphatically proven that each family is adequately represented from an epidemiological perspective [14][15][16][17][18][19][20][21][22]. In Brazil, Vasconcellos and collaborators demonstrated via spoligotyping that LAM is represented in 43.6%, T in 34.9% and Haarlem 18.3% of case studies. Similar results were also found in a study where isolates from patients belonging to 11 states of Brazil were analyzed. The mentioned study demonstrated that the predominant MTB lineage was LAM (46%) followed by T (18.6%), Haarlem (12.2%), X (4.7%), S (1.9%) and, lastly, the East African Indian (EAI) (0.85%) families [14]. However, in Peru and Bolivia it is reported that the most represented family is Haarlem, followed by the LAM and T families [20,33,34]. For Argentina there is comparatively less information in literature but available data clearly show that Haarlem is predominant except for MDR cases where the LAM family dominates [35,36]. In a recent work that was conducted in the border areas of Brazil, Argentina and Paraguay, a higher presence of the LAM family was reported [37]. All of the above mentioned variance can be potentially accounted for by: i) number of isolates analyzed, ii) region of sampling (restricted to a small region or spanning all country) iii) year of sampling, etc.
Fifteen SITs were noted to be the most represented (n = 5 or more isolates) in this study ( Table 2). SIT53/T1 was the most prevalent suggesting that this particular lineage is actively circulating in Chile. This finding is corroborated by findings regarding the worldwide scene which also states that this particular SIT is widely distributed ( Table 2) [22]. SIT33/LAM3 (9.61%) and SIT42/LAM9 (9.39%) are present in Brazil in proportions similar to reported by us but in Peru and Paraguay the presence of these two SITs is only about 3% ( Table 3). Fig 1B  represents the main SITs obtained in this study (labeled circle) as well as the genetic evolution based on distance and allele/spacer change between patterns. In case of the MST, it is possible to ascertain major evolutionary pathways followed by the MTB lineages by observing the similarity between each strain. Further analysis leads us to observe that the Beijing family group is very far from the Euro American lineage present in the central nodes. We also concluded that the strains belonging to the LAM lineage have a high genetic diversity and that the T and Haarlem lineages are genetically closer.
Three additional published studies have analyzed the genetic diversity of MTB in Chile [25][26][27]. Although all three studies were limited by the fact that they were restricted to a single region of the country; they unanimously concluded that the LAM family was the most predominant lineage in Chile. A recently published work [27] showed that almost two-thirds of the circulating strains in Santiago city could be accounted for by the prevalence of LAM and T genotypes. The most frequent SITs that were seen to cluster spoligotypes were SIT33/LAM3 (10.7%), SIT53/T1 (8.7%), SIT50/H3 (7.8%) and SIT37/T3 (6.8%). The distribution pattern in the above mentioned study goes in parallel with data obtained in our study. Last but not least, our study further corroborated the high presence of SIT37/T3 (n = 23) in the Chilean population, a pattern that was usually found in Eastern Africa in the international database.
Currently, the National Control Program of Human Tuberculosis in Chile [13] has managed to position our country among nations that have managed to control this disease. The incidence of TB in Chile, when compared to other South American countries, is low at only 12.5 per 100,000. During the 2009-2013 period, the incidence value for Brazil was of 46, 44 in Paraguay and 124 in Peru (per 100,000). The situation in Peru, a neighboring country with which we share the border, attracts our attention because the differences in incidence rates are so overwhelming. This could be a possible effect of immigration and public health policies. As with other airborne diseases, the spread of MTB is facilitated by high population densities and crowded indoor environments that optimize the transmission of the pathogen. Other hostrelated factors that are implicated in increasing the chances of infection are: (i) immune suppression, (ii) smoking, (iii) poor nutrition, (iv) diabetes, and (v) respiratory comorbidities. All these factors can also result in an increase in the risk of transitioning from latent to active MTB [38]. This is a very important issue in the developing world where overcrowding, malnutrition and HIV infection contribute to a high burden of the disease.
Drug resistant tests conducted as part of our study revealed that 87.7% (n = 402) of the isolates were pan-susceptible, 27.8% (n = 44) were resistant to any drugs, and 1.31% (n = 6) were MDR; for 1.31% (n = 6 isolates) the data could not be obtained. Three of the six strains that presented with multi drug resistance belonged to SIT53/T1. Conversely, LAM (which is the most prevalent lineage in our study), was apparently overrepresented among MDR cases in French departments of the Americas-based on data collection over an extended period of time [39], as well as in Peru [40]. Lastly, half of the SIT4015 isolates (of a lineage with unknown signature) showed drug resistance. This pattern should be carefully surveyed in coming years to see if its emergence is not linked to specific host factors since different ethnic groups may have varying susceptibilities to various MTB strains [41]. Indeed, both SIT4014/X1 (n = 7 isolates) and SIT4015 (Unknown signature; n = 7 isolates) have only been found in Chile so far. Considering that most of the samples belonging to these SITs were collected in the city of Concepcion and its vicinity, one should take into account associated factors such as susceptibility genetics of the host when studying emerging MTB clones. A similar case can be made for SIT720/ LAM3, which is so far restricted to South America. In summary, our results underline the importance of tracking circulating and emerging MTB clones in conjunction with development of phylogenetical databases in order to maintain proper vigilance. These measures are a powerful way to safeguard against possible future outbreaks.
Although Spoligotyping can contribute to identifying outbreaks and tracking the spread of disease, it has been known to overestimate clustering of isolates. Therefore, it is our belief that a more polymorphic technique such as MIRU-VNTR technique would be able to impart greater in-depth information that will help improve our knowledge of current epidemiological data in Chile.
To conclude, in this study, spanning 15 regions of Chile, we have shown that the LAM, T, and Haarlem lineages together represent 88.2% of the isolates. This corroborates that the data obtained from the neighboring countries also show that the Euro-American lineages have a clear epidemiological superiority. The exclusive emergence of SIT4014 and SIT4015 indicates that conditions specific to Chile, along with the unique genetic makeup of the Chilean population, might have allowed for a possible co-evolution of certain genotypes leading to their relative success to cause disease.