Mapping of Mycobacterium tuberculosis Complex Genetic Diversity Profiles in Tanzania and Other African Countries

The aim of this study was to assess and characterize Mycobacterium tuberculosis complex (MTBC) genotypic diversity in Tanzania, as well as in neighbouring East and other several African countries. We used spoligotyping to identify a total of 293 M. tuberculosis clinical isolates (one isolate per patient) collected in the Bunda, Dar es Salaam, Ngorongoro and Serengeti areas in Tanzania. The results were compared with results in the SITVIT2 international database of the Pasteur Institute of Guadeloupe. Genotyping and phylogeographical analyses highlighted the predominance of the CAS, T, EAI, and LAM MTBC lineages in Tanzania. The three most frequent Spoligotype International Types (SITs) were: SIT21/CAS1-Kili (n = 76; 25.94%), SIT59/LAM11-ZWE (n = 22; 7.51%), and SIT126/EAI5 tentatively reclassified as EAI3-TZA (n = 18; 6.14%). Furthermore, three SITs were newly created in this study (SIT4056/EAI5 n = 2, SIT4057/T1 n = 1, and SIT4058/EAI5 n = 1). We noted that the East-African-Indian (EAI) lineage was more predominant in Bunda, the Manu lineage was more common among strains isolated in Ngorongoro, and the Central-Asian (CAS) lineage was more predominant in Dar es Salaam (p-value<0.0001). No statistically significant differences were noted when comparing HIV status of patients vs. major lineages (p-value = 0.103). However, when grouping lineages as Principal Genetic Groups (PGG), we noticed that PGG2/3 group (Haarlem, LAM, S, T, and X) was more associated with HIV-positive patients as compared to PGG1 group (Beijing, CAS, EAI, and Manu) (p-value = 0.03). This study provided mapping of MTBC genetic diversity in Tanzania (containing information on isolates from different cities) and neighbouring East African and other several African countries highlighting differences as regards to MTBC genotypic distribution between Tanzania and other African countries. This work also allowed underlining of spoligotyping patterns tentatively grouped within the newly designated EAI3-TZA lineage (remarkable by absence of spacers 2 and 3, and represented by SIT126) which seems to be specific to Tanzania. However, further genotyping information would be needed to confirm this specificity.


Introduction
Human tuberculosis remains a global leading devastating, often severe and contagious chronic respiratory disease. People with immune-suppression are more susceptible to tuberculosis (TB). With the Global Tuberculosis Report indicating 13% of new TB cases being HIV-positive worldwide and up to 78% of TB cases among people living with HIV worldwide found in African region [1], the disease impact becomes further magnified especially in developing countries. In Tanzania various studies have been conducted using both conventional and nonconventional means to characterize the disease and its spread. Some of these studies focus on epidemiology [2][3][4][5], diagnostics [6][7][8][9], treatment and control [2,[10][11][12], strain molecular characterization [13][14][15][16], TB-HIV co-infections [17][18][19][20], challenges and resource limitation [21], and more recently, TB cross-species transmission at the human-animal interface [22][23][24]. In the recent study by our group in Serengeti ecosystem [15], a strain unassigned to neither of known spoligotypes, but resembling the CAS strain family was identified and tentatively named 'Serengeti strain'. Although various TB studies have been conducted in Tanzania so far, none of them has given detailed information on all the major phylogenetic lineages of tubercle bacilli and their distribution. In addition, none of the studies compared Tanzanian strain patterns with those prevailing in neighbouring countries and sub-regions to underline differences relating to the presence of specific lineages. All this information is important in establishing phylogenetical relatedness of the strains causing disease within and between countries as well as establishing transmission links between individuals. This study aimed to specifically describe the genotypic diversity of Mycobacterium tuberculosis complex (MTBC) in Tanzania, as well as neighbouring and other countries in Africa (having spoligotyping data deposited in SITVIT database), with reference to their global distribution. The study also assessed the association between HIV serological status and M. tuberculosis lineages for people who apart from TB had HIV. To better highlight region-specificity or similarity and get a larger view on MTBC distribution in Tanzania, Mycobacterium tuberculosis complex (MTBC) genotypic diversity was characterized and compared to diverse genotypes available in other African countries. Basing on genotyping information, this paper also tentatively proposes a sublineage which seems to be phylogeographically specific for Tanzania.

Study areas
The Tanzanian studies were conducted in two geographically different areas. First is the Serengeti ecosystem comprised of Bunda, Serengeti and Ngorongoro districts in northern Tanzania [15]. The second is the Dar es Salaam region, along the coast in the eastern zone (newly reported). Dar es Salaam is located at 6°48' South, 39°17' East (−6.8000, 39.2833), along the natural harbour on the eastern coast of Africa, with sandy beaches in some areas (Population Distribution by Administrative Units, United Republic of Tanzania, 2013). Population density in Dar es Salaam is 3,133 persons per square kilometer. The increased birth and immigration rates as well as transient populations largely influence the relatively high population rate. The population densities and annual growth rates (in brackets) for Bunda, Serengeti and Ngorongoro in persons per square kilometers, are 70 (1.8%), 22.4 (3.5%) and 11.2 (3.0%), respectively.

Neighbouring countries
The main neighboring countries of Tanzania are Kenya and Uganda in the north; Rwanda, Burundi, and the Democratic Republic of the Congo in the west; and Zambia, Malawi, and Mozambique in the south (http://en.wikipedia.org/wiki/Tanzania). The description of M. tuberculosis strain profile used the findings from the current study in Tanzania by Mbugi et al [15] and already published data of existing patterns from other countries in the international genotyping database SITVIT2.

Preliminary Laboratory sample processing and Ethics Approval
All preliminary samples processing post field collection for Tanzanian  . Participants consented to enrol in the study after completing informed consent forms. When patients were found to have tuberculosis, treatment was offered as per the Tanzanian National Guidelines for management of tuberculosis. Information on other infections than tuberculosis was also collected from patients and of particular interest, was HIV serology.

Spoligotypes and new names assignment
Spoligotyping was performed using a commercially available spoligotyping kit (Isogen, Bioscience BV, Maarssen, The Netherlands) as previously described by Kamerbeek et al [25]. The resulting spoligotypes were reported in octal and binary formats ( Table 1) and compared to existing patterns in an international genotyping database SITVIT2, which is an updated version of previously released SITVITWEB database [26] available at: http://www.pasteurguadeloupe.fr:8081/SITVIT_ONLINE/. Spoligotype patterns were grouped as spoligotype international types (SITs) if they shared identical spoligotype patterns with patterns present in the existing database. Spoligotypes that had no match with previously reported patterns, represented by unique exemplars, were considered as orphans. The database provided tools for queries and conversion into binary or octal values of spoligotypes, as well as geographical distribution maps for worldwide comparisons. Spoligotyping data obtained from Tanzania were compared with patterns from neighbouring countries and sub-regions in order to underline differences relating to the presence or absence of specific lineages. This includes the 'Serengeti strains' that were recently reported in Serengeti ecosystem [15].

Data analysis
Data were entered using Microsoft Excel and later transferred to relevant software for analysis. STATA version 12 and MegaStat softwares were used for this statistical analysis and a p-value <0.05 was considered significant. Standard deviation (SD) and means were calculated for age of patients. Odds ratio (OR) and 95% confidence interval (CI) were calculated for comparison between HIV information versus PGG1 and PGG2/3 lineage groups. The spoligotype patterns from Dar es Salaam and the Serengeti ecosystem study were pooled together and reanalysed for comparison with already known patterns from neighbouring countries (Kenya, Uganda, Zambia, Malawi, and Mozambique), other African countries (Sudan, Ethiopia, Nigeria, Cameroon, Namibia, South Africa, Zimbabwe, and Madagascar) and those available globally in SITVIT2 database. Phylogenetic analysis and mapping MLVA Compare V1.03 software (Genoscreen; Lille, France) was used to draw the minimum spanning trees (MSTs). The SIT or orphan spoligotype number appeared inside each node, and the distance (number of spacers of difference) between two nodes was shown on the edge linking these nodes. These phylogenetic trees were coloured in function of various characteristics such as the MTBC lineages described in SITVIT, the cities of isolation, and the HIV serological data of patients. The MST is a graph which is undirected and connected. The MST links all isolates together with the fewest possible linkages between nearest neighbours. Furthermore, a spoligoforest was drawn as a "hierarchical layout" using the SpolTools software (available through http://www.emi.unsw.edu.au/spolTools; [27,28]). As opposed to the MST, the spoligoforest is a directed and not necessarily connected phylogenetic tree illustrating the parent to descendant relationships between spoligotypes (considering the fact that spoligotypes rather evolve by loss of spacers). TBVis tool (available at http://tbinsight.cs.rpi.edu/; [29,30]) was used to visualize and map the spoligotypes shared between different lineages and split by city of isolation and the information of HIV serology of patients. Maps were reproduced and designed according to terms described in the Creative Commons 3.0 Attribution License (http://creativecommons.org/licenses/by/3.0/).  Table 2). The male/female sex ratio was 156/137 or 1.14. HIV serology distribution was not significantly different between female and male patients (p-value = 0.931); neither between the various cities of isolation (p-value = 0.921).

Distribution of Spoligotype International Types (SITs) in this study
A total of 54/57 SITs containing 264 isolates matched a pre-existing shared-type in the SIT-VIT2 database, whereas 3/57 SITs (n = 4 isolates) were newly created. A total of 27/57 SITs containing 238 isolates were clustered within this study (2 to 76 isolates per cluster) while 30/ 57 SITs containing 30 isolates were unique (in addition 25 orphan strains were identified, which brings the number of unclustered isolates in this study to 55/293 or 18.8%, and clustered isolates to 238/293 or 81.2%). The distribution of Spoligotype International Types (SITs) in this study is shown in Table 3.

Phylogenetic lineages vs. demographic characteristics
The distribution of phylogenetic lineages by gender showed no significant difference ( Table 5). Strains belonging to CAS lineage were more visible with highest proportion in Dar es Salaam followed by Ngorongoro with Bunda and Serengeti having nearly equal proportions. The proportion of strains belonging to the ill-defined T lineage was second highest, Bunda having the highest proportion, followed by Serengeti, Ngorongoro and Dar es Salaam. The EAI lineage was another lineage found in relatively high proportion predominantly in Bunda, Dar es Salaam and Ngorongoro, Serengeti having the least proportion. The LAM lineage strains were found in high proportions particularly in Ngorongoro and Bunda. For patients who had HIV in association with TB, the predominant strains seemed to be belonging to LAM and T lineages. Other strains were found in small proportions within specific demographic groups ( Table 5).

Distribution map of Mycobacterium tuberculosis lineages in African countries
Comparison of phylogenetic lineages between our study and those isolated in neighboring countries and other countries in Africa (representing a total of 9922 isolates) revealed the predominance of the CAS family especially in the north (Sudan) and Eastern region covering Ethiopia, Kenya, Tanzania and the Madagascar (Fig 1). Small proportions of CAS lineage were also recovered in Uganda, Zambia, Malawi and Mozambique. The EAI lineage was found to be more predominant in Mozambique, Malawi, Sudan, Tanzania and Madagascar with traces of this strain lineage found in Kenya, Uganda, Zambia, Zimbabwe and Ethiopia. The predominance of LAM lineage was notable in Namibia, Zimbabwe, Zambia, Malawi, Mozambique, South Africa, Kenya, Tanzania and Madagascar. Traces of this lineage were found sporadically in Sudan, Ethiopia, Cameroon, Nigeria and Uganda. The T family spread was notable covering all areas where data on TB in Africa were available; however, the smallest proportion of this lineage was noted in Sudan (Fig 1). The Beijing strain spreads out in the Eastern part of Africa (particularly Kenya, Tanzania, and Mozambique) all the way to South Africa including ). Note that SITs followed by an asterisk indicates "newly created" SITs due to 2 or more strains belonging to an identical new pattern within this study or after a match with an orphan in the database; SIT designations followed by number of strains: 4056* this study n = 2; 4057* this study n = 1, SDN n = 1; 4058* this study n = 1, NLD n = 1. ** Lineage designations according to SITVIT2; "Unknown" designates patterns with signatures that do not belong to any of the major lineages described in the database. *** Clustered strains correspond to a similar spoligotype pattern shared by 2 or more strains "within this study"; as opposed to unique strains harboring a spoligotype pattern that does not match with another strain from this study. Unique strains matching a preexisting pattern in the SITVIT2 database are classified as SITs, whereas in case of no match, they are designated as "orphan".     Tanzania) and SIT26/CAS1-Delhi. Among EAI lineage, we can note the high prevalence of SIT126/EAI5 (tentatively relabeled EAI3-TZA) and SIT8/EAI5. SIT53/T1 was the better representative of the T lineage, and SIT59/LAM11-ZWE was predominant among LAM lineage. One may notice that the phylogenetic lineages were rather well organized in the MST (Fig 2A). Furthermore, the spoligoforest provided in S1 Fig allows a better visualization of main SITs and their tentative parent descendant relationships. As is for Fig 2A, Fig 2B also illustrated relationships among spoligotypes, but in function of the cities of isolation. This figure (Fig 2B) showed the distribution of spoligotypes as compared to the various cities of isolation present in this study (Bunda, Dar es Salaam, Ngorongoro, and Serengeti). Briefly, SIT21/CAS1-Kili was more common in Dar es Salaam followed by Ngorongoro; SIT26/CAS1-Delhi in Serengeti, SIT53/T1 in Bunda, SIT8/EAI5 in Dar es Salaam, SIT126/EAI3-TZA in Bunda, and SIT59/ LAM11-ZWE both in Ngorongoro and Bunda (Fig 2B).  Or19. A TBVis graph representation is provided in S1 Fig, to illustrate the distribution of spoligotyping patterns and lineages as compared to city of isolation and HIV serological information of patients.

Spoligotype geographic distribution and migration routes
The analysis of spoligotype patterns from Tanzania based on the absence of specific spacers is shown in Fig 3A. Absence of spacers 2 and 3 indicated an EAI sublineage that seemed to be specific to Tanzania (pending further investigations). In total, there were 18 isolates belonging to SIT126 and 12 isolates belonging to SIT8. The peculiarity of SIT126 led to tentative re-labeling for the lineage as "EAI3-TZA" to indicate its specificity to Tanzania. This strain seemed to dominate in areas south of Asia particularly the Indian sub-continent (India and Sri-Lanka), Middle East (Oman) and Tanzania. The SIT8 lineage seems to be prevalent in Northern Europe (particularly in Denmark and Norway), Middle East (particularly in Oman), Eastern and Central Africa (particularly in Mozambique and Zambia) as well as southern part of North America particularly in the Mexican area. The geographic distribution maps/Intensity maps of SIT126 and SIT8 (by percentage in country, Fig 3B) has been established according to data recorded in the SITVIT2 database.

General spoligotyping results
The present study reports for the first time, the results of mapping of MTBC in Tanzania and other African countries. The study combined spoligotyping data from Dar es Salaam city along the coast and the Serengeti ecosystem that comprises of Bunda, Serengeti and Ngorongoro with heterogeneous population variably involved in different activities as source of their income. Dar es Salaam and Bunda are the busiest cities due to their location that connects several other parts of Tanzania. The distribution of lineages/sublineages of isolates from Tanzania is shown in Table 1. Generally, our findings reveal similar spoligotyping results to those previously reported [13,14,16] with the exception that, a new EAI3 strain that seems to be specific to Tanzania was identified. We have tentatively re-labeled this sublineage as "EAI3-TZA" with reference to an international genotyping database SITVIT2, which is an updated version of previously released SITVITWEB database [26].

TB and HIV Status
HIV infection has long been associated with tuberculosis, and this co-infection exacerbates the outcome of co-diseases. As regards TB patients whose HIV serology was known (214 strains), the findings that HIV-positive accounted for 21 (9.8%) cases calls for a joint consideration in designing control strategies. Although there were no statistically significant differences when comparing individual lineages versus HIV serology of the patients (p = 0.103), the proportion of HIV-positive patients isolates associated with LAM lineage strains (n = 6/38 or 15.79%) The phylogenetic tree connects each genotype based on degree of changes required to go from one allele to another. The structure of the tree is represented by branches (continuous vs. dashed and dotted lines) and circles representing each individual pattern. Note that the length of the branches represents the distance between patterns while the complexity of the lines (continuous, gray dashed and gray dotted) denotes the number of allele/spacer changes between two patterns: solid lines, 1 or 2 or more changes (thicker ones indicate a single change, while the thinner one indicate 2 changes); gray dashed lines represent 3 changes; and gray dotted lines represent 4 or more changes. The size of the circle is proportional to the total number of isolates in our study, illustrating unique isolates (smaller nodes) versus clustered isolates (bigger nodes). The color of the circles indicates the phylogenetic lineage to which the specific pattern belongs. doi:10.1371/journal.pone.0154571.g002 provides clues of HIV-TB strain specificity with patients. Some reports have in principle indicated similar associations [31] while others not [14]. With the reported multi-drug resistance in some areas by the LAM lineage [32], the finding is a warning for planning TB control strategy in HIV co-infection. It is important to note that significant number of T lineage (n = 7/57 or 12.28%) was also associated with HIV-positive patients. Despite variable reports on association between TB lineages and HIV [33][34][35][36][37], it seems most likely that there exists specific strain association with HIV. This association might be dominant strain dependent and probably region-specific. These all might be determined by transmission chains of infection. Furthermore, when comparing modern Euro-American PGG2/3 strains and ancestral PGG1 strains versus HIV serology, we found a significant difference. PGG2/3 strains were significantly more associated with HIV-positive patients as compared to PGG1 strains (p-value = 0.03; OR = 2.88, 95%CI [1.003-9.454]). The significant association found between PGG2/3 (modern) lineages group and HIV-positive patients is an observation that should provide some clues regarding treatment of co-infected TB/HIV cases that is mostly anticipated scenario presently. It could also mean that patients infected by evolutionary recent M. tuberculosis strains have a greater chance of reactivation from latent infection to disease than those infected with ancestral strains such as M. africanum. The descriptive statistics on population characteristics reflected HIV serology distribution to not be significantly different between female and male patients (p-value = 0.931). The distribution was too, neither significant between the various cities of isolation (p-value = 0.921). The findings could mean that exposure to infection is more relevant and critical in establishing causal-impact relationship than disease-population characteristics interactions. Previous reports [1] have indicated the age range of 15-40 years, which is the most active and productive age, to be at a higher risk of TB infection than the rest. It is worth noting TB-HIV co-infection because either of the diseases exacerbate the impact or rather progression of the other. For example, it is said that the risk of progressing from latent to active TB in people living with HIV is 12 and 20 times greater than in those without HIV infection (www.who.int/tb/ publications/global_report/). Despite the reported challenges in integrating TB and HIV control services (TB Facts.org, http://www.tbfacts.org/tb-hiv.html), it is ideal to have a conscious disease control approach whose focus is on TB-HIV co-infection in endemic areas. This is particularly critical when HIV infection impacts on the epidemiology of drug resistant TB. The idea is to reduce a high risk of development to a situation where MDR TB/HIV could become a co-epidemic [38] and worsening with drug resistant HIV infection. A combined TB/HIV vaccine that could include a BCG vaccine carrying combinations of both mycobacterial and HIV antigens [39,40] has been suggested, however integrating the knowledge in the mechanistic interactive immune response in tuberculosis-HIV co-infection remains a challenge [40]. Yet, the current BCG vaccine does not confer effective and reliable protection against the prevalent pulmonary TB form in adults in countries near the equator.

Distribution patterns of phylogenetical lineages
Interestingly, this study revealed a low prevalence (around 3%) of isolates belonging to the Manu family. Furthermore, a visible cleavage was noted for the lineage distribution in the studied cities (Fig 1). Bunda is a city located in the north-western part of the country near the Lake Victoria, as opposed to Dar es Salaam which is located in the eastern part of Tanzania. We noted that EAI lineage was more predominant in Bunda, Manu lineage was more associated to strains isolated in Ngorongoro, and CAS lineage was more predominant in Dar es Salaam (p-value<0.001); this observation may underline an agglomeration of EAI surrounding the Lake Victoria, but further investigations will be needed to clarify this fact as reflected by phylogeographical data.
The EAI strain is a highly polymorphic lineage [41] with considerably low transmission [42] that seem to emerge over 13,000 years ago (most recent common ancestor) having a markedly different genetic structure as compared to other human tuberculosis strains [43]. The strain belongs to the lineage 1 of the Indo-Oceanic family of TB strain [44] that include EAI-5, EAI1-SOM, EAI2-Manila, EAI2-Nonthaburi, EAI3-IND, EAI4-VNM, EAI6-BGD1, EAI7-BGD2, EAI8-MDG [45] in ancient classification [46] which is evolutionary primitive [47,48]. Supposedly and proposed to have spread to East Africa from Asian continent (Southern India), predominance of this strain around the Lake zone might be due to its closeness to the Serengeti National Park. The CAS strain predominance in Dar es Salaam could be due to the city being very close to the Indian Ocean that links our country with The Asian continent. The CAS strain is said to emerge over 9,000 years ago [43] spreading globally. Using available genetic tools, spoligotyping and MIRU-VNTR, classification, surveillance on TB transmission trends and epidemiology have been made possible [30]. The CAS strain for example has been classified as belonging to the East-African Indian TB strain, lineage 3 according to other researchers [44]. In the SITVIT database, this lineage (labeled CAS) includes the CAS1-Delhi, CAS1-Kili, CAS2 and CAS (or CAS-like) [45]. Originally, while the EAI lineage is more ancestral, predominating in the southern part of India, the CAS lineage is predominant in the north of India and more modern [49]. The Manu lineage is a newly-described ancient clade, closely related to an Indo-Oceanic lineage 1 of an ancestral EAI strain of TB [44]. The presence of the Manu lineage in Ngorongoro could be a new introduction or a descendant of the EAI strain. The two lineages are closely related and have been proposed to share a common ancestor [49] or to represent an intermediate lineage between ancestral and evolutionary modern M. tuberculosis [50]. The presence of Manu lineage strains in the area could also be a warning sign as it has been reported to be associated with HIV infection [33].
Tentative designation of EAI3-TZA as a new Tanzania-specific EAI sublineage East Africa has been prone to nearly all TB strains from all over the world. Similar to other sub-Saharan Africa where HIV/AIDS is endemic, TB caseload increase 5x or more a decade in eastern and southern African countries [51]. This is partly attributed to the historical movements to and from Africa since colonial times particularly because movements were not strictly restricted. Studies done in different places in Asian continent have shown presence of confined local [52,53] and predominant ancestral TB strains [54] whose worldwide spread might indicate their origins. It might signify that despite ancestral origins of TB strains there are microevolutions that are taking place within the MTBC lineages that give rise to new locally evolving strains that share specific signatures [53]. Evolution of M. tuberculosis strains analyzed using spoligotyping is generally through loss of some spacers. We understand that evolution is part of adaptation but it is still not known whether this evolution contributes to the 'host-pathogen compatibility' that was previously reported by Gagneux et al [44]. Despite the limited diversity in M. tuberculosis compared to other infectious bacteria [55], evolution in M. tuberculosis complex strains may result into new largely strain and lineage-specific characteristics that can influence the host interaction and pathogen virulence, consequently, determining the pathological potential and sequelae of infection [56]. Furthermore, a great proportion of SIT126/EAI5 was collected for the first time. The particular spoligotyping signature of this profile (absence of spacers 2 and 3) could indicate that it is endemic to East Africa (and particularly Tanzania). This SIT126 could also be a precursor of EAI3-IND and EAI8-MDG sublineages. We could hereby suggest a new EAI sublineage, tentatively designated as EAI3-TZA, which would be specific to Tanzania, and being mostly represented by SIT126 and its descendant SITs (Fig 3). This lineage which was initially proposed to be 'Serengeti strains' [15], seems to be not only spread in Serengeti ecosystem but also in other areas of Tanzania and possibly nearby countries. The lineage appeared to have spread in South India, Sri Lanka, and other Southeast Asian countries like Malaysia. This spread could be explained by ancestral Afro-Asian trade networks that occurred a long time ago (http:// castinet.castilleja.org/users/pmckee/africaweb/swahilistates.html).
This specific family may have spread to other countries, bordering the Indian Ocean, due to historical trade reason or common history, like Oman which has historical links with Tanzania via Zanzibar (http://fr.wikipedia.org/wiki/Oman). However, the relevant proportion of SIT8 detected in Mexico could raise other questions. Several potential migration routes such as the route from East Africa, Middle East, South Asia, East Asia, North Asia, to the Americas by the Bering Strait, could explain the presence of SIT8 in Mexico. Nonetheless, more investigations are obviously needed to assert this hypothesis. Moreover, the large absence of spacers 2-13 in SIT8 does not allow us to draw significant conclusions on this profile. There might also be a parallel evolution that is taking place within M. tuberculosis complex providing another arm of consideration for the varying TB strain profiles with regions.

Varying distribution patterns of Mycobacterium tuberculosis lineages in African countries
Variation in predominance of phylogenetic lineages in neighboring and other countries in Africa (Fig 1) largely reflect different ancestral origins of TB strains in the continent. It may also indicate diversity in global TB strains [57]. We noticed that the CAS family was predominant in the north (Sudan) and Eastern region covering Ethiopia, Kenya, and Tanzania, and to a lesser proportion in Madagascar. This lineage is said to descent from Central and Middle Eastern Asia consisting of relatively newly defined sub-lineages CAS1 and CAS2 [58]. Similarly, the EAI lineage was mostly present in Mozambique, Malawi, Sudan, Tanzania and Madagascar, and to a lesser extent in Kenya, Uganda, Zambia, Zimbabwe and Ethiopia. These strains are originally Asian particularly, Central Asia [49] and the region covered nears the Eastern route to and from Asia ( Fig 3B). On the other hand, the predominance of LAM lineage was notable in Namibia, Zimbabwe, Zambia, Malawi, Mozambique, South Africa, Kenya, Tanzania and Madagascar. This strain seemed to largely predominate in the south and western part of Africa (Fig 1) that nears the South American-African migratory routes (Fig 3B). This might indicate that these strains originally had focal points of landing in Africa and later spreading within Africa due to movements within the continent. We can suggest that while strains like LAM are spreading from the South towards north along the eastern region, strains such as CAS, EAI, and Beijing counter spread (Fig 1). The T family spread is tremendously notable in most of the areas where TB data in Africa are available. This strain might largely signal evolutional changes that are occurring in the MTBC [53].
The varying predominance of other minor TB strains in Africa could be a result of intermitted migration in and out of Africa due to continued previous and recent movements. For example, the Manu family, which was also detected and reported in our study, could be originating from the north as it seems to prevail in Sudan and Uganda. The dynamics of lineage composition may depend on different selection pressures [59]. The X and S family seemed to be in limited distribution across African countries. As it was noted in Fig 1, unknown TB strains were detected in the majority of studied countries. These strains might be newly introduced or evolving strains. The other strains, CAM, TUR, AFRI and BOV which were reported in other African countries, were not detected in our Tanzanian study further signifying the localization and originality of various TB strains. It could be possible that, evolutionally, these various pathogens have been adapted to specific human populations [60]. The Beijing strain for instance is said to be very successful, virulent and potentially posing high risk for development of anti-TB drug resistance and multidrug resistance. This strain has an Asian ancestry with a geographical origin centered in northern China, Korea and Japan spreading out in waves [61] to other areas around the world. The worldwide spread of the Beijing lineage, raise concern on spread of drug resistance due to reported association with drug as well as multi-drug resistance [62][63][64]. With evolution, new strains can be expected. The newly emerging strains are usually more favoured than their ancestors making evolution a useful adaptation tool for the pathogen. New strains of EAI lineage have been reported elsewhere [65][66][67] indicating a continuum of evolution within this lineage. Similarly, in this study, we found a new EAI strain that does not look like other reported strains and it was tentatively renamed EAI-TZA.
In conclusion, this study provided mapping of MTBC genetic diversity in Tanzania (containing information on isolates from different cities) and several neighbouring African countries allowing us to underline potentially new spoligotyping patterns. However further genotyping information, such as MIRU-VNTRs typing, would be needed to pinpoint epidemiologically important clusters, which is the next important step in devising appropriate TB control strategy in Tanzania.
Supporting Information S1 Fig. Spoligoforest tree drawn using the SpolTools software (available through http:// www.emi.unsw.edu.au/spolTools; Reyes et al. [27], Tang et al. [26]), and shown as a Hierarchical Layout. The Figure was drawn on all patterns including orphan patterns (n = 293). Each spoligotype pattern from the study is represented by a node with area size being proportional to the total number of isolates with that specific pattern. Changes (loss of spacers) are represented by directed edges between nodes, with the arrowheads pointing to descendant spoligotypes. In this representation, the heuristic used selects a single inbound edge with a maximum weight using a Zipf model. Solid black lines link patterns that are very similar, i.e., loss of one spacer only (maximum weight being 1.0), while dashed lines represent links of weight comprised between 0.5 and 1, and dotted lines a weight less than 0.5. (PDF) S1 Table. Detailed demographic, epidemiologic and genotyping information on Tanzanian M. tuberculosis isolates. Note that all strains were pansusceptible, and were isolated from newly diagnosed, sputum smear/culture positive pulmonary TB patients. NEW SITs are followed by an asterisk ( Ã ) and highlighted in yellow. Orphan spoligotypes are highlighted in blue. (PDF) UK and staff are acknowledged for collaboration. We thank the participants' authorities in Tanzania for allowing us to conduct our study in Dar es Salaam and The Srengeti ecosystem. The WHO Supranational TB Reference Laboratory,Tuberculosis & Mycobacteria Unit, Institut Pasteur de la Guadeloupe is acknowledged for stimulation of ideas to mine up data from TB database for evaluation of TB strains in Tanzania as related to other African Countries.