A Mycobacterial Perspective on Tuberculosis in West Africa: Significant Geographical Variation of M. africanum and Other M. tuberculosis Complex Lineages

Background Phylogenetically distinct Mycobacterium tuberculosis lineages differ in their phenotypes and pathogenicity. Consequently, understanding mycobacterial population structures phylogeographically is essential for design, interpretation and generalizability of clinical trials. Comprehensive efforts are lacking to date to establish the West African mycobacterial population structure on a sub-continental scale, which has diagnostic implications and can inform the design of clinical TB trials. Methodology/Principal Findings We collated novel and published genotyping (spoligotyping) data and classified spoligotypes into mycobacterial lineages/families using TBLineage and Spotclust, followed by phylogeographic analyses using statistics (logistic regression) and lineage axis plot analysis in GenGIS, in which a phylogenetic tree constructed in MIRU-VNTRplus was analysed. Combining spoligotyping data from 16 previously published studies with novel data from The Gambia, we obtained a total of 3580 isolates from 12 countries and identified 6 lineages comprising 32 families. By using stringent analytical tools we demonstrate for the first time a significant phylogeographic separation between western and eastern West Africa not only of the two M. africanum (West Africa 1 and 2) but also of several major M. tuberculosis sensu stricto families, such as LAM10 and Haarlem 3. Moreover, in a longitudinal logistic regression analysis for grouped data we showed that M. africanum West Africa 2 remains a persistent health concern. Conclusions/Significance Because of the geographical divide of the mycobacterial populations in West Africa, individual research findings from one country cannot be generalized across the whole region. The unequal geographical family distribution should be considered in placement and design of future clinical trials in West Africa.


Introduction
West Africa consists of 15 countries with 245 million inhabitants (S1A Fig), 13 of which belong to the world's 42 countries with the lowest human development index [1]. Consequently, it faces great challenges in controlling infectious diseases, such as tuberculosis (TB). Clinical trials investigating the local health needs are much needed to understand and tackle the TB epidemic in West Africa.
The composition of the endemic mycobacterial population infecting human study subjects can have a major impact on TB clinical trial outcomes and should ideally be accounted for in the planning phase of any project [2]. Considering bacterial variation between study sites is also essential to estimate to what extent country-specific results can be generalised to the whole of West Africa.
The MTBc can be divided into six major lineages, comprised of the Indo-Oceanic (L1), East-Asian (L2), Central Asian (L3), Euro-American lineages (L4) and the two endemic African lineages M. africanum West Africa 1 (MAF1, L5) and M. africanum West Africa 2 (MAF2, L6) [3]. Although MAF1 seems to be disappearing in some countries, the longitudinal development of MAF2 is not known. Each of these phylogenetically distinct lineages can be further differentiated into mycobacterial families, such as, amongst others, the Latin-American-Mediterranean (LAM) or Haarlem families within the Euro-American lineage [3].
Interestingly and for reasons not understood, West Africa is the only region in the world in which all of the six major human lineages are present. This exceptional diversity necessitates future West African trials to be adjusted for this unique bacterial variability-even more than trials in other parts in the world. Therefore the scope of the present publication was to describe the geographical distribution and spatial variations of mycobacterial families across the region.

Search strategy and spoligotype analysis
We searched Pubmed using terms "spoligotype", "spoligotyping" with respective country names. Studies on pulmonary TB up to December 2014 were included, in which spoligotypes on all isolates were available. Individual spoligotypes designated as mixed infections were excluded. In case several publications analysed the same dataset, the most comprehensive collection was selected. M. bovis studies, conducted in high risk populations (abattoir staff) were excluded. To assign mycobacterial families to isolates, and to ensure comparability between different datasets, we re-analysed extracted spoligotype information using a standardized approach. Isolates were classified into families using the online platform "Spotclust" at the default settings. For M. africanum isolates, Spotclust identifies, but does not distinguish between MAF1 and 2. Therefore "TBLineage" was further applied to M. africanum isolates previously identified by Spotclust [4]. Both Spotclust and TB Lineage are mathematical algorithms that were shown to reliably identify mycobacterial lineages and families based on respective signature spoligotype patterns. A detailed description of the algorithms and their performance is described elsewhere [4,5]. The lineage/family distribution per country/study site was plotted as chloropleth maps generated using QGIS 2.0.1 (http://qgis.osgeo.org).

Statistical analysis of geographical genotype distribution
To investigate geographical differences in mycobacterial families across West Africa we split West Africa into a Western and an Eastern region. Western countries include Gambia, Guinea-Bissau, Guinea, Sierra Leone, Ivory Coast, Mali, Senegal, while Eastern countries include Benin, Burkina Faso, Ghana, Niger and Nigeria (S1 Fig). With region as response variable, the proportion of each family was tested univariately using logistic regression, with country fitted as a cluster to account for multiple studies per site.
Families found in one region and not in the other cannot be modelled mathematically because the maximum likelihood for these families does not exist. We defined families with complete separation between regions as 'perfect predictors'.
A two-sided p-value <0.05 was considered statistically significant and a two-sided p-value 0.05 & <0.10 was considered of borderline significance. No adjustment was made for multiple testing. All analyses were performed using Stata v12.1 (StataCorp. 2011. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP.).

GenGIS analysis
Phylogeographic analysis using linear axis analysis in GenGISvs2.2.2 was conducted [6]. The default GenGIS Africa map was used. A UPGMA phylogenetic tree was constructed from spoligotyping data (S2 Fig) using the publicly available MIRU-VNTRplus software [7] and uploaded into GenGIS allowing for the re-ordering of leaf nodes. A Linear axis plot (10.000 permutations) was run at significance level p = 0.001.

Longitudinal analysis of lineages in The Gambia
Gambian isolates, collected within a TB Case Contact cohort, in which all cases of the Greater Banjul area are recruited [8] were spoligotyped. Genotyping was approved by the Gambian Government/MRC joint ethics committee. Longitudinal lineage data was modelled using logistic regression for grouped data. The outcome was the number of a particular lineage out of the total number of samples taken in each year. Both lineage and year were fitted as explanatory variables and interactions between the two explored. The multicollinearity between the lineages was avoided by excluding one lineage and fitting the model on the remaining.

Results
The M. tuberculosis complex in West Africa Of 20 original research articles, four were excluded (based on above criteria), with the remaining 16 covering 12 of 15 West African countries. In total we collected, extracted and (re) analysed spoligotype information of 3580 isolates, belonging to six major human lineages, of which the Euro-American lineage (L4), together with M. africanum lineages (L5 and 6) were the main causes of pulmonary TB (Table 1). Thirty-two different mycobacterial families were identified, but 84% of all patients are infected by only eight major families (Fig 1).
Common to most of the countries is the "ill-defined" T1 family. We also confirmed the previously described geographical distribution of two M. africanum lineages [24]. While MAF1 (L5) has the highest presence in Nigeria/Benin, MAF2 (L6) is mainly found in Gambia/ Guinea-Bissau. Besides MAF2 as a major cause for TB, a variety of Euro-American families (Haarlem 1 and 3, LAM9, amongst others) are prevalent in western West Africa. This is in sharp contrast to eastern West Africa where, besides MAF1, the great majority of TB infections is attributable to only one other dominant family LAM10. A recently introduced family into West Africa is the Beijing family which lead to an outbreak in Cotonou, Benin [25]. The only other place with comparably high numbers of Beijing isolates is Dakar in Senegal. Both cities, Dakar and Cotonou have major international ports.

Geographical distribution of mycobacterial families in West Africa
To evaluate whether identified families are geographically equal, we divided West Africa into a Western and Eastern region (S1B Fig). Univariate logistic regression analysis showed that the proportion of mycobacterial families can serve as predictors for the two regions. 13 out of 32 families were associated with one of the two regions (see S1 Table). Amongst these were four of the eight major families: LAM10 (perfect predictor at proportion 0.12) and MAF1 (p = 0.08) as predictors for the East and Haarlem 3 (p = 0.07) and MAF2 (p = 0.09) for the West.
To verify the geographic separation of these four major families, which cause 51% of all TB, we carried out an independent phylogeographic analysis using GenGIS software (Fig 2). We constructed an UPGMA tree based on 279 unique Haarlem 3, MAF1/2 and LAM10 spoligotypes, which was superimposed onto geographic locations and mycobacterial family distributions of the study sites (Fig 2A). In case of geographical separation, one expects significantly less crossings between the phylogenetic tree and the spoligotype distribution in the study sites than by mere chance. A linear axis analysis (p<0.001, 10.000 permutations) identified several orientations of the tree's geographical axis that resulted in less than the 9759.5 crossings expected by chance. Fig 2B demonstrates that geographical separation occurs at various geographical axis angles, with the least crossings (9144) at 228.1° (Fig 2A). Although spoligotyping might have led to minor misclassifications of MAF1/MAF2 isolates in our phylogenetic analyses (S2 Fig), we expect such misclassification to result in an unbiased underestimation of the observed geographical separation.

Discussion
We confirmed that modern Euro-American strains are the predominant lineage followed by the two M. africanum lineages. Although the polyphyletic T1 family [26] is rather equally  distributed across the whole region, we find geographical variations of other families. While western West Africa shows a high genetic diversity from a multitude of mycobacterial families, the MTBc of Eastern West Africa is mainly composed of two dominant families (LAM10 and MAF1). Although other West and Central African countries observed a replacement of MAF1 and MAF2 with modern strains [10,11,14,27], our longitudinal analysis from The Gambia did not confirm these findings and MAF2 remains an important cause of TB in the country. The exact mechanism of how MAF2 can maintain a stable prevalence of 35% over the last decade within The Gambia (despite a slower progression to disease when compared to M. tuberculosis [28]) is not fully understood.
Besides the known geographical divide of the two M. africanum lineages, we find for the first time geographical separation of major Euro-American families in West Africa. Due to this spatial variation previous research findings observed in one West African country/region are hardly generalizable to the sub-region. In addition, the unequal distribution has important implications for design of future trials. For instance, western West African countries with their high genetic diversity are appropriate settings for research that aims to test whether novel diagnostics or vaccine candidates work equally well against different MTBc families. In contrast, research on host genetics, benefitting from low diversity, would yield more robust results when conducted in eastern West Africa with predominant LAM10 and MAF1 families. To investigate the spreading of novel TB families, one can follow up on the geographical expansion of LAM10 or on recently introduced Beijing strains into Benin or Senegal. As first studies confirmed that the "ill-defined"T1 is not a monophyletic clade [26], further research using more robust phylogenetic markers could focus on understanding the endemic MTBc composition T1-endemic countries.
The presented phylogeography also has limitations: first, we combined genotypic information, independent from respective collection strategies ranging from convenience to systematic sampling. Therefore data presented are a cross-sectional compilation of genotyping information between 1986-2012. Also, individual patient's treatment history, whether they presented as new or retreatment cases, was not systematically collected and has not been accounted for. In order to avoid over-interpretation of results, we agree that comparing differing sampling Tuberculosis in West Africa strategies is challenging, and we therefore limited our discussion to proportions of families with larger isolate numbers. Lastly, the families themselves consist of a multitude of strains characterised by specific spoligotypes (shared international types, SITs) and we did not study whether the local expansion of a family was driven by one or several individual proliferating SIT within the family.
Spoligotyping can be successfully used to assign the majority of mycobacterial isolates to one of the major mycobacterial lineages and their families [4]. We appreciate that classification of mycobacteria in West Africa would ideally be based on whole genome sequencing (WGS) data, however, limited bioinformatics capacity combined with financial and infrastructural constraints did not allow high-throughput sequencing in most resource-limited West African countries to date.
By summarizing available and novel data, we showed significant geographical variation of the MTBc, which will impact on the overall outcome of clinical trials in any specific region. With the generated data researchers can consider the demonstrated spatial variation in the planning stage of respective future clinical TB trials.

Author Contributions
Conceived and designed the experiments: FG SK ME EA. Performed the experiments: ME SK OS. Analyzed the data: FG LK BCdJ SK OS BOA. Contributed reagents/materials/analysis tools: BCdJ MA DB. Wrote the paper: FG LK.