A Molecular Epidemiological and Genetic Diversity Study of Tuberculosis in Ibadan, Nnewi and Abuja, Nigeria

Background Nigeria has the tenth highest burden of tuberculosis (TB) among the 22 TB high-burden countries in the world. This study describes the biodiversity and epidemiology of drug-susceptible and drug-resistant TB in Ibadan, Nnewi and Abuja, using 409 DNAs extracted from culture positive TB isolates. Methodology/Principal Findings DNAs extracted from clinical isolates of Mycobacterium tuberculosis complex were studied by spoligotyping and 24 VNTR typing. The Cameroon clade (CAM) was predominant followed by the M. africanum (West African 1) and T (mainly T2) clades. By using a smooth definition of clusters, 32 likely epi-linked clusters related to the Cameroon genotype family and 15 likely epi-linked clusters related to other “modern” genotypes were detected. Eight clusters concerned M. africanum West African 1. The recent transmission rate of TB was 38%. This large study shows that the recent transmission of TB in Nigeria is high, without major regional differences, with MDR-TB clusters. Improvement in the TB control programme is imperative to address the TB control problem in Nigeria.


Introduction
Multi-drug-resistant Mycobacterium tuberculosis (MDR-TB) has emerged as a major global public health problem [1]. WHO estimates that in 2008, between 390,000 and 510,000 persons developed MDR-TB worldwide with 69,000 cases occurring in Africa and 11,000 in Nigeria [2]. Nigeria has the tenth highest burden of TB among the 22 TB high-burden countries and an estimated TB incidence rate of 320/100000 population (WHO 2011). MDR-TB is an emerging problem in Nigeria with as much as 8% of all cultured specimens being MDR-TB in Ibadan, Nnewi and Abuja [2].
Despite the ever growing importance of TB in Nigeria, available molecular epidemiological studies do not represent an extensive picture of TB epi-links in this country due to non-standard genotyping protocols and restricted sampling areas [3][4][5][6]. This is due to molecular diagnostic methods being until now poorly adapted to high TB prevalence due to high costs or suboptimal protocols to ensure epi-links detection. Hence, the African TB molecular epidemiology is poorly described with the exception of South Africa [7,8].
Recent innovations in molecular diagnostics (e.g. Hain MTBDRplusH, MTBDRsl H, GenXpertH) and genotyping procedures such as the analysis of 24 Mycobacterial Interspersed Repetitive Units-Variable Number of Tandem Repeats (MIRU-VNTR) and high-throughput spoligotyping have made the analysis of TB transmission more efficient and the MDR-TB diagnostics easier [9,10]. Multiplexed high-throughput technologies are also emerging as powerful tools both for molecular diagnostics and public health with whole genome sequencing (WGS) holding promise for this field [11,12]. 24 MIRU-VNTR combined with spoligotyping is a new standard to replace the IS6110-Restriction Fragment Length Polymorphism (RFLP) fingerprinting method. Analyzing hundreds and even thousands of clinical isolates' DNA with limited resources has now become feasible [13,14].
Combining spoligotyping and MIRU-VNTR allows to analyze the genetic diversity and molecular epidemiology of drug susceptible and MDR-TB strains with the aim to identify the population structure of circulating clinical isolates, to estimate the recent TB transmission rate, and to eventually detect the transmission of MDR-TB cases [2].
In this manuscript, we present the characterization of the drugsusceptible and MDR-TB M. tuberculosis isolates previously reported by  in Nigeria using molecular markers and we estimate the recent TB transmission rate in Nigeria [2]. We also identify the main clades of circulating genotypes.

Materials and Methods
Biological Samples, Clinical Isolates, Drug Susceptibility Testing, DNA Extraction Sputum specimens were obtained through a cross-sectional study aiming at describing TB drug resistance in three cities of Nigeria. These are cities situated in three different geopolitical zones of Nigeria, as reported in Lawson et al., 2011 [2]. Five hundred twenty patients were recruited (520), including 433 newly diagnosed patients with pulmonary TB (PTB) and 87 patients with PTB who had failed to respond to first-line TB treatment attending TB diagnostic centres in three different geopolitical zones of Nigeria. Patients over 18 years old with positive smear microscopy attending Directly Observed Treatment Short course (DOTS) treatment centres were enrolled prospectively from August 2009 to July 2010 at 1) Wuse, Maitama, Asokoro and Nyanya General Hospitals in Abuja, 2) the University College Hospital and five DOTS centres in Ibadan and 3) Nnamdi Azikiwe Teaching Hospital and three DOTS centres in Nnewi. Clinical features of these specimens were described in Lawson et al. [2]. The protocol for the study received Ethical approval from the Federal Capital Territory Health and Research Ethics Committee and Zankli Ethical Research Review Board. Written consent was obtained from all participants including permission to store samples to conduct further tests for characterising the strains affecting them.
DNA was extracted from isolates derived from sputum specimens cultured on BACTECH MGIT960 by a thermolyzate method and sent by express carrier on ice to the Institut de Génétique et Microbiologie UMR8621 CNRS-University Paris-Sud.

Genotyping Methods, Data Analysis
High-throughput spoligotyping was performed on Luminex 200H (Luminex Corp., TX) as previously described [15]. Twentyfour VNTR loci analysis was done on agarose gels after simplex PCR, as previously reported [16], or using an updated in house duplex procedure (Refrégier et al. unpublished data). The standardized 24 VNTR loci method was used [10]. DNA was available for 423 isolates (353 from new cases and 70 from previously treated patients); complete genotyping defined as spoligotyping plus at least 20 MIRU-VNTR markers was obtained for 404 samples (97%). From these, 154 were obtained from Abuja (38%), 81 from Ibadan (20%) and 169 from Nnewi (42%). All data were entered into ExcelH files and transferred to BionumericsH (v.6.6 Applied Maths, Sint Martin Latems, Belgium). Cluster analysis was performed in BionumericsH according to instructions manual ( Figure 1, 2, 3, 4). Clade/Family assignation was done based on SpolDB4 for available Spoligo-International-Types (up to SIT1939) and based on SITVITWEB for SIT nu 2088 and 2550 [17,18]. MIRU-VNTR international type (MIT) designation was done using SITVITWEB [18]. When no SIT nu were available in SpolDB4 or SITVITWEB, the designation ''new-x'' (lower cases) was given for orphan pattern and the designation ''NEW-X'' for new intra-study clusters. The recent transmission Index (RTI) was computed using the (n21) method in which the number of isolates in clusters minus the number of clusters divided by the total number of isolates represents the recent tuberculosis transmission rate; in our case: (RTn-1 = 152-51/410) [19]. Statistical analyses (Odds ratio, Chi-square and Student's T test and ANOVA) were performed in R version 2.13.0 (www.r-project. org/).

Population Description
MTBC strains were isolated from 423 patients (163 men, 253 women, and 7 with unregistered gender). Twenty-three patients were ,20 years old, 289 between 20 and 40 years, 87 between 40 and 60 years and 17.60 years old.

VNTR Results; Recent Transmission Index Assessment
Exploitable VNTR results (defined as less than 4 missing values) were obtained for 404 of 423 isolates (Table S1). Some isolates did not provide results for many loci. Among the most problematic loci, Mtub39 (VNTR3690) characterisation frequently failed (n = 78). The failure to amplify Mtub39 was linked to isolates belonging to the CAM family suggesting an amplification problem specific to this family (Odds Ratio = 4.8; IC 95 = [2.5; 10.1]). Recent results in Indian isolates have pointed to the large variability of copy number of VNTR3690 [22]. A similar observation was made for the 57 isolates in which there was a failure to amplify QuB11b (VNTR2163b), among which all M. africanum. None of the 53 M. africanum clinical isolates provided results on QuB11b whereas only 20 among the 370 other strains did not (x 2 268). As this feature seems to rely on genetic properties of all M. africanum isolates, QuB11b marker was hypothesized to be identical (''X'' or not determined ''ND'' in Table S1 Table 2). A large proportion of CAM isolates could not be amplified for Mtub39 locus. The remaining isolates had a surprisingly diverse copy numbers. The values ranged from 2 to 28 copies and are noted as ''N'' ( Table 2) (cf. also Table S1). Spoligotyping splits the CAM sub-group G-V isolates into two subclusters of 15 and 16 isolates. Among ''Unknown'' isolates according to SpolDB4, six sharing the spoligotype SIT1204 shared 24 VNTR patterns suggesting a phylogenetic link with CAM family. The SIT1204 genotype was already described in the Cross River State in the South geopolitical zone of Nigeria [4]. These strains differed only on the Mtub39 locus and harboured an intermediate copy number (n = 2.5 copies) at the exact tandem repeat D (ETR-D) locus [23]. They may represent an epi-cluster.
If defining clusters by using 100% identity between isolates according to both spoligotyping and VNTR typing, and considering missing values as unique so that any pattern with a missing value cannot belong to a cluster, 134 isolates were grouped in 47 clusters containing 2 to 11 isolates. If these figures are considered a true representation of the epidemiological situation and using the (n21) method, the recent TB transmission rate would be around 22% in Nigeria [19]. However when single-variants are included, the number of clusters doubles reaching 219 isolates grouped in 65 clusters and a Recent Transmission Index (RTI) of 38% (Table 3). The analysis using VNTR genotyping data alone did not give significantly lower discriminatory power than the composite one, i.e. adding spoligotyping information (Table 3).
Amplification for QuB11b did not work for M. africanum isolates and clustering analysis was thus conducted without this marker. A high polymorphism of MLVA (multi-locus VNTR analysis) was observed within M. africanum clinical isolates DNA with identical spoligotypes, suggesting that spoligotyping-based clustering represented common ancestors with no clear epidemiological links in most cases. Indeed, among the 49 M. africanum isolates for which MLVA results were available, 4 clusters only of 2 isolates were identified using a strict cluster definition (100% identity). VNTR clusters were also found in the T and H families (Table S1) with six out of eight SIT53 (T1) isolates found in one single cluster. Amongst the 41 SIT52 or derived types, also designated as Ghana family, and using 100% identity, only 17 isolates were found in 4 clusters (4 SIT2088, 4 SIT846, 5 designated as ''NEW3'' and 4 designated as ''NEW5''). 11 of 13 SIT316 isolates (T2-variant) were found in one cluster, whereas two other isolates differed on only one single VNTR locus, Miru26 [24]. These two isolates are likely to represent a second epidemiologically-linked cluster.

An Evolutionary Scenario of the ''Cameroon'' (CAM) Clade
The CAM clade, was first described in Cameroon and shows a typical SIT61 signature [25,26]. The MLVA analysis of the CAM isolates in Nigeria, provides evolutionary and epidemiologic information and together with the 43 spacers spoligotyping, describes a global population analysis of at least 7 main clusters. The combination of values obtained on MIRU16 and MIRU40 (greyed out numbers in following text) allows the observation of three main MIRU12 international types (MIT) as described in the SITVITWEB database (see http://www.pasteur-guadeloupe. fr:8081/SITVIT_ONLINE): 223315153323, reported as MIRUinternational-type 12 (MIT12), 223315153321, reported as MIT266 and 223215153323, reported as MIT264. All three major VNTR 12 types were independently reported in Nigeria in another study [4]. QuB11B provides further epidemiological information within the sub-clades ( Table 2). Assuming a molecular evolution by loss of copies on MIRU40, the ancestral character of this marker would be 3 and the ancestral MIT signature would be MIT12, which would have independently evolved in MIT264 and MIT266. The larger diversity observed in Mtub39 for MIT12 and MIT266 (from 5 to 28 copies, see also Table S1) than for MIT264 (from 10 to 12 copies) reinforces this hypothesis.

Distribution of Multi-Drug Resistance Isolates
Twenty-nine (29 i.e. 7%) of 407 isolates with phenotypic Drug Susceptibility Testing (DST) in BACTEC-MGITH (Becton Dickinson, NJ, USA) were MDR-TB isolates as previously described [2]. Among these, 23 belonged to the CAM family, of which 17 were SIT61 (data not shown). The proportion of MDR-TB within the CAM family is statistically not different from the percentage of the CAM family in the whole population (Student's test T = 0.1; df = 260; p-value = 0.9). Three MDR-TB isolates belonged to T, one to LAM, one to M. africanum and one to U (Unknown). Nine MDR-TB isolates belonged to the subgroup G-III of the CAM family which contains 21 isolates (i.e. 43% MDR in this subgroup). Nine additional isolates were resistant to at least one of the drugs tested (altogether 85% of resistance). The CAM and T clades exhibited a high resistance level with respectively 53% and 54% being resistant to at least one drug. Among the 53 M. africanum isolates studied, one only was MDR and 17 (32%) were resistant to one of the drugs tested. The proportion of MDR was higher among the H family (28%, 9 out of 32).

Spatial and Phylogenetical Analysis of Diversity and Transmission
To detect if specific MTBC clusters were circulating in specific geographical areas, cluster analyses were performed independently for each collecting center (Figures 2, 3 and 4 for Abuja, Ibadan, and Nnewi, respectively). The number of clustered isolates of the 3 centers was reduced to 72 as compared to 134 in the complete study (54%) when considering 100% identity, and 144 as compared to 219 (66%) when allowing for inclusion of SLVs. The prevalence of the main clades was similar in the three cities (p = 0.59). These results confirm that Nigeria can be considered as homogeneous in the three settings investigated regarding the origin of isolates. A linear model was searched for to identify possible differences in transmission depending on the city (Abuja, Ibadan, Nnewi) or large isolate families (CAM, other modern isolates, other isolates namely M. africanum and M. bovis). The clade was found to be significantly linked to the transmission frequency as assessed by clustering, with higher transmission for T isolates (ANOVA, p = 0.012; effect sizes: ''M. africanum and bovis'' family = -0.25; ''CAM'' family = 20.01; other modern isolates [T] = +0.32). Indeed T isolates exhibited the lower proportion of orphans (59% as compared to 94% for M. africanum and M. bovis cluster and 87% for CAM). No significant statistical differences were detected regarding transmission in the different centers although a tendency for higher transmission in Abuja and Nnewi was detected (Table 3).

Discussion
This is the largest and most detailed genetic characterisation on MTBC clinical isolates of patients suffering from TB in Nigeria relying on the analysis of isolates from three main cities [2]. The genetic diversity of MTBC was characterised by spoligotyping (43 and 68 spacers) and by 24 VNTR loci [10,27].
Spoligotyping is a genotyping method that studies the genetic diversity of the Clustered Regularly Interspersed Palindromic Repeats (CRISPR) within the MTBC (for a review on CRISPR see [28]). It enables reliable subspecies identification [17,29,30]. Its recent transfer from a membrane-based to a microbead-based format resulted in a second youth to this method, and a similar ''CRISPOL'' method has recently been developed to track outbreaks for another pathogen, Salmonella enterica ser. typhimurium [9,15,31].
Increasing the number of spacers to be analysed in some settings can also improve clustering and reduce the costs of systematic Spoligo+VNTR typing as recently shown in Cambodia where the number of VNTR locus to be analysed was reduced to 8 without loss of discriminatory power [32].
We have shown in this study that the recent tuberculosis transmission rate could be between 22 and 38% using either a strict (100% identity on 24 VNTR) or smooth (including SLV) definition of clusters. We detected an active transmission of TB especially in Abuja and Nnewi, although these data need to be interpreted with caution given the short (one year) recruitment period [33]. However, looking into the social network of patients found in clusters could not be done here and is a clear limitation of this study.
The CAM genotypes were the most prevalent circulating genotypes (66%). This clade was first described in Cameroon, where it represented 34% of the M. tuberculosis isolates in 2003 [25]. This group of strains was assumed to have emerged recently and homogeneously in the West province of Cameroon. It is characterized by the SIT61 signature (spacers 23-25 and 33-36 missing) and a homogenous 6 bands-Ligation-Mediated PCR pattern [25]. The CAM clade belongs to the principal genetic group 2 (i.e. modern strains) and is lacking the TbD1 region [26]. Several CAM spoligotype variants (SIT852, SIT808, SIT403) have been reported in Cameroon and in Nigeria [26]. The 12 MIRU-VNTR signatures differ in 4 out of 12 loci (MIRU16, MIRU26, MIRU27, MIRU40) and they have similar IS6110-RFLP patterns (10 to 15 copies) with seven common bands [26]. This group was also shown not to have an IS6110 copy in the DR locus and four IS6110 copies in open reading frames coding for adenylate cyclase, phospholipase C, moeY, and ATP-binding proteins [26]. In rare cases, strains with identical IS6110-RFLP patterns had spoligotypes differing by as much as 15 spacers [26].  In a recent study, four clinical isolates belonging to the Cameroon clade were partially sequenced to detect single nucleotide polymorphisms (SNPs) and in an attempt to find new markers for molecular evolution and epidemiology. A specific nonsynonymous mutation in the dnaQ gene was found in these four isolates of the CAM clade [34]. Whether this SNP could be used to specifically identify the CAM clade remains to be studied on a large sample in West Africa. The CAM clade had formerly been designated as a subclade of the LAM clade (LAM10-CAM) based on the common absence of spacers 23-24 [17]. However Dos Vultos and colleagues demonstrated that this clade has nothing to do with bona fide LAM, since it does not carry the LAM-specific SNPs [34].
In addition to Cameroon, a high prevalence of the CAM clade had been previously observed in neighbouring countries such as Chad (33%), Burkina Faso (30%) and Ghana (45%) [14,25,[35][36][37] and a study on the genetic diversity of TB in Jos, Plateau state (Nigeria) suggested a frequency of this clade similar to the one found here [3]. Thus our study indicates that the CAM clade has a very high prevalence in Nigeria and suggests that Nigeria is the present largest reservoir for this genetic family in Africa.
M. africanum remains an important cause of TB in humans. Its presence in every setting of this study confirms that it is still transmitting in Africa. Its capacity to spread and to cause disease seems restricted though [38]. Here, the lower frequency of recent transmission was further documented for subfamily M. africanum West African 1 as all M. africanum isolates but one in Abuja belonged to this type [21]. Its continuing presence could however be due to different transmission dynamics, namely the ability to perform efficient retarded transmission. This possibility could be investigated using long-term molecular epidemiological studies.
Four Spoligotype profiles characteristic of M. bovis isolates were found in Abuja, which is in the Central North area where the population owns large herds of cattle. As in a previous study all belonged to the M. bovis Afri1 family that was found to infect human, cattle, goat and pigs [5]. Another former study of 55 isolates from human samples in Ibadan (South-West) revealed 11% of M. bovis Afri1 and zero Afri2 [39].
Regarding transmissibility of the different families, the fact that the CAM clade is very prevalent is an indirect evidence of a high fitness. VNTR3690 (Mtub39) copy number was very variable in this clade. Mtub39 is located in the promoter of the lpdA gene, a potential virulence factor. The number of repetitions in Mtub39 was found correlated to the expression level of lpdA [22].
Patho-physiological grounds on the success of the CAM clade could also be linked to the polymorphism of the 3R genes. Until now, we have no evidence of such a link although it is likely that  3R genes are major players of molecular adaptation and evolution [40,41]. Alternatively, the main parameter responsible for the fitness of the CAM clade may be the demographic changes of the Nigerian population with a population of around 150 million in 2012 and a projected 250 million population in 2035. Even though we did not observe strong geographical differences in the prevalence of the clades in the three cities, further analysis of data stratified by language and ethnic/tribe group and a more thorough spatial analysis may allow to better investigate bacterial genotypes-human hosts associations. Further work to characterise the phenotypic/genotyping links within M. tuberculosis strains circulating in Nigeria is also needed, especially on the critical issue of MDR-XDR-TB control. In this sense, new studies that will use integrated molecular methods, aimed at both MDR-TB prevention, outbreak surveillance and patient care should be implemented in Nigeria in a near future.