Insights into the genetic diversity of Mycobacterium tuberculosis in Tanzania

Background Human tuberculosis (TB) is caused by seven phylogenetic lineages of the Mycobacterium tuberculosis complex (MTBC), Lineage 1–7. Recent advances in rapid genotyping of MTBC based on single nucleotide polymorphisms (SNP), allow for rapid and phylogenetically robust strain classification, paving the way for defining genotype-phenotype relationships in clinical settings. Such studies have revealed that, in addition to host and environmental factors, different strains of the MTBC influence the outcome of TB infection and disease. In Tanzania, such molecular epidemiological studies of TB however are scarce in spite of a high TB burden. Methods and Findings Here we used a SNP-typing method to genotype a nationwide collection of 2,039 MTBC clinical isolates obtained from new and retreatment TB cases diagnosed in 2012 and 2013. Four lineages, namely Lineage 1–4 were identified. The distribution and frequency of these lineages varied across the regions but overall, Lineage 4 was the most frequent (n=866, 42.5%), followed by Lineage 3 (n=681, 33.4%) and 1 (n=336, 16.5%), with Lineage 2 being the least frequent (n=92, 4.5%). A total of 64 (3.1%) isolates could not be assigned to any lineage. We found Lineage 2 to be associated with female sex (adjusted odds ratio [aOR] 2.25; 95% confidence interval [95% CI] 1.38 – 3.70, p<0.001) and retreatment (aOR 1.78; 95% CI 1.00 – 3.02, p=0.040). We found no associations between MTBC lineage and patient age or HIV status. Our sublineage typing based on spacer oligotyping revealed the presence of mainly EAI, CAS and LAM families. Finally, we detected low levels of multidrug resistant isolates among a subset of retreatment cases Conclusions This study provides novel insights into the influence of pathogen-related factors on the TB epidemic in Tanzania.


Introduction 48
Tuberculosis (TB) is the leading cause of mortality due to an infectious disease [1]. In 2017, 49 an estimated 10.0 million people developed TB globally, with 1.3 million dying of the 50 disease. More than 80% of the TB burden lies in the thirty high burden countries [1]. 51 Tanzania is among these countries, with a national average TB notification rate of 129 cases 52 per 100,000; however, some regions show higher notification rates [2]. Like in most sub-53 Saharan African countries, the HIV epidemic contributes to the high TB incidence in 54 Tanzania, where a-third of the TB patients are co-infected with HIV [2]. Contrarily, drug 55 resistant-TB is still low in this setting [3]. Other risk factors such poverty also influence the 56 epidemiology of TB in Tanzania [4]. 57 Transmission of TB occurs via infectious aerosols, and upon exposure individuals can either 58 develop active disease or remain latent infected [5]. It is estimated that a-quarter of the 59 world's population is latently TB infected [6], with a 5 -10% life time risk to develop active 60 TB disease; this risk is 50% in case of HIV co-infected individuals [7]. 61 The complex dynamics of TB infection and disease are determined by the environment, the to each region annually) and from all retreatment cases were sent to zonal reference 95 laboratories (i.e. Central Tuberculosis Reference Laboratory [CTRL]  We selected a subset of clinical isolates from retreatment cases to perform molecular drug 124 resistance testing. We used a previously described multiplex polymerase chain reaction 125 (PCR) to target the hotspot region of rpoB gene that confers resistance to rifampicin [27].

126
The PCR assay targets both the tuberculous and non-tuberculous Mycobacteria (MTBC and 127 NTMs, respectively) rpoB gene, so we could also rule out the presence of non-tuberculous 128 isolates in our study sample using the assay. The amplified rpoB gene product was confirmed 129 by electrophoresis on a 2% agarose gel and sent for Sanger sequencing. We analyzed the

133
For statistical analyses we applied descriptive statistics to delineate patients' characteristics. 134 We used Chi-square or Fisher's exact tests for assessment of differences between groups in 135 categorical variables, whenever applicable. We used univariate and multivariate logistic

142
Patients' demographic and clinical characteristics 143 The patients' demographic and clinical information in our study included; age, sex, 144 geographical location, HIV and disease category (new or retreatment case). Table 1   Similar to other settings [1], we identified a higher proportion of male TB cases compared to 160 female TB cases. However, the male-to-female ratio in our study population is higher than 161 the national estimates for the two years of sampling (2.2:1 vs., 1.4:1). The striking gender 162 imbalance among TB cases seems to peak at adolescence onwards and is less pronounced 163 among pediatric TB cases (S1 Table). Additionally, a-third (32.2%, 517/1604) of the TB cases 164 with available HIV status were HIV co-infected. In contrast TB/HIV co-infected cases were 165 more likely to be female (44.5%, CI 38.3-50.7% vs., 25.8%, 95% CI 20.6-31.0%) which is 166 consistent with HIV being generally more prevalent in females than males [32]. We found 167 that our study population comprised 16.1% (321/2000) of TB retreatment cases, which was 168 four-fold higher than the overall countrywide notifications [31]. Finally, more than half 169 (51.6%, 1029/1996) of the TB patients in our study population were diagnosed in the Coastal 170 zone of Tanzania and about 40% were either diagnosed in the Lake and Northern zones. In 171 addition to higher TB notification rates, the three former mentioned geographical zones 172 contain the country's zonal TB reference laboratories. The remaining 10% of the patients 173 were diagnosed in any of the remaining four geographical zones.
varying proportions. In our study setting, Lineage 4 and Lineage 3 were the most frequent 177 (866, 42.5% and 681, 33.4%, respectively) followed by Lineage 1 (336, 16.5%). Lineage 2 was 178 the least frequent (92, 4.5%). The remaining 64 clinical isolates (3.1%) could not be assigned 179 into any of the MTBC lineages. Of the seven geographical zones, four (Coastal,Northern,180 Lake and Southern Highlands) were highly represented with more than 100 clinical strains 181 each ( Table 2). The distribution of the M. tuberculosis lineages varied within the 182 geographical zones (Fig 1 and S3 Fig). Our findings reveal that Lineage 1 strains were more     four to Lineage 4, three to Lineage 3 and one was unclassified (S4 Table). Table 4 summarizes 240 the non-synonymous rpoB mutations detected.