Whole Genome Sequencing Based Characterization of Extensively Drug-Resistant Mycobacterium tuberculosis Isolates from Pakistan

Improved molecular diagnostic methods for detection drug resistance in Mycobacterium tuberculosis (MTB) strains are required. Resistance to first- and second- line anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs) in particular genes. However, these SNPs can vary between MTB lineages therefore local data is required to describe different strain populations. We used whole genome sequencing (WGS) to characterize 37 extensively drug-resistant (XDR) MTB isolates from Pakistan and investigated 40 genes associated with drug resistance. Rifampicin resistance was attributable to SNPs in the rpoB hot-spot region. Isoniazid resistance was most commonly associated with the katG codon 315 (92%) mutation followed by inhA S94A (8%) however, one strain did not have SNPs in katG, inhA or oxyR-ahpC. All strains were pyrazimamide resistant but only 43% had pncA SNPs. Ethambutol resistant strains predominantly had embB codon 306 (62%) mutations, but additional SNPs at embB codons 406, 378 and 328 were also present. Fluoroquinolone resistance was associated with gyrA 91–94 codons in 81% of strains; four strains had only gyrB mutations, while others did not have SNPs in either gyrA or gyrB. Streptomycin resistant strains had mutations in ribosomal RNA genes; rpsL codon 43 (42%); rrs 500 region (16%), and gidB (34%) while six strains did not have mutations in any of these genes. Amikacin/kanamycin/capreomycin resistance was associated with SNPs in rrs at nt1401 (78%) and nt1484 (3%), except in seven (19%) strains. We estimate that if only the common hot-spot region targets of current commercial assays were used, the concordance between phenotypic and genotypic testing for these XDR strains would vary between rifampicin (100%), isoniazid (92%), flouroquinolones (81%), aminoglycoside (78%) and ethambutol (62%); while pncA sequencing would provide genotypic resistance in less than half the isolates. This work highlights the importance of expanded targets for drug resistance detection in MTB isolates.


Introduction
Mycobacterium tuberculosis (MTB) caused 9 million cases of tuberculosis (TB) and 1.5 million death worldwide in 2013 [1]. Drug resistance in MTB increases the burden due to the disease and complicates both diagnosis and treatment. Multidrug-resistant (MDR)-TB is caused by MTB resistant to at least the two first line drugs i.e. rifampicin and isoniazid. Globally, in 2013, an estimated 480,000 people developed TB while, there were about 210,000 deaths due to TB [1]. Extensively drug resistant (XDR)-TB is caused by MDR-TB strains resistant to fluoroquinolone and to any one of the injectable aminoglycosides; amikacin, kanamycin or capreomycin. In Pakistan, MDR-TB comprises 4-5% of all TB cases, and XDR-TB is 4-5% of the MDR-TB cases [2,3]. Appropriate MDR-TB management requires drug susceptibility testing to guide treatment regimens. The usual turnaround time for phenotypic MTB drug susceptibility testing (DST) is about 8-10 weeks, often leading to delays in initiation of appropriate therapy. In high TB burden low resource settings facilities for mycobacterial culture and DST are limited. Therefore, rapid genotypic methods for detection of drug resistant MTB strains are required for more effective management of TB.
The Xpert MTB/RIF assay (Cepheid, USA) assay was rolled out at a global level since 2012; it detects rifampicin resistance by targeting the ribosomal polymerase rpoB gene [4]. The LiPA MTBDRplus (Hain Lifescience GmbH, Germany) assay targets rpoB for rifampicin resistance and katG and inhA genes for isoniazid resistance. The LiPA MTBDRsl (Hain Lifescience GmbH, Germany) targets gyrA, rrs and embB for fluoroquinolone, aminoglycoside and ethambutol resistance respectively [5]. Whilst these assays have significant value and allow rapid diagnosis of MDR-TB it is apparent that further improvements in molecular DSTs are required. MTB is largely clonal [6,7]. The TBDReaM database (http://www.tbdreamdb.com), catalogues greater than 1400 SNPs associated with drug resistance [8]. Studies of drug resistant MTB have shown that mutations in hot-spot regions, and those occurring beyond hypervariable regions vary according to both strain type and geographical location [9,10,11]. Thus, it is important for all possible resistance conferring mutations or single nucleotide polymorphisms (SNPs) related to each anti-tuberculous drug to be identified. Whole genome sequencing (WGS) facilitates the identification of SNPs, insertions and deletions in genomic DNA [12]. Thus, WGS of MTB can reveal new SNPs in genes previously associated with drug resistance and also identify outbreak strains [13].
By providing information on multiple genes simultaneously WGS facilitates the understanding of relationships between genes and their interactions such as in the case of fitness conferring compensatory mutations which occur due to the presence of drug induced selection pressure. Compensatory mutations are believed to play a role in conferring fitness to drug resistant MTB strains by affecting the growth rate of the isolates [14,15,16]. Compensatory mutations in rpoB genes which aid fitness of rifampicin resistant strains are well documented [17]. Knowledge of additional resistance mutation and compensatory mutations in MTB will be important for determining appropriate treatment for MTB strains. Here we have used WGS analysis to describe XDR-TB strains from Pakistan and identify SNPs in genes known to be associated with drug resistance to particular drugs.

Ethics statement
This study was approved by the Ethical Review Committee, The Aga Khan University.

Strain selection
The M. tuberculosis strains studied were those collected from 2004-2009. Mycobacterial strains were obtained from the strain bank of Aga Khan University Clinical Microbiology laboratory. Demographic information on patients was obtained from clinical laboratory records and collected as part of good clinical practice. The hospital and its clinical laboratory are accredited by the Joint Commission of International Accreditation (JCIA, USA). MTB isolates were identified through a random selection method. Each strain had its unique identification number which at the time of testing was anonymised and unlinked with patient demographic information. The XDR-TB (n = 37) and drug susceptible (n = 5) isolates studied here were spoligotyped previously [3].

Spoligotyping
Spoligotype patterns for each strain were obtained as described previously [18], and also confirmed in silico from the sequencing reads using SpolPred software [19].

DNA extraction and Whole Genome Sequencing of MTB isolates
DNA was extracted by the cetyl-trimethyl ammonium bromide (CTAB) method [22]. One microgram of DNA was used for sequencing. All samples underwent WGS with 76-base paired end fragment sizes, using Illumina paired end HiSeq2000 technology, and the raw sequence data is available in the European nucleotide archive (http://www.ebi.ac.uk/ena/data/view/ PRJEB7798). For each sample trimmomatic software (http://www.usadellab.org/cms/?page= trimmomatic) was used to remove low quality reads and trim low-quality 3' ends of reads. Nucleotide positions in the reads with a quality score lower than Q20 were removed. High quality reads were then mapped to the H37Rv reference genome (Genbank accession: AL123456.3) using BWA-MEM software (http://bio-bwa.sourceforge.net). SAMtools (http://samtools. sourceforge.net) and GATK (https://www.broadinstitute.org/gatk) were used to call single nucleotide polymorphisms (SNPs) and small indels. Variants (with quality at least Q30) were then selected as the intersection dataset between those obtained from both programs. Sample genotypes were called using the majority allele (minimum frequency 75%) in positions supported by at least 20-fold total coverage; otherwise they were classified as missing. Samples with a proportion of missing genotype calls greater than 15% were filtered out [21]. Similarly, we excluded positions in the genome with more than 15% missing genotypes across samples. Larger indels were called using a consensus from paired end mapping distance or split read approaches [23,24,25,26]. For more detail, the data processing pipeline has been described [27]. Spoligotypes were inferred from the sequencing reads SpolPred software [19]. Variation density maps were generated using Circos software (http://www.circos.com).

Phylogenetic analysis
Phylogenetic analysis was performed in PHYLIP using a RAxML tree (Maximum likelihood phylogenetic software) with bootstrap to 1000 replicated [28]. Phylogenetic tree was visualized using Dendroscope software [29].
Isolates from Sindh comprised Central Asian Strain (CAS), T1, East African Indian (EAI), X and Orphan types and thus had more diverse lineages as compared to those from other provinces. However, within each province the CAS lineage was predominant, as described previously [30].

Polymorphisms in genes associated with drug resistances
A visual description of SNPs densities throughout the genome of MTB isolates was compiled by a circos plot (Fig. 1). The maximum likelihood analysis is demonstrated by the circular phlyogenetic tree based on the polymorphism in the drug resistance genes using Dendroscope software version 3.2.8 [32], (Fig. 2). The clustering pattern of the strains demonstrated that T1 and X lineage strains clustered together (PGG3 and PGG2 groups respectively), and were distinct from CAS and EAI strains which clustered in a similar node. However, EAI remained distant from other PGG1 isolates in the phylogenetic tree.
By comparison with the H37Rv reference genome, SNPs were identified in 40 genes associated with drug resistance (Fig. 3). All synonymous SNPs (sSNPs) were removed from the analysis so that only non-synonymous SNPs (nsSNPs) are displayed in S2 Dataset. This nsSNP dataset is further described in Tables 1-5, which focus on the known resistance-determining regions in particular MTB genes.

Polymorphisms conferring rifampicin and isoniazid resistance
Resistance to rifampicin is associated with mutations in the RNA polymerase gene sub-unit B, rpoB. However, there is a fitness cost associated with the acquistion of rpoB mutation and rifampicin resistant MTB strains often acquire compensatory mutations in rpoC and rpoA genes to their improve growth [33]. Similarly, there is a fitness cost associated with acquisition of isoniazid resistance, and while katG followed by the inhA gene mutations are most common, a number of other genes are thought to contribute to resistance [34]. We first evaluated SNPs in the context of MDR by studying the rifampicin-resistance determining region (RRDR) in rpoB, and compensatory mutations in rpoC and rpoA. For isoniazid resistance we analysed the katG and inhA genes, and also investigated sequences of oxyR-ahpC, fbpC, Rv1592C, ndh, Rv2242, fabD, accD, fad E24 and nat. Combinations of mutations observed in the XDR strains related to rifampicin and isoniazid are illustrated in Table 1. All the XDR isolates exhibited mutations in the RRDR of rpoB gene; codons 531 (n = 24), 516 (n = 12) and 526 (n = 1). The rpoB codon 531 mutation was present in PGG1, PGG2 and PGG3 isolates whereas, a mutation at codon 526 was observed in one PGG1 group EAI3-IND isolate. Mutations at codons 515 and 516 were present concurrently in each case, in CAS and Orphan strains.
Twenty-two XDR-TB isolates with compensatory mutations in rpoC and rpoA genes had the low fitness cost mutation rpoB S531L. One isolate with the rpoB H526R mutation had two rpoC SNPs. In all, we observed 15 different rpoC SNPs and one rpoA SNP. The most common was rpoC A172V which present in 5 isolates and rpoC P601L which was present in 3 isolates.
Polymorphisms at codon 463 of katG were observed among all the PGG1 isolates. Resistance to isoniazid could be attributed to the katG codon in 34/37 of cases and most strains (86%) had low fitness cost mutation, katG S315T while, two strains had katG S315N. Two strains had a inhA S94A mutation in the absence of katG mutations. We did not observe any inhA promoter mutations amongst the XDR isolates studied. One EAI3-lineage strain had a katG Y413H change.

Polymorphisms conferring resistance to pyrazinamide
All of the XDR strains were resistant to pyrazinamide treatment (Table 1). Non-synonymous SNPs in the pncA gene were observed in 14 (38%) strains, whereby 11 different SNPs were observed (S2 Dataset). Twenty-three pyrazinamide resistant isolates did not reveal any nsSNP in pncA; two such strains had the rpsA mutation Q1228R.
Both pyrazinamide and isoniazid are pro-drugs which are converted to their active forms by enzymatic activity within the mycobacterium. We explore an association between isoniazid and pyrazinamide resistance by comparing katG and pncA mutations within strains. Of the XDR strains, thirteen had pncA and katG S315T mutations, while one strain with a pncA mutation had inhA S94A (Table 1).

Polymorphisms conferring resistance to streptomycin
The mechanism of streptomycin resistance in the XDR strains was investigated by analyzing sequences of the ribosomal genes rpsL, rrs (500 region) and gidB. The rpsL mutation K43R was found in 41% of streptomycin resistant isolates; this SNP was found independently (n = 11) and also in combination with rrs A514C (n = 2), Table 3. The rpsL mutation K88R was found in combination with rrs A906G and gidB P75S. Three strains had rrs A514C mutations only. There were eight different SNPs in gidB. Eight streptomycin resistant strains had gidB mutations only, without rpsL and rrs (500 region) SNPs. However, six streptomycin resistant and five sensitive strains were without mutations in rrs, rpsL or gidB (S2 Dataset).

Polymorphisms in the genes for fluoroquinolone resistance
Mutations in gyrA and gyrB genes were analysed to investigate flouroquinolone resistance in the XDR strains. The most common mutation in the quinolone-resistance determining region (QRDR) of gyrA gene covering codons 90-98 was A94G, followed by A90V and A94Y (Table 4). All CAS family strains had gyrA QRDR mutations. One X3 strain had gyrA D94Y together with gyrB T500N. In total, 81% of isolates had SNPs in the QRDR of gyrA, which is the target region in currently available commercial diagnostic assays for flouroquinolone resistance.  Polymorphism at codon 95 of gyrA was observed among PGG1 and PGG2 isolates (S2 Dataset). The gyrA, G668D polymorphism was present in PGG3 (n = 3) isolates belonging to T1 lineage.
Of the eight strains which did not have gyrA QRDR SNPs, four had SNPs in the gyrB gene. One T1 strain only had the gyrB S447F mutation, two EAI3 strains only had the M291I mutation and an Orphan isolate had a gyrB M291I and A432V mutations. In one strain, a three nucleotide deletion was observed at codon 683 of gyrB gene at 7158 position. In all, four CAS1-Delhi XDR strains without gyrA or gyrB mutations. Polymorphisms conferring resistance to aminoglycosides To investigate the mechanism for aminoglycoside resistance codon 1400 region of the rrs gene was analysed. Thirteen XDR strains were resistant to all three aminoglycosides (amikacin, kanamycin and capreomycin). Twenty-four XDR strains were resistant to amikacin and kanamycin but sensisitive to capreomycin. The most common rrs mutation across lineages was A1401G. Of strains resistant to amikacin, kanamycin and capreomysin, eleven had the SNP rrs A1401G and one had G1484T, whilst 2 strains had no mutation in the 1400 region of rrs (Table 5). Of the strains resistant to amikacin and kanamycin, eighteen had the SNP A1401G whilst five did not have a SNP in the rrs 1400 region. As current commercial diagnostic assays for aminoglycoside resistance target rrs nt 1401, 78% of resistance would be detected in these XDR isolates. In all, 6 aminoglycoside resistant isolates did not reveal any polymorphism in hyper-variable 1400 region of rrs gene (S2 Dataset).

Polymorphisms in the genes for ethionamide resistance
Five XDR isolates resistant to ethionamide but none showed any polymorphism in ethA gene, but one one ethionamide resistant isolate had the mutation ethA R239G which was also present in an ethionamide sensitive isolate (S2 Dataset).

Discussion
Our results present the first WGS based characterization of XDR-TB strains from Pakistan. We focused on SNPs in genes known to confer resistance to first-and second-line drugs and present data relevant to molecular diagnostic assays for drug susceptibility testing. Consistent with published reports all the rifampicin resistant MTB isolates in this study revealed at least one rpoB gene codon mutation i.e. 531, 526, 516 or 515, in the RRDR region [35,36,37,38]. The most common mutation was the rpoB S531L, corroborating with previous reports [33]. In 22/23 isolates, compensatory mutations in rpoC and rpoA genes were coincident with the low fitness cost rpoB531 mutation, supporting previous reports which demonstrate increased fitness in rifampicin resistant isolates as a consequence of these compensatory mutations [39,40,41]. One EAI lineage isolate had the rpoB H526R mutation together with two rpoC mutations. As the rpoB codon 526 mutation has been shown to have a higher fitness cost than the codon 531 mutation [17] therefore, it would be an advantage for such an isolate to accumulate the rpoC compensatory mutations.
Based on the katG463-gyrA95 polymorphism 91% of the MTB isolates belonged to PGG1 of which the majority were CAS family isolates, as previously reported [30]. PGG1 is considered the oldest of MTB lineages and has been previously been reported from Asia [42]. The predominance of CAS strains suggests that they are well adapted to human populations in this region [43,44,45,46].
The most frequently observed isoniazid resistance associated mutation observed in our strains was katG S315T, which has previously been shown to be associated with a growth advantage in strains [47]. Three isolates with inhA codon 94 mutations did not reveal katG315 mutations, supporting a causal relationship of this mutation for isoniazid resistance [48,49].
Pyrazinamide is important as a first-line anti-tuberculous drug but phenotypic and genotypic discordance exists regarding testing its drug susceptibility. Consistent with previous reports, we found that more than 50% of resistant isolates did not reveal any mutation in pncA gene [50]. Mutations in pncA gene are thought to be the primary mechanism for pyrazinamide resistance, but our data consistent with others indicates that other resistance mechanisms may exist [51]. We found rpsA mutations in two XDR isolates which did not have pncA mutation, suggesting that this gene may play a role in resistance to pyrazinamide [52]. Additional pyrazinamide resistant strains had neither pncA nor rpsA mutations, and further supports the suggestion that mechanisms such an efflux pump for pncA wild type resistant MTB isolates may be relevant [53].
The embB306 SNP was found to be the most common mutation associated with ethambutol resistance in MTB strains, as reported previously [54]. The embB306 mutation has been shown to be associated with high-level ethambutol resistance [55]. In addition, we also observed embB codon 378, 406 and 497 mutations; which have reported previously [56] [55,57]. Our observations of distinct combinations of embB and compensatory mutations in XDR isolates of EAI-IND and Orphan types suggest that some of these may lineage associated mutations. Especially, we observed a particular pattern of embB, embA, embC, rmlD and ini mutations was in four Orphan types, of which three were resistant and one was sensitive to ethambutol.
The most common streptomycin resistance associated mutation we observed was rspL K43R. This mutation has been shown to play a role in reduced fitness cost and enhanced growth rate of resistant MTB isolates [47]. The K43R mutation has also been associated with high-level streptomycin resistance [58]. In the rrs gene we identified point mutations at positions 514 and 906, encoding for streptomycin resistance [58,59]. Previous studies have reported role of gidB polymorphism for low level of streptomycin resistance in MTB [60]. Here we found eight different gidB SNPs. Three strains had compensatory gidB SNPs with rpsL and/or rrs mutations. Six streptomycin resistant isolates had gidB SNPs but did not have either rpsL or rrs gene mutations. The occurrence of gidB mutations in strains reduces drug pressure for selection of highly resistant isolates, therefore it has been suggested that high-level streptomycin resistant strains (e.g. rpsL mutants) arise more frequently in gidB mutants than in wild-type cells [61]. We observed six streptomycin resistant strains without mutations in rpsL, rrs or gidB. The aminoglycoside antibiotics inhibit protein translation therefore it is likely that in additional mutations may be relevant for inactivating the effect of streptomycin in inhibiting tRNA translocation in mycobacteria.
In the strains resistant to amikacin, kanamycin and capreomycin we found 16S rRNA gene rrs1401 and 1484 mutations to be the most common. These rrs mutations have previously been associated kanamycin and capreomycin [39,56,62]. In addition, it has been shown that the tlyA and rrs gene mutations jointly play a role for capreomycin resistance [63]. However, we did not observe any tlyA mutations amongst our strains.
We found that across MTB lineages, the most common fluoroquinolone resistance associated mutation was at gyrA codon 94, as shown previously [56,64]. Lineages other than CAS (X3, T1, EAI and Orphan) had gyrB SNPs in the 400 region. The gyrB gene has previously been shown to result in flouroquinolone resistance [65]. Four XDR strains did not have gyrA or gyrB mutations. This may be explained by the fact that additional mechanisms such as efflux pumps may be involved in a generating resistance flouroquinolone [66]. The predominant lineage amongst our XDR isolates was CAS and there is less representation of the other strain types. However, this data is suggestive of the association of flouroquinolone resistance in CAS lineages with gyrA QRDR region.
In the case of ethionamide, we found that none of the isolates had ethA SNPs which were particular to ethionamide resistance, as one isolate had ethA R239G mutation which was also present in an ethionamide sensitive isolate. A low frequency of mutation in ethA gene of ethambutol resistant isolates have also been reported earlier and may suggest existence of some other mechanisms such as efflux pump for ethionamide resistance [56,67].
Overall, the most frequently seen drug resistance mutations observed for isoniazid, rifampicin, streptomycin, ethambutol, aminoglycosides and ofloxacin were katG315, rpoB531, rpsL43, embB306, rrs1401 and gyrA94 respectively as previously reported [35,36,37,38,55,56,68,69]. Currently available commercial assays for detection of drug resistance (MTBDRplus and MTBDRsl) target the rpoB531, 516 and 526, katG315, gyrA94, 91 and 90 and rrs1401 regions. Based on our data the frequency of drug resistance detection using such assays for these strains would have been 100% for rifampicin resistance, 92% for isoniazid resistance, 81% for fluoroquinolone resistance, 78% for aminoglycoside and 62% for ethambutol resistance. Our data indicates that SNPs in drug resistance genes vary with different MTB lineages. Therefore, it highlights the relevance for WGS data from prevalent global strain types in order to develop improved molecular diagnostic assays for resistance in MTB. Additional knowlegde of mutations that are compensatory to those that confer drug resistance will be important for prediction of levels of drug resistance and therefore, to guide treatment of drug resistance isolates.