Genetic Mutations Associated with Isoniazid Resistance in Mycobacterium tuberculosis: A Systematic Review

Background Tuberculosis (TB) incidence and mortality are declining worldwide; however, poor detection of drug-resistant disease threatens to reverse current progress toward global TB control. Multiple, rapid molecular diagnostic tests have recently been developed to detect genetic mutations in Mycobacterium tuberculosis (Mtb) genes known to confer first-line drug resistance. Their utility, though, depends on the frequency and distribution of the resistance associated mutations in the pathogen population. Mutations associated with rifampicin resistance, one of the two first-line drugs, are well understood and appear to occur in a single gene region in >95% of phenotypically resistant isolates. Mutations associated with isoniazid, the other first-line drug, are more complex and occur in multiple Mtb genes. Objectives/Methodology A systematic review of all published studies from January 2000 through August 2013 was conducted to quantify the frequency of the most common mutations associated with isoniazid resistance, to describe the frequency at which these mutations co-occur, and to identify the regional differences in the distribution of these mutations. Mutation data from 118 publications were extracted and analyzed for 11,411 Mtb isolates from 49 countries. Principal Findings/Conclusions Globally, 64% of all observed phenotypic isoniazid resistance was associated with the katG315 mutation. The second most frequently observed mutation, inhA-15, was reported among 19% of phenotypically resistant isolates. These two mutations, katG315 and inhA-15, combined with ten of the most commonly occurring mutations in the inhA promoter and the ahpC-oxyR intergenic region explain 84% of global phenotypic isoniazid resistance. Regional variation in the frequency of individual mutations may limit the sensitivity of molecular diagnostic tests. Well-designed systematic surveys and whole genome sequencing are needed to identify mutation frequencies in geographic regions where rapid molecular tests are currently being deployed, providing a context for interpretation of test results and the opportunity for improving the next generation of diagnostics.

Introduction resistance explained by the existing canonical mutations. It is critically important to understand the frequency and geographic distribution of mutations associated with INH resistance. A failure to account for these variations limits the local effectiveness of molecular diagnostic tools currently available and constrains the development of improved genotypic diagnostic tests [28]. The aims of our systematic review were to quantify the frequency and co-occurrence of the most common mutations associated with phenotypic INH drug resistance and to describe the regional differences in the distribution of these mutations as reported in the published literature.

Methods
A PubMed search was conducted to identify all English-language peer-reviewed publications that assessed mutations associated with INH resistance. In order to capture publications not aggregated in previous reviews, the search was restricted to studies published between January 1, 2000 and August 31, 2013. PubMed key search terms included: (INH OR isoniazid) AND (resistance or resistant) AND (tuberculosis) AND (mutation OR sequence).

Publication Selection Criteria
Publications were selected for inclusion if they met the following criteria: 1) presented original data; 2) used clinical strains of Mtb (not laboratory strains); 3) described the phenotypic DST method used as reference standard; 4) used DNA sequencing (including pyrosequencing) as a means of characterizing mutations; and 5) included individual level amino acid mutation data. Data from studies evaluating commercial or in-house molecular drug susceptibility tests were included if the results were verified by DNA sequencing.
Our primary goal was to assemble mutation data that was representative of the global distribution of drug resistance phenotypes. Studies were excluded if DNA sequencing was only performed on isolates with discrepancies between phenotypic and genotypic tests, as data from these studies would not be generalizable to the pathogen population. Studies were also excluded if DNA sequencing was performed on non-random isolate subsets or if studies reported sequential sequencing, i.e. sequencing katG on all isolates then sequencing inhA among isolates without katG mutations then sequencing ahpC-oxyR among isolates without mutations in either inhA or katG. An exception was made for studies where identifying sequencing results for the initial gene sequenced was easily discernible and in these instances, the sequencing results for only the first gene sequenced were included.

Data Extraction
The following information was extracted from each publication: primary author, publication year, geographic origin of specimens, year(s) of specimen collection, reference strains, phenotypic drug susceptibility testing method and cutoff, genotypic testing method, gene loci sequenced, primers used for DNA sequencing, and total number of resistant and susceptible isolates tested. Individual isolate mutation information included: location of gene mutation, amino acid and/or nucleotide changes, number of resistant and susceptible isolates with identified mutations, and number of resistant and susceptible isolates tested. Mutations in codon 463 of katG were not extracted as previous studies have identified katG463 as a genetic marker and not a marker of resistance [29,30]. All data were extracted and compiled using MS Excel software (Microsoft, Redmond, WA).

Mapping of Primer Sequences and Mutation Locations
In order to accurately assess which gene fragments had been sequenced for each individual isolate, the exact start and end points of the sequenced sections were determined using one of two methods. If primer sequences were included or referenced in the selected studies, sequence endpoint coordinates were identified by entering primer sequences into the NCBI Basic Local Alignment Search Tool (BLAST) and mapping the primers onto the complete Mtb H37Rv genome (Accession number NC_000962.3) [31]. If primer sequences were not published, sequenced endpoints coordinates were inferred by using the outermost identified mutations as sequence endpoints. If several primers were included and sequenced fragments overlapped, the final dataset included only the outermost/inclusive primer coordinates. Mutations identified in studies by traditional gene and nucleotide or codon nomenclature were converted to NCBI Mtb H37Rv complete genome (Accession number NC_000962.3) coordinates using the conversions described in S1 Table. In short, mutations occurring in the promoter or regulatory regions of the identified genes were mapped by exact nucleotide position and mutations occurring in the coding region of the gene were mapped by the center nucleotide position of the codon. Due to inconsistencies in the nomenclature and reporting of insertions, deletions, and frameshifts, they were excluded from analysis.

Calculation of Single Mutation Frequencies
As described in 2012 by Georghiou et al., mutation frequencies among phenotypically resistant isolates were calculated by aggregating the number of occurrences of a specific mutation among phenotypically resistant isolates and dividing it by the total number of phenotypically resistant isolates that could have identified that mutation, i.e. all phenotypically resistant isolates that had been sequenced at the location of the specific mutation [32]. Mutation frequencies among phenotypically susceptible isolates were calculated using the same method; the number of occurrences of a specific mutation among phenotypically susceptible isolates was divided by the total number of phenotypically susceptible isolates sequenced at the location of the mutation being assessed. Additionally, mutation frequency data were stratified by WHO region based on specimen origin and regional frequencies were calculated as described above [33].

Calculation of Cumulative Mutation Frequencies
Studies that reported co-occurring or multiple mutations in phenotypically resistant isolates were further analyzed to assess cumulative frequencies of the most commonly occurring mutations. Cumulative mutation frequencies were calculated first for individual genes, then across multiple genes simultaneously. Isolates were only included in data subsets if they had been sequenced at each of the mutation locations being evaluated. Individual mutation frequencies were reassessed in each data subset for global comparison, then the cumulative frequency for all mutations being evaluated was calculated by aggregating the number of isolates containing any of the mutations being assessed and dividing it by the total number of isolates in the subset.

Results
Of the 466 publications identified, 348 were excluded as they did not meet inclusion criteria (S1 Fig.). Mutation data, including gene, mutation location, original amino acid and/or nucleotide, and mutated amino acid and/or nucleotide, were extracted for 11,411 Mtb isolates described in the selected publications. Study size ranged from 4 to 502 isolates per publication. Seventy-seven percent (8,786) of the isolates were phenotypically resistant to INH. Geographic origin of the isolates was reported for 94% (10,745) of the isolates and included 49 countries (S4 Table). Primers were extracted (>96% of publications included primer information) for the three most commonly inspected genes and their surrounding regions: furA-katG, fabG1-inhA, and aphC-oxyR.

Single Mutation Frequencies
Global frequencies for all mutations occurring in the katG coding region, inhA coding region, inhA promoter region, and ahpC-oxyR intergenic region were evaluated, however only mutations occurring at a cumulative frequency of >0.1% and identified in at least two publications are summarized in Table 1 (for a complete listing of all mutations see S2 Table). Of the 8,416 phenotypically resistant isolates sequenced at katG315, 5,400 (64.2%) harbored a mutation in  Four-thousand, five-hundred and five isolates had nucleotide information reported in addition to amino acid information for the katG315 mutation ( Table 2). The most common nucleotide change for the katG315 codon, AGC to ACC (serine to threonine), occurred among 93.4% of isolates. AGC to AAC (serine to asparagine), the second most common nucleotide change, occurred among 3.6% of isolates. Mutation data at the country level was sparse, allowing for meaningful aggregation only at the regional level. Aggregating mutation frequencies by WHO regions revealed geographic differences in the individual frequencies for the two most commonly occurring mutations, katG315 and inhA-15 (Table 3). Among phenotypically INH resistant isolates, South East Asia had the highest frequency (78.4%) and the Western Pacific Region had the lowest frequency (55.5%) of the katG315 mutation. Frequencies also varied widely for inhA-15 mutation with the Americas region reporting the highest frequency (24.6%) and the Eastern Mediterranean region reporting the lowest frequency (13.0%) of this mutation in phenotypically INH resistant isolates.

Cumulative Mutation Frequencies
The cumulative frequencies of multiple or co-occurring mutations associated with INH resistance, were assessed first by individual genes and then in combination across multiple gene regions (Table 4)

Discussion
Few studies to date have attempted to assess the prevalence of the most commonly occurring "canonical" mutations associated with phenotypic INH resistance among a globally representative set of isolates, making it difficult to compare the global frequencies calculated in this systematic review to previous global estimates. However, a study conducted by Lin et al. using 127 INH resistant isolates from California, a population which is thought to mirror global MDR-TB diversity due to immigration, identified a global frequency of 61% for katG315, 23% for inhA -15 mutations, and 83% for the cumulative frequency of either mutation, approximating the frequencies of these mutations as quantified in this systematic review [34]. In contrast, Campbell et al., using 212 INH resistant isolates from both WHO and CDC laboratory archives estimated the global frequency of the katG315 mutation to be 85%, inhA -15 to be 17%, and their cumulative frequency 91%; however, isolates used for that study were selected to provide a diverse set of mutation patterns, and therefore may not accurately represent true global frequencies [35]. Finally, a more recent study conducted by Rodwell et al. using 348 INH resistant isolates from four geographically diverse countries estimated the global frequencies of katG315 and inhA -15 to be significantly higher at 86% and 34% respectively. The cumulative frequency  of both mutations with the addition of inhA -17 was 96% [36]. However, as with the previously mentioned study, isolates were selected to maximize diversity, potentially affecting the generalizability of the frequency estimates. Prevalence of drug resistant TB is commonly understood to vary significantly by region, making it reasonable to expect regional variation in the frequencies of specific mutations that drive INH resistance [37,38]. Based on this systematic review, frequencies of specific mutations do in fact appear to vary quite significantly by geographic region. Although the total number of isolates analyzed varied by country and for some countries data was very sparse, we suspect that regional differences in the frequencies of the most common mutations could play a significant role in the performance of molecular-based diagnostic tests in some regions. For example, of the 808 INH resistant isolates sequenced from Japan and Korea, only 43% harbored the katG315 mutation, in contrast katG315 mutations in the post-soviet states of Belarus, Lithuania, Kazakhstan, Latvia, Moldova, and Russia were identified in 94% of the 751 INH resistant isolates analyzed from those countries. This regional variability of the katG315 mutation frequency may contribute to the poor performance of molecular diagnostics for INH resistance detection in Japan [39]. However, due to the lack of large, geographically diverse crosssectional studies, regional patterns in single mutations should be interpreted with caution.
The frequency patterns of the most common mutations associated with INH resistance appear to differ between individual genes. It is clear that the overwhelming majority (64%) of phenotypic INH resistance among Mtb isolates is associated with a single mutation, katG315. The dominance of this mutation is hypothesized to be the result of a low or zero fitness cost for this mutation, allowing it to propagate without negative selection pressure [40,41]. Mutations other than katG315 in the katG gene appear to occur at low (<1%) frequencies and occur overwhelmingly in conjunction with the katG315 mutation. Patterns of co-occurring mutations in the inhA promoter region appear to differ markedly from co-occurring mutations in the katG gene. Although the inhA -15 mutation is the dominant (19%) mutation in the inhA promoter region, other resistance associated mutations (~1%) in the inhA promoter region appear to occur independently of the inhA -15 mutation and frequently contribute to the detection of INH resistance. In contrast, no single mutation in the ahpC-oxyR intergenic region showed an individual frequency above 1.3%; rather multiple mutations (at positions -48, -39, -15, -12, -10, -9, and -6,) in the ahpC-oxyR intergenic region occurred at a total cumulative frequency of 5.4% among phenotypically resistant isolates.
Equally important to our understanding of the distribution of INH resistance associated mutations is their cumulative frequency across genes. Analysis of the cumulative frequencies of mutations in katG, inhA, and ahpC-oxyR was performed only in a small subset of the data (n = 1,582), as few publications sequenced portions of all three genes of interest. In this subset, 80% of the isolates had at least one mutation reported at the following locations: katG315, inhA -15, inhA -8, inhA -47, or inhA -17. With the addition of mutations occurring at ahpC -48, ahpC -39, ahpC -15, ahpC -12, ahpC -10, ahpC -9, or ahpC -6, the percentage of isolates with at least one reported mutation increased to 84%. This approaches the cumulative frequency previously documented by Lin et al. of 88% [42]. The increased detection capacity achieved by including mutations in the ahpC-oxyR intergenic region appears to contradict previous studies which have found no significant contribution of INH resistance detection from mutations occurring in the ahpC-oxyR intergenic region [18,27,43].

Limitations
Given the nature of a systematic review, we were unable to obtain data from all geographic regions, thereby introducing a theoretical potential for bias in the global cumulative frequencies calculated. However, given that the total number of isolates reviewed exceeded 11,000, the cumulative frequencies most likely approximate true global frequencies of the most common mutations associated with INH resistance. Additionally, there is a potential for publications to selectively sequence isolates that did not show the most common mutations as detected on rapid molecular tests which may have led to artificially lower cumulative frequencies of the most common mutations. To reduce the potential for this reporting bias, we specifically excluded publications that indicated any non-random selection of isolates for sequencing analysis.

Conclusions
The performance of molecular-based diagnostic tests for drug resistant TB is intrinsically linked with not only the regional frequencies of mutations being detected, but also the diversity of resistance-conferring mutations being detected by diagnostic tests. Both of these factors influence the maximum sensitivity which rapid molecular tests can be expected to achieve.
Based on the data we evaluated, approximately 80% of global Mtb isolates with phenotypic resistance to INH appeared to contain mutations in codon 315 of the katG gene or position -15 in the inhA promoter. Taken together with other independently occurring mutations in the inhA promoter and the ahpC-oxyR intergenic region that have been previously documented to confer INH resistance, a minimum of 84% of all global Mtb isolates with INH resistance phenotypes should be detectable with molecular diagnostics based on analysis of these mutations. Additionally, as these mutations almost never (<0.1%) occurred in INH susceptible isolates, they should have specificities in excess of 99% as markers of phenotypic INH resistance. However, the cumulative frequencies of these mutations do appear to differ by region which could lead to variation in the sensitivity of molecular diagnostics if they are based only on these mutations. Further research, especially in regions with low prevalence of katG and inhA canonical mutations, is needed to determine if unknown genes or specific mutations account for the unexplained phenotypic INH resistance, or if a more inclusive combination of mutations in previously identified genes such as the inhA promoter, inhA coding, ahpC-oxyR intergenic region, or katG gene account for the remaining phenotypic resistance. Regional surveys of the most commonly occurring mutations, including mutations reported in the ahpC-oxyR intergentic region should continue to be conducted in geographic regions where rapid molecular tests are being deployed. This would allow for tailoring of molecular tests to specific regions, better interpretation of the molecular tests being used, and improved therapy recommendations.