Detection of drug resistant Mycobacterium tuberculosis by high-throughput sequencing of DNA isolated from acid fast bacilli smears

Background Drug susceptibility testing for Mycobacterium tuberculosis (MTB) is difficult to perform in resource-limited settings where Acid Fast Bacilli (AFB) smears are commonly used for disease diagnosis and monitoring. We developed a simple method for extraction of MTB DNA from AFB smears for sequencing-based detection of mutations associated with resistance to all first and several second-line anti-tuberculosis drugs. Methods We isolated MTB DNA by boiling smear content in a Chelex solution, followed by column purification. We sequenced PCR-amplified segments of the rpoB, katG, embB, gyrA, gyrB, rpsL, and rrs genes, the inhA, eis, and pncA promoters and the entire pncA gene. Results We tested our assay on 1,208 clinically obtained AFB smears from Ghana (n = 379), Kenya (n = 517), Uganda (n = 262), and Zambia (n = 50). Coverage depth varied by target and slide smear grade, ranging from 300X to 12000X on average. Coverage of ≥20X was obtained for all targets in 870 (72%) slides overall. Mono-resistance (5.9%), multi-drug resistance (1.8%), and poly-resistance (2.4%) mutation profiles were detected in 10% of slides overall, and in over 32% of retreatment and follow-up cases. Conclusion This rapid AFB smear DNA-based method for determining drug resistance may be useful for the diagnosis and surveillance of drug-resistant tuberculosis.

The funding sources mentioned above did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section.
AFB smears and then used in TB DST analysis using molecular and other DNA sequencing based methods [5][6][7][8][9][10][11][12]. However, most of these studies involved small sample sizes (<100 slides) and only demonstrated the performance of their assays with one or two TB specific gene targets. Additionally, some studies used complex extraction and amplification protocols involving phenol-chloroform extractions, ethanol precipitations, and nested Polymerase Chain Reaction (PCR) amplifications requiring multiple primers for each target region, which would be impractical for widespread use.
We have developed a simple and effective method to extract Mycobacterium tuberculosis (MTB) DNA from clinically obtained AFB smears with sufficient quality for PCR amplification and next-generation DNA sequencing. We have also streamlined the process for PCR amplification of the isolated DNA for subsequent high-throughput sequencing of 17 regions of the MTB genome, enabling rapid detection of mutations associated with resistance to INH, RIF, ethambutol (EMB), PZA, FQ, and injectable aminoglycoside antibiotics. Here, we describe the method and assess the utility of our approach with 1,208 clinical AFB smears from Ghana, Kenya, Uganda, and Zambia. Since sequencing is becoming increasingly established as a definitive, reproducible, and reliable predictor of phenotypic DST, we also report the results of our new test desegregated by region and patient treatment status.

AFB direct sputum microscopy smears
A convenience sample of clinically obtained direct AFB sputum smears (n = 2,227), collected between 2013 and 2016, was provided for evaluation with our assay. We utilized 635 of the slides to develop our assay. We processed an additional 1,208 smears (one per subject) with the final version of our assay, which is the focus of this manuscript. A summary of the study sample population and methods is presented in Fig 1. We excluded 384 smears from processing (Fig 1). We excluded scanty [1 to 9 AFB in 100 fields] smears because a preliminary evaluation of our assay showed suboptimal performance with such smears (S1 Fig). The slides included in this study consisted of 379 smears from the Ghanaian Armed Forces Health Care Beneficiary population seen at 37 Military Hospital, Accra, 517 smears from a PEPFAR population in the Western highlands of Kenya, 262 smears from the Uganda Peoples Defence Force and Ministry of Health hospitals and the Central Public Health Laboratory, Uganda, and 50 smears from Maina Soko Military Hospital serving the greater Lusaka area of Zambia.
In each country, smears were prepared based on standard protocols recommended for clinical use by their national TB control programs. Types of stains used included Kinyoun, Auramine-Rhodamine, and Ziehl-Neelsen. All smears were centrally graded at the Uniformed Services University of the Health Sciences (USUHS) based on the International Union Against Tuberculosis and Lung Disease (IUATLD) grading system [13]; this required stripping and restaining the Auramine-Rhodamine stained slides with Kinyoun stain (Remel, Lenexa, KS, USA). In the remainder of this manuscript, wherever a reference is made to stain type, we are specifically referring to the original stain type used on the smear.

Ethics reviews
This study was reviewed and approved as non-human subjects research by the Institutional Review Boards (IRB) at the USUHS and Rutgers New Jersey Medical School, and it was reviewed and designated as an exempt protocol by the IRBs at the Naval Health Research Center, Walter Reed Army Institute of Research, Kenya Medical Research Institute, Makerere University, 37 Military Hospital, and the University of Zambia. Therefore, informed consent was waived by all aforementioned institutions. All data was anonymized before access by this study.

Smear extraction and DNA isolation
Prior to extraction, the slides were washed with Xylene (Sigma Aldrich, Inc., Burlington, MA, USA) to remove immersion oil. For the extraction, 200 μl of Instagene Matrix (Bio-Rad Laboratories, Inc., Philadelphia, PA, USA) with 0.1% Triton X-100 (Sigma Aldrich, Inc., Burlington, MA, USA) was aliquoted into 1.5 mL Eppendorf tubes. Without disturbing the Chelex pellet, up to 100 μL of the Instagene Matrix and Triton X-100 solution was aspirated from the tube and dispensed onto the smear. The smear was then scraped off using a razor blade (Fish-erbrand™ Razor Blades, Thermo Fisher Scientific, Inc., Hampton, NH, USA), aspirated off the slide, and transferred into the Eppendorf tube containing the Chelex pellet. To isolate the DNA, the extracted smear material was pulse vortexed at high speed for 30 seconds, then boiled for 20 minutes at 90˚C, pulse vortexed on medium speed for 10 seconds, and centrifuged for 5 minutes at 15,000 rpm using Eppendorf Centrifuge 5424 (Eppendorf, Hamburg, Germany). The supernatant was transferred to a clean Eppendorf tube and purified using Qiagen QIAamp DNA Micro Kit (Qiagen, Inc., Germantown, MD, USA) according to manufacturer's instructions for cleanup of genomic DNA. The purified DNA was eluted in 45 μL of AE elution buffer from the kit.

Amplification targets and primers
The drug resistance phenotypes, gene target regions, and the associated segment names and primers are shown in Table 1. These target regions were selected to cover single nucleotide polymorphisms (SNPs) which have been reported to be associated with drug resistance in previous studies [14][15][16]. Mutation H57D in pncA, which occurs naturally in M. bovis, was excluded from downstream SNP analyses in this study. For pncA, the entire gene, including upstream and downstream flanking regions were targeted for amplification (Table 1).

Polymerase Chain Reactions (PCR)
All samples were amplified in 20 μL reactions. Smears graded 1+ were amplified in uniplex reactions containing 2 μl of target DNA to increase PCR efficacy and sequencing depth for these paucibacillary samples. For smears graded 2+ or 3+ all gene target sequences were amplified in duplexed reactions using 3 or 4 μL of target DNA per reaction, except for gyrB, which was only amplified in a uniplex reaction to maintain amplification efficiency. The PCR mix consisted of the following: 2.5 mM magnesium chloride (MgCl 2 )), 0.25 mM deoxyribonucleotides, 5% glycerol, 1X PCR buffer without , and genomic DNA extracted from a mutant positive control drug-resistant strain (TB-TDR-0114 or TB-TDR-0115)) were included [17]. These controls were also included in the downstream sequencing and SNP detection pipeline.

Data analysis
Reads were aligned to a reference sequence from the H37Rv strain of TB using an in-house bioinformatics pipeline (S2 Appendix). For each DST target region of a given slide, results were classified as interpretable if coverage depth of 20X or more reads was obtained; a mutation frequency of 80% or greater was considered sufficient for calling SNPs. If coverage depth of a DST target region was less than 20X, the results for that target region were classified as uninterpretable. Prevalence and corresponding exact binomial confidence intervals of mutations associated with resistance to first-and second-line anti-tuberculosis drugs were estimated for all slides with interpretable results for all gene targets.

Results
Slide characteristics are summarized by country of origin in Table 2. The 1,208 processed slides consisted of 282 (23.1%) 1+ smears, 352 (29.1%) 2+ smears, and 574 (47.5%) 3+ smears. All Kenya smears and 58% of Ghana smears were from adult patients. Information on adult or child status was missing for Uganda and Zambia smears. Over 61% of the smears were classified as new TB cases. The majority of the smears (42.8%) were from Kenya ( Table 2). Ziehl-Neelsen was the most frequently used stain (Table 2). Coverage depth varied by target region. The percentage of slides per smear grade with interpretable results overall and per target gene segments is reported in Fig 2A. With the exception of gyrB (interpretable results obtained in 79% of 1+ smears), interpretable results were obtained for any given target gene segment in 80% or more of slides, regardless of smear grade (Fig 2A). Interpretable results was obtained for all targets in 194 (68.8%) 1+ smears, 246 (69.9%) 2+ smears, 430 (74.9%) 3+ smears, and 870 (72%) slides overall. The number of mapped reads varied by target gene segment and slide smear grade, ranging from approximately 300X for rrs10 to over 12000X for the eis promoter on average. Median coverage per target region, stratified by smear grade, is shown in Fig 2B. Coverage depth also varied by stain type. Interpretable results were obtained for all targeted gene segments in over 94% of Kinyoun stained smears, but only 64% of Ziehl-Neelsen stained smears, and 59% of Auramine/Rhodamine stained smears (S1 Table).
A summary of all drug resistance associated mutations detected in all gene targets are presented in Table 3. RIF resistance associated mutations in rpoB were detected in 6.3% of new cases, 3.3% of follow-up cases (patients under initial treatment), 20.0% of retreatment cases (S2 Table), and 6.1% of slides overall ( Table 3). The highest rate of RIF resistance associated mutations were detected in slides from Uganda (18.3%), followed by slides from Zambia (8.51%). Over 85% of RIF resistance associated mutations occurred in codon 445 (alternative numbering system: 526) of rpoB (Table 3).
INH resistance associated mutations in katG and/or inhA were detected in 6% of slides overall, with the highest rate being among Zambia slides (12.8%), followed by Uganda (10.7%) ( Table 3). Most (92%) of the observed INH resistance associated mutations were S315T in katG (Table 3). Overall, fluoroquinolones, moxifloxacin, and ofloxacin resistance associated mutations in gyrA were observed in over 1% of slides (Table 3). PZA resistance associated mutations in pncA were observed in over 2% of slides overall ( Table 3). Majority of the distinct  pncA mutations were only detected in a single slide. However, the -11 A/G mutation in the pncA promoter was observed in 2.4% of Kenya slides. Detected pncA SNPs for which there currently is little to no evidence of an association with drug resistance are reported in S3 Table. We did not detect any SNPs in our wild-type positive controls or negative controls. Prevalence estimates and associated 95% confidence intervals for detected drug resistance mutation profiles in smears with interpretable results for all DST gene targets, stratified by country are presented in Table 4. Overall, drug resistance associated SNPs were detected in 88/ 870 (Estimate: 10.11%; 95% CI: 8.19%, 12.31%) of the slides with interpretable results for all DST gene targets, with mutation rates being highest in slides from Zambia (Table 4). Approximately 58% of the slides in which a drug resistance associated mutation was detected contained a SNP associated with mono-resistance to RIF, INH, or SM, with the most frequent mutation being associated with INH resistance (Table 4). A MDR mutation profile with resistance to at least RIF and INH was also observed in 8% (95% CI: 2.48%, 21.22%) of smears from Zambia and in 1.8% (95% CI: 1.05%, 2.97%) of smears overall (Table 4). No XDR mutation profiles were identified.
Presented in Table 5 are overall and country specific prevalence estimates and 95% confidence intervals for phenotypic interpretation of observed mutations in smears with interpretable results for all DST gene targets, stratified by case treatment status. Among smears with interpretable results for all gene targets, 7.8% of new cases, 4.1% of follow up cases, and 28.1% of retreatment cases had mutation profiles for mono-resistance, MDR, or poly-resistance.
We re-sequenced a subset of the slides using Sanger sequencing to confirm the accuracy of our MiSeq based approach. This included 16 slides with a wild type pncA gene, 61 with rpoB mutations, 18 with KatG mutations, and 27 additional slides with mutations in other gene targets. Our Sanger sequencing results matched the results of our primary study in every case except for one slide where an H445D mutation was detected in rpoB by MiSeq (coverage depth: 1700X; mutation frequency: 99%) but not Sanger sequencing (S1, S2 and S3 Data).

Discussion
We have demonstrated that DNA of sufficient quality for PCR amplification and next generation DNA sequencing can be isolated from AFB stained direct sputum smears and tested for mutations associated with resistance to all first and several second line anti-tuberculosis drugs. The DNA isolation method we have developed is simple and rapid and does not require much technical expertise. The samples used in this study were comprised of a diverse set of clinically a A coverage depth cut-off of 20X and a frequency cut-off of 80% was used for all reported single nucleotide polymorphisms. For each target gene segment, the number of smears with SNPs out of the total number of smears with 20X or greater coverage for the given target gene segment are reported per country. For each country and in total, the percent of smears with resistance to a given drug were calculated by dividing the number of smears with at least one mutation associated with resistance to the given drug by the total number of smears in which all relevant target gene segments (listed in Table 1) were successfully screened plus any smears in which at least one associated target gene segment met the coverage cut-off and contained a drug resistance associated mutation. b For rpoB0, the amino acid number is presented in the MTB numbering system followed by the alternative numbering system in parentheses. c Two samples from Uganda contained a drug resistant associated mutation in both emb10 and emb20 gene segments. obtained smears in terms of smear grade, AFB stain type, and geographic origin. Although the percentage of slides that met the coverage cut-off of 20X was similar for all target gene segments, the median number of reads for each target varied. This indicates that while sufficient DNA can be isolated from 1+ to 3+ smears to amplify and sequence all gene segments tested in this study, certain targets (e.g., eis promoter) are amplified more efficiently than others by the selected primer pairs. Kinyoun stained smears enabled better sequencing coverage compared to other stain types. However, the poor performance with Auramine/Rhodamine stained smears could in part be due to subsequent stripping and counterstaining of these slides with Kinyoun stain at USUHS, which may have resulted in DNA loss. Given that each country mainly used one type of stain, the variation observed in coverage depth by stain type could also be due to site-specific factors ( Table 2). Our assay demonstrated well-established associations of drug-resistance with geographical region and TB treatment status. Although the number of retreatment cases in our study was small, the rate of RIF resistance associated mutations we observed in retreatment cases as compared to new cases was similar to trends currently reported for the continent of Africa, where the rate of drug resistance mutations in retreatment cases is almost four times that observed in new TB cases [3]. The countries from which slides were analyzed in our study are on the WHO's list of top high burden countries for TB (Kenya; Zambia), MDR-TB (Kenya), and/or TB-HIV concurrent infections (Zambia; Ghana; Uganda) [3]. Comprehensive genotypic drug   resistance surveillance data for these countries is currently limited. Based on national surveillance data, less than 2% of new TB cases are drug-resistant in all countries from which slides were included in our study [3]. Although our sample size for each country was small and subject treatment status was only known for a subset of our samples, the estimated prevalence of drug resistance among new TB cases in our study was over ten times higher than reported national rates for Ghana and Uganda. Among Kenyan smears in our sample, the estimated drug resistance rate was almost three times higher than the reported national rate. It could be that a greater number of drug-resistant cases were evaluated at the facilities that we received our samples from because they are centralized laboratories. The drug resistance rate estimated among Zambian smears in our sample was similar to the national rate of 18% reported for relapse cases in Zambia [3]. However, the subject treatment status associated with the Zambian slides in our study was unknown. Similar to findings from previous phenotypic and molecular drug resistance surveillance studies [18- A limitation of this study is that our specimens were obtained from central or regional reference laboratories and thus may not be generalizable to the entire population. Another limitation is that the observed drug resistance mutations could not be confirmed via phenotypic DST, given that the study was conducted retrospectively. Nevertheless, we were able to validate the in-house bioinformatics pipeline that was used to analyze our samples by reanalyzing the data for 384 slides with the previously published ASAP pipeline [26]. We observed no discordance between the variant calls made by our pipeline and the ASAP pipeline when adjusting coverage cut-off to 20X and SNP frequency to 80% (S3 Appendix). Also, limiting our screening process only to previously published drug-resistant mutations may have led to an underestimation of drug resistance in our samples.
Among the slides processed by Sanger and MiSeq sequencing, one discordant sample was observed, where rpoB mutation H445D (alternate naming system: H526D) was detected by MiSeq but not by Sanger sequencing. Given the length of the rpoB target, coverage of codon 445 was only available from the forward direction from Sanger sequencing. It is possible that the rpoB SNP was missed by Sanger sequencing in this sample. Alternatively, the portion of the discordant sample amplified by MiSeq sequencing may have been contaminated. Regardless, the overall concordance between the two approaches was very high. Although precautions were taken to prevent contamination, some samples could have been contaminated during sample processing.
Our approach has several potential advantages compared to phenotypic DSTs and to currently available rapid molecular DST tests. Our approach is free from the biohazards associated with phenotypic DST and can be performed without use of expensive biocontainment laboratories. Compared to commercially available rapid molecular tests, our use of targeted DNA sequencing enables easy expansion to detect additional resistance mutations as well as mutations to new drugs as needed. The total reagent costs for our approach, including all assay steps, was approximately $50 per slide, for slides amplified with duplex PCR, and $60 for slides amplified with uniplex PCR. However, with additional multiplexing of the PCR step and with expected decreases in next generation sequencing costs, it is likely that our approach could soon become less expensive than other DST methods. Aside from the smear scraping step, slides can be batch processed at every step of the process. Slide scraping to DNA purification can be completed for at least 12 samples in less than an hour.
In summary, we have demonstrated a simple and rapid method for determining drug resistance in a widely available sample type, AFB stained sputum microscopy smears, which is easily storable, and safely transportable without infectious risk. This approach should prove useful for diagnosing drug-resistant TB as well as for surveillance purposes.
Future research may add mycobacterial speciation targets, which may determine the frequency of positive acid-fast microscopy caused by mycobacteria other than MTB.