Automated IS6110-based fingerprinting of Mycobacterium tuberculosis: Reaching unprecedented discriminatory power and versatility

Background Several technical hurdles and limitations have restricted the use of IS6110 restriction fragment length polymorphism (IS6110 RFLP), the most effective typing method for detecting recent tuberculosis (TB) transmission events. This has prompted us to conceive an alternative modality, IS6110-5’3’FP, a plasmid-based cloning approach coupled to a single PCR amplification of differentially labeled 5’ and 3’ IS6110 polymorphic ends and their automated fractionation on a capillary sequencer. The potential of IS6110-5’3’FP to be used as an alternative to IS6110 RFLP has been previously demonstrated, yet further technical improvements are still required for optimal discriminatory power and versatility. Objectives Here we introduced critical amendments to the original IS6110-5’3’FP protocol and compared its performance to that of 24-loci multiple interspersed repetitive unit-variable number tandem repeats (MIRU-VNTR), the current standard method for TB transmission analyses. Methods IS6110-5’3’FP protocol modifications involved: (i) the generation of smaller-sized polymorphic fragments for efficient cloning and PCR amplification, (ii) omission of the plasmid amplification step in E. coli for shorter turnaround times, (iii) the use of more stable fluorophores for increased sensitivity, (iv) automated subtraction of background fluorescent signals, and (v) the automated conversion of fluorescent peaks into binary data. Results In doing so, the overall turnaround time of IS6110-5’3’FP was reduced to 4 hours. The new protocol allowed detecting almost all 5’ and 3’ IS6110 polymorphic fragments of any given strain, including IS6110 high-copy number Beijing strains. IS6110-5’3’FP proved much more discriminative than 24-loci MIRU-VNTR, particularly with strains of the M. tuberculosis lineage 4. Conclusions The IS6110-5’3’FP protocol described herein reached the optimal discriminatory potential of IS6110 fingerprinting and proved more accurate than 24-loci MIRU-VNTR in estimating recent TB transmission. The method, which is highly cost-effective, was rendered versatile enough to prompt its evaluation as an automatized solution for a TB integrated molecular surveillance.

Introduction Tuberculosis (TB), though a curable disease, still represents a major global threat, notably because of the emergence and spread, at an alarming rate, of difficult-to-treat drug-resistant forms. In 2014, 9.6 million new TB cases and 1.5 million deaths were recorded, 480 000 of whom developed multidrug-resistant TB (MDR-TB), a TB form that is resistant to at least isoniazid and rifampicin, the two most effective front-line anti-tubercular drugs [1].
Rapid detection of drug-resistant TB and implementation of an effective system to monitor its spread is a great challenge for any national TB program (NTP) [2]. Ideally, to set effective control measures aimed at reducing the spread of drug-resistant TB, NTPs must be guided by a clear picture of TB transmission dynamics and epidemiology, particularly in those regions where drug resistance is prevalent. Molecular fingerprinting of M. tuberculosis could address this challenge but still awaits versatile, medium-to high-throughput, and cost-effective typing approach that reflects the true transmission picture.
IS6110 restriction fragment length polymorphism (IS6110 RFLP), a typing approach based on the polymorphism generated by the insertion sequence IS6110, has proved highly discriminative and has served as the gold standard for the detection and surveillance of outbreaks, notably those involving MDR-TB strains, for many years [3][4][5]. However, as a hybridizationbased approach, IS6110 RFLP is labor-intensive, time-consuming, refractory to automating, and hence not amenable to high-throughput analyses. The method also suffers the major drawback of not being suitable for data banking and inter-laboratory comparisons [6].
Nowadays, TB molecular surveillance systems rely mainly on multiple interspersed repetitive unit-variable number tandem repeats (MIRU-VNTR) typing for tracing outbreaks and ongoing transmission. Unlike IS6110 RFLP, MIRU-VNTR is a PCR-based genotyping method whose results are expressed in a digital code, enabling their portability and creation of databases [7]. In its standard 24-locus format, the discriminatory power of MIRU-VNTR was shown to be equal to that of IS6110 RFLP, except in settings where genetically closely related strains predominate [8]. Compared to whole genome sequencing (WGS), virtually the highest discriminatory typing approach, 24-loci MIRU-VNTR was shown to overestimate recent TB transmission, thus emphasizing the need for an accessible, cost-effective alternative [9,10].
Towards this end, we reconsidered IS6110-5'3'FP, a typing approach that we have previously developed [11], and which we believe could represent a valuable alternative to 24-loci MIRU-VNTR. IS6110-5'3'FP relies on a single PCR amplification reaction, coupled to simultaneous differential labeling of 5' and 3' IS6110 polymorphic ends, and their automated fractionation on a capillary sequencer. IS6110-5'3'FP addresses all the drawbacks of IS6110 RFLP, while offering increased discriminatory power and flexibility. Indeed, it efficiently and accurately resolves both 5' and 3' polymorphic ends on a capillary sequencer, thus enabling interlaboratory reproducibility, results portability, and hence establishment of databases.
Here we introduced major amendments and refinements to the original IS6110-5'3'FP protocol and demonstrated its highest discriminatory power compared to 24-loci MIRU-VNTR. Aside from relying on a single PCR reaction, the overall IS6110-5'3'FP protocol was rendered versatile enough to make it a viable alternative to 24-loci MIRU-VNTR with the same benefits in terms of data sharing.

Ethics statement
In this study, no interventions were performed. Only fully anonymized data were processed, and hence no further ethical clearance was required.

Bacterial strains
The performance of IS6110-5'3'FP was assessed using a strain collection of M. tuberculosis clinical isolates of the Haarlem, LAM, and Beijing genotypes. Those isolates whose IS6110 RFLP banding pattern is known (some LAM and all Beijing strains) were used to evaluate the ability of IS6110-5'3'FP to detect almost all IS6110 polymorphic fragments. Because IS6110 RFLP targets only the 3' polymorphism, the number of peaks expected to be detected by IS6110-5'3'FP should be at least twice the number of IS6110 RFLP bands (5' and 3' polymorphic ends). Reproducibility assay was performed with a well characterized IS6110 RFLP 11-banded clinical strain [12,13], considered herein to as the laboratory reference strain, and which has been previously used in the optimization steps of the original IS6110-5'3'FP protocol [11].
Characteristics of the whole strain collection, including the year of isolation, origin, and genotypic data, are provided in S1 Table.

IS6110-5'3'FP modified protocol
The original protocol of IS6110-5'3'FP and its amended version are depicted in Fig 1. According to the new protocol, mycobacterial genomic DNA was extracted from a single colony growing on Löwenstein-Jensen medium. Genomic DNA was prepared according to standard recommendations [14]. Next, DNA was fully digested with BstUI, a blunt-end restriction enzyme, which cuts several times within the IS6110 insertion sequence, while leaving intact the 5' and 3' ends, and 52 794 times in the chromosomal DNA of the M. tuberculosis reference strain, H37Rv (NC_000962.3). BstUI-Restricted DNA was ligated into EcoRV-linearized and dephosporylated plasmid pBS SK+ (Stratagene). Typically, 150 ng of restricted DNA was combined with 50 ng of linearized plasmid vector in a 10-μL ligation reaction. All DNA manipulations including endonuclease restriction and ligation into plasmids were performed according to standard protocols [15]. Restriction endonucleases and other enzymes were used as recommended by the supplier (Amersham Biosciences).
To simultaneously target IS6110 3' and 5' polymorphic ends, we performed a single PCR reaction involving the T7 primer and the two outward IS6110-specific primers IS1 (5'GGCTG AGGTCTCAGATCAG 3') and IS2 (5'ACCCCATCCTTTCCAAGAAC 3') [16], fluorescently labeled with FAM and ATTO dyes, respectively. Because cloning is non directional (blunt-end ligation), effective PCR amplification of any given IS6110 polymorphic fragment takes place only when it is inserted in the right orientation, which basically occurs as frequently as the opposite orientation. Therefore, the use of a single primer in the plasmid (T7 primer, herein) ensures efficient amplification of all polymorphic fragments. Typically, 2 μL of the whole ligation reaction is used as template in the PCR amplification. The latter consisted of 1× Taq DNA Polymerase, recombinant buffer (Invitrogen) (4 mM Tris-HCl (pH 8.4), 10 mM KCl, 3 mM MgCl 2 (Invitrogen), a 1 μM concentration of each primer (Amersham Biosciences), a 50 μM concentration of each deoxynucleotide triphosphate and 1 U of Taq DNA Polymerase, recombinant. The reaction was thermal cycled once at 94˚C for 5 min, 35 times at 94˚C for 1 min, 55˚C for 1 min, 72˚C for 1 min and then once at 72˚C for 10 min. The amplification reaction was performed in a Perkin Elmer GeneAmp PCR system 9700 (Applied Biosystems Inc., CA). As a negative control we performed a PCR reaction using as template DNA the product of a pBS SK+ plasmid auto-ligation reaction.

Automated fragment size fractionation of IS6110-5'3'FP-generated products
A 0.5-μL volume of IS6110-5'3'FP PCR product was mixed with 0.5 μl of Gene Scan 600 Liz and 9 μl of Hi-Di Formamide. The mix was denatured for 2 minutes at 95˚C. Automated fragment analysis was performed using Liz 600 as a DNA size marker on an ABI PRISM 3100 capillary DNA sequencer (Applied Biosystems Inc., CA, USA). The electrophoresis was done using POP-7 Polymer and a 50-cm length capillary. Electrophoresis parameters were set according to the 3130_50cm_POP-7_GS600LIZ run module (Applied Biosystems Inc., CA, USA). In each run, an equivalent volume of the auto-ligation reaction was included in order to automatically subtract nonspecific fluorescent peaks (S1 Fig).

Reproducibility study
To assess the reproducibility of IS6110-5'3'FP, PCR amplifications were performed on six independent ligation products using the genomic DNA isolated from the 11-banded M. tuberculosis laboratory reference strain. PCR products were then used in separate capillary runs to test for the reproducibility of the signals in terms of both size and intensity.

Data analysis
Chromatographs showing FAM-and ATTO 532-fluorescing fragments were imported to Gen-eMapper software V5 (Applied Biosystems). Fluorescent peak data, which were blinded with regard to IS6110 copy number, were analyzed using AFLP default and the advanced mode with selection of a starting point of electrophoresis at 3250 data point. Fluorescent peaks whose RFU> 500 were considered. Background signals from the auto-ligation reaction were automatically subtracted from the chromatographs of typed strains, a new modification which simplifies data analysis, since in the original protocol checkerboard dilution testing was performed to determine the threshold for assigning a specific peak. The size of fragments was determined using GeneScan 600Liz version 2 (Applied Biosystems) with a sizing quality up  [11]. (B). The new protocol of IS6110-5'3'FP. In this new protocol version, aside from using the frequently cutting BstUI enzyme instead of HincII, there is no need for plasmid library amplification in E. coli, a modification that considerably shortens the method turnaround. Moreover, amplification in E. coli could result in the loss of some IS6110-containing plasmid most likely because of clone instability. Therefore, omission of this step increases the sensitivity of the method. https://doi.org/10.1371/journal.pone.0197913.g001 Automated, highly discriminatory, IS6110-based typing of M. tuberculosis PLOS ONE | https://doi.org/10.1371/journal.pone.0197913 June 1, 2018 than 0.95, and verified using the plot determined by the software. GeneMapper also recorded the fragment data in a binary format in Excel files which were exported into BioNumerics v6.1, visualized as virtual electrophoresis gels and analyzed. Two IS6110-5'3'FP profiles were considered to be different if they had at least one peak difference. Two peaks with a size difference less than 1 bp were considered identical.

Calculation of the discrimination power
The Hunter-Gaston discriminatory index (HGDI) [17] was used to calculate the discriminatory power of IS6110-5'3'FP and 24-loci MIRU-VNTR.

Clustering analysis
IS6110-5'3'FP and 24-loci MIRU-VNTR data were analyzed with the BioNumerics software 6.6 (Applied Maths, East Flanders, BE) in order to construct the similarity matrices and the dendrogram (unweighted pair-grouping method analysis algorithm-UPGMA).

Cost estimation of IS6110-5'3'FP
We carried out an estimation of the total cost of IS6110-5'3'FP, from DNA extraction to data acquisition, by taking into account direct costs (e.g., reagents and consumables) and depreciation costs (15%) for those equipments that were directly related to the method and purchased within the last 5 years. This included mainly the automated capillary sequencer used for polymorphic fragments fractionation. In our estimation, labor costs and maintenance fees were included.

Statistical analyses
Means and standard deviation values were calculated using the data analysis package included within Microsoft Excel 2010 software (Microsoft Corporation, Redmond, WA).

Results and discussion
The advent of WGS of M. tuberculosis and its application in molecular epidemiology has revolutionized our understanding of TB transmission dynamics [8,18]. The ability to distinguish M. tuberculosis isolates differing by a single nucleotide confers to WGS the highest possible level of discrimination and makes it the most valuable tool to investigate tuberculosis outbreaks, uncover unknown transmission events, disclose super spreaders, and refute or confirm epidemiologically suspected transmission links [10,[19][20][21][22]. WGS has also revealed the limits of 24-loci MIRU-VNTR, the current gold-standard method, in TB transmission analyses. Indeed, WGS showed that 24-loci MIRU-VNTR overestimated recent transmission events [9].
Ideally WGS must be integrated as the typing method of choice into TB surveillance [19], but much has yet to be done to make it of routine use, notably with regard to data analysis, cost-effectiveness, versatility, etc. It may take several years or decades to tackle these obstacles, thus stressing the need for an alternative modality to 24-loci MIRU-VNTR with higher discriminatory power that is cost-effective, amenable to routine high throughput use, and less technically demanding than WGS. For this purpose, we reprised a previously described IS6110-based automated typing modality, IS6110-5'3'FP, and significantly improved its protocol to meet the above requirements. IS6110-5'3'FP was rendered versatile enough to justify its implementation as a medium-to high-throughput alternative to 24-loci MIRU-VNTR, with a more reliable estimation of recent TB transmission rates.
The most critical amendment introduced to the original IS6110-5'3'FP protocol consisted in the generation of smaller-sized polymorphic fragments in order to increase the efficiency of the cloning and PCR amplification steps (Fig 1). Our objective was to detect and resolve the maximum number of IS6110 5' and 3' polymorphic fragments, expected to be at least twice the number of IS6110 RFLP bands. For this purpose, we used the restriction enzyme BstUI, which cuts the chromosomal DNA much more frequently than did HincII (52 794 times vs 7 445 times), thus yielding shorter DNA fragments. As shown in Fig 2, the mean size of IS6110-5'3'FP-generated fragments for the 11-banded outbreak strain is 197.13 bp with BstUI compared to 295.17 bp when HincII was used. Consequently, the number of polymorphic peaks generated by IS6110-5'3'FP was considerably much higher with BstUI (N = 23) than with Hin-cII (N = 16) (Fig 3). Strikingly, in all tested strains, irrespective of their genotype, the number of IS6110-5'3'FP polymorphic peaks was at least twice the number of IS6110 RFLP bands (Table 1; S1 Table for peak details). This result indicates that polymorphisms on both sides of IS6110 copies were efficiently detected using this new protocol.
Aside from using a frequently cutting enzyme, the use of more stable fluorophores could have significantly contributed to the increased sensitivity of IS6110-5'3'FP in terms of fragments detection. In its original version, we noticed that 5'-end fragments were much less frequently detected than 3'-end fragments (34.3% vs 65.7%), a finding that was attributed to the relative instability of JOE dye [11]. Here we replaced JOE with ATTO 532, a fluorophore with higher stability and better spectral properties. In doing so, the detection frequency of 5' polymorphic ends has increased to 47.3%. Furthermore, the ability of IS6110-5'3'FP to detect almost all polymorphic fragments also resides in the high resolution power of capillary fractionation, as well as efficient detection of 5' and 3' polymorphic fragments of identical size because of their differential labeling. The finding that in 13% of tested strains, IS6110-5'3'FP polymorphic peaks exceeded twice the number of IS6110 RFLP bands (Table 1), was attributed Automated, highly discriminatory, IS6110-based typing of M. tuberculosis to the lower resolution power of the latter technique, which relies on DNA fragment separation in agarose gels.
More importantly, detection of both IS6110 5' and 3' polymorphisms makes IS6110-5'3'FP more reliable than IS6110 RFLP in studying the clonal relationship of M. tuberculosis isolates. Since matching IS6110 RFLP fingerprint pattern generated by the IS6110 3'-end probe (used in the standard IS6110 RFLP protocol) does not always indicate the same IS6110 insertional event, it has been suggested that a combination of IS6110 3' and 5' probes should increase the reliability of IS6110 fingerprinting in detecting clonally-related strains [28]. The reliability of IS6110-5'3'FP in identifying clonally related strains is further enhanced by the accurate resolution of polymorphic fragments in a capillary sequencer, as well as efficient differentiation of 5' and 3' polymorphic fragments of identical size. Therefore, not only is IS6110-5'3'FP highly discriminatory, it is likely to be epidemiologically more informative, a hallmark which further reinforces its use in population-based studies.
IS6110-5'3'FP protocol was considerably improved in terms of technical flexibility and work flow, offering several advantages over existing gold-standard methods. Firstly, the overall turnaround time of the technique was considerably reduced to 4 hours, as we omitted the amplification step of the ligation product in E. coli (Fig 1). In doing so, not only did we significantly reduce the turnaround time, but we also improved the ability of the method to detect increased number of polymorphic fragments. For more flexibility, the plasmid ligation step can be readily achieved using ready-to-use blunt-end cloning kits. In comparison to other ligation-mediated PCR approaches, cloning in a pre-linearized plasmid vector is much more simple and efficient than ligation to adapters or linkers [29][30][31].
Unlike 24-loci MIRU-VNTR, which necessitates several PCR reactions and optimizations (for particular loci), IS6110-5'3'FP relies on a single PCR reaction, a characteristic which makes it more amenable to high-throughput analyses.
The use of a capillary sequencer to fractionate fluorescently labeled 5' and 3' polymorphic fragments confers to IS6110-5'3'FP the same potential for data portability as 24-loci Automated, highly discriminatory, IS6110-based typing of M. tuberculosis MIRU-VNTR, yet, sizing is likely to be much more accurate for the former approach. Indeed, the small size of BstUI-generated polymorphic fragments (<500 bp), does not suffer sizing inaccuracies such as those encountered with long PCR fragments (>750 bp) obtained with some MIRU-VNTR loci [32][33][34][35]. Furthermore, small-sized DNA fragments are much less vulnerable to run-to-run sizing variation in capillary electrophoresis systems. In this respect, it is worthy of mentioning that IS6110-5'3'FP proved highly reproducible as the run-to-run Automated, highly discriminatory, IS6110-based typing of M. tuberculosis standard deviation of fragment size estimates did not exceed 0.67 bp (S2 Table). Since this reproducibility assay was performed using IS6110-5'3'FP products of six independent reactions, each with a new DNA preparation, we anticipate for a good inter-laboratory reproducibility of IS6110-5'3'FP. Strikingly, IS6110-5'3'FP proved much more cost-effective than 24-loci MIRU-VNTR (12,56 vs 53,6 USD per strain) ( Table 3) [35]. This low cost compared to 24-loci MIRU-VNTR stems mainly in the fact that IS6110-5'3'FP relies on a single PCR reaction. At best, 24-loci MIRU-VNTR can be performed in eight triplex PCR reactions, but still remains costly, particularly when commercially available kits are used to ensure reliable PCR amplifications. Furthermore, while IS6110-5'3'FP can genotype as much as 95 M. tuberculosis clinical strains in a single capillary electrophoresis run (plus one auto-ligation control of a 96-well microplate), 24-loci-MIRU VNTR could process only 4 (24 simplex PCR reactions) to 12 strains (8 triplex PCR reactions). With regard to WGS, and considering the current costs of~60 USD per genome (for a batch of at least 300 genomes) (Table 3), IS6110-5'3'FP is clearly much less expensive. Yet, despite the continuous declining costs of WGS, the use of IS6110-5'3'FP as an automatized solution for a TB integrated molecular surveillance could still be justified, since analysis of fluorescent fragment length is much easier and much less demanding in terms of bioinformatics skills and IT infrastructure.

Conclusions
Numerous previous studies have highlighted the benefit of complementing the classical contact tracing strategy with molecular typing data to get a clear picture of TB transmission and, hence, effective surveillance of outbreaks, particularly those involving MDR-TB strains. One of the most limiting factors to achieve such a goal consists in the absence of a typing method that reliably estimate TB transmission, while offering sufficient technical flexibility to be routinely used in large-scale population-based studies. Aside from being highly cost-effective, the automated IS6110-based typing protocol developed herein showed an unprecedented discriminatory power and versatility, and could thus represent a step forward for the effective implementation of a TB integrated molecular surveillance. Indeed, the new IS6110-5'3'FP protocol, which is executable in a few hours, could be fully automated and could even gain more in terms of flexibility with the use of technically simple and cost-effective new capillary electrophoresis technologies, such as QiaXcell, Bioanalyzer, etc. Importantly, given the small size of IS6110-5'3'FP-generated polymorphic fragments, the method does not suffer sizing inaccuracies as reported for some MIRU-VNTR loci. With the optimized protocol described in this study, IS6110-5'3'FP proved robust enough to undergo performance and inter-laboratory reproducibility assessment.