Optimizing and accelerating the assignation of lineages in Mycobacterium tuberculosis using novel alternative single-tube assays

The assignation of lineages in Mycobacterium tuberculosis (MTB) provides valuable information for evolutionary and phylogeographic studies and makes for more accurate knowledge of the distribution of this pathogen worldwide. Differences in virulence have also been found for certain lineages. MTB isolates were initially assigned to lineages based on data obtained from genotyping techniques, such as spoligotyping or MIRU-VNTR analysis, some of which are more suitable for molecular epidemiology studies. However, since these methods are subject to a certain degree of homoplasy, other criteria have been chosen to assign lineages. These are based on targeting robust and specific SNPs for each lineage. Here, we propose two newly designed multiplex targeting methods—both of which are single-tube tests—to optimize the assignation of the six main lineages in MTB. The first method is based on ASO-PCR and offers an inexpensive and easy-to-implement assay for laboratories with limited resources. The other, which is based on SNaPshot, enables more refined standardized assignation of lineages for laboratories with better resources. Both methods performed well when assigning lineages from cultured isolates from a control panel, a test panel, and a problem panel from an unrelated population, Mexico, which included isolates in which standard genotyping was not able to classify lineages. Both tests were also able to assign lineages from stored isolates, without the need for subculture or purification of DNA, and even directly from clinical specimens with a medium-high bacilli burden. Our assays could broaden the contexts where information on lineages can be acquired, thus enabling us to quickly update data from retrospective collections and to merge data with those obtained at the time of diagnosis of a new TB case.


Introduction
Genotyping and genomic analysis have enabled us to differentiate between Mycobacterium tuberculosis (MTB) strains with varying degrees of discrimination.
The magnitude of discrimination differs according to the objective of the analysis and should be greater when attempting to accurately define recent transmission clusters. In this context, population-based molecular epidemiology was initially based on IS6110 RFLP analysis [1], which was later substituted by MIRU-VNTR analysis [2]. The recent expansion of whole genome sequencing (WGS) means that recent transmission clusters are now being defined using genomic epidemiology, which is based on the quantification of the number of differential single-nucleotide polymorphisms (SNPs) between circulating strains [3].
However, this shift from population-based epidemiology to global epidemiology or phylogeographic analysis requires discriminatory power to be lower, since the objective of the approach is to identify major lineages families instead of clones/strains. In this context, other less discriminatory genotyping strategies, such as spoligotyping, have been extensively used [4] and various platforms and global databases compile extensive collections of isolates with the associated lineage or enable automatic assignation of lineages based on spoligotype or VNTR patterns [5][6][7][8].
Current genotyping approaches tend towards convergent evolution, which could lead to incorrect assignation of lineages [9,10]. On the other hand, using SNPs to classify lineages has proven to be the most robust approach, because SNPs show a low degree of homoplasy [10]. The classification of MTB complex into lineages based on either long sequence polymorphisms or SNPs has been modified over time [11]. However, the approaches used have led to the classification of MTB and M. africanum into six lineages, which have recently grown with the recent description of lineage 7 [12,13]. The distribution of lineages varies throughout the world: some, such as lineage 4, are widely distributed, whereas others are restricted to a specific geographic area (eg, Ethiopia for lineage 7). In addition, differences in virulence associated with diminished immune responses, more severe disease, and enhanced transmission have been described for certain lineages [12]. Therefore, optimal classification of lineages will help us to understand the mechanisms of host-pathogen interactions in TB.
The classification of lineages 1-6 can be supported using a set of six robust markers, which have proven to be lineage-specific in a wide sample of WGS data representative of the global diversity of MTB complex [14]. This set paves the way for the design of novel assays to optimize lineage assignation in MTB complex. Our objective was to design novel assays that could i) provide a simplified response (simultaneous interrogation for all SNPs in single-tube tests), ii) adapt to settings with different resources, and iii) be suitable not only for standard application on cultured isolates, but also for fast direct analysis on retrospective collections or on clinical specimens. We propose to bring phylogenetic analysis closer to the time of diagnosis and to enable classification of lineages in low-resource settings where MTB is not routinely cultured.

Strain selection/sputum samples
The control panel included six strains, each of which was representative of lineages 1-6. The strains corresponded to clinical isolates whose lineage had been assigned by identification of the corresponding marker lineage SNPs [14]  . We examined as wide a distribution of nationalities as possible to increase the variability of the lineages. The problem panel included purified DNA from a collection of 400 isolates (January 2014-February 2016) from Veracruz, Mexico, which were obtained within a molecular epidemiology study (manuscript in preparation). Veracruz has a population of 7 million inhabitants, and around 10% of all cases of TB in Mexico are diagnosed in this state. We randomly selected 60 isolates from the collection. (27 in which lineage assignation based on spoligotyping [4] and MIRU-VNTR 24 -loci [15] was not possible and 33 isolates where the lineage was assigned successfully).
We also selected 19 stain-positive respiratory specimens for analysis. Our selection was taken from the remnants for the decontaminated sputa of the specimens received in the diagnostic laboratory that had been both stain-and culture-positive for MTB.

Lineage assignation
Lineages were preassigned based on Spoligotype and/or MIRU-VNTR data following the instructions from TBminer platform, which provides the assignations according to SITVIT-WEB, MIRU-VNTRPlus, Borile_AffinityPropagation and TBlineage classifications, and offers a final consensus assignation [8].
In order to assign lineages based on lineage marker SNPs, we used the six SNPs reported by Stucki et al ( [14], Table 1) as targets in our assays. These SNPs map in essential genes and are synonymous. Consequently, their stability was maximized, since they are highly unlikely to be involved in selective pressure processes.

Allele-specific oligonucleotide multiplex PCR (ASO-PCR) assay
The final conditions for the ASO-PCR assay (multiplex format) were as follows: 1X PCR Gold buffer (Applied Biosystems [AB], Foster City, CA, USA); 1.8 mM MgCl 2 ; 0.2 μM primers each for lineage 2, lineage 3, lineage 4, and lineage 6; and 0.3 μM primers each for lineage 1 and lineage 5 ( Table 2); 200 μM dNTPs; 0.5 μl DMSO; and 0.4 μl AmpliTaq Gold (Applied Biosystems, Foster City, California, USA) in a final reaction volume of 50 μl. The PCR was run as follows: 95˚C for 10 minutes followed by 29 cycles (95˚C for 1 minute, 62˚C for 1 minute, and 72˚C for 1 minute, and a final extension step at 72˚C for 10 minutes. The GeneAmp PCR system 9700 thermal cycler was used. Amplification products were sized using standard agarose electrophoresis.

Multiplex SNaPshot assay
The assay consisted of a multiplex PCR to obtain the six amplicons, including the region where each lineage marker SNP mapped (Table 3). Each amplicon was subsequently interrogated by a selective extension reaction from a primer whose 3 0 end mapped only one nucleotide before the coordinate for the marker SNP (Tables 4 and 1). Since the marker SNPs are biallelic, one of each two alternative dideoxynucleotide terminators is incorporated in the extension reaction. Each alternative ddNTP is labelled with a different fluorophore, which makes it possible to indirectly identify the allele present in the target, depending on the fluorophore in the extension reaction. The multiplex PCR was performed in a final volume of 50 μl constituted by 1X PCR Gold buffer (Applied Biosystems [AB], Foster City, CA, USA), 2 mM MgCl 2 , 200 μM dNTPs,   (Table 4). Selective extension was performed by 25 cycles of: denaturation at 96˚C for 10 seconds, annealing at 57˚C for 5 seconds, and extension at 60˚C for 30 seconds. Unincorporated ddNTPs were removed by addition of 1 U of alkaline phosphatase (USB Shrimp Alkaline Phosphatase) to the mixtures and incubation at 37˚C for 1 hour.
Ten microliters of the final product was sized and analyzed by capillary electrophoresis in a 3130xl Genetic Analyzer (polymer POP-7, 1200 standard, LIZ SS; Applied Biosystems, Foster City, CA, USA). Alleles were called using GeneMapper, version 4.0.

Sanger sequencing
Lineages classified by applying the tests in the evaluation were finally confirmed by identification of the corresponding SNPs using Sanger sequencing. Confirmation was not possible with the analysis performed directly on sputa owing to material restrictions and, therefore we confirmed the lineage on the corresponding cultured isolates.

Optimization and evaluation of the ASO-PCR and SNaPshot assays on control strains
The first step in the optimization process consisted of ensuring the following: i) correct amplification for the six regions targeted in the multiplex reactions; ii) appropriate size distribution for the amplicons/products; and iii) accurate discrimination of results between the six lineages interrogated. We went through these steps using a control panel including a clinical isolate representative of each of the six lineages analyzed (S1 Table). After a performing a pre-evaluation analysis involving screening for the best Mg 2+ and primer concentration and running a gradient of annealing temperatures, the optimal analysis conditions for the multiplex ASO-PCR and SNaPshot assays were set (see Methods).
For ASO-PCR, the intensity and sizing of the amplification products enabled them to be detected and separated appropriately using standard electrophoresis. The selective performance of the test was optimal: whenever the corresponding lineage-marker SNP interfered with a single PCR (3 0 ends of the selective primers were designed to coincide with the coordinate where the SNP mapped), the corresponding amplification was completely impaired, thus leading to five amplicon patterns, with a single amplicon missing from each of the lineages in the test panel (Fig 1).
In the case of SNaPshot, once optimized the multiplex PCR (see Methods) that rendered the six amplicons including each of the six SNPs to be interrogated, we designed progressively extended primers (using polyA tails, Table 4) for the selective extension reactions. The length of each primer indicated the lineage-specific SNP that we were interrogating (Table 4), whereas the fluorophore/color of the nucleotide included in each extension reaction allowed us to determine the lineage (Fig 2).

Evaluation on a test panel of cultured isolates
The next step was to evaluate both assays on a test blind panel including purified DNA from a collection of 40 cultured isolates. In 37 cases, the lineage had been previously assigned from the spoligotype and/or MIRU-VNTR data (S1 Table) (two to lineage 1, six to lineage 2, two to lineage 3, seventeen to lineage 4, nine to lineage 5, one to lineage 6, and three with no consensus). ASO-PCR enabled the correct assignation of lineages of all 37 isolates preasigned based on standard genotyping and SNaPshot in 36 (S1 Table). Both assays were able to assign lineage in the 3 isolates without a previous assignation. In all cases, the assignations were confirmed by identification of the lineage marker SNPs using DNA sequencing.

Performance of the assays on a problem panel
Once the two assays had been evaluated on test panels, we applied them to another sample from an unrelated population. We analyzed a collection of 60 isolates obtained in Mexico (S1 Table), in which a first attempt to assign lineages based on the combined analysis of their spoligotypes and 24-loci MIRU patterns had failed in 27 cases (45%). The remaining 33 isolates had been correctly assigned. The lineage was correctly assigned by both assays in 29 of the 33 isolates with a preassigned lineage (27 isolates to lineage 4, one to lineage 1 and one to lineage 2). In the remaining four isolates (preassigned as lineage 4), the presence of inaccurate patterns precluded assignation by one or both assays (one of them was assigned only by SNaPshot; S1 Table). 23 of the 27 isolates without a pre-assigned lineage were now successfully assigned by either ASO-PCR and SNaPshot (all isolates to lineage 4 but one to lineage 2). Four isolates remained unassigned after application of both tests owing to unspecific bands in the ASO-PCR pattern and lack of extension products in SNaPshot.

Fig 2. Patterns obtained in SNaPshot with the control panel.
Each peak corresponds to an extended product in the following order from left to right (lineages 6, 1, 3, 5, 4 and 2). The arrow indicates the peak interrogated for each of the lineages. For each peak, two different labelled ddNTPs can be included alternatively in the extension reaction depending on the allele present in the amplicon. Depending on the absence/presence of the marker SNP for each corresponding peak, the two alternative colors expected for each lineage, are as follows: Lineage 1: blue/green; lineage 2: black/red; lineage 3: green/blue; lineage 4: red/blue; lineage 5: blue/black; and lineage 6: blue/green. Based on these features, the colour pattern (from left to right) expected for each lineage would be: Lineage 1: blue, green, green, blue, red, black, Lineage 2: blue, blue, green, blue, red, red, Lineage 3: blue, blue, blue, blue, red, black, Lineage 4: blue, blue, green, blue, blue, black, Lineage 5: blue, blue, green, black, red, black and Lineage 6: green, blue, green, blue, red, black. https://doi.org/10.1371/journal.pone.0186956.g002

Evaluation using crude extracts
To evaluate whether the tests could be applied directly on stored MTB isolates for fast analysis of retrospective collections, we checked whether lineages could be assigned directly on the supernatant from boiled frozen stored isolates, that is, without subculture or DNA purification. Assignation was correct for a representative of each lineage when the test was applied directly (Fig 3A and 3B). Direct analysis on respiratory specimens The possibility of assigning a lineage at diagnosis of a new case, without waiting until culture is available, was evaluated by directly applying the assays on a selection of 19 stain-positive respiratory specimens. ASO-PCR enabled us to assign the lineage from 12 out of 19 sputa samples (Fig 4A) (one to lineage 1 and 11 to lineage 4), most of which had a high and medium bacterial load (3+ and 2+) and one a low bacterial load (1+). SNaPshot assigned the lineage in 14 sputa samples ( Fig  4B) (those assigned by ASO-PCR and, in addition, one lineage 2 in a 3+ specimen and another lineage 4 in a 1+ specimen). The isolates cultured from 9 of the 14 sputa with an assignation were available and the direct assignation on sputa was consistent with the lineage assignation based on MIRU-VNTR data from the corresponding isolates.

Detecting mixed infections
The last feature evaluated was whether the assays could identify mixed infections involving two different lineages. We applied the assays to laboratory mixes (50:50) comprising a lineage 4 isolate (the most prevalent in our context) and each of the other five remaining lineages under analysis. In contrast to the five-amplicon pattern expected when assaying a single lineage, ASO-PCR showed all six amplicons because of the overlap between the patterns expected for the two lineages involved (Fig 5A). Therefore, the observation of this singular six-band amplification pattern alerted us to the presence of a mixed infection with more than one lineage. However, given the design of our ASO-PCR assay, we were unable to determine the two specific lineages involved in the mixed infections. On the contrary, the rationale underlying the SNaPshot design enabled us to resolve the limitations of ASO-PCR. The same 50:50 mixes were then analyzed by SNaPshot, which succeeded in identifying not only mixed infections, but also the specific lineages according to the color patterns of the fluorophores included in the extension reactions (Fig 5B).

Discussion
The degree of discrimination to be applied when identifying a microorganism depends on the aim of the study. In MTB, the microorganism is characterized to species level in the diagnostic setting, even though this is the lowest degree of discrimination. At the other extreme, molecular epidemiology applied to track transmission clusters at the population level demands maximum discriminatory power.
Between these degrees of characterization, which enable diagnosis to be confirmed or transmission to be assessed, we find intermediate levels of discrimination that can be applied in specific circumstances. Such is the case of lineage assignation, which is useful for phylogenetic analysis and surveying the global distribution of MTB at the macro-population level. The consequences of genome diversity in MTB for virulence have been reported [17], with some lineages, such as lineage 2 and lineage 4, being more virulent than other, more geographically restricted lineages [12]. Therefore, the development of methods to optimize lineage classification in MTB would improve the analysis of host-pathogen interactions.
The most robust approach for assigning lineages is that based on identification of lineage marker SNPs, which can be identified directly using methods such as standard direct Sanger sequencing and even more advanced approaches such as WGS. However, adapting laboratory tests to indirectly assess the presence of this set of marker SNPs is more convenient. The various tests and algorithms that have been developed to optimize lineage classification target some of the SNPs previously used for phylogenetic purposes, other genetic elements [18], or a newly proposed set of SNPs [19]. To our knowledge, only one attempt has been made to develop tests targeting the current set of six robust SNPs [14]. The authors, who originally proposed the set of markers, designed a refined single-tube analysis. However, this approach was based on a Luminex device, which is still uncommon in the microbiology laboratory. The authors also designed a second test based on PCR that is well adapted to more widely available platforms (real-time PCR) it was based on singleplex assays, thus creating the need for six independent reactions to classify the lineage.
Current limitations to finding a wider variety of tests to classify lineages led us to attempt a new approach. We defined three strict requirements for developing two new tests. First, both had to have a single-tube reaction design to ensure simplicity and reduce costs. Second, each had to be adapted to fit laboratories with different resources. The multiplex ASO-PCR-based test is inexpensive and easy to implement, with no need for complex equipment, and is especially suitable for laboratories with basic resources. The SNaPshot analysis-based approach offers more accurate and semiautomatic assignation, albeit at a higher cost and requiring capillary electrophoresis. Finally, the new tests had to be able to respond to demands that were broader than the standard classification of cultured isolates, that is, for fast direct analysis of retrospective isolate collections without the need for subculture or direct application on clinical respiratory specimens.
The performance of the two assays was equivalent with cultured isolates, and both were able to correctly assign 100% of cases, thus ensuring their usefulness when applied systematically to prospectively characterize the lineages circulating in a population. In fact, we evaluated the tests on a challenging sample from Mexico in which a high percentage of cases could not be assigned by combining spoligotyping and MIRU-VNTR typing. The tests succeeded in assigning a lineage in most of those that were previously unassigned. The alternative standard approach to assign lineages based on MIRU-VNTR data would have required greater experimental efforts (i.e. 24 PCRs to obtain the MIRU-VNTR pattern) to have the assignation for all the patterns. We also proved that the assays could quickly assign lineages by direct analysis on the stored isolates, without the need for subculture or purifying DNA. We believe that the assays would provide a rapid and inexpensive means of obtaining data on the distribution of lineages in settings where this information is not updated.
Genotyping for diagnosis of MTB infection is undergoing a transformation. Culture to identify the species and subsequent subculture to determine the susceptibility pattern are being replaced by commercial molecular tests that make it possible to identify MTB and assess the presence of the most common resistance mutations directly on clinical specimens. In short, detailed characterization of the strain is possible when TB is diagnosed, that is, when acid-fast bacilli are observed in microscopy.
Following the same approach to in-depth characterization of the strain at diagnosis, we evaluated the possibility of assigning the lineage directly on respiratory specimens. Such an approach would prove useful when an unknown lineage has been identified as recently imported in a population and it is crucial to fast-track the emergence of secondary cases caused by this lineage. Direct assignation is frequently requested when a high-risk strain, such as those belonging to the Beijing sublineage (within the, East-Asian, lineage 2), which is frequently involved in severe outbreaks [20,21], is imported into a population. The two assays allowed us to assign lineages directly on clinical specimens, although they were restricted to specimens with medium-high bacterial content.
As a collateral analysis, we aim to evaluate the usefulness of our tests when mixed infections are suspected. Given the feature targeted-lineages-we are limited by the fact that the tests will be only be able to detect mixed infections when different lineages are involved. Both tests proved able to efficiently identify mixed infections from mixtures artificially generated in the laboratory. Given the specific design of our ASO-PCR, only SNaPshot would also be able to determine the two specific lineages involved in each mixed infection. This ability could be considered useful only in very unusual circumstances. However, in settings with a high disease burden, both mixed infections and those involving different lineages are common [22,23]. Our assays would add value in these settings.
In summary, we developed two tests to optimize the assignation of lineages in MTB. Each test is adapted to a different laboratory resource profile, and both tests increase the possibility of updating the composition of lineages in a population and performing fast prospective assignation, even directly on clinical specimens, thus expanding the information we can currently obtain when a new TB case is diagnosed.