Development of a Hierarchical Variable-Number Tandem Repeat Typing Scheme for Mycobacterium tuberculosis in China

Molecular typing based on variable-number tandem repeats (VNTR) analysis is a promising tool for identifying transmission of Mycobacterium tuberculosis. However, the currently proposed 15- and 24-locus VNTR sets (VNTR-15/24) only have limited resolution and contain too many loci for large-scale typing in high burden countries. To develop an optimal typing scheme in China, we evaluated the resolution and robustness of 25 VNTR loci, using population-based collections of 1362 clinical isolates from six provinces across the country. The resolution of most loci showed considerable variations among regions. By calculating the average resolution of all possible combinations of 20 robust loci, we identified an optimal locus set with a minimum of 9 loci (VNTR-9) that could achieve comparable resolution of the standard VNTR-15. The VNTR-9 had consistently high resolutions in all six regions, and it was highly concordant with VNTR-15 for defining both clustered and unique genotypes. Furthermore, VNTR-9 was phylogenetically informative for classifying lineages/sublineages of M. tuberculosis. Three hypervariable loci (HV-3), VNTR 3232, VNTR 3820 and VNTR 4120, were proved important for further differentiating unrelated clustered strains based on VNTR-9. We propose the optimized VNTR-9 as first-line method and the HV-3 as second-line method for molecular typing of M. tuberculosis in China and surrounding countries. The development of hierarchical VNTR typing methods that can achieve high resolution with a small number of loci could be suitable for molecular epidemiology study in other high burden countries.


Introduction
Tuberculosis (TB) remains a serious global public health issue, especially in the developing world. Over 95% cases and deaths caused by TB were in developing countries [1]. China has been suffering the second largest TB burden worldwide. In 2011, there were about one million new cases in China, which accounted for 12% of the global epidemic [1]. The TB epidemic in China was exacerbated by the dominance of the notorious M. tuberculosis Beijing strains [2], as well as the prevalence of multidrug resistant (MDR) cases [3]. Both the Beijing strains and MDR strains in China were found associate with ongoing transmissions [3,4]. Therefore, there is an urgent need for reliable genotyping tools to identify and prevent transmissions of M. tuberculosis.
Among the genotyping tools, the PCR-based variable-number tandem repeat (VNTR) analysis represented a promising method for typing M. tuberculosis [5,6]. The newly proposed 15-or 24-locus VNTR typing sets (VNTR-15/24) have demonstrated adequate discriminatory power for tracing transmissions in low-burden areas [7][8][9][10]. However, their usefulness in high-burden settings was questioned, especially in settings such as China, where Beijing strains were prevalent [11][12][13]. M. tuberculosis Beijing strains are genetically highly similar, which leads to limited discriminatory power of VNTR-15/24 in settings dominated by these strains [13][14][15]. Several locally optimized VNTR schemes, which include the hypervariable loci such as VNTR 3232, VNTR 3820 or VNTR 4120, were suggested for typing Beijing strains [15][16][17][18]. However, hypervariable loci have defects, such as amplification failures and uninterpretable large amplicons [5,14]. Therefore, these schemes are less applicable as standard typing methods. Using hypervariable loci as second-line method to subtype clustered strains following the VNTR-15/24 represents a more appropriate solution [13,14]. Recently, a consensus set of four hypervariable loci was proposed for second-line typing of Beijing strains following the VNTR-24 [19]. However, the VNTR-15/24 contains too many loci to be extensively applied in China, which has limited resources but a huge TB burden. Furthermore, a number of loci from VNTR-15/24 have been demonstrated poor resolutions in Beijing strains [12,15,18,20].
In this study, we evaluated the discriminatory power and robustness of 25 VNTR loci using population-based collections of M. tuberculosis isolates from six provinces in China. Our aim was to develop an optimal VNTR scheme as potential standard for typing M. tuberculosis in China.

Selection of Candidate VNTR Loci
According to the published data in different regions of China and surrounding countries [12,[15][16][17][18][21][22][23][24][25][26][27][28], we summarized the Hunter-Gaston index (HGI) of 37 VNTR loci and calculated the median for each locus (Table S1). According to Sola et al. [29], the discriminatory powers of VNTR loci could be classified as high (HGI.0.6), moderate (0.3#HGI#0.6) and low (HGI,0.3) levels. When this standard was applied to HGI medians of the 37 loci, 19 of them were found with low discriminatory power. To include as many potentially discriminatory loci as possible, we set a threshold of $0.1 for choosing candidate loci in this study. Loci that belong to standard VNTR-15 were all included. At last, 27 VNTR loci were selected for evaluation in this study.

Clinical Isolates and VNTR Typing
Population-based collections of 1375 isolates from six provinces (Guangxi, 176; Sichuan, 216; Shanghai, 396; Shandong, 206; Henan, 197; Heilongjiang, 184) were used to evaluate the candidate VNTR loci. Genomic DNAs of all isolates were extracted with a boiling lysis method, and were typed with a 16locus VNTR set in our previous study [4]. Additional typing of 11 loci was performed in this study using the primers listed in Table  S2. The PCR reactions for 22 loci were performed in a volume of 10 mL containing 16 Taq PCR MasterMix (CoWin Biotech Co. Ltd., Beijing, China), 0.4 mM of each primer, and 1 mL DNA template. The reactions for the remaining 5 loci (VNTR 3232, VNTR 3820, VNTR 4120, VNTR 3336 and QUB-15) were performed in a volume of 20 mL containing 16 GC buffer I (Takara Biotech Co. Ltd., Dalian, China), 200 mM of each dNTP, 0.5 U of Taq (Takara Biotech Co. Ltd.), 0.4 mM of each primer, and 1 mL DNA template. The thermocycling conditions were as follows: 95uC for 5 min, followed by 30 cycles at 94uC for 30 s, 58uC (64uC for locus ETR-F, QUB-1895, QUB-3232, MIRU 40, VNTR 4120) for 30 s, 72uC for 30 s (1.5 min for locus VNTR 3232, VNTR 3820, VNTR 4120 and VNTR 3336), with a final extension at 72uC for 7 min. The size of amplicons were analyzed on 1.0% or 1.5% agarose gels for 1.5 to 2 hours at 150 V with 50 bp Ladder and 100 bp High Ladder size standards (CoWin Biotech Co. Ltd., Beijing, China). Beijing strains and sublineages were identified by RD105 targeted multiplex PCR and the typing of six single nucleotide polymorphism loci in the previous study [4].
A second set of 69 strains that serially isolated from 31 patients were typed with all candidate loci. The comcordance of VNTR alleles of the serial isoaltes from the same patients was used to evaluat the clonal stability of each VNTR locus.

Allelic Variability and Genetic Distance Analysis
Custom Perl script A was written to calculate HGI of individual VNTR locus in different geographical/genetic populations according to the equation derived from elementary probability [30]. Perl script B was written to calculate the HGI for all possible combinations of the candidate VNTR loci. BioNumerics (version 5.0, Applied Maths, Sint-Martens-Latem, Belgium) was used to construct the Minimal Spanning Trees (MSTs) based on VNTR data. The priority rules was set to first link types that had the highest number of single-locus variants (SLVs). Creation of hypothetical types was not allowed. Creation of clonal complexes was defined by setting the maximum number of variations to fewer than two loci, for more than two genotypes. The frequencies of SLVs, double-locus variants (DLVs) and triple-locus (TLVs) for different VNTR loci were calculated according to the MSTs.

Robustness and Variability of VNTR Loci
Among the 1375 isolates, 1362 (99.1%) of them had enough DNA extracts for additional VNTR typing in this study. Two loci, QUB-15 and VNTR 3336, cannot be amplified for most DNA extracts, even different primers and conditions were tried (Table  S2). These two loci were excluded for further analysis. The variability of the remaining 25 loci was evaluated as HGI in different genetic and geographic M. tuberculosis populations. Genetically, the 1362 strains were classified as Beijing and non-Beijing strains, and the Beijing strains were further divided into ''ancient'' and ''modern'' group according to the single nucleotide polymorphism (SNP) in codon 58 of mutT2 (Table S3) [4,13]. The variability for most loci was higher in non-Beijing strains than in Beijing strains. Among Beijing strains, most loci showed higher variability in ''ancient'' strains than in ''modern'' strains. Only 9 of the 25 loci showed high or moderated variability (HGI $0.3) in ''modern'' Beijing strains. By contrast, as many as 15 loci in ''ancient'' Beijing strains and 23 loci in non-Beijing strains had moderate or high variability (Table 1). Geographically, except three hypervariable loci, the variability of other loci showed considerable variations among six regions ( Figure 1). The variability of most loci was highest in Sichuan or Guangxi, and lowest in Henan or Heilongjiang.
The robustness of the candidate loci was evaluated as their typeability (defined as the success rate of amplification) and interpretability (defined as the unambiguity for sizing amplicon through agarose gel electrophoresis). The typeability of the 25 VNTR loci ranged from 90.0 to 100% (Table 1). Locus QUB-11a, ETR A and QUB-11b had the lowest typeability (90.0%, 94.3% and 95.4%). Among the 1362 isolates, 45 (3.3%) of them could not be amplified in all three loci, which account for most of the PCR failures in QUB-11b (45/63) and ETR A (45/77). An addition of 91 isolates showed failures in QUB-11a, which leads to a low typeability (90.0%) in this locus. The typeability of the remaining 22 loci was reliable. However, the interpretability of four loci, including ETR F, VNTR 3232, VNTR 3820 and VNTR 4120 was relatively low. For ETR F, 132 strains (9.7%) showed incomplete repeat in this locus. For the three hypervariable loci, there were 860 (63.1%), 855 (62.8%) and 181 (9.7%) strains in VNTR 3232, VNTR 3820 and VNTR 4120, respectively, showed large alleles (with a repeat number larger than 10), which cannot be or hard to be sized unambiguously (Table S4).
At last, we excluded ETR F for further analysis due to the incomplete repeat alleles, as well as its low variability in Beijing strains. Due to the low typeability or low interpretability, locus QUB-11a and three other hypervariable loci were not suitable for firs-line typing. We maintained these four loci as potential candidates for second-line typing by considering their high variability. The remaining 20 loci (VNTR-20) with reliable typeability and interpretability were chosen as candidates for first-line typing.

Relative Evolution Rate of VNTR Loci
VNTR loci with high variability (as represented by HGI) do not necessarily have high evolutionary rate, as HGI might reflect separate allelic distribution between distinct lineages but not necessarily high variability within lineages [5,30] 20 first-line candidate loci. To evaluate the relative evolutionary rate of the 20 loci, we constructed the MSTs for strains of each region based on these loci (data not show). The relative evolutionary rate of each locus was calculated as its involvement in SLVs, DLVs and TLVs according to the MSTs [5]. Comparing to HGI, the consistency of relative evolutionary rates of the 20 loci among six regions were relatively high, indicating similar evolutionary rates ( Figure 2). Generally, there was a decreasing trend of evolutionary rate as the HGI decreased. However, several loci, such as MIRU 10, MIRU 39 and Mtub30, showed relatively low evolution rate with respected to their HGIs. To explain the discordance, we analyzed the allelic distribution of the 20 VNTR loci in non-Beijing and Beijing strains (Table S4). Most of the loci showed different allelic patterns between two genetic subpopulations. Therefore, the diversity of these loci in the overall population was partially contributed by variability between subpopulations. Several loci, such as MIRU 39 and Mtub30, showed relatively high variability between subpopulations, but with very low variability within subpopulations, indicating low evolutionary rates.

Optimization of Locus Set for First-line Typing
Usually, VNTR loci with the highest individual HGIs were selected as the optimal typing set [16][17][18]. However, such procedure ignored the evolutionary rate, as well as the discriminatory redundancy between loci. To avoid these defects, we calculated the discriminatory powers for all possible combinations of VNTR loci in each region. The 1167 strains with unambiguous typing results at all 20 first-line candidate loci were used for the calculation. As we expected, in most cases, the combination with the highest HGI at a given number of locus did not include all loci that had the highest individual HGIs (Table 2).
VNTR loci that could combine to achieve comparable resolution of IS6110 restriction fragment length polymorphism (RFLP) were usually selected as final typing sets [16,18]. However, such VNTR schemes usually contain large number of loci, which made them less practical, especially in developing settings. According to previous studies, the discriminatory power of standard VNTR-15 was sufficiently high for first-line typing [13,14]. Therefore, the average HGI of VNTR-15 in six regions was set as the threshold for choosing the optimal combination in this study. A minimum of nine loci was found could achieve the discriminatory power of VNTR-15. Among all 167,960 combinations of nine loci, eight of them had a HGI equal to or slightly higher than VNTR-15. The combination, with locus QUB-11b, QUB-18, Mtub21, MIRU 26, QUB-26, Mtub04, MIRU 31, VNTR 2372 and MIRU 40, had the highest HGI, and its discriminatory powers in non-Beijing strains and Beijing strains were both slightly higher than VNTR-15. Consistently, these nine loci had the highest relative evolution rates among the 20 candidates (Figure 2), and eight of them also among the top nine loci in terms of variability. This 9-locus set (VNTR-9) also consistently showed high resolutions among all six regions (Table 3). Comparing with VNTR-15, VNTR-9 had slightly higher discriminatory powers in Guangxi, Heilongjiang and Henan, and equal or slightly lower discriminatory powers in Sichuan, Shandong and Shanghai.

Concordance between VNTR-9 and VNTR-15
The standard VNTR-15/24 has been proved reliable and effective for tracing transmissions of M. tuberculosis through defining genotypic clusters and unique strains [31]. To evaluate the reliability of VNTR-9 for identifying transmissions, we calculated its concordances with VNTR-15 for defining both clustered and unique strains, which defined as the the prototion of isolates having clustered (or unique) VNTR-15 genotypes that also had clusted (or unique) VNTR-9 genotypes. According to VNTR-15, 466 of the1167 strains were grouped into 156 clusters, among which 34 clusters were subtyped into singletons and/or smaller clusters by two extra loci (QUB-18 and VNTR 2372) of VNTR-9. At last, VNTR-9 classified 409 of the 466 strains into 148 clusters, with a concordance of 87.8% (409/466) with VNTR-15. The remaining 57 strains were subtyped as singletons by QUB-18 and VNTR 2372, and all of them showed additional variations in three hypervariable loci (VNTR 3232, VNTR 3820 and VNTR 4120). There were 701 unique strains based on VNTR-15, and 629 of them were also classified as singletons by VNTR-9, with a concordance of 89.7% (629/701). The remaining 72 strains were assigned to 38 clusters according to VNTR-9, and 67 of them could be further differentiated as singletons by three hypervariable loci.
We further evaluated the usefulness of VNTR-9 to predict M. tuberculosis lineages/sublineages. We constructed MSTs of all 1167 strains based on VNTR-20, VNTR-15 and VNTR-9 respectively. The three MSTs showed similar topologies (Figure 3). Similar to VNTR-15 and VNTR-20, the MST based on VNTR-9 successfully differentiated the four major complexes that represent Beijing strains and three sublineages of non-Beijing strains.

Optimization of Hypervariable Locus Set for Second-line Typing
To select VNTR loci for second-line typing, we evaluated the variations of QUB-11a and three other hypervariable loci (VNTR 3232, VNTR 3820 and VNTR 4120) in clustered strains defined by VNTR-9. Among the 488 clustered stains, 441 of them had unambiguous typing results in these loci. Considerable variations within each cluster were detected in these loci, and most of them belonged to multi-locus variations (Table S5). A number of SLVs were found in three hypervariable loci (28 in VNTR 3232, 19 in VNTR 4120 and 20 in VNTR 3820), and these loci classified the 441 strains into 106 clusters (238 strains) and 103 singletons. In contrast, only two SLVs were detected in QUB-11a, and the addition of QUB-11a to the three other loci could only further differentiate two additional clusters, indicating high level of discriminatory redundancy. Since QUB-11a also associated with serious amplification problems, we excluded this locus and kept other three hypervariable loci (HV-3) as a final set for second-line typing.
The usefulness of HV-3 was further evaluated in cross-regional clusters of ''modern'' Beijing strains. The ''modern'' Beijing strains were prevalent in all field sites and accounted for 766 of all 1362 strains. A total of 678 ''modern'' Beijing strains had unambiguous typing results and we constructed their MST based on 17 VNTR loci that include all loci of VNTR-15 and two additional loci (VNTR 2372 and QUB-18) of VNTR-9. The MST represented a star-like network, indicating ''modern'' Beijing strains from different regions are highly homogenous ( Figure 4A). The MST was characterized by two major genotypes (genotype A and B) in the center, which were surrounded by other minor genotypes. A number of 39 cross-regional clusters (241 strains) that contain strains from at least two different field sites were identified, and 29 of them belong to the major genotype A, B and their single locus variants ( Figure 4B). Given the large geographical distance and low population motilities between field sites, these clusters less likely indicate cross-regional transmissions. According to HV-3, The remaining 94 strains were further classified into 40 clusters, and 29 of them only contained strains from single region. As an example, the 49 strains of cluster A and the 23 strains of cluster B were subtyped into 37 unique genotypes and 13 smaller clusters, among which eight clusters only contained strains from single region ( Figure 4C).

Clonal Stability of VNTR Loci
The clonal stability of the individual 25 loci was evaluated using 69 serial isolates obtained from 31 patients. The time spans between the initial and follow-up isolates from different patients varied from one month to 24 months (Table S6). Single locus variations were observed in two patients. Simultaneous occurrence of double alleles in VNTR 3232 was detected in the first isolates of patient 2 (with an allele of 17 or 19) or in the second isolates of patient 17 (with an allele of 9 or 11), which indicate evolution and coexistence of clonal variant strains within each patient.
Among the loci of VNTR-9, locus QUB-18 and VNTR 2372 were rarely evaluated in previous typing schemes. In this study, the HGIs of QUB-18 (0.691) and VNTR 2372 (0.496) were lower than the hypervariable loci and some loci of VNTR-15. Furthermore, the relative evolutionary rate of both loci were not extraordinarily high comparing with VNTR-15, indicating reliable clonal stabilities. We further tested the stability of QUB-18 and VNTR 2372 in 119 clusters (409 strains) defined by VNTR-15 and HV-3, which contained genetically closely related strains [13,14). No variation of these two loci was detected in all these clusters (data not show).

Discussion
The large TB burden and limited resource in developing countries call for applicable and highly discriminatory genotyping methods for molecular epidemiology studies. Here, by systematically evaluating 25 VNTR loci in six regions across China, we proposed an optimized 9-locus VNTR set (VNTR-9), combining with three hypervariable loci (HV-3), as the potential standard for nationwide genotyping of M. tuberculosis in China.
The discriminatory powers of VNTR loci, especially those of the standard VNTR-15/24, have been evaluated in many regions of China [12,16,18,[21][22][23][24][25]27,28]. Several optimized VNTR sets that contained loci with highest discriminatory powers in local regions have been proposed [16,18,27,28]. These VNTR sets may be good for typing M. tuberculosis strains in a specific region, but their discriminatory power in other regions of China cannot be guaranteed. Furthermore, most previous studies evaluated the discriminatory powers of VNTR loci using hospital-based collection of strains [12,22,24,27,28], which is less representative in choosing loci for population-based molecular epidemiological studies. In this study, we systematically evaluated the discriminatory powers of 25 VNTR loci using population-based collections of M. tuberculosis strains from six field sites that cover different regions across China [4]. We found the variability for most VNTR loci varied considerably among the six sites, which could be explained by the genetic differences of M. tuberculosis between regions. The Table 2. The optimal VNTR combinations at different locus number and their average discriminatory powers in all strains or genetic subpopulations of M. tuberculosis form six field sites.  Table 1; the combination in bold indicates the optimal 9-locus VNTR set. doi:10.1371/journal.pone.0089726.t002 non-Beijing strains in China are heterogeneous, which could be classified into three distinct complexes based on their VNTR profiles ( Figure 3). By contrast, the VNTR profiles of Beijing strains are highly similar and form a single large clonal complex, which, in turn, explain the high variability of VNTR loci in regions where non-Beijing strains are prevalent (Sichuan and Guangxi) and the relatively low variability in regions dominated by Beijing strains (Heilongjiang and Henan) (Table S3).
In this study, we applied a strategy to identify the optimal typing set through calculating the discriminatory power of all possible combinations. We found a minimum of nine loci could achieve the discriminatory power of standard VNTR-15. The optimal combination contains eight but not all of the nine loci with the highest HGIs. Traditionally, VNTR loci with the highest individual HGI were treated as the optimal combination. In this manner, we found a minimum of ten loci was needed to achieve comparable resolution of VNTR-15 (data not show), which highlight the limitation of using the traditional method to determine the optimal set in previous studies [15][16][17][18]. Among the loci of VNTR-9, seven of them are in common with VNTR-15, which enables informative comparisons between these two typing sets. On the base of these seven common loci, the addition of locus QUB-18 and VNTR 2372 could achieve comparable discriminatory power with the extra eight loci of VNTR-15. Furthermore, VNTR-9 was proved highly concordant with VNTR-15 to define both clustered and unique strains. Since VNTR-9 is six loci fewer than VNTR-15, it would largely saving the cost for genotyping, making VNTR-9 more applicable in developing countries. For the two extra loci of VNTR-9, locus QUB-18 was reported associate with instable amplification of large alleles in a previous study that included a global sample of M. tuberculosis strains [5]. However, we found the repeat numbers of most alleles of QUB-18 were equal or less than 10 in this study and almost all of them could be stably amplified. Similarly, QUB-18 was found with high typeability and interpretability in Beijing strains in a more recent study [19]. The discordance might thus at least in part be caused by the genetic differences of the M. tuberculosis strains.
Due to the prevalence of M. tuberculosis Beijing family in China, it is necessary to use hypervariable loci to differentiate epidemiologically unrelated strains. Recently, a consensus set of four hypervariable loci (VNTR 3232, VNTR 3820, VNTR 4120 and VNTR 1982) was proposed for second-line typing of Beijing strains following standard VNTR-24 [19]. The loci of HV-3 (VNTR 3232, VNTR 3820 and VNTR 4120) proposed in this study were all contained in the consensus set. The extra locus, VNTR 1982, which is the same loci as QUB-18, was found highly robust in terms of typeability and interpretability both in this and the recent study [19]. Therefore, it is suitable to include QUB-18 for first-line typing in this study. Considering the high variability of QUB-18 in all six areas, the inclusion of this locus would largely reduce the number of loci needed for first-line scheme. The HV-3 was proved great useful for further differentiating epidemiologically unrelated strains in this study. The 441 VNTR-9 based clustered strains were further classified into 103 singletons and 106 clusters by HV-3. Furthermore, the cross-regional clusters of ''modern'' Beijing strains could be mostly differentiated as singletons or smaller clusters that only contained strains from single region. However, several HV-3 based clusters still contained strains from different regions, which may indicate convergent evolutions or recent cross-regional transmissions. We also evaluated the usefulness of another locus, QUB-11a, four second-line typing. We found the inclusion of this locus only marginally improved the resolution in addition to HV-3 and, thus, we Table 3. Discriminatory powers of VNTR-9 and VNTR-15 among six field sites and their concordance for defining clustered and unique strains.  excluded it from the final typing scheme. A similar result has been observed in a recent study, in which QUB-11a was found almost fully redundant with other hypervariable loci [19]. QUB-11a was also associated with serious amplification failure in this study. This locus, together with QUB-11b and ETR A, were located closely in Rv1917c that belongs to PPE family. According to a recent study, the sequence of Rv1917c was highly polymorphic [32], which may explain the relatively low typeability of these loci. Considering the high discriminatory powers of hypervariable loci, their clonal stabilities have been questioned [5]. In this study, we found these loci had been quite stable in serial isolates. Similar results have been observed in previous studies [14,19,33], indicating these loci are not extraordinarily unstable. Another concern associated with hypervariable loci is the large alleles that can not be interpreted unambiguously. The method of counting stutter peaks through a capillary electrophoresis sequencer provides an effective way to measure the repeat numbers [14]. Alternatively, we previously provide a method to minimize potential artificial variations by electrophoresing the amplicons of clustered strains defined by first-line typing in lanes close together on the same gel [13].
There are two major limitations for this study. First, two potential hypervariable loci, VNTR 3336 and QUB-15, were not evaluated in this study due to amplification failure using our PCR conditions. Amplification failure of locus QUB-15 was also reported in a previous study [34]. In a recent study, these two loci were found with low variability (0.22 for QUB-15, and 0.18 for VNTR 3336) and were not involved in any SLV events over a global panel of Beijing strains [19]. Thus, these two loci would probably not affect the determination of the typing sets in this study. Second, loci QUB-18 and VNTR 2372 from the VNTR-9 were only evaluated in this study and limited previous studies [17][18][19], and more typing data from broader areas is needed to evaluate their discriminations and robustness.

Conclusion
Our study proposes a hierarchical VNTR typing scheme to study the transmission of M. tuberculosis in China. Firstly, the VNTR-9 can be used as the first-line method for large-scale genotyping. Then, the HV-3 can be used to subtype the clustered strains, based on the typing of VNTR-9. Since Beijing strains are highly prevalent in East Asia, this genotyping scheme could also be suitable for molecular epidemiology studies in other East Asian countries. The strategy to develop hierarchical VNTR typing methods that can achieve high resolution with a small number of  loci could be suitable for molecular epidemiology study in other high burden countries.