Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Verification of emerging genomic mutations in Mycobacterium tuberculosis allows transmission chains to be distinguished in an epidemiological typing cluster extending over thirty years

  • Rina de Zwaan ,

    Contributed equally to this work with: Rina de Zwaan, Gerard de Vries

    Roles Data curation, Formal analysis, Writing – review & editing

    Affiliation Tuberculosis Reference Laboratory, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands

  • Gerard de Vries ,

    Contributed equally to this work with: Rina de Zwaan, Gerard de Vries

    Roles Conceptualization, Data curation, Methodology, Writing – review & editing

    Affiliation Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands

  • Ella Ubbelohde,

    Roles Data curation

    Affiliations Tuberculosis Reference Laboratory, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands, KNCV Tuberculosis Foundation, The Hague, The Netherlands

  • Arnout Mulder,

    Roles Data curation, Software

    Affiliation Tuberculosis Reference Laboratory, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands

  • Miranda Kamst-van Agterveld,

    Roles Project administration, Supervision

    Affiliation Tuberculosis Reference Laboratory, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands

  • Karin Rebel,

    Roles Data curation

    Affiliation TB Department, Municipal Public Health Services (GGD), Utrecht, The Netherlands

  • Saskia Kautz,

    Roles Conceptualization

    Affiliation TB Department, Municipal Public Health Services (GGD), Rotterdam, The Netherlands

  • Kristin Kremer,

    Roles Formal analysis, Project administration, Supervision, Writing – review & editing

    Affiliation KNCV Tuberculosis Foundation, The Hague, The Netherlands

  • Richard M. Anthony ,

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    richard.anthony@rivm.nl

    ‡ RMA and DVS also contributed equally to this work.

    Affiliation Tuberculosis Reference Laboratory, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands

  • Dick van Soolingen

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – original draft, Writing – review & editing

    ‡ RMA and DVS also contributed equally to this work.

    Affiliation Tuberculosis Reference Laboratory, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands

Abstract

Whole genome sequencing (WGS) is able to identify epidemiological links between Mycobacterium tuberculosis isolates. Recent clustering can be ruled out using a pre-defined single nucleotide polymorphism (SNP) threshold. If WGS clusters grow significantly over time limited genetic variability hampers epidemiological investigations. Newly emerging (informative) SNPs in isolates of an extended cluster growing for more than 30 years to >150 cases in the Netherlands were analysed. WGS data was analyzed from 61 sequencing files from 54 patients. Genomic positions that varied within the cluster isolates were carefully screened for minority populations in other isolates from the cluster. A transmission scheme was generated on the basis of WGS data alone then compared to the epidemiological information available. Fifty-two informative SNPs were identified, eight of which were also detected as mixed variants. One emerging SNP in dnaA (1199G > A R400H) has been observed in other transmitted strains and may be under selection. There was high concordance between the transmission chains suggested on basis of the newly emerging SNPs and scenarios identified using classical epidemiological cluster investigations. Analysis of filtered SNPs accumulating in the genome of M. tuberculosis in large clusters contains information on transmission dynamics and can be used to support epidemiological investigations.

Introduction

In the Netherlands, all Mycobacterium tuberculosis complex isolates have been subjected to epidemiological typing by whole genome sequencing (WGS) at the national tuberculosis reference laboratory since 2016 [1]. Isolates within a 12 single nucleotide polymorphism (SNP) distance are reported as ‘clustered’ to the Municipal Public Health Services (MPHs) who perform interviews to identify and confirm epidemiological links [2]. However, most international WGS analysis pipeline’s nowadays apply a stricter threshold of six SNPs to rule in a possible recent transmission [3,4]. Furthermore, SNP filtering in our pipeline is less strict than in most other pipelines, which results in an increased sensitivity to genetic variation and a parallel reduction in specificity (overcalling of SNPs). We estimate the increased SNP calling in our pipeline to be between 0 and 3 SNPs based on observations in international quality control exercises [1,5].

In low transmission settings such as the Netherlands epidemiological clusters mostly consist of two to three cases, but occasionally large, longer-term clusters occur. In general, the detectable genetic variability between M. tuberculosis isolates in even large transmission clusters is limited [1,6,7]. Moreover, SNP variability between isolates from a single individual can be as large as between isolates from different individuals in a cluster [5,8] Nonetheless, as the population structure of M. tuberculosis is clonal, in principle a higher resolution can be achieved by tracking truly emerging SNPs, as non-fixed alleles in a part of the isolates. This approach should enable a more precise reconstruction of transmission chains [912]. This was already suspected in 2010 by investigating single colony cultures of M. tuberculosis isolates associated with the large Harlingen outbreak in the Netherlands; the occurrence of new mutations in a part of the colonies of a sub set of the isolates was proven by PCR and could be used to distinguish different sources of spread in the extending cluster [9]. Thus if SNP calling is sufficiently accurate, a higher resolution in typing can be achieved by following the accumulation of specific emerging SNPs and tracking them as fixed or non-fixed alleles in isolates within transmission chains.

Here, we investigate WGS data of the second largest typing cluster in the Netherlands that has been followed by three different molecular typing methods (initially IS6110 RFLP, then VNTR-MIRU 24, then WGS) over a time period of 30 years [2], with the objective to study the accumulation of new SNPs and to test their utility to distinguish transmission forks introduced by subsequent sources of spread. The predicted evolutionary branching of the strain was compared to information available at MHSs on possible sources of spread in the cluster and subsequent cases, in order to investigate the utility of the cluster sub-branching information in the epidemiological investigation of MHSs.

Methods

Since 1993, DNA fingerprinting of all M. tuberculosis isolates has been performed in the Netherlands, using RFLP typing from 1993–2008, VNTR-MIRU from 2004–2018 and WGS from 2016 to the present. TB public health nurses systematically investigate possible epidemiological links between clustered cases. This activity, the so called ‘cluster investigation’, is prioritised for recent transmission defined as cases that occur within two years after another case. A link is confirmed if the investigation reveals that the newly clustered case was indeed in contact with another clustered and infectious case (e.g., a family member, friend) or if the person was at a particular location at the same time with a clustered infectious case. Since 2009, results of cluster investigations are reported to the National TB Register (NTR), including the identified source case.

A single large cluster, that existed throughout the whole epidemiological typing period since 1993 and with a substantial number of confirmed epidemiological links between clustered cases, was selected for detailed study. We requested the NTR Registration Committee to provide demographic data of cases in this cluster and the results of cluster investigations. All isolates in this cluster had been typed with WGS since 2016. Additionally, patients from before 2016 with a confirmed epidemiological link were selected for re-typing by WGS. MPHs provided information from routine cluster investigation on the most likely place of transmission of patients in the cluster with a confirmed epidemiological link. The laboratory technicians were blinded to information on possible epidemiological links between the selected isolates.

From the cluster, 61 WGS sequencing files of M. tuberculosis isolates from 54 individuals were analysed. From four individuals the sequences from two different isolates were included; two individuals had two isolates (<6 months apart) and two others had two isolates as a result of a reactivation (>6 months apart). An additional three isolates were included twice as duplicate controls for the WGS sequencing and data analysis procedure.

The isolates were sequenced using an Illumina (CA, USA) HiSeqTM 2500, NextSeqTM 500 or NextSeqTM 550 sequencer. Unpaired reads were mapped against the H37Rv reference genome using Bowtie2, (GenBank accession: AL123456.3). Breseq v0.28.1 was used to detect SNPs and INDELs (insertions and deletions). The in-house pipeline called SNPs using a previously validated arbitrary ≥80% allele frequency cut off [1]. Called SNPs were excluded from the SNP distance calculation when present in ‘difficult to sequence’ genomic regions according to an exclusion list or when they were within 12 bp from another SNP [4]. All sequencing data were subjected to our routine quality check, specifically an average coverage of 50x or more, no contamination detected (i.e., < 3 SNPs called in ribosomal genes rrs or rrl genes and no more than five specific reads of another lineage in any of the 62 Coll SNPs [13]). Also 15 high confidence first line resistance mutations (supplementary S2 Table) were screened for mutant reads at <80% to exclude a mixed resistance genotype/ emerging resistance.

The sequencing data generated is available online, NCBI accession numbers are listed in supplementary S1 Table.

Additional analysis & WGS transmission scheme

Any SNPs not present in all sequenced isolates from the cluster called by our pipeline vs the reference genome were graded ‘confident’ or ‘non-confident’ based on visual inspection of the SAMtools pile ups for probable assembly errors [14]. These SNPs were then screened for their distribution in the Dutch WGS database (containing at the moment of screening 3,981 isolates from 53 different lineages [13]). Allele frequencies for any confident SNPs not present in all sequenced isolates in the cluster were then examined in all the isolates from the cluster sequenced. These allele frequencies were used to identify emerging SNPs; an allele frequency over 95% was considered ‘fixed’ and below 5% as wild type, an allele frequency between 5% and 95% was considered a mixed SNP.

A WGS transmission scheme was then produced based on the distribution of all ‘confident’ (both fixed and mixed) SNPs based on the assumptions that;

  1. (i). SNP acquisition is unique, i.e., the same SNP did not occur randomly twice.
  2. (ii). Fixed SNPs are not lost in the transmission scheme.
  3. (iii). Mixed SNPs are either transmitted or not transmitted to a secondary case.

The transmission scheme, generated from the WGS data was then compared to the transmission inferences obtained on the basis of cluster investigations and the known relationships between the isolates (including duplicates and multiple isolates from the same individual).

Ethical clearance

The NTR Registration Committee approved our request and provided the WGS and epidemiological data. As the routine collected data were pseudonymized, the study was not subjected to the Dutch law for Medical Research Involving Human Subject Act (WMO).

Results

Cluster description based on epidemiological information

The selected cluster was identified in 1993 and composed of 157 tuberculosis cases by the end of 2022 (overview in S1 Fig). In February 2003, infectious TB was diagnosed in a new patient (case 5, Fig 1,transmission chain 1; red) who frequently visited pubs in one specific street in Rotterdam. This patient was lost to follow-up during treatment and traced in November 2003, being again highly infectious (case 23). Between 2003–2014, the cluster increased by 121 patients; 80 (66%) were living in the city of Rotterdam, of which 30 (38%) reported frequent visits to the pubs in the mentioned street. A new transmission chain occurred in Amsterdam (Fig 1 transmission chain 7; dark blue Rotterdam, light blue Amsterdam) with two highly infectious cases. Case 96 was diagnosed in 2009 by contact investigation around a child with TB (case 94). A contact of this person developed tuberculosis in 2012 (case 114) and had recurrent tuberculosis (case 129), more than two years after completing the first treatment. Several new transmission locations were identified by cluster investigation, such as a pub in another town (3 cases; not shown), a penitentiary institute (3 cases), a coffee shop (5 cases), a school (4 cases, Fig 1, transmission chain 6; purple) and a hospital (one case of nosocomial transmission). Case 153 developed tuberculosis nine years after being diagnosed with a TB infection in a contact investigation and having received preventive treatment (Fig 1, transmission chain 5; green).

thumbnail
Fig 1. Transmission scheme based on epidemiological information. Cases sequenced with an epidemiological link confirmed by MHSs are present as ovals, with arrows showing the direction of transmission. Relapse cases in the same individual are linked with a solid black line. Oval borders indicate the location of the case; solid thick border Rotterdam; dashed thick border Amsterdam; thin border; other towns or villages in the Netherlands. The arrow colour indicates a distinct epidemiological transmission chain. Dotted arrows indicate situations in which transmission was considered likely. The cases and epidemiological transmission chains are presented using the same colours in Fig 2 (except case 27 shaded, for which WGS data was unanalysable due to contamination of the culture). The year and number of isolates from the cluster is indicated on the left hand side (and in S1 Table).

https://doi.org/10.1371/journal.pone.0319630.g001

Cluster description based on WGS data (SNPs)

A total of 773 SNPs vs the reference genome was called by our pipeline in all the sequenced isolates from the selected cluster, 706 of these SNPs were shared by all 61 WGS files (available online S1 Table) investigated from the cluster on basis of WGS and uninformative. All duplicate controls showed exactly the same confident SNPs as fixed, wild type or mixed. Of the remaining 67 SNPs 53 were manually graded as confident, 14 as non-confident (Table 1). One SNP (Table 1, SNP 64), which was confidently graded, was present in all members of the cluster but not always reliably called by our pipeline; this SNP was thus considered uninformative and not used in the construction of the transmission scheme. The final analysis was thus based on 52 SNPs (Table 1).

thumbnail
Table 1. Overview of all SNP diversity detected in the cluster, the visual grading of the SNPs and the distribution of the diverse identified in the clustered isolates in the NL database of 3981 isolates.. * = relative to the reference genome H37Rv (AL123456.3).

https://doi.org/10.1371/journal.pone.0319630.t001

Subsequent comparison of the distribution of all 67 variable SNPs in the entire Dutch WGS database showed that all but three of the 14 SNPs manually graded as non-confident SNPs were randomly distributed throughout the database, and present in more than 100 unrelated isolates spread over at least seven different lineages, as a result of poor mapping/sequencing at these positions (Table 1). The non-confident SNPs e.g. included three SNPs in the fhaA gene (Table 1, SNPs 57–59) which had already been removed from our routine cluster analyses by the pipeline based on the fact they were within 12 base pairs from each other.

In total 46 of the 52 SNPs manually classified as ‘confident’ were exclusive to the cluster in the Dutch WGS database. Five SNPs (Table 1; SNPs 8, 17, 18, 26, 29) were only shared by one or two isolates from one different lineage in the Dutch WGS database. One confidently graded SNP in dnaA (Table 1, SNP 9) was shared by eight other isolates from four different lineages.

The accumulation of the 52 confident SNPs in a proportion of the isolates was used to inform the WGS-based transmission scheme (Fig 2) based on the assumptions outlined in the methods. A multiple sequence alignment of the SNPs used to create this figure is available in S3 Table.

thumbnail
Fig 2. Transmission scheme based on WGS data.

Numbers inside boxes indicate NTR cases (Fig 1), cases with zero fixed SNP differences are inside the same box. NTR case numbers appearing more than once indicate duplicate sequences generated from the same isolate are indicated by an *. Follow up isolates from the same NTR case are indicated in bold with a “b”. Reactivations have a different NTR case number but are underlined with the second episode in bold. Numbers inside white ovals indicate a fixed SNP (Table 1); unfixed SNPs inside shaded symbols; a grey symbol if the SNP was mixed and a black symbol when in subsequent cases where the SNP was fixed, they have their frequency indicated as a percentage after a comma inside the box. Solid lines linking boxes indicate the inferred transmission of fixed SNPs, dashed lines the inferred transmission of mixed SNPs. Coloured lines indicate isolates in epidemiologically linked transmission chains(Fig 1).

https://doi.org/10.1371/journal.pone.0319630.g002

There were 44 confident SNPs that were not present in all isolates in the cluster and were fixed (>=95% of reads) in all the files in which they were observed. The remaining eight confident SNPs were present in at least one file as a mixed SNP (<95% of the reads). Six of the eight mixed SNPs were fixed in at least one isolate (Fig 2, grey [mixed] and black [fixed] symbols). The remaining two SNPs were only found as mixed in only a single isolate and never fixed (SNPs 22 and 34).

The duplicate controls were identical and placed together on the WGS transmission tree with pair 96 both containing mixed SNP 9 (Fig 2, triangle symbol) at 72% and 84% respectively.

Only two of the eight mixed SNPs were present as a mixed SNP in more than one sequencing file; one duplicate control (SNP 9 in sample 96 mentioned above) and one (SNP 15, Fig 2 round symbols) in two different sequencing files. The two files with a shared mixed SNP 15 were serial isolates from the same individual (with a reactivation) (Figs 1 and 2; 114, 129). These results were derived from isolates collected in September 2012 (SNP 15; 8%) and March 2015 (SNP 15; 68%) from the same patient. Four files also contained a fixed SNP 15(135, 139, 154 and 154*); two of these were duplicate controls (154, 154*). The first isolate in which SNP 15 was mixed (September 2012) also contained a mixed SNP 14 (17%). SNP 14 was fixed in one additional isolate (134) without SNP 15. In the second isolate (Fig 2 129; March 2015) two other additional mixed SNPs were present (SNP 17; 54%, SNP 18; 58%). One isolate (Fig 2; 135) contained both SNPs 17 and 18 fixed along with SNP 15.

Comparison of the epidemiological and WGS based transmission schemes.

The comparison of the information from the WGS analysis and the epidemiological investigations included 33 confirmed epi-links (68 cases) and two relapses. The serial isolates case 5 and its relapse (case 23) were identical, while the isolates from case 114 and its relapse (case 129) contained mixed SNPs; (SNP 14; 17%, and SNP 15; 8%, in the first episode SNP 15; 68%, SNP 17; 54% and SNP 18; 58% in the second isolate (Fig 2).

WGS failed for one isolate selected for WGS analysis (Fig 1, case 27 shaded oval). Out of the remaining 33 of the confirmed epi-links and two repeat isolates screened for SNPs, 16 were identical, 11 contacts gained one SNP, four contacts gained two SNPs and two contacts gained three new confident SNPs (Table 2). There were two epi-linked isolates (cases 85 and 97) that based on the WGS scheme lost a unique fixed SNP identified in the isolate of putative source case 77 (Table 2, Figs 1 and 2). In retrospect it was concluded that these two cases (85, 97) were related cases to case 77 but could also very well have been infected by case 69, who was the source of infection of case 77, which would be consistent with the WGS transmission scheme (Fig 2; supplementary S1 Table).

thumbnail
Table 2. Distribution of SNP distances between isolates considered to have an epidemiological link based on contact investigations undertaken by the MHSs.

https://doi.org/10.1371/journal.pone.0319630.t002

All cases that occurred since 2015 in patients residing in Amsterdam and surrounding areas contained SNPs 4–7 and for most epidemiological links were identified (Fig 1 and 2, transmission chain 7 dark and light blue lines).

Discussion

This study shows that with proper verification of variable mutations identified in a large cluster useful information was revealed with respect to the micro-evolution of the strain. The resulting branching divided the large cluster in several satellite-clusters, which is relevant for monitoring the development of the outbreak and to steer public health actions. The inferences based on a detailed WGS analysis were almost entirely compatible with the possible scenarios developed by the MHSs based on information extracted from epidemiological investigations. The initial outbreak occurred in Rotterdam, the second largest city in the Netherlands, and was associated with known risk factors like frequent visits to pubs in one area. From 2009 onwards, cases emerged in a sub-cluster in Amsterdam. Over time the outbreak strain acquired four confident SNPs, which were present in isolates linked to subsequent cases in Amsterdam, but not in the initial cases in Rotterdam.

We classified all 67 SNPs that were variably present in the cluster using two independent approaches; firstly visual inspection of the sequencing reads (SAMtools Pileups [14]) and secondly an analysis of distribution of the SNPs in the WGS database in the Netherlands comprising 3,981 diagnostic isolates (Table 1). Of the 14 visually non-confident called SNPs, 11 were also identified as non-informative based on their presence in a large number of genetically diverse isolates. A proportion of these SNPs were found in genomic regions that have been previously recognized as poorly mapping [15]. The remaining three SNPs in rrs (Table 1; SNP 53–55) were considered unreliable, presumably as a result of contamination with other bacteria. Finally, one confidently called SNP (64; Rv2662_P86P) was found in 486 other isolates in the WGS database, all belonging to one lineage and was likely present in all members of this lineage but not called by our pipeline in a small proportion of lineage 4.1.2.1 isolates due to poor mapping of a A > G substitution directly next to a string of G nucleotides (Table 1).

Six of the 52 remaining confident SNPs were not unique to the cluster but present in one, two and in one case eight additional isolates in the WGS database of 3,981 unique isolates. Positions under selection in clonal organism such as M. tuberculosis are subject to homoplasy, most notably mutations associated with drug resistance [16]. These six SNPs are thus potential candidates for positions under selection. Strikingly, SNP 9 [dnaA_R400H] emerged in the cluster and was identified in eight unrelated isolates in diverse lineages, present in the WGS database (Table 1). Additionally, a second SNP also emerged in this cluster in the dnaA gene (M348I), this SNP was only identified in isolates from the cluster studied (Table 1, SNP 15). Mutations in dnaA have previously been observed in spontaneous M. tuberculosis drug resistant laboratory mutants [10], and have been associated with increased survival during INH treatment and more robust growth at the MIC of isoniazid [17]. Furthermore, dnaA mutations have also been observed in more efficiently transmitted variants in another cluster of isolates that was subject to detailed SNP analysis [11]. Notably, both dnaA mutations observed in this cluster were initially identified as mixed SNPs and transmitted to other individuals. Furthermore, the dnaA_M348I mutation was also present in two isolates from a treatment failure/ reactivation case (cases 114, 129) isolated more than two years apart (Fig 2, transmission chain 7; blue) with at least four mixed SNPs which were identified as fixed in other individuals who were presumably infected by this case.

After filtering, either manually or screening our database for unexpected homoplasy, high confidence informative SNPs present in duplicate controls were identical and serial/reactivation isolates were all directly linked in the WGS scheme.

We identified the SNP variability within the cluster on the basis of SNP thresholds and then filtered this short list of SNPs based on a screen for variability of these variable positions in a larger database allowing “noisy” SNPs to be identified and filtered out of the analysis. In this way true potentially epidemiologically informative SNPs within the cluster were identified. The resulting short list of confident informative positions could then be analysed in all other isolates sequenced from the cluster in detail with the aim of identifying the isolates in the cluster in which the SNPs were emerging/not fixed [911,18]. This approach in cancelling sequence noise using a large database to screen for epidemiologically informative variation, adds to the epidemiological typing. Comparing the epidemiological data available to WGS based transmission scheme supports the assumption that careful analysis can identify epidemiologically informative variability in closely related sequences.

Limitations

Even with a permissive SNP calling algorithm for a significant proportion of individuals (20 of 54, 37%) in the cluster analysis we were unable to identify any confident SNP variation with at least one other case. Long read sequencing or the use of pan genome analysis may allow the analysis of more variable regions of the genome that currently cannot be confidently analysed using short read sequencing [15] allowing the detection of large genome rearrangements as well as potentially allowing analysis of regions not present in the reference genome currently used could potentially increase the typing resolution [19,20] within recently clustered isolates. We sequenced cultured isolates raising the possibility that sub-populations of bacteria present in the sample may grow at different rates, be lost, or additional mutations accumulate during culture [21]. Finally, we did not screen the whole genomes for mixed SNPs to streamline the analysis, targeted only SNPs that were variable/informative within the clustered isolates sequenced. This detailed analysis was performed on a single larger cluster. The identification of emerging SNPs will only be informative when the majority of cases in the cluster are identified and sequence data is available for at least the majority of the potentially linked cases.

Conclusion

In conclusion, although clustering can be effectively ruled out on the basis of SNP distance thresholds alone we demonstrate there is additional information available in micro variability within the WGS data from isolates in a large cluster that can be useful to identify possible links and target epidemiological investigations. We confirm mixed SNPs, if present, may be fixed in subsequent isolates and strongly suggest a transmission direction and direct epidemiological link as supported by the epidemiological information available from the cluster studied and recently proposed by others [18,22,23]. Screening for these mixed SNPs in a long established cluster can therefore provide epidemiologically useful information. Strategies to automate and validate this type of SNP analysis could be implemented. Marker SNPs characteristic of specific transmission chains [24] can be identified in larger clusters and are potentially more precise than SNP thresholds to identify epidemiologically linked isolates.

Supporting information

S1 Fig. Overview of the numbers of tb episodes in the National TB Register (NTR) linked to the cluster studied with the typing method used and year of detection.

https://doi.org/10.1371/journal.pone.0319630.s001

(PDF)

S1 Table. Details of all the files analysed, the sequence file accession numbers, epidemiological links and the SNPs called in each sample.

https://doi.org/10.1371/journal.pone.0319630.s002

(XLSX)

S2 Table. The 15 high confidence resistance associated SNPs called at any frequency by the routine analysis pipleline.

https://doi.org/10.1371/journal.pone.0319630.s003

(XLSX)

S3 Table. A multiple sequence alignment of the confident SNPs called ordered from SNP 1 to SNP 52 + SNP 64 (Table 1).

https://doi.org/10.1371/journal.pone.0319630.s004

(TXT)

Acknowledgments

We thank the MHSs in the Netherlands for their highly valuable contribution to this paper and ongoing collaboration.

References

  1. 1. Jajou R, de Neeling A, van Hunen R, de Vries G, Schimmel H, Mulder A, et al. Epidemiological links between tuberculosis cases identified twice as efficiently by whole genome sequencing than conventional molecular typing: a population-based study. PLoS One. 2018;13(4):e0195413. pmid:29617456
  2. 2. de Vries G, van Beurden K, Kamst-van Agterveld M, Rebel K, Veulings Y, van Soolingen D. Tuberculoseclusters in kaart gebracht door whole genome sequencing. Infectieziekten Bulletin. 2020;31(2).
  3. 3. Walker TM, Ip CLC, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13(2):137–46. pmid:23158499
  4. 4. Jajou R, Kohl TA, Walker T, Norman A, Cirillo DM, Tagliani E, et al. Towards standardisation: comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases. Euro Surveill. 2019;24(50):1900130. pmid:31847944
  5. 5. Anthony RM, Tagliani E, Nikolayevskyy V, de Zwaan R, Mulder A, Kamst M, et al. Experiences from 4 years of organization of an external quality assessment for Mycobacterium tuberculosis whole-genome sequencing in the European Union/European Economic Area. Microbiol Spectr. 2023;11(1):e0224422. pmid:36475728
  6. 6. Nikolayevskyy V, Niemann S, Anthony R, Van Soolingen D, Tagliani E, Ködmön C. Role and value of whole genome sequencing in studying tuberculosis transmission. Clin Microbiol Infection. 2019;25(11):1377–82.
  7. 7. Bjorn-Mortensen K, Soborg B, Koch A, Ladefoged K, Merker M, Lillebaek T. Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting: a retrospective population-based study in East Greenland. Scientific Reports. 2016;6(1):33180.
  8. 8. Pérez-Lago L, Comas I, Navarro Y, González-Candelas F, Herranz M, Bouza E, et al. Whole genome sequencing analysis of intrapatient microevolution in Mycobacterium tuberculosis: potential impact on the inference of tuberculosis transmission. J Infect Dis. 2014;209(1):98–108. pmid:23945373
  9. 9. Schürch AC, Kremer K, Kiers A, Daviena O, Boeree MJ, Siezen RJ, et al. The tempo and mode of molecular evolution of Mycobacterium tuberculosis at patient-to-patient scale. Infect Genet Evol. 2010;10(1):108–14. pmid:19835997
  10. 10. Bergval I, Coll F, Schuitema A, de Ronde H, Mallard K, Pain A, et al. A proportion of mutations fixed in the genomes of in vitro selected isogenic drug-resistant Mycobacterium tuberculosis mutants can be detected as minority variants in the parent culture. FEMS Microbiol Lett. 2015;362(2):1–7. pmid:25670707
  11. 11. Abascal E, Genestet C, Valera A, Herranz M, Martinez-Lirola M, Muñoz P. Assessment of closely related Mycobacterium tuberculosis variants with different transmission success and in vitro infection dynamics. Scientific Reports. 2021;11(1):11041.
  12. 12. Specht IO, Petros BA, Moreno GK, Brock-Fisher T, Krasilnikova LA, Schifferli M. Inferring viral transmission pathways from within-host variation. medRxiv. 2023.
  13. 13. Coll F, McNerney R, Guerra-Assunção JA, Glynn JR, Perdigão J, Viveiros M, et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun. 2014;5:4812. pmid:25176035
  14. 14. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. pmid:19505943
  15. 15. Meehan CJ, Goig GA, Kohl TA, Verboven L, Dippenaar A, Ezewudo M, et al. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol. 2019;17(9):533–45. pmid:31209399
  16. 16. Hazbón MH, Brimacombe M, Bobadilla del Valle M, Cavatore M, Guerrero MI, Varma-Basil M, et al. Population genetics study of isoniazid resistance mutations and evolution of multidrug-resistant Mycobacterium tuberculosis. Antimicrob Agents Chemother. 2006;50(8):2640–9. pmid:16870753
  17. 17. Hicks ND, Giffen SR, Culviner PH, Chao MC, Dulberger CL, Liu Q, et al. Mutations in dnaA and a cryptic interaction site increase drug resistance in Mycobacterium tuberculosis. PLoS Pathog. 2020;16(11):e1009063. pmid:33253310
  18. 18. Ortiz AT, Kendall M, Storey N, Hatcher J, Dunn H, Roy S. Within-host diversity improves phylogenetic and transmission reconstruction of SARS-CoV-2 outbreaks. Elife. 2023;12:e84384.
  19. 19. Phelan JE, Coll F, Bergval I, Anthony RM, Warren R, Sampson SL, et al. Recombination in pe/ppe genes contributes to genetic variation in Mycobacterium tuberculosis lineages. BMC Genomics. 2016;17:151. pmid:26923687
  20. 20. Gómez-González PJ, Campino S, Phelan JE, Clark TG. Portable sequencing of Mycobacterium tuberculosis for clinical and epidemiological applications. Brief Bioinform. 2022;23(5):bbac256. pmid:35894606
  21. 21. Vargas Jr R, Farhat MR. Antibiotic treatment and selection for glpK mutations in patients with active tuberculosis disease. Proc National Academy Sci. 2020;117(8):3910–2.
  22. 22. Walter KS, Cohen T, Mathema B, Colijn C, Sobkowiak B, Comas I. Signatures of transmission in within-host Mycobacterium tuberculosis complex variation: a retrospective genomic epidemiology study. The Lancet Microbe. 2024;6(1).
  23. 23. Wyllie D, Do T, Myers R, Nikolayevskyy V, Crook D, Peto T. M. tuberculosis microvariation is common and is associated with transmission: analysis of three years prospective universal sequencing in England. J Infection. 2022;85(1):31–9.
  24. 24. de Neeling AJ, Tagliani E, Ködmön C, van der Werf MJ, van Soolingen D, Cirillo DM. Characteristic SNPs defining the major multidrug-resistant Mycobacterium tuberculosis clusters identified by EuSeqMyTB to support routine surveillance, EU/EEA, 2017 to 2019. Eurosurveillance. 2024;29(12):2300583.