Fig 1.
Matrix of M. tuberculosis variants associated with the outbreak.
Locations of all SNPs and indels found in the 22 isolates are shown in the colour-coded matrix. Deletions, or the absence of an insertion, are indicated with a single dash (-). Genome locations of variants are given for the reference strain H37Rv. Colour-coding of variants is based on differences from H37Rv, with all variants appearing in that isolate coded the same colour. This colour is then maintained when these variants appear in subsequent isolates, to help visualise patterns of SNP accumulation.
Fig 2.
Unrooted maximum parsimony tree of variants.
The tree depicts the relative genetic distances between cluster isolates, estimated from maximum parsimony analyses performed on concatenated variants. Isolates that were genetically indistinguishable based on variant analyses are grouped together. Branch lengths are relative to the number of variants separating each isolate; individual SNPs, insertions, and deletions are represented by black, yellow, and red dots, respectively.
Fig 3.
Transmission pathways derived from unrooted maximum parsimony tree.
Each circle (or node) represents a sequenced isolate. Nodes are positioned according to the year the original specimen was collected. Dashed lines connect nodes that are indistinguishable based on variant analyses. Solid lines indicate at least one observed variant between two nodes. Putative transmission events are indicated by arrows based on: (a) variant analyses and assumptions of no homoplasy and no introductions after 2003; and (b) variant analyses, no homoplasy, no introductions after 2003 and further epidemiological assumptions. The further epidemiological assumptions applied are (i) chronological transmission; (ii) transmission could not occur between cases that were diagnosed within 6 months of each other; and (iii) secondary cases arose within three years of exposure to a possible source case. The application of these assumptions indicated that at least two unidentified cases would have been required to sustain cluster transmission (“Missing Case(s)” boxes). However if, for example, the insertion found in the c15 library had arisen after transmission, then even with these assumptions no missing cases would be required later than 2003.
Fig 4.
Low Frequency Variant Detection.
LoFreq was used to detect SNPs present at frequencies ≥ 10%. The top row indicates the SNP position in the reference genome H37Rv, the second row shows the nucleotide present in H37Rv, and the third row shows the variant nucleotide identified by LoFreq. The matrix indicates the frequencies of the SNPs detected in the isolates shown.
Fig 5.
Bayesian inference tree from multiple sequence alignment of de novo-assembled cluster genomes and reference genomes.
A multiple sequence alignment of the H37Rv genome with repetitive elements censored (NC_000962.RRE), eight other lineage 4 reference genomes and the de novo assembled cluster libraries was used to generate Bayesian inference trees using BEAST. A consensus tree using relaxed clocks and the coalescent skyline population model is shown, with branch labels showing the probability of those subclades appearing in the sampled trees; substitutions per site appear on the y-axis. Subclades that appeared in less than half of the sampled trees are not shown. The SNPs that determine the characteristics of this tree are a subset of the variants shown in Fig 1.