Skip to main content
Advertisement

< Back to Article

Fig 1.

Overview of the workflow used.

For each species, we selected different (3–8) publicly available closed whole-genome sequences as references and 20 sets of short-reads from whole-genome sequencing projects. Reads were mapped to each selected reference genome per species and consensus sequences were obtained from quality SNPs of each mapping. Consensus sequences from the mappings to the same reference genome were added to the MSA of all references of each species. For the analysis of each MSA, (a) we considered only those genome regions present in the reference used for mapping and (b) we obtained a ‘core’ MSA by removing all the regions absent from any of the reference sequences. Finally, we studied the impact of reference choice on the ML trees inferred from each MSA, recombination rates calculated on ‘core’ MSAs and dN/dS ratios calculated considering only coding sequences.

More »

Fig 1 Expand

Fig 2.

Distribution of proportion of mapped reads depending on reference choice.

More »

Fig 2 Expand

Fig 3.

Distribution of coverage of the reference genome depending on reference choice.

More »

Fig 3 Expand

Table 1.

Proportion of significant (P<0.05) comparisons depending on reference choice.

More »

Table 1 Expand

Fig 4.

Distribution of the average depth depending on reference choice.

More »

Fig 4 Expand

Fig 5.

Distribution of the number of SNPs depending on reference choice.

More »

Fig 5 Expand

Fig 6.

Comparison of Robinson-Foulds (RF) and matching clusters (MC) normalized distances calculated between trees from the same species.

More »

Fig 6 Expand

Table 2.

Descriptive statistics of topological distances per species.

More »

Table 2 Expand

Fig 7.

Comparison of RF distances against ANI calculated between the reference genomes selected for each species.

More »

Fig 7 Expand

Table 3.

Congruent comparisons according to ELW test.

All the other pairwise comparisons were not congruent (P<0.05).

More »

Table 3 Expand

Fig 8.

Impact of reference choice on phylogenetic trees of L. pneumophila.

ML trees included the selected reference sequences of L. pneumophila and the consensus sequences obtained from mappings against strains (A) Philadelphia 1, (B) Paris, (C) Alcoy and (D) Lansing 3. Clusters of isolates related with references Paris (red) and Alcoy (blue) are coloured in the first three phylogenies. Isolates 28HGV and 91HGV (highlighted in yellow) were placed in different clades in the trees when using references Paris and Alcoy. Clade of references resulting from using Lansing 3 as reference genome is coloured in red.

More »

Fig 8 Expand

Fig 9.

Impact of reference choice on phylogenetic trees of K. pneumoniae.

ML trees included the selected reference sequences from K. pneumoniae and the consensus sequences obtained from mappings against strains (A) HS11286 and (B) NTUH-K2044. Isolates HGV2C-06 and HCV1-10 (yellow) changed their placement depending on reference choice.

More »

Fig 9 Expand

Fig 10.

Impact of reference choice on phylogenetic trees of P. aeruginosa.

ML trees included the selected reference sequences of P. aeruginosa and the consensus sequences obtained from mappings against strains (A) M18 and (B) 12939. Reference M18 and isolate P5M1 (yellow) alter their phylogenetic relationships depending on reference choice.

More »

Fig 10 Expand

Fig 11.

Impact of reference choice on phylogenetic trees of S.marcescens.

ML trees included the selected reference sequences from S. marcescens and the consensus sequences calculated from alignments against strains (A) UMH9 and (B) WW4. Outbreak clade is shown in red.

More »

Fig 11 Expand

Fig 12.

Recombination rate distribution depending on reference choice between ‘core’ MSAs including sequences from N. gonorrhoeae.

More »

Fig 12 Expand

Fig 13.

Distribution of dN/dS depending on reference choice.

More »

Fig 13 Expand