Fig 1.
TPE strains of NHP origin from Tanzania chosen for whole-genome sequencing.
The figure is taken and modified from Chuma et al. [15]. Original figure (https://www.nature.com/articles/s41598-019-50779-9/figures/1) was licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Median-joining network using 1,773 bp–long concatemer of TP0488 and TP0548 loci from 57 Tanzanian NHPs samples. The number of nucleotide differences, when >1, are shown close to branches. Inferred allelic variants (median vectors) are shown as small black circles. If contiguous, indels were considered as single events only. The number of individual sequence variants, when > 1, are shown inside the circles and reflected by circle size. Color-code based on origin: blue–Issa Valley (n = 3), orange–Tarangire National Park (NP) (n = 1), brown–Ruaha NP (n = 4), red–Lake Manyara NP (n = 34), grey–Serengeti NP (n = 7), green–Ngorongoro Conservation Area (n = 5), violet–Gombe NP (n = 3). Geometric form according to the species: circle–Papio anubis (n = 46), square–Papio cynocephalus (n = 5), triangle–Chlorocebus pygerythrus (n = 5), hexagon–Cercopithecus mitis (n = 1). Sequenced samples are shown by black arrows. One of the arrows labels two strains (49F8190407 and 22LMF5290815).
Table 1.
Genome regions in the assembled genomes that were checked and filled using Sanger sequencing.
Table 2.
Primers used for arp sequencing.
Table 3.
Treponema pallidum subspecies pertenue (TPE) strains analyzed in this study.
Fig 2.
Comparison of human and NHP strains.
Samples of human and NHP origin are marked in brown and yellow, respectively. (A) A comparison of genomic features between TPE of human and nonhuman primate origin. Only genomic regions showing similar differences and detected in two or more genomes are shown. For more details see S1 Data–Comparison of genomic features. Same color cells indicate the same sequence variant. For changes present only in one genome see S2 Table. (B) Differences in the length of homopolymeric tracts in the analyzed genomes. Coordinates correspond to the reference genome LMNP-1. The length of homopolymeric tracts was determined by the dominating variant in the sequencing reads of any given homopolymer. Homopolymeric tracts differing in the number of nucleotides in at least one genome are shown. For comparison, the length of homopolymeric tracts in the SS14 genome is shown. The maximum (red) and minimum (yellow) number of nucleotides are shown, with all values in between shown in orange. All length values are listed at the bottom of the figure. Exact numbers of nucleotides in homopolymeric tracts for each genome that have three or more variants are shown in S1 Fig. *Homopolymers with nucleotide substitution. 1T4 = CTC CCC, 3T2 = CCC TCC, 1A10 = CAC CCC CCC CCC, 2T9 = CCT CCC CCC CCC. (C) Pseudogenes identified in the set of analyzed TPE strains/isolates. Identified pseudogenes are highlighted in red while yellow indicates functional genes.
Fig 3.
The number of repeats in genes TP0304, TP0433 (arp), TP0470, TP0967, and IGR TP0488-9 showed differences between TPE strains of human and NHP origin.
The mean values (marked by red lines) for TP0433 and TP0470 genes with normal data distribution were 6.25 and 12.5 and 25.8 and 29.5 for TPE isolates from humans and NHPs, respectively. The median values (marked by red lines) for genes TP0304, TP0967, and IGR TP0488-9, which did not have normal data distribution, were 1 and 1.5, 2 and 3 and 3 and 3, for TPE isolates from humans and NHPs, respectively. The differences in the TP0470 and IGR TP0488-9 genes were not statistically significant (p-value > 0.05). The p-values are shown at the top of column scatter plots for individual TP0304, TP0433, TP0470, TP0967, and IGR TP0488-9 genes.
Fig 4.
Sequence analysis of tprC, D, F, and I genes.
A. Visualization of the full-length tprC, D, F, and I genes, including gene sequences marked as sections (a–g, shown in grey, seven in total). Each section was defined as a DNA region containing two or more nucleotide sequences that differ in three and more positions. Black lines denote positions of single nucleotide variants (SNVs) at the start and at the end of each section. Single nucleotide variants not following the pattern of the section detected within and outside sections are shown as red lines. B. Modular structure of tprC, D, F, and I genes. White and grey show the two different versions of the DNA regions. Dark grey represents the tprD2 allele. Different sequence versions of the sections are combined into different patterns of the tprC, D, F, and I genes in different TPE isolates. In general, the tprF and tprI genes in TPE isolated from NHPs show more variability in combining different versions of the sections compared to TPE of human origin. The Fribourg-Blanc isolate, CDC-2, Samoa D, and Kampung Dalan K363 have identical modular structures of tprC, D, F, and I. C. Coordinates of the tprC, D, F, and I genes with respect to the LMNP-1 reference. D. Section coordinates according to tprC gene coordinates of LMNP-1 reference and the number of SNVs differentiating sections as well as the number of additional SNVs.
Fig 5.
Phylogenetic analysis of TPE genomes of human and NHP origin.
The phylogenetic relatedness of TPE genomes determined for complete (left tree) and draft (right tree) genomes. Draft genomes from other studies were used [11–13,46–48]. Draft genomes with more than 83,7% coverage were used for tree construction. TPA strains SS14 and Nichols were used as outgroups. For more details see S1 Data–Overview of strains (phylogeny). The evolutionary history was inferred by using the Maximum Likelihood method and Tamura-Nei model [45]. The bootstrap values of trees in which the associated taxa clustered together is shown next to the branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. All positions containing gaps and missing data were deleted. There was a total of 1,092,985 and 651,557 positions in the tree containing 19 and 80 nucleotide sequences, respectively, with 1561 and 4311 variable sites. The color legend next to the trees marks the geographical origin of the samples. Human and nonhuman primate origin is displayed using brown and yellow color, respectively. TPE genomes predominantly cluster relative to the geographical origin.