Complete genome sequences of two strains of Treponema pallidum subsp. pertenue from Ghana, Africa: Identical genome sequences in samples isolated more than 7 years apart

Background Treponema pallidum subsp. pertenue (TPE) is the causative agent of yaws, a multi-stage disease, endemic in tropical regions of Africa, Asia, Oceania, and South America. To date, four TPE strains have been completely sequenced including three TPE strains of human origin (Samoa D, CDC-2, and Gauthier) and one TPE strain (Fribourg-Blanc) isolated from a baboon. All TPE strains are highly similar to T. pallidum subsp. pallidum (TPA) strains. The mutation rate in syphilis and related treponemes has not been experimentally determined yet. Methodology/Principal findings Complete genomes of two TPE strains, CDC 2575 and Ghana-051, that infected patients in Ghana and were isolated in 1980 and 1988, respectively, were sequenced and analyzed. Both strains had identical consensus genome nucleotide sequences raising the question whether TPE CDC 2575 and Ghana-051 represent two different strains. Several lines of evidence support the fact that both strains represent independent samples including regions showing intrastrain heterogeneity (13 and 5 intrastrain heterogeneous sites in TPE Ghana-051 and TPE CDC 2575, respectively). Four of these heterogeneous sites were found in both genomes but the frequency of alternative alleles differed. The identical consensus genome sequences were used to estimate the upper limit of the yaws treponeme evolution rate, which was 4.1 x 10−10 nucleotide changes per site per generation. Conclusions/Significance The estimated upper limit for the mutation rate of TPE was slightly lower than the mutation rate of E. coli, which was determined during a long-term experiment. Given the known diversity between TPA and TPE genomes and the assumption that both TPA and TPE have a similar mutation rate, the most recent common ancestor of syphilis and yaws treponemes appears to be more than ten thousand years old and likely even older.

Introduction Treponema pallidum subsp. pertenue (TPE) is the causative agent of yaws, a multi-stage disease transmitted through direct skin contact between children or young adults; it is characterized by skin nodules and ulcerations and later accompanied by joint, soft tissue, and bone affections (reviewed in [1]).
To date, four TPE strains have been completely sequenced including three TPE strains of human origin (Samoa D, CDC-2, and Gauthier) [2] and one TPE strain (Fribourg-Blanc) isolated from a baboon (Papio papio) in West Africa [3]. Compared to syphilis-causing strains of T. pallidum subsp. pallidum (TPA), the observed genetic differences between TPA and TPE represent less than 0.2% of the total genome sequence length [2]. In addition, TPE strains have been shown to be closely related to the T. pallidum subsp. endemicum (TEN) strain Bosnia A [4].
The origin of syphilis and other treponematoses (i.e., diseases caused by TPE and TEN) remain enigmatic and historically there have been several hypotheses about the origin of these diseases (reviewed in [5]). One of the limitations was the fact that the mutation rate in syphilis and related treponemes was unknown, although estimates have been published based on either paleopathological findings [6] or on phylogenetic analyses of TPA strains/isolates with known isolation dates [7].
In this communication, we compared the complete genome sequences of two strains of TPE which came from infected patients living in Ghana in 1980 and 1988. While there were more than 7 years between isolation dates of the two treponemal strains, the assembled consensus genome sequences of both TPE strains were identical, although both strains exhibited intrastrain heterogeneity at a limited number of nucleotide positions. These data were used to estimate the upper limit of the yaws treponeme evolution rate, as well as evaluate the possible evolutionary implications.

Ethics statement
No vertebrate animals were used in the study. The study was approved by the ethics committee of the Faculty of Medicine, Masaryk University. The research was conducted in accordance with the Czech law standards.

Strains used in this study
Two TPE strains (TPE Ghana-051 and CDC 2575) were used in this study. In 1980, TPE strain CDC 2575 was isolated in Akorabo, Ghana, from a 6-year-old female patient from a papillomatous lesion and inoculated into hamsters [8], and later propagated in New Zealand White rabbits [9]. TPE strain Ghana-051 was isolated in 1988 from a papillomatous lesion of 9-yearold girl who was infected at an unspecified location in Ghana six months prior to admission to Sophia Children's Hospital in Rotterdam, The Netherlands [10]. An extract from a biopsy of this patient in secondary yaws stage was inoculated into New Zealand White rabbits in the Netherlands [10]. DNA from the TPE strains CDC 2575 was isolated from rabbit testes lysates containing 10 7 treponemes per ml. TPE Ghana-051 was provided as a DNA isolate that was previously isolated directly from rabbit testes containing 10 9 treponemes per ml. Both strains came from the laboratory strain collection of Dr. Gerda Noordhoek.

Amplification of TPE genomic DNA
The DNA of both TPE strains was isolated from treponemes obtained during experimental infection of New Zealand White rabbits. The genomic DNA was amplified using the multiple displacement amplification approach (REPLI-g kit, QIAGEN, Valencia, CA, USA) according to the manufacturer's instructions and then used as a template (50x diluted) for pooled segment genome sequencing (PSGS) as described previously [2][3][4]11]. Briefly, the TPE DNA was amplified with PrimeSTAR GXL DNA Polymerase (Takara Bio Inc., Otsu, Japan) with 278 pairs of specific primers to obtain overlapping PCR products (S1 Table) covering the entire genome in both isolates. PCR products were amplified using touchdown PCR at the following cycling conditions: initial denaturation at 94˚C for 1 min; 8 cycles: 98˚C for 10 s, 68˚C for 15 s (annealing temperature gradually reduced by 1˚C/every cycle), and 68˚C for 6 min; 35 cycles: 98˚C for 10 s, 61˚C for 15 s, and 68˚C for 6 min (43 cycles in total); followed by the final extension at 68˚C for 7 min. PCR products were subsequently purified using a QIAquick PCR Purification Kit (QIAGEN, Valencia, CA, USA) and mixed in equimolar amounts into four distinct pools (S1 Table). Prior to Illumina sequencing, the PCR products from each pool were labeled with multiplex identifier (MID) adapters and sequenced as four different samples.

DNA sequencing and de novo assembly of the TPE genomes
Illumina Nextera XT library preparation and sequencing on a NextSeq 500 (2x150 bp) was performed at CEITEC (Brno, Czech Republic). The results were: (1) for CDC 2575, 15,629,124 paired reads, 4,688,737,200 total bases, with an average coverage depth of 1028x, and (2) for Ghana-051, 13,194,731 paired reads, 3,958,419,300 total bases, with an average coverage depth of 868x. The sequencing results are summarized in S2 Table. Resultant data were pre-processed with Trimmomatic (0.32) [12]. Low quality bases were removed with a sliding window having a length of 4, with an average quality of at least Phred = 17. After pre-processing, sequencing reads shorter than 50 bp were removed.
The Illumina sequencing reads were handled separately with respect to the 4 distinct pools and were separately assembled de novo using SeqMan NGen v4.1.0 software (DNASTAR, Madison, WI, USA). A total of 470, 783, 607, and 667 contigs for Ghana-051 and 72, 348, 57, and 215 contigs for TPE CDC 2575 were obtained for Pools 1-4, respectively. Alternatively, all Illumina sequencing reads were mapped to the TPE Samoa D genome (GenBank Acc. No. CP002374.1). All genome gaps and discrepancies were resolved by location-specific PCR products that were Sanger sequenced. Altogether, 26 and 23 regions of the Ghana-051 and CDC 2575 genome were Sanger sequenced, respectively. The individual overlapping pool sequences were joined together, resulting in the complete genome sequence of both isolates.

Identification of genetic heterogeneity within TPE genomes
The individual Illumina reads were mapped to the final version of the corresponding complete genome sequences using SeqMan NGen v4.1.0 software (DNASTAR, Madison, WI, USA) with default parameters and requiring at least a 93% read identity relative to the reference genome. To determine the frequency of each nucleotide (allele frequency) in every single genome position, the haploid bayesian method was used for SNP calculation with default parameters using SeqMan NGen v4.1.0 software (DNASTAR, Madison, WI, USA). Nucleotide positions located within homopolymeric tracts (defined as a stretch of 6 or more identical nucleotides) were omitted from the analysis. Chromosomal loci showing genetic heterogeneity within TPE genomes were defined as those containing more than 8% alternative reads in regions having coverage greater than 100x. The resulting candidate sites for heterogeneous nucleotide positions were subsequently visually inspected using SeqMan NGen v4.1.0 software (DNASTAR, Madison, WI, USA).

Gene identification, annotation, and classification
Geneious software v5.6.5 [13] was used for gene annotations, as described previously [4]. Genes were tagged with TPEGH051_ and TPECDC2575_ prefixes with the locus tag numbering corresponding to the tag numbering of the orthologous genes annotated in the TPE CDC-2 genome ([2]; GenBank Acc. No. CP002375.1). As in other TPE genomes, tprK showed intrastrain variability [14] and the corresponding nucleotide positions were denoted as "N" in the complete genome sequences. For proteins with unpredicted functions, a gene size limit of 150 bp was applied.

Sequencing of rabbit DNA
Trimmed sequencing reads that passed quality threshold for both TPE Ghana-051 and CDC 2575 were mapped to the rabbit reference chromosome 1 (CM000790.1) using BWA-MEM (version 0.7.5a; [16]) software with default parameters. Altogether, sequencing of Ghana-051 and CDC 2575 resulted in 2,150,741 and 1,731,654 reads that aligned to the rabbit DNA, respectively. Single nucleotide variants in the rabbit chromosome-aligned sequences were detected using SAMtools (version 0.1.19; [17]) and VarScan (version v2.4.2; [18]) software by setting a minimum read depth of 8, strand filter on, and a mutant frequency of 100%. Altogether, 21 positions in the rabbit DNA were found to be heterogeneous between the Ghana-051 and CDC 2575 samples.
A region differing in 7 nucleotide positions and spanning approximately 1 kb was selected and analyzed in additional TPE samples containing rabbit DNA with one pair of primers including 5'-AAAGCCCCTTTGCCTAGTCC-3' (positions 53848016-53848035 in the genome of Oryctolagus cuniculus (rabbit) according to reference sequence CM000790.1) and 5'-CCTG CGGCCCCATTATTGTA-3' (positions 53849015-53848996 in CM000790.1). The reaction mixture contained 2.5 μl of ThermoPol Reaction Buffer, 0.5 μl of 10 mM deoxynucleoside triphosphate (dNTP) mixture, 0.25 μl of each primer (100 pmol/μl), 2 μl of the DNA-containing sample and 0.1 μl of Taq DNA polymerase (5,000 U/ml; New England BioLabs, Ipswich, MA, USA). The reaction mixture was supplemented with PCR-grade water to a final volume of 25 μl. The DNA was amplified under the following cycling conditions: denaturation at 94˚C for 2 min; 35 cycles: 94˚C for 15 s, 57.5˚C for 15 s, 68˚C for 1.25 min; followed by the final extension at 68˚C for 2 min. The PCR products were purified using a QIAquick PCR Purification Kit (QIAGEN, Valencia, CA, USA) and sequenced using the Sanger method (GATC Biotech, Germany).

Nucleotide sequence accession numbers
The complete genome sequences of the TPE Ghana-051 and CDC 2575 strains were deposited in the GenBank under accession number CP020365 and CP020366, respectively. The sequenced chromosomal rabbit DNA from TPE Ghana-051, CDC 2575, CDC-2, and Sei Geringging samples were deposited in GenBank under accession numbers KY972336, KY972337, KY972338, and KY972339, respectively.

Isolation of TPE strains
Both TPE strains (CDC 2575 and Ghana-051) used in this study were isolated from papillomatous lesions of young patients of comparable age (6 and 9 years old, respectively), and both patients were infected in Ghana, West Africa. While TPE strain CDC 2575 was isolated in Akorabo, Ghana, in 1980 [8,9], TPE strain Ghana-051 was isolated in 1988 after its introduction, via an infected patient, to the Netherlands [10]. The time and place of the isolations, clinical data, as well as the species of experimentally infected animals and the known number of passages, are shown in Fig 1. While Liska et al. [8] is provided as an original reference for TPE CDC 2575, this strain is not specifically mentioned in this study. According to the lab record from Dr. Gerda Noordhoek, CDC 2575 was provided by Dr. Peter Perine from CDC, one of the coauthors of Liska et al. [8] study, and strain details referred to Liska et al. [8]. Moreover, clinical data from Dr. Noordhoek´s notebook are in accordance with the description of the patient no. 1 in the Liska et al. [8] study (see Table 1 in Liska et al. [8]). Therefore, CDC 2575 represents one of the three successfully hamster propagated samples in the Liska et al. [8] study. The other two TPE strains, named as CDC-1 and CDC-2, were successfully transferred to rabbits and the third isolate (likely CDC 2575) was lost [8]. It is not clear if CDC 2575 represents the lost and perhaps again found sample. Nevertheless, the mislabeling or swapping of CDC 2575 with CDC-2 or CDC-1 can be excluded since all these strains differ in the genomic sequence [2].
In the original article by Engelkens et al. [10], strain Ghana-051 is also not specifically mentioned. However, according to the lab record of Dr. Noordhoek containing clinical data and reference, TPE Ghana-051 is the same strain as described by Engelkens et al. [10].

Whole genome sequencing of TPE Ghana-051 and TPE CDC 2575
Both TPE Ghana-051 and TPE CDC 2575 genomes were sequenced using the previously described PSGS approach, originally described by Weinstock et al. [11] and later modified by others [2][3][4]. The average coverage of Illumina reads reached 868x and 1028x for TPE Ghana-051 and TPE CDC 2575 genome, respectively. The two sequencing projects were performed separately and by different people; MS analyzed TPE Ghana-051, while LM analyzed the TPE CDC 2575 genome. Both sequencing projects (i.e., TPE Ghana-051 and TPE CDC 2575) revealed identical consensus genome sequences, except for regions showing intrastrain heterogeneity, including tprK variable regions. The basic characteristics of the TPE Ghana-051 and TPE CDC 2575 genomes are shown in Table 1, and these are identical or highly similar to other completely sequenced TPE genomes [2][3]. The TPE Ghana-051 and CDC 2575 strains clustered with TPE strains isolated from Africa, especially with the TPE Gauthier strain (Fig  2A), which differed from it in 129 nucleotide positions (S3 Table). Intrastrain heterogeneity in TPE Ghana-051 and TPE CDC 2575 genomes Both TPE Ghana-051 and TPE CDC 2575 genomes were assessed for the presence of intrastrain heterogeneity, as described previously [21]. The frequency of minor alleles was set to at least 8% and the list of heterogeneous sites is shown in Table 2. While the genome of TPE Ghana-051 had 13 heterogeneous sites, the genome of TPE CDC 2575 had 5 heterogeneous sites. Four heterogeneous sites at positions 134948, 522943, 700634, and 997894, relative to strain Samoa D, were found at the same positions in both genomes, but differing in the frequency of both alleles ( Table 2). All minor alleles were supported by at least 19 independent No. of annotated pseudogenes 7 6 6 No. of tRNA loci 45 45 45 No. of rRNA loci 6 (2 operons) 6 (2 operons) 6 (2 operons) No. of ncRNAs 3 3 3 https://doi.org/10.1371/journal.pntd.0005894.t001 An unrooted tree constructed from whole genome sequence alignments from available TPE and TEN genomes. The tree was constructed using the Maximum Likelihood method based on Tamura-Nei model and MEGA7 software [15]. Bootstrap values based on 1,000 replications are shown next to the branches. A. An unrooted tree constructed from the whole genome sequence alignment of TPE and TEN genome sequences. The bar scale corresponds to a difference of 0.00010 nucleotides. B. An unrooted tree constructed from the whole genome sequence alignment of TPE and TEN genome sequences. The bar scale corresponds to a difference of 0.00005 nucleotides. tprK sequences, both rRNA operons, tprD, arp, and TP0470 genes were omitted from the analysis. Deletion of these regions resulted in a modified tree topology due to: (1) differences between the two possible constitutional states for the rRNA operons [19], (2) the presence of two tprD alleles in the TPE population (tprD and tprD2; [20]), and (3)  Illumina reads with a mean value of the number of minor allele Illumina reads of 278.5. The average sequencing error rates for these positions, calculated from alternative reads (i.e., other than minor and major alleles) for nucleotide positions (presented in Table 2) was 0.094%. Therefore, a cutoff of 8% for alternative reads was more than two orders of magnitude higher than the average error rate of Illumina reads for these positions.
One of the 14 polymorphic sites (235835) was in an intergenic region. Most of the alternative alleles (11 out of 13) resulted in non-synonymous mutations in genes encoding proteins involved in substrate transport and metabolism (Table 3).

Molecular analysis of rabbit chromosomal DNA in Ghana-051 and CDC 2575 samples
Since consensus genome sequences of both TPE samples were identical (although both strains contained several intrastrain heterogeneous sites) and since these samples were also contaminated by rabbit DNA derived from the experimentally inoculated rabbits, the rabbit chromosomal DNA in Ghana-051 and CDC 2575 samples was analyzed to verify the independent character of both samples. Altogether, sequencing of Ghana-051 and CDC 2575 resulted in 2,150,741 and 1,731,654 reads aligning to Chromosome 1 rabbit DNA, respectively. Altogether, 21 positions in rabbit DNA were different between Ghana-051 and CDC 2575 samples. From these reads, a region differing in 7 nucleotide positions and spanning less than 1 kb was selected, amplified, and analyzed in additional TPE samples from the same laboratory and time period. The sequenced region covered positions 53848036-53849014 in the genome of Oryctolagus cuniculus (CM000790.1) (Table 4), i.e., the region between the DMRT1 gene (encoding Doublesex and mab-3 related transcription factor 1; coordinates 53687806-53804010) and the KANK1 gene (encoding the KN motif and ankyrin repeat domain-containing protein 1; coordinates 53898320-53958377). Within this region, 46 and 141 Illumina sequencing reads were mapped to the rabbit reference in the Ghana-051 and CDC 2575 samples, respectively. Another TPE sample that was propagated in rabbits in the same laboratory and at about the same time (1990; TPE Sei Geringging, [22]) showed sequences in these nucleotide positions that were identical with the rabbit DNA from the TPE Ghana-051 sample (1988). Similarly, the TPE CDC-2 sample (isolated in 1980, [8]) showed identical rabbit sequences to the rabbit sequences from the CDC 2575 sample (isolated in 1980).

Estimation of the upper limit of the mutation rate in yaws treponemes
There was a time span of 7 years and 3 months between isolation of the two strains and, during this time, the TPE strain Ghana-051 multiplied in infected humans. Using this time period   [23,24], and the optimal proliferating conditions during the entire 7.25 year period, one can assume that 63,510 hours (7.25 years) between the isolation of the two samples corresponded to 2,117 treponemal generations. This number corresponds to a TPE mutation rate of 4.1 x 10 −10 per site per generation (assuming 292 generations per year).

Discussion
Two TPE strains isolated more than 7 years apart from each other, from young patients who were originally infected in Ghana, Africa, were analyzed and their genomes were completely sequenced with PSGS [2][3][4]11]. The consensus genome sequences were identical despite the fact that the isolation of DNA, genome amplifications, sequence assembly, and genome analyses were performed separately at different times and by different people. This finding indicates that (1) there is a very low error rate in PSGS sequencing, i.e., on the order of 10 −6 or lower and (2) the genomes of uncultivable pathogenic treponemes are quite stable over time.
The identical consensus whole genome sequences raised the question as to whether the Ghana-051 and CDC 2575 samples really represented two different strains. There are several Noordhoek's lab record, this sample was described in the study by Engelkens et al. [10]. The sequenced sample of TPE Ghana-051 showed similar signatures of rabbit DNA as rabbits used in the Netherlands at the same time period (1988) and the original tube from which the TPE Ghana-051 was sequenced originated from 1989. Both these facts support strain TPE Ghana-051 as the original sample isolated in the Netherlands. Taken together, the possible duplication of either CDC 2575 or Ghana-051 samples can be excluded. Several previous studies [25][26][27] described partial sequences of CDC 2575 (n = 24) and Ghana-051 (n = 22) comprising 8,733 bp and 7,583 bp, respectively. Altogether, 5 nucleotide discrepancies, located in genes TP0617 (1 nucleotide difference; GenBank Acc. No. EU101912.1 for CDC-2575 and EU101917.1 for Ghana-051) and TP0620 (4 nucleotide differences; GenBank Acc. No. JN582339.1 for CDC 2575), were identified. However, the TPE CDC 2575 sequence of TP0620 (JN582339.1), published in a study by Pillay et al. [27], was likely mislabeled since this sequence was also identical to the TP0620 of the TPA Nichols strain, which had also been used in the aforementioned study [27]. The remaining difference (C vs. T) at position 671367 corresponding to the complete genome sequence of CDC 2575 and Ghana-051 in the TP0617 gene was analyzed in individual sequencing reads and revealed a cytosine at this position, which was supported by 97.9% (1266x coverage) and 96.9% (2080x coverage) of the reads in CDC 2575 and Ghana-051, respectively. A thymine at this position was detected in 2.1% and 2.9% of the reads in CDC 2575 and Ghana-051, respectively, suggesting that the observed difference may represent an intrastrain heterogeneous site, a sequencing error, or both.
Genetic diversity within individual treponemal strains, i.e. intrastrain genetic diversity, was detected in several previous studies and mostly affected tpr genes and sequences in their vicinity or sequences paralogous to tpr genes [28][29][30][31][32][33][34]. A previous genome-wide study [21] on genetic heterogeneity revealed 17 genes in 5 treponemal genomes with 23 intrastrain heterogeneous sites; the number of identified heterogeneous sites in individual genomes varied between 0 and 7. However, the average sequence coverage of analyzed genomes in the previous study [21] ranged between 21x and 194x, while in the present study the coverage was well over 800x, which may explain the higher detection rate of intrastrain variable sites. Another recent genome-wide study [35] revealed 63 intrastrain heterogeneous sites in 25 TPA genomes. The median average depth of coverage of the genomes analyzed in this study [35] was also relatively high compared to the original study [21] and reached 131x (ranging from 20x to 1,196x). While it is clear that the number of identified intrastrain heterogeneous sites somehow correlates with the average depth of coverage, it is not clear if the number of heterogeneous sites also reflects the T. pallidum subspecies, as suggested by Čejková et al. [21], where the majority of heterogeneous sites were found among TPA strains and not among TPE strains. As with the previously identified heterogeneous sites, the alternative alleles primarily encoded non-synonymous replacements of amino acid residues in the corresponding protein sequences, suggesting an adaptive character for this genetic variability. Interestingly, TPE strain Ghana-051 showed more heterogeneous sites (n = 13) compared to TPE strain CDC 2575 (n = 5). A previous study showed that when DNA preparations originated from two different rabbit passages, the relative proportions of alleles differed [21]. It is therefore not clear if the observed differences in occurrence and frequency of heterogeneous sites reflect differences associated with when (i.e., the year) the patient's sample was isolated, or if it was the differences in the number of hamster and rabbit passages, or both.
Seven years and 3 months elapsed between the isolation of the two samples and this corresponds to a mutation rate of 1.21 x 10 −7 per TPE nucleotide site per year or lower or 4.1 x 10 −10 per site per generation (assuming 292 generations per year). For our calculations, we omitted the time period over which the two TPE strains evolved separately, i.e., the time between TPE infection and the appearance of clinical symptoms in the patient that originated the older sample (CDC 2575), and also the time that was needed for isolation of both strains during experimental hamster and rabbit infection. The reason for this was the lack of data indicating the exact time of infection and the exact number and duration of animal passages. However, both these factors would act to further lower the calculated mutation rate. In E. coli, an in vitro experiment revealed about 100 fixed genome mutations per 40,000 generations, i.e., one mutation per approximately 400 generations [36]. As shown by Maughan [37], differences in generation time between E. coli and TPE do not have to affect the evolutionary rates of other organisms with different generation times. The genome size of E. coli is 4,629,812 nt, i.e., 4.06 times larger than the TPE genomes analyzed in this study and the corresponding mutation rate corresponds to 5.3 x 10 −10 per site per generation. Interestingly, the mutation rates calculated for both organisms were surprisingly similar. However, in TPE, this estimate represents the upper limit of the mutation rate, which is likely to be lower. Based on paleo-pathological findings and phylogenetic analyses, de Melo et al. [6] estimated the evolutionary rate of syphilis treponemes to be 8.82 x 10 −8 substitutions per site per year, i.e., a number corresponding to 3.3 x 10 −10 per site per generation. This estimate is close to the highest substitution rate revealed by the present study. Moreover, the mutation rates in microbes with DNA-based chromosomes were estimated to be close to 1/300 per genome per replication [38], i.e., 2.92 x 10 −9 per site per generation, less than an order of magnitude higher than that estimated in this study. Altogether, the estimated upper limit of the TPE mutation rate appears to be close to that of other bacteria and to other mutation rate predictions in treponemes (based on different data). Since the evolutionary rate differs between bacterial species [39], it is not clear how the estimated upper limit of the TPE mutation rate is applicable to related treponemal subspecies, including syphilis treponemes (TPA), TEN, and T. paraluisleporidarum [40] treponemes. Uncultivable pathogenic treponemes are monomorphic bacteria [5] that have extremely high sequence similarity; therefore, it is likely that other related treponemes show similar mutation rates. However, a study published by Arora et al. [7] estimated the mean evolutionary rate of TPA as 6.6 x 10 −7 substitutions per site per year, which was based on sample isolation dates and a Birth-death serial skyline model [41]. This value is about 5.5 times higher than the upper limit estimated in this study (1.21 x 10 −7 per nucleotide site per year). The differences in the estimated TPA error rate could be related to the model used and/or the differences between TPA and TPE mutation rates.
Several studies showed that non-human primates can represent possible reservoirs of human yaws [42][43][44][45][46]. Since yaws treponemes appear to be genetically stable for years, molecular typing of yaws-causing strains will help in future to prove or exclude a transfer of yaws strains between non-human primates and humans. Moreover, epidemiological mapping of TPE strains could help to identify transmission networks also within humans which might be valuable especially in recent yaws eradication efforts. In 2012, the WHO launched a plan for eradication of yaws using macrolide antibiotic azithromycin [47]. So far, azithromycin was found to be effective in treatment of yaws [48-51], although in syphilis-causing strains, two mutation were associated with macrolide resistance [52-53]. Since resistance to azithromycin is caused by point mutation in the 23S rRNA gene and since both copies of the 23S rRNA genes need to be mutated, we have previously estimated the probability of emergence of macrolide-resistant yaws treponemes to be a rather rare event (with probability 10 −4 -10 −5 per infected patient) [54].
Another implication of the yaws treponemes genome stability during human infections is the possible implication for syphilis treponemes typing studies [54-60], where the stability of arp, tprEGJ, TP0136, TP0548, and 23S rRNA loci were repeatedly discussed. If the major finding of this study could be extrapolated to TPA strains, then all loci used, so far, for molecular typing of syphilis treponemes, should be sufficiently stable to infer epidemiological relationships.
There are over 2,000 nucleotide differences between TPE and TPA strains [2] and the majority of them are scattered throughout the genomes with only a minority of loci positively selected or recombinant. Based on the known genetic diversity between TPA and TPE genomes, the observation-derived estimate of the upper limit of the mutation rate, and the assumption that the mutation rate in syphilis treponemes is similar to that of TPE, the most recent common ancestor of the syphilis and yaws treponemes is more than ten thousand years old. Assuming an example that uses a 10-times lower mutation rate compared to the upper limit revealed in this study, the most recent common ancestor of syphilis and yaws treponemes would have appeared at a time that would have fallen within the beginning of evolution of modern humans. Taken together, the data presented in this study are consistent with the relatively slow evolution rate of yaws treponemes and suggest that the appearance of the most recent common ancestor of syphilis and yaws treponemes took a very long time, perhaps even a length of time comparable to the evolution of modern humans.
Supporting information S1