Historic Treponema pallidum genomes from Colonial Mexico retrieved from archaeological remains

Treponema pallidum infections occur worldwide causing, among other diseases, syphilis and yaws. In particular sexually transmitted syphilis is regarded as a re-emerging infectious disease with millions of new infections annually. Here we present three historic T. pallidum genomes (two from T. pallidum ssp. pallidum and one from T. pallidum ssp. pertenue) that have been reconstructed from skeletons recovered from the Convent of Santa Isabel in Mexico City, operational between the 17th and 19th century. Our analyses indicate that different T. pallidum subspecies caused similar diagnostic presentations that are normally associated with syphilis in infants, and potential evidence of a congenital infection of T. pallidum ssp. pertenue, the causative agent of yaws. This first reconstruction of T. pallidum genomes from archaeological material opens the possibility of studying its evolutionary history at a resolution previously assumed to be out of reach.


Introduction
. We estimated the age of these three individuals using a method based on dentition, as dental development is considered a more reliable indicator of age than developmental or degenerative skeletal changes [20]. The dental aging chart established by Ubelaker [21] was developed using American indigenous populations; however, when evaluating dental aging methods with a population of known ages, the Schour and Massler atlas [22] was found to have a higher accuracy of age estimations in individuals in below the age of 5.4 years [23]. Both methods were, therefore, employed to estimate the age of individuals as recommended previously [20,21]. Our results reveal one perinatal individual (94B) and two infants (94A and 133) [21,24]. Archaeological context as well as pH-EH quantification studies suggest a burial date during the Colonial period [19]. Radiocarbon dates of the individuals 94B and 133 suggest that both individuals were buried less than 350 years ago (younger than cal AD 1654 for 94B and younger than cal AD 1669 for 133).
These individuals present signs of infection in the form of abnormal skeletal changes such as periosteal reactive bone formation and diaphyseal expansion of long bones (Fig 1). Pathological skeletal changes were observed macroscopically and descriptively recorded as outlined by Buikstra and Ubelaker [25]. We recorded abnormal bone formation and bone destruction, status at time of death (active or healing) and location and extent of the affected bone. The resulting data was referenced to paleopathological and clinical literature [25][26][27][28][29][30][31]. The individuals displayed varied skeletal completeness and differential preservation due to taphonomic damage. The most consistent skeletal feature suggestive of infection present in the three individuals was periostitis, which included a multilayered periosteal reaction and dactylitis (Fig 1) [ [32][33][34][35]. Periostitis, though not specific to treponemal disease is strongly associated with treponematosis. It has been described as a "third category" of skeletal indicators of treponematosis and is present in greater frequency compared to other changes associated with treponemal disease such as cranial and dental changes [27]. Although these individuals (S1 Fig) do not display pathognomonic dental changes such as Hutchinson's incisors or "mulberry" (or Founier's) molars which are known to have a broad occurrence (33% and 27% respectively) in individuals bearing a congenital infection of syphilis, they do display abnormal periosteal bone development of the long bones and cranial bones [28]. To our knowledge, dental features associated with congenital syphilis are restricted to permanent teeth and would only be confidently visible around 6-9 months of age as only the sites of initiated ameloblast activity are visible before then [27,36]. Although each individual was represented by a differing number of skeletal elements, all individuals presented layered periostitis. Individual 94A, estimated to be a six month-old infant based on skeletal development [24], displays a systemic, fulminating periosteal reaction present on the axial as well as the appendicular skeleton as illustrated by the severely affected tibia in Fig 1. Also present in this individual was dactylitis (expansive periostitis of the metacarpals), which is associated with treponematosis in the first year of life [27], among other conditions [26]. Furthermore, this individual displays abnormal reactive bone on the endocranial surface of the pars basilaris (Fig 1). Individual 94B is represented by fewer skeletal elements; however, periostitis suggestive of treponemal infection is still present on the ribs and an unidentified long bone (Fig 1). Based on the crown formation of the incisors and molars this individual is estimated to have died at birth +/-2 months. Individual 133 presents bilateral periosteal reaction appearing to be active at time of death with extensive involvement of the appendicular and axial skeleton. As illustrated in Fig 1, the right femur of this individual displays significant periosteal bone formation with some cortical resorption. This individual is represented by a nearly complete, albeit fragmentary, skeleton, with four mandibular molars, two maxillary incisors, and two canines. Dental elements from the individual were used to estimate the age at death as six months (+/-3 months) post partum [21].
The skeletal changes displayed by the three individuals suggest treponemal infection that is possibly congenital given their young ages [26,37]; however, the nature of the skeletal changes, being non-specific yet suggestive of treponemal infection emphasizes the importance of analysis through molecular paleopathological methods. Ancient DNA provides the possibility to both identify a causative infectious organism and explore its ecological and evolutionary history.

Genome-wide enrichment, assembly and analysis of the T. pallidum DNA
Long bones with lesions suggestive of treponemal infections were sampled for all five individuals; ancient DNA was extracted [38] and double stranded next generation sequencing libraries were constructed for all samples [39,40]. To reduce the background of environmental DNA, whole-genome array capture for T. pallidum subspecies [41] was conducted. After Illumina sequencing to a depth of 98 to 204.5 million pre-processed pair-end reads, mapping to the reference genome (NC_021490.2) and duplicate removal, between 71,524 and 128,419 unique DNA sequencing reads were retrieved for individuals 133, 94A and 94B, resulting in an average of 3.25 to 7.72-fold genome coverage with a duplication factor of 28.64 to 37.25 and 57-94% of the reference genome covered at least 3-fold (Table 1).

Analysis and authenticity of the retrieved T. pallidum and human DNA
To assess the authenticity of the captured T. pallidum DNA we analyzed nucleotide misincorporation patterns characteristic of ancient DNA [42,43]. The observed DNA damage patterns of 10% to 17% C to T damage at the terminal base of the DNA fragments support the ancient origin of the retrieved T. pallidum DNA (Table 1). Human mitochondrial fragments were simultaneously enriched via a streptavidin-based bead capture [44] resulting in complete mitochondrial genomes with haplogroups found today in Central America (D1i2 for 94A & 94B and H1c+152 for 133). Consistent with the ancient pathogen, 11% to 17% DNA damage was observed at the terminal ends, thus supporting their authenticity ( Table 1). Contamination of the human mitochondrial DNA calculated using the program Schmutzi [45] in all three samples was estimated at less than 2% (Table 1).

Phylogenetic analysis of the historic T. pallidum genomes
We reconstructed a phylogeny for the three historic and 39 publicly available modern T. pallidum genomes [41] using different reconstruction methods. Our Maximum Likelihood ( Table). In the Maximum Likelihood tree they branch within the SS14 clade, while they fall basal on the syphilis branch in the  Maximum Parsimony tree with reduced bootstrap support. This phenomenon could be linked to DNA mosaic patterns, such as those identified previously in three T. pallidum strains [46][47][48], leading to the suggestion that T. pallidum strains may horizontally recombine [49].

Analysis of ambiguous SNP positions in the historic genomes
To further investigate this phenomenon we performed several analyses. We constructed a Bayesian phylogenetic tree from 1,061 SNP positions after deletion of positions with missing data from a SNP alignment of the historic strains 94A and 94B with Fribourg-Blanc, Bosnia_A, Nichols, Sea81-4, Mexico_A and SS14 using BEAST2 [50]. The uncertainty in the branching pattern of 94A and 94B was visualized using Densitree [51] by overlaying the different tree topologies from the BEAST2 run ( Fig 2B). We found the highest support for an ancestral placement in 82.16% of the tree states ( Fig 2B, blue color). Only 17.66% of the tree states ( Fig  2B, red color) support the branching of the historic strains within the SS14 clade, and 0.18% of the tree states support a trifurcation of the 94A and 94B strains, the SS14 clade (SS14, Mexi-co_A) and the Nichols clade (Nichols, Sea81-4).
To study the derived SNPs that the historic strains share with different clades in more detail we also conducted a comparative SNP analysis using 2,294 SNPs among a complete set of 42 T. pallidum strains and identified the following: for 94A and 94B, 22 SNPs and 30 SNPs, respectively, support an ancestral position (Fribourg-Blanc clade), 19 and 24 SNPs, respectively, support a position in the SS14 clade, and 4 and 5 SNPs, respectively, support a position in the Nichols clade ( Fig 2C). In general, the loci of these shared SNPs are spread across the genome and are not specific to certain regions.
ClonalFrameML [52] showed evidence of recombination as previously described [41]: depending on the applied model we detected between 13 (standard model) and 16 (per-branch model) predicted recombination events corresponding to the common ancestor of 94A and 94B (S3 Table). The combination of recombination events from both models (standard and perbranch) was analyzed by comparing these regions to SNP positions that 94A and 94B share with different clades (S1 Table). Most of the regions are shared with Fribourg-Blanc and thereby further support the ancestral positioning of the historic strains. We also found one recombination event (346215-347572; TPANIC_RS01590) that is congruent with previous reports on recombination in two genes (TPANIC_RS01590, TPANIC_RS02370) in the contemporary Mexico_A strain [46]. Interestingly, for TPANIC_RS02370 no recombination event was observed in the common ancestor of 94A and 94B pointing to the possibility of a recombination event that happened later in time. We identify multiple regions where the common ancestor of 94A and 94B shows putative recombination events with the ancestors of TPA (pallidum subspecies), TPE (pertenue subspecies) and TEN (endemicum subspecies) suggestive of inter-strain recombination as has been proposed earlier [46][47][48]. Furthermore, the genes TPANIC_RS04240, TPANIC_RS04770 and TPANIC_RS05095, which were previously proposed to be recombining [47][48][49], were also detected as part of potentially recombining regions in our analysis (S3 Table). three subspecies Treponema pallidum ssp pallidum (TPA), pertenue (TPE) and endemicum (TEN). Strains of subspecies pallidum cause syphilis, subspecies pertenue cause yaws and subspecies endemicum causes bejel. (B) Bayesian trees visualized in Densitree overlaying phylogenetic trees based on the most probable topologies. Blue colored trees represent the most probable topology followed by red colored trees. For the ancient strains 94A and 94B two conflicting topologies are visible. The bars represent the 95% highest probability density intervals of the heights of the clades. The support value given at each clade is the fraction of trees in the tree-set that contain the clade. (C) Circos plot showing the shared SNP positions with specific clades and the coverage of the three ancient strains. From outer circle to the inner circle regions of possible recombination detected by ClonalFrameML are denoted on the outermost circle (purple). 'ORI' refers to the origin of replication. The genome coverages of the ancient strains 94B, 94A and 133 are represented in orange magenta and brown respectively from outward to inwards. Based on the SNPs that are specifically shared with different clades, colored bars are shown for strains 94B and 94A respectively in the innermost circles. Red bars highlight the SNP positions specifically shared with Fribourg-Blanc (supporting a phylogenetic position ancestral to the two syphilis clades). The green bars highlight the SNP positions shared with the SS14 clade while the blue bars highlight the SNP positions shared with the Nichols clade. To assess the effects of these findings in the context of single-gene phylogenies, we tested different model topologies (S2B and S2C Fig) on 1,028 genes in a TREE-PUZZLE analysis [53]. Out of 1,028 genes, 544, 564 and 509 genes in strains 94A, 94B and 133 have at least one informative SNP in the gene alignment, of which 221, 180 and 246, respectively, do not reject any of the model trees. Among the remaining genes, 199 genes of strain 133 rejected any positioning of the strain within the syphilis clades, while for strains 94A and 94B, 265 and 322 genes, respectively, reject any position within the yaws clade. For all remaining genes, the rejected tree topologies are listed in S1 Table. We found the highest support for an ancestral placement of 94A and 94B with only seven and eight genes rejecting this positioning (S1 Table). For the third historic strain, 133, its positioning in the yaws clade has the highest support only rejected by one gene (proA; TPANIC_RS01715), which contains a SNP (position 376,586) that is specifically shared with Nichols (S1 Table).

Virulence factor analysis
The presence of 61 genes previously thought to be related to virulence based on other studies was evaluated in the ancient and modern strains. This included the Tpr family, various outer membrane proteins, adhesion proteins, lipoproteins and a few other classes [54][55][56][57][58][59][60]. The ancient strains identified as syphilis harbor all of these factors except TPANIC_RS04235 a homolog of the FadL family that enables the transport of long-chain fatty acids (LCFAs) as well as other hydrophobic nutrients [61]. In general, Treponema pallidum seems to harbor altogether five FadL orthologs [62], of which the other four are present in the ancient strains. In addition, previous studies have described a deletion of the gene TPANIC_1030 in the pertenue strains [41,63], which is also confirmed in the ancient pertenue strain 133 (S3 Fig and S4 Table).

Discussion
The origin of syphilis and the evolutionary history of treponemal diseases are subjects of ongoing debate. Ancient T. pallidum genomes have the potential to provide evidence to address many questions raised in this context. Here we provide the first historic genomes of T. pallidum subspecies. Our data is of sufficient resolution to allow for a global genomic and phylogenetic analysis. Although our historic genomes cannot directly contribute to discussions about the origin of syphilis due to their post-Columbian age, our study demonstrates the potential of successfully obtaining authentic historic T. pallidum genomes when focusing on treponemal cases in young individuals. More historic T. pallidum genomes, in particular from the pre-Columbian era, could be able to settle the debate. However, the selection of samples, retrieval of ancient T. pallidum DNA and comparative genomic analyses pose multiple challenges.
We find strong evidence that the bone lesions in two historical treponematosis cases, one in a perinate and another in an infant, were caused by T. pallidum ssp. pallidum. Furthermore, we present skeletal and genomic evidence of T. pallidum ssp. pertenue infection in an infant individual estimated to be six months (+/-3 months) of age at time of death. This individual calls to attention the possible existence of congenital yaws, which has been debated in the literature due to the apparent inability of T. pallidum ssp. pertenue to cross the placental barrier, despite the description of potential historical examples [64,65]. Variation in the ability for vertical transmission has been demonstrated within T. pallidum subspecies, for example the Haiti B strain of T. pallidum ssp. pallidum also seems unable to cross the placenta [66,67] suggesting the possibility of such variation within the pertenue subspecies strains. Thus, the debate of whether T. pallidum ssp. pertenue can [64,65], or cannot [67,68] cross the placental barrier to cause a form of congenital treponematosis remains unresolved. The severity of the skeletal response in individual 133 together with the young age at death (6 months, +/-3 months) supports a diagnosis of congenital yaws; however, the known pathophysiology of T. pallidum ssp. pertenue infections, namely that skeletal changes can occur in infants infected up to four months post partum [64,65], challenges an unequivocal diagnosis of congenital yaws for this case and highlights the importance of further attention to this research question.
Although the individuals included in our study present skeletal changes consistent with but not limited to treponematosis, producing a differential diagnosis of a greater resolution was not possible due to the shared signs of yaws and syphilis and the differential preservation of the skeletons, which compromised the ability to visualize diagnostic patterning of the pathological changes. Likewise, such challenges exist in the diagnosis of clinical cases whereby evidence of the overlapping modes of transmission and clinical features of syphilis and yaws can be found in biomedical research on modern T. pallidum strains, such as T. pallidum ssp. pallidum causing yaws-like frambesiform lesions [69] or T. pallidum ssp. pertenue invading neurological and cardiovascular tissue as seen in syphilis [64,70]. However, our genetic results allowed us to identify syphilis infection in two of the individuals (94A, 94B) whereas one individual (94B) presents a likely congenital infection of syphilis. In addition, we identified yaws in one infant individual (133). Our work demonstrates the value of molecular identification of ancient pathogens, particularly as applied to treponemal diseases where skeletal responses to the various pathogenic subspecies are often shared, challenging the development of a confident diagnosis through osteological observation [27]. Molecular analyses not only provide greater resolution to the paleopathological diagnosis of osteological changes but they also provide relevant details related to the history and evolution of pathogens we still encounter today.
In the phylogenetic comparison of our historic strains with contemporary T. pallidum strains, we detected ambiguous SNP patterns in our strains that may suggest a number of recombination events in its evolutionary past. Although T. pallidum subspecies are thought to be clonal [71], several studies on modern treponemal DNA have reported evidence for recombination events in various T. pallidum subspecies [41,[46][47][48]. As recombination mechanisms are active during treponemal infections, TPA strains could have integrated some genomic regions from TPE strains during a co-infection of the same host [46,48]. The presence of yaws in sample 133 and syphilis in samples 94A and 94B indicates that pertenue and pallidum infections were both prevalent among individuals in this region, which could have opened the possibility of recombination events facilitated by co-infections with these subspecies. Further research on ancient and modern T. pallidum strains is needed to assess this aspect of their evolution, which should be borne in mind in phylogenetic assessments.
Various studies on the pathogenicity of treponemes have suggested a number of genes potentially involved in virulence, most of which encode proteins that reside at the host-pathogen interface. Regarding the presence and absence of these genes we did not detect a difference in our ancient strains compared to modern ones with the exception of the FadL homolog TPA-NIC_RS04235, which is absent in our ancient syphilis strains. FadL facilitates the transport of long-chain fatty acids (LCFA), which are metabolic energy sources and can also aid in pathogenesis. Uptake of high concentrations of LCFAs released by host cells would assist in suppression of inflammatory responses thereby enabling the pathogen to colonize the host more efficiently [72,73]. However, there are four other FadL homologs present and it is unclear if they target different compounds with varying specificity [57]. Thus, it currently remains an open question if the absence of one of the FadL homologs could influence phenotype.
In conclusion, our study demonstrates that retrieval of authentic ancient T. pallidum DNA from historic tissues to the resolution of genomic reconstruction is indeed possible, despite earlier pessimism [74]. The presence of T. pallidum ssp. pertenue in old world monkeys [17,75] and our finding that two T. pallidum subspecies likely caused similar osteological manifestations in the past may suggest a more complex evolutionary history of T. pallidum than previously assumed. Furthermore, our detection of possible recombination events in the past highlights an important, and currently underrepresented, analytical component that should be accommodated in future models of T. pallidum's history. Our results point out the necessity of ancient T. pallidum genome-level data to better understand both osteological manifestations caused by infection of the various forms and their potential for genomic recombination in the past. Ultimately, these analyses may be able to resolve the origin of syphilis and the evolutionary history of treponemal diseases in general.

DNA extractions and library preparation
For all five samples, DNA from 50 mg of bone powder was extracted in clean room facilities dedicated to ancient DNA work at the University of Tübingen using a silica purification protocol [38] with the following modifications: the Zymo-Spin V funnels (Zymo Research) were bleached and UV irradiated for 60 minutes and the total elution volume was raised to 100 μl. To obtain doublestranded Illumina libraries, aliquots of 20 μl of extract were used in a well-established protocol [39] and sample-specific barcodes were added to both library adapters by amplification after adapter ligation [39,40]. For all samples, two sets of double-stranded Illumina libraries were created: with and without UDG treatment [76]. Extraction and library blanks were treated accordingly. After quantification of the indexed libraries using the IS5 and IS6 primer set [39], the DyNAmo Flash SYBR Green qPCR Kit (Biozym) and the Lightcycler 96 (Roche), a second amplification was carried out in 100 μl reactions for each library with 5 μl of library template, 4 units of AccuPrime Taq DNA Polymerase High Fidelity (Invitrogen), 1 unit of 10X AccuPrime buffer (containing dNTPs) and 0.3 μM of IS5 and IS6 primers [39] and the following thermal profile: 2-min initial denaturation at 94˚C, followed by 6 to 16 cycles consisting of 30-sec denaturation at 94˚C, a 30-sec annealing at 60˚C and a 2-min elongation at 68˚C and a 5-min final elongation at 68˚C. Subsequently, a purification using the MinElute PCR purification kit (Qiagen, Hilden, Germany) and quantification with Agilent 2100 Bioanalyzer DNA 1000 chips was performed.
For the three samples positive for T. pallidum DNA (94A, 94B and 133), a second extract from 50 mg of bone powder as described above and additional single-stranded libraries were created [77], and the previously described procedures were followed after the library preparation step.

Genome wide enrichment and sequencing of the T. pallidum DNA
The genome wide enrichment for T. pallidum DNA was performed via two rounds of hybridization capture using a one-million-feature Agilent microarray and a well-established protocol [78]. The design of the microarray was previously detailed by Arora and coauthors [41]. After array enrichment the libraries were amplified, purified and quantified as described above.
For the positive samples a second enrichment including an additional set of single-stranded libraries and further sequencing was performed as detailed before.

Processing of the sequencing reads and reconstruction of the T. pallidum genomes
Paired-end reads from the three ancient samples, 28 modern samples [41] and simulated reads (reads of length 100 and offset 1) from 11 complete genomes were pre-processed, mapped and post-processed using the EAGER pipeline version 1.92 [79]. After trimming of adaptors and filtering for reads with a minimum base quality of 20 and a minimum read length of 30bp, overlapping forward and reverse reads were merged. These merged and unmerged forward and reverse reads were mapped to the T. p. ssp. pallidum Nichols (NC_021490.2) complete genome as a reference with BWA [80] using a mapping stringency of 0.1 (-n) and a mapping quality filter of 37. Duplicated reads were eliminated using PICARD tools Mark Duplicates [81]. Reads from non-UDG treated libraries were used to estimate the authenticity of the ancient DNA samples. DNA damage in the ancient samples was quantified using MapDamage 2.0 [82]. SNPs were called using UnifiedGenotyper of the Genome Analysis Toolkit (GATK) [83]. These resulting VCF files were combined and compared using an in-house tool (Multi-VCFAnalyser). Reference bases and SNPs were called with a minimal coverage of 3 and a minimal major allele frequency of 90%. Calls below this threshold are represented as "N". The annotation and effect of SNPs within protein-coding regions are identified using SnpEff [84]. All SNPs were concatenated for phylogenetic analysis and downstream analysis.

Enrichment and sequencing of the human mtDNA
Along with bacterial DNA, human mitochondrial DNA from the three ancient samples was sequenced. The paired-end reads were mapped to the HG19 mitochondrial genome (NC_012920.1) using the EAGER pipeline version 1.92 [79]. MapDamage 2.0 [82] was used to estimate DNA damage. Estimation of contamination with modern DNA and generation of consensus sequences was performed using Schmutzi version 1.0 [45]. Haplogroups were determined using Haplogrep 2 [85].

Phylogenetic analysis of the T. pallidum genomes
The concatenated full genome alignment (equal in length to the reference NC_021490.2) of all 42 samples (includes the three ancient samples) was used to assess the relationship of the ancient samples with modern treponemes. Phylogenetic trees were built using RAxML [86] and MEGA 7.014 [87]. Maximum Likelihood trees were built with the GTR+GAMMA+I substitution model with a proportion of Invariant sites of 0.01 with 1000 bootstrap replicates. Evolutionary rates of the sites were estimated based on Gamma distributions with 8 categories. Maximum Parsimony trees were generated for all sites of the full genome alignment using the Subtree-Pruning-Regrafting (SPR) search method and 1000 bootstrap replicates. BEAST2.4.2 [50] was used to reconstruct phylogenies for the SNP alignment of 94A, 94B and selected representative genomes (Nichols, Sea81-4, SS14, Mexico_A, Fribourg-Blanc, BosniaA) after complete deletion of sites with missing data. Bayesian phylogenetic tree states were generated with a GTR-based substitution model with 8 Gamma categories and the proportion of invariant sites estimated from the data. Substitution frequencies of all nucleotide pairs are estimated with CT pair initial frequency of 1.0. A strict clock model was set with an initial rate of 0.005. The priors for the trees were set to Coalescent Constant Population model. All default parameters were used for priors after setting the population model. For the MCMC runs the chain length was set for 1,000,000,000 steps with a burn-in of 10% (100,000,000) and cataloguing tree states and log information for every 10,000 steps. Results were analyzed for stability using Tracer version 1.6 [88]. All the trees were visualized using Densitree version 2.2.5 [51].

Detection of mosaic patterns in the T. pallidum genomes
The contribution of SNPs of the ancient samples was analyzed in detail to resolve the uncertainty of topological differences in the phylogenetic trees using Maximum Likelihood and Bayesian approaches. SNPs from ancient samples 94A and 94B were compared with Fribourg-Blanc (representing the Yaws clade), Nichols (representing the Nichols clade) and SS14 (representing the SS14 clade) using an in-house tool (MultiVCFAnalyser). SNPs supporting the ancestral position are SNPs that are shared specifically with Fribourg-Blanc (ancestral SNPs) but are derived in SS14 and Nichols. SNPs supporting the SS14 position are SNPs that are shared specifically with SS14 (derived SNPs) but are ancestral in Fribourg-Blanc and Nichols. SNPs supporting the Nichols position are SNPs that are shared specifically with Nichols (derived SNPs) but are ancestral in Fribourg-Blanc and SS14.
ClonalFrameML version 1.-178 [52] was used to predict recombination patterns in the ancient strains. Using a Maximum Likelihood tree generated with RAxML as the initial tree and the full genome alignment of 42 samples (three ancient samples included), recombination events were predicted by using the standard model of the Baum-Welch expectation maximization algorithm (EM) for 1000 iterations with default parameters. These parameters were applied to all branches. The recombination parameters were also estimated for each individual branch (per-branch model) with a variability of recombination parameters of 0.1. The predicted recombination events from the standard run and the per-branch run were compared and combined.
For more in depth analysis of any ambiguity in branching patterns that contradict the Maximum Likelihood phylogenetic tree we used TREE-PUZZLE version 5.3 [53]. It compares various models of tree topologies to the constructed maximum likelihood tree and finds the rejected tree models using various statistical tests. For each of the three ancient samples an individual full genome alignment fasta file was generated (equal in length to the reference NC_021490.2) with Nichols, Sea81-4, SS14, Mexico_A, Bosnia_A, and Fribourg-Blanc using an in-house tool (MutliVCFAnalyser). All ambiguous bases are denoted as "N" in the full genome alignment. For each gene, the alignment spanning the gene coordinates from the NC_021490.2 annotation file was extracted and modified to phylip 3.0 format. TREE-PUZZLE [53] was then used to reconstruct phylogenetic trees from the gene alignments and to compare them with our nine pre-defined model tree topologies. With the Fribourg-Blanc, Bosnia_A, Nichols, Sea81-4, SS14 and Mexico_A taxa, a backbone tree topology was created using RAxML. Independently, 94A, 94B and 133 were placed at every possible position on the tree backbone to generate nine model trees that were used to reconstruct the Maximum Likelihood tree from the gene alignment (S2B Fig). Model trees 1, 2, 6, 7, 8 and 9 represent the branching of 94A, 94B and 133 with Fribourg-Blanc, Bosnia_A, SS14, Mexico_A, Nichols and Sea81-4 respectively. Model tree 3 represents an ancestral branching point that splits the sexually transmitted treponemes (T. p. ssp. pallidum) and non-sexually transmitted treponemes (T. p. ssp. pertenue and T. p. ssp. endemicum). Model tree 4 represents a branching that is at a basal position within the SS14 clade (ancestral to the split of SS14 and Mexico_A). Model tree 5 represents a branching that is at a basal position within the Nichols clade (ancestral to the split of Nichols and Sea81). TREE-PUZZLE [53] generated a Maximum Likelihood tree with the GTR substitution model and evolutionary rate estimation using Gamma distribution with eight categories. The search procedure was set to evaluate the user-defined trees (nine model trees here). Initial Parameters were estimated using a Quartet sampling and neighbor-joining tree internally. TREE-PUZZLE generated a comparison table of statistical tests by comparing the topology of the nine gene trees reconstructed from the nine model trees. TREE-PUZZLE reports parsimoniously informative SNPs. For each gene, SNPs were identified and compared with the parsimoniously informative SNPs from TREE-PUZZLE. Genes without any SNPs were discarded and remaining genes were evaluated. Using Expected Likelihood Weight (ELW) statistics, a phylogenetic gene tree was considered rejecting any of the nine model trees based on a threshold of 0.05. TREE-PUZZLE was run for 1028 genes. For robust results, the process was repeated 100 times with the same parameters for 94A, 94B, 133.

Virulence factor analysis
A gene presence/absence analysis was performed to assess the virulence factors harbored by the ancient genomes. A set of 61 gene sequences potentially related to virulence [55][56][57] were extracted from the Nichols reference genome (NC_021490.2) with 200 bases upstream and downstream (S4 Table). Reads from the 42 genomes including the ancient genomes were mapped to the gene sequences of the set with a mapping quality threshold of 0 using BWA. Read duplicates were removed using MarkDuplicates as described above. The coverage over each gene was calculated using genomeCoverageBed in BEDTools version 2.250 [89]. Gene coverage by the reads is estimated at 1x and visualized using ggplots2 package in R (S3 Fig).

Ethic statement
Research on historic human remains presents the ethical problem that informed consent can no longer be obtained from the individuals on whom the research is being conducted. In many cases it is also impossible to identify down close relatives that would be able to provide an informed consent to conduct genetic or anthropological research of their ancestors. Our research, however, does not include culturally sensitive material such as Native American-or Australian Aboriginal populations, whereby it is necessary to acquire the informed consent of the descendants in order to conduct research. Furthermore, we focus on infectious diseases that potentially affect all human populations regardless of cultural identity. The samples ana-  Table. Comprehensive listing of genes that contradict specific tree topologies. For strains 94A and 94B all genes are listed that reject tree topologies other than those that place the strains within the yaws clade. For strain 133 all genes are listed that reject tree topologies other than those that exclude all possible positions within the syphilis clade. In case a gene rejects varying sets of tree topologies during multiple TREE-PUZZLE runs the different patterns are listed (separated by '//'). If a gene does not reject any of the tree topologies for one of the strains the rejection pattern is denoted as '0'. SNP positions of 94A and 94B that are specifically shared with Fribourg (a representative of the Yaws clade), Nichols (a representative of the Nichols clade) or SS14 (a representative of the SS14 clade) are listed for affected genes. For each position, the alternative allele and the representative strain that shares this SNP position are represented for 94A and 94B. If multiple SNP positions from a gene contribute to the ambiguity, all the SNP positions are specified (separated by ';'). In cases were one of the two strains has missing data this is indicated by a ' Ã '. The effect of the SNPs along with the nucleotide change (specified in sense strand direction) and codon change is specified as well as the annotated gene function. If a gene is within genomic regions predicted to be recombining by ClonalFrameML in strains 94A and 94B, the region is specified. Pseudogenes are labeled with '(a)'. (XLSX) S2 Table. List of SNPs that are specific to 94A, 94B and 133. 27 unique SNPs were identified for 94A and 94B while three unique SNPs were identified in 133. The reference allele (Nichols) and the alternative allele, the gene, gene annotation and the effect of the variant are provided for each SNP. The position within the gene and its corresponding amino acid change is also provided. (XLSX) S3 Table. List of recombination events as detected by ClonalFrameML. A list of 16 recombination events detected by ClonalFrameML in the common ancestor of the ancient genomes 94A and 94B is provided. For each region the length and number of SNPs within that region are provided. For each region the number of SNPs per base pair is given as a factor of the genome-wide average. For each region, recombination events occurring in both internal and terminal branches of the tree that are sharing the event with the common ancestor of 94A and 94B are specified. Genes located within the predicted recombinant regions are listed together with their annotated functions. (XLSX) S4