Identification of positively selected genes in human pathogenic treponemes: Syphilis-, yaws-, and bejel-causing strains differ in sets of genes showing adaptive evolution

Background Pathogenic treponemes related to Treponema pallidum are both human (causing syphilis, yaws, bejel) and animal pathogens (infections of primates, venereal spirochetosis in rabbits). A set of 11 treponemal genome sequences including those of five Treponema pallidum ssp. pallidum (TPA) strains (Nichols, DAL-1, Mexico A, SS14, Chicago), four T. p. ssp. pertenue (TPE) strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), one T. p. ssp. endemicum (TEN) strain (Bosnia A) and one strain (Cuniculi A) of Treponema paraluisleporidarum ecovar Cuniculus (TPeC) were tested for the presence of positively selected genes. Methodology/Principal findings A total of 1068 orthologous genes annotated in all 11 genomes were tested for the presence of positively selected genes using both site and branch-site models with CODEML (PAML package). Subsequent analyses with sequences obtained from 62 treponemal draft genomes were used for the identification of positively selected amino acid positions. Synthetic biotinylated peptides were designed to cover positively selected protein regions and these peptides were tested for reactivity with the patient's syphilis sera. Altogether, 22 positively selected genes were identified in the TP genomes and TPA sets of positively selected genes differed from TPE genes. While genetic variability among TPA strains was predominantly present in a number of genetic loci, genetic variability within TPE and TEN strains was distributed more equally along the chromosome. Several syphilitic sera were shown to react with some peptides derived from the protein sequences evolving under positive selection. Conclusions/Significance The syphilis-, yaws-, and bejel-causing strains differed relative to sets of positively selected genes. Most of the positively selected chromosomal loci were identified among the TPA treponemes. The local accumulation of genetic variability suggests that the diversification of TPA strains took place predominantly in a limited number of genomic regions compared to the more dispersed genetic diversity differentiating TPE and TEN strains. The identification of positively selected sites in tpr genes and genes encoding outer membrane proteins suggests their role during infection of human and animal hosts. The driving force for adaptive evolution at these loci thus appears to be the host immune response as supported by observed reactivity of syphilitic sera with some peptides derived from protein sequences showing adaptive evolution.


Methodology/Principal findings
A total of 1068 orthologous genes annotated in all 11 genomes were tested for the presence of positively selected genes using both site and branch-site models with CODEML (PAML package). Subsequent analyses with sequences obtained from 62 treponemal draft genomes were used for the identification of positively selected amino acid positions. Synthetic biotinylated peptides were designed to cover positively selected protein regions and these peptides were tested for reactivity with the patient's syphilis sera. Altogether, 22 positively selected genes were identified in the TP genomes and TPA sets of positively selected genes differed from TPE genes. While genetic variability among TPA strains was predominantly present in a number of genetic loci, genetic variability within TPE and TEN strains was distributed more equally along the chromosome. Several syphilitic sera were shown to react with some peptides derived from the protein sequences evolving under positive selection. PLOS  Introduction Adaptive evolution including positive selection plays crucial roles in the evolution of bacterial human pathogens and both have been well documented on a genome-wide scale in a number of bacterial genera including Escherichia, Helicobacter, Neisseria, Listeria, Salmonella, Streptococcus, Campylobacter, and Actinobacillus [1][2][3][4][5][6][7][8].
Pathogenic treponemes are both human and animal pathogens. Human pathogens include Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, T. p. ssp. pertenue (TPE, the causative agent of yaws), and T. p. ssp. endemicum (TEN, the causative agent of bejel) while animal pathogens include TPE causing non-human primate infections [9-12], and T. paraluisleporidarum ecovar Cuniculus (TPeC; formerly denoted as Treponema paraluiscuniculi), the causative agent of venereal spirochetosis in rabbits, and T. paraluisleporidarum ecovar Lepus, which infects hares [13][14][15]. Although the recent work of Edmondson et al. [16] reported successful long-term cultivation of T. pallidum in a tissue culture system, most of the data on treponemal genetics comes from the whole genome sequencing studies [17,18].
The genomes of pathogenic treponemes related to T. pallidum contain no prophages or insertion sequence-elements [21,34], or plasmids [35]. Therefore, recombination is expected to be quite infrequent among these treponemes due to a lack of mobile genetic elements [21]. However, traces of both intragenomic DNA recombinations via gene conversion [36][37][38] and intergenomic homologous recombination after DNA horizontal gene transfer have been described [20,25]. In addition, traces of positive selection have been detected in previously published papers including TPA and TPE comparisons [19], comparisons within TPA strains [39], and detailed intrastrain analysis [40]. However, no comprehensive analysis of positively selected loci in the genomes of pathogenic treponemes has been performed to date.
Despite the fact that treponemes related to Treponema pallidum are monomorphic bacteria with extremely low level of genetic diversity [18], it has been shown that human immunity does not protect against different subspecies and not even against different syphilis strains [41]. Therefore, divergent genes encoding differences in proteomes of individual treponemal strains and subspecies are likely of importance for development of syphilis vaccine.
In this communication, the whole genome sequences of 11 treponemal strains were systematically analyzed for the presence of positive selection. The identified genes were further reanalyzed relative to all sequences available in 62 draft genomes published to date. The causative agents of syphilis, yaws, and bejel differed in sets of positively selected genes. Moreover, several synthetic peptides covering positively selected protein regions were found to interact with syphilitic sera.

Strains used in this study
A set of 11 treponemal genomic sequences was examined in this study and included genomes of five TPA strains (Nichols, DAL-1, Mexico A, SS14, Chicago B), four TPE strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), one TEN strain (Bosnia A), and one strain of TPeC (Cuniculi A). An overview of the complete genome sequences used is shown in Table 1. Subsequent analysis of selected genes was performed on additional 62 draft genomes (Arora et al. [30]; Pinto et al. [31]; Sun et al. [32]; GenBank genome TPA sequences [UW074B, UW189B, UW228B, UW254B, UW391B] and TEN Iraq B [CP032303.1]). The overall algorithm is shown in Fig 1.

Identification of genes under adaptive evolution
The TPE Samoa D was used as a reference genome, and all 1068 orthologous genes were extracted from the other 10 complete genomes using given annotation coordinates. The orthologous sequences from the complete genomes were aligned at the codon level using Matlab R2013a software and the Bioinformatics Toolbox. Only genes where at least three nonsynonymous mutations at different sites occurred were further analyzed with respect to the presence of positive selection. A BLAST search was used to determine orthologous sequences in the draft genomes. These sequences were aligned in Matlab at the nucleotide level because large number of ambiguous sites precluded proper automatic ORF localization. Then the sequences from the draft genomes were scanned for nucleotide differences. After filtering sites with unknown nucleotides, insertions, and deletions, only orthologs with more than five nucleotide differences at different sites were analyzed further. These orthologs were aligned with the corresponding genes from the complete genomes to determine the ORF and identify the number of nonsynonymous mutations. At the same time, the TPeC orthologs were excluded in this step due to frequent sequential diversity and due to the lack of pathogenicity of TPeC to humans. The removal of the TPeC orthologs did not change the number of detected positively selected genes. Compared to an analysis of whole genomes, no new locus with more than three nonsynonymous mutations at different sites was identified during the analysis of draft genomes.   Table 1. Subsequently, orthologous gene sequences extracted from published treponemal draft genomes were used and the Cuniculi A orthologs were removed due to frequent sequential diversity and due to lack of pathogenicity of TPeC to humans. Orthologs from draft genomes were used when available and positively selected genes were analyzed within treponemal subspecies using branch-site PAML model analysis. For each analyzed gene from the complete genomes, a maximum likelihood phylogenetic tree with 50 bootstrap replicates was constructed using MEGA 6 [48]. Different evolution models (Kimura 2-parameter [49], , and Tamura-Nei [51]) were applied to each gene. The trees of each gene were compared by calculating Robinson-Foulds distances [52] using R software (packages phytools and phangorn) [53]. The comparison showed that the choice of the evolution model did not significantly change the topology of the tree; the Tamura-Nei evolution model was chosen for the construction of all phylogenetic trees.
A calculation of mutational rate ratio ω between two gene sequences was the basis for the positive selection analysis. The ω was calculated as a ratio of nonsynonymous to synonymous mutational rates. The ratio indicates negative purifying selection (0 < ω < 1), neutral evolution (ω = 1), and positive selection (ω > 1) [54]. A set of selected genes from complete genomes was tested relative to positive selection using the maximum likelihood method using the CODEML of the PAML software package [55]. PAML version 4 [56] and its user interface PAMLX [57] were used in our study. For each analyzed gene, its maximum likelihood phylogenetic tree was used as an input tree. The CODEML offers several different codon evolutionary models, and the statistical likelihood ratio test (LRT) was used to compare the codon evolutionary model to the null model. The Bayes empirical Bayes method (BEB) [58] was then used to evaluate the posterior probability of sites considered to have been positively selected.
The CODEML models could produce different results (i.e., a list of sites under positive selection) since they calculate different parameter estimates. Site models allow ω to vary in each site (codon) within the gene. Statistical testing was required for sites with ω > 1. Two pairs of models were predominantly used since their LRTs have low false-positive rates. M1a (nearly neutral evolution) was compared to M2a (positive selection) [58,59] and M7 (beta) was compared to M8 (beta & ω) [60]. Our preliminary testing found that the two model pairs gave the same or very similar results. Therefore we chose to use the M7-M8 model pair. The M7 model is a null model that allows 10 classes of sites with a ω beta-distribution within the interval 0 � ω � 1. Sites with ω > 1 are not allowed. The alternative M8 model adds an eleventh class of sites with ω > 1. Each site was tested to determine the class to which it belongs. The LRT compares twice the log-likelihood difference 2Δl = 2(l 1 -l 0 ) between the M7 model (log likelihood value l 0 ) and the M8 model (log likelihood value l 1 ) to the χ 2 distribution [61]. If the twice log-likelihood difference is above a critical χ 2 value, then the null model is rejected, and the positive selection is statistically significant.
A considerable disadvantage of the site models is that ω was calculated as an average over all codons of the site. Therefore, the site models are not suitable for the data where ω also varies between lineages. In contrast, the branch-site models search for positive selection in sites and pre-specifies lineages where different rates of ω may occur [62]. Sequences of lineages are a priori divided into a group of foreground lineages where positive selection may occur and group of background lineages where only purifying selection or neutral evolution occurs. We used branch-site model A, which allows four classes of sites and different setups of foreground lineages to be tested depending on the gene phylogeny. In branch-site model A, all lineages under purifying selection with a low value of ω 0 belong to site class 0. Weak purifying selection and neutral evolution with ω 1 near to value 1 are allowed in site class 1. In site class 2a, a proportion of class 0 sites in foreground lineages is under positive selection with ω 2 > 1. Similarly, site class 2b is a proportion of class 1 sites under positive selection with ω 2 > 1. The null model for LRT has ω 2 = 1. Critical values of LRT (2Δl) are 2.71 at 5% and 5.41 at 1% [63]. The posterior probabilities of suggested sites under positive selection were calculated using the BEB method.
The average pairwise p-distances (APD) and average number of mutations (transitions and transversions), calculated using MEGA-X [64], were used to evaluate genetic diversity. A pairwise deletion of sites with gaps/missing data was used. The Fisher exact statistical test was used to assess the significance of the changes between average numbers of mutations.

Synthetic peptides and chemiluminescent detection of serological reactions
Synthetic biotinylated peptides, covering protein regions containing positively selected residues, were designed. Peptide synthesis was performed by JPT Peptide Technologies (Berlin, Germany) on a 50-200 nmol scale. The lyophilized peptides were resuspended in TBS buffer (25 mM Tris, 150 mM NaCl, pH = 7.2) at 1 mM concentrations and were stored at −20˚C. Prior to further use, synthetic peptides were diluted 1000x in TBS buffer.
Streptavidin-coated 96 well plates (Pierce Streptavidin Coated High Binding Capacity White 96-Well Plates; Thermo Scientific, Rockford, USA) were washed three times with 200 μl of washing buffer (TBS buffer containing Tween 20 (0.05%) and Bovine Serum Albumin, BSA (0.1%) (Sigma-Aldrich, Prague, Czech Republic); then 100 μl of diluted peptide was added to each well and incubated for 30 min at room temperature as recommended by the manufacturer (Thermo Scientific) with mild shaking. Subsequently, each well was washed three times with washing buffer (200 μl in each step); then 100 μl of blocking buffer SuperBlock Blocking Buffer in TBS (Thermo Scientific) were added and incubated for 30 min at room temperature with mild shaking. Each well was washed three times with washing buffer (200 μl in each step) and 100 μl of diluted sera (1:500 in washing buffer) were added and incubated for 30 min at room temperature with mild shaking. Each well was washed three times with washing buffer (200 μl in each step) and 100 μl of diluted secondary antibody conjugated with horseradish peroxidase were subsequently added (1:2000 in washing buffer; Goat Anti-Human IgG/IgA/ IgM Horseradish Peroxidase Conjugate; Life Technologies, Carlsbad, USA) and incubated for 30 min at room temperature with mild shaking. Each well was then washed three times with washing buffer (200 μl in each step); then 100 μl of chemiluminescent detection solution (Super Signal ELISA Pico Chemiluminescent Substrate; Thermo Scientific) was added. Luminescence was measured on a TriStar 2 LB 942 luminometer with a Modular Multimode Microplate Reader (Berthold Technologies, Bad Wildbad, Germany). Each experiment was performed at least three times. A signal was considered positive when it was higher than the average of the three lowest values for each serum plus five standard deviations of the average value.

Ethics statement
The human sera were collected from adult patients diagnosed with syphilis at the Department of Dermatology, 1st Faculty of Medicine, Charles University, Prague, Czech Republic. Sera from child patients diagnosed with Lyme disease were obtained from the Department of Children's Infectious Diseases, Faculty of Medicine and University Hospital, Masaryk University, Brno, Czech Republic. All clinical samples were obtained after patients or parents of involved children signed an informed consent. The design of the study was approved by the ethics committee of the Faculty of Medicine, Masaryk University. All human sera were collected under established guidelines.

Identification of positively selected genes
A comparison of 47 orthologous gene sequences from the complete genomes where at least three nonsynonymous mutations at different sites occurred was used to identify positively selected genes using the site and branch-site models of the CODEML in PAML package [55].
The completely sequenced genomes are listed in Table 1 and include 11 genomes. In addition, 25 draft TPA genomes [31], 23 draft TPA and TPE genomes [30], 8 TPA genomes [32], 5 TPA genomes from GenBank (UW074B, UW189B, UW228B, UW254B, UW391B), and one TEN (Iraq B) were also analyzed. The overall algorithm is shown in Fig 1. In all cases of complete treponemal genomes, the genome structure was identical or very similar allowing straightforward identification of gene orthologs. However, in many cases, draft genome sequences were either incomplete or contained many ambiguous bases precluding their use in analyses. This resulted in a variable number of sequences used for identification of positively selected sites within individual loci. Altogether, 22 positively selected genes were identified in the TP genomes using site model analysis in PAML ( Table 2). The number of positively selected amino acid sites varied from 1 to 65, with a median value of 8.5. A list of positively selected protein sites identified using PAML software (site and branch-site models) as well as PAML-identified positively selected protein sites within treponemal subspecies are shown in S1 Table. Average pairwise p-distances (APD) were calculated for each of the 22 genes from 10 complete genomes (Cuniculi A genome was removed from analysis) and from the draft genomes. The APD value for each gene was compared with APD w10 = 0.000525 of the 10 complete genomes without 54 variable loci (listed in S2 Table) which can be considered as a background level of polymorphism. All 22 genes evolving under adaptive evolution had elevated nucleotide substitution density.

Positively selected genes with no recombination events described so far
A list of the 8 positively selected genes is shown in Table 4. These genes include tprF gene, genes encoding outer membrane proteins (TP0515, TP0733, TP0859), subtilisin-like proteins (TP0314, TP0462), enzymes (TP0619), and hypothetical protein (TP0126b). Evidence of positive selection from analyses performed within the strains belonging to different treponemal subspecies and analyses from the PAML branch-site model revealed that positive selection was found mostly between TPA and TPE strains (n = 6), and within TPA strains or isolates (n = 2). In one case (TP0859), positive selection was found both between TPA and TPE strains (n = 1), and between TEN and TPA/TPE (n = 1).
Identified positively selected genes in various treponemal species and subspecies are shown in Table 5 and Fig 2. While Table 5 lists all identified recombinant and positively selected genes in TPA, TPE, and TEN groups of strains or isolates, Fig 2 shows only recombinant or positively selected genes that were identified within a particular subspecies.

Distribution of genetic diversity among TPA, TPE, and TEN strains
To test whether genetic diversity in treponemal genomes was distributed evenly along the chromosome, an average pairwise p-distance (APD) and average number of mutations (ANM, transitions and transversions) were calculated ( Table 6) for concatenation of 54 genes (S2 Table) representing a total length of 86.8 kbp (7.6% of the genome length). These 54 genes included the tpr genes, genes with traces of possible recombination events (according to Gray et al. [38], and Čejková et al. [19]), genes showing inter-strain variability between TPE and TPA strains and their paralogs (according to Čejková et al. [19]), genes showing intra-strain variability [40], and previously identified positively selected genes (according to Čejková et al. [19]) (S2 Table). The TPA strains contained greater genetic diversity within these genes compared to TPE strains (APD (TPA) = 420.0 � 10 −5 versus APD (TPE) = 384.2 � 10 −5 ). Note that the average pairwise p-distances within TPA strains (compared to TPE/TEN strains) are lower when complete genomes were analyzed but higher within selected 54 loci (Table 6). In addition, genetic distance within TPA strains (compared to TPE/TEN strains) is markedly lower in complete genomes without selected 54 genes compared to whole genome analyses (p = 0.0008).

Serological reactivity of human syphilis and Lyme disease sera with synthetic biotinylated peptides derived from positively selected proteins
Synthetic biotinylated peptides were designed to cover protein regions where positively selected amino acid residues were detected. These peptides were tested for reactivity with the patient's syphilis sera ( Table 7). As a control, serum from a patient with Lyme disease was used. A positive result was obtained for one to seven peptides (out of 8 tested) depending on the serum used. At the same time, serum obtained from Lyme disease patient failed to recognize peptides derived from treponemal proteins but did react with peptides derived from borrelial protein ErpA.

Discussion
In this study, 22 genes showing traces of positive selection were identified among the TPA, TPE, and TEN genomes. Within this group of genes, recombination was previously reported in 14 genes. Genes with previously detected recombination events were often found to contain positively selected amino acid residues both among the recombinant and the non-recombinant sequences, which indicates that both recombination and positive selection are different mechanisms of treponemal adaptive molecular evolution. Adaptive evolution is common to many bacterial pathogens and can usually be found in genes important to the interaction between the host and the pathogen, i.e., where new protein variants are of selective advantage for the survival of the pathogenic strain. The immune pressure from the host favors, in microbial genes encoding proteins exposed on the surface of the pathogen, emerging genetic variants, which due to immune evasion, get positively selected. In Escherichia coli, positive selection is limited to a few dozen genes [3] while in several other genomes, including Streptococcus [1] and Campylobacter [2], traces of positive selection have been found in more than half of the core genome. The extent to which adaptive evolution of different bacterial pathogens differs depends on several bacterial features including bacterial mutation rate, frequency of genetic recombination and horizontal gene transfer, and genome size. Genetic recombination occurs more frequently in Neisseria [8] and Helicobacter [5] compared to several bacterial pathogens such as Escherichia [3], Salmonella [4], and Listeria [6]. Moreover, compared to E. coli, Helicobacter pylori has about a 100-times higher mutation rate due to the lack of a highly efficient DNA repair system [72]. Treponemes related to T. pallidum represent bacterial pathogens with small genomes, with an extreme paucity of outer membrane proteins [73], high genetic similarity, and a relatively low mutation rate [28]. Moreover, there are no known mechanisms of horizontal gene transfer in syphilis, bejel, and yaws treponemes [18]. These features of pathogenic treponemes are consistent with a relatively small number of positively selected genetic loci, which consists of just 22 genes (2.1% of all protein-encoding genes). Moreover, this situation also partly reflects the fact that the number of determined treponemal genomes is quite low due to the difficulties in long-term cultivation of treponemes [16]; the sequenced genomes currently available do not reveal the entire genetic variability present among human pathogenic treponemes. A recently developed MLST typing of TPA treponemes [74,75] revealed a number of genetic variants of the TP0705 gene and the vast majority of these variants resulted in amino acid replacements, which suggests positive evolution at this locus. This locus was not identified in Table 5. Positively selected genes and the corresponding proteins in different treponemal species and subspecies. Proteins previously reported as recombinant out of the positively selected proteins are also shown. Positively selected proteins   TPA  TprC, G, J, BamA, Mcp-2,  TP0136   TprC, F, G, J, L, BamA, Mcp-2  TP0126b, TP0133, TP0136, TP0314, TP0462, TP0515,  TP0548, TP0619, TP0733, TP0859, TP0865   TPE  TprI  TprE, F, I  TP0126b, TP0133, TP0314, TP0548, TP0619, TP0733,  TP0858, TP0859,  this study because of the limited variability present in the currently available genome sequences. It is therefore expected that the list of positively selected/recombinant treponemal loci will grow larger as the number of additional genomes accumulate. Positively selected genes as well as positively selected genes with previously identified recombination event that were identified within particular subspecies. Genes identified as recombinant in a particular treponemal subspecies are shown in bold. Positively selected genes with no evidence of recombination are shown in regular version. Positively selected genes identified between subspecies of treponemes, but not within any of them, are not shown. Note that positively selected genes occur mostly within the TPA and the recombinant genes are within the TEN genomes. The TP0548 and TP0865 genes were found to be positively selected within TPA and also within TPE subspecies.

Positive selection in treponemes
Among the genes identified in this study, a substantial number of genes were shown to have evidence of recombination event. Several recent studies revealed that genetic recombination in pathogenic treponemes is not only limited to intra-genomic homologous recombination and gene conversion events involving rDNA loci, tpr genes and their paralogs, and the TP0856 and TP0858 genes [36,38,68]. In addition to intra-genome recombinations, two genes, TP0326 and TP0488, in the TPA Mexico A genome, were found as a result of recombination with exogenous DNA, likely as a result of DNA uptake and chromosome incorporation during coinfection with a different treponemal subspecies [25]. Similar recombinations were detected in a TPA isolate from South Africa [66]. In the work of Grange et al. [76], TEN strain 11qj within the TP0548 locus and a nucleotide sequence almost identical to TPE strains [67,77,78] indicating that the TP0548 locus in other TEN strains is also a result of an interstrain recombination event. In the genome of TPA Sea84-1, the TP0621 locus revealed sequences identical to TPE [23]. Moreover, whole-genome sequencing of TEN Bosnia A revealed several genomic loci similar to TPA strains [20]. All these findings suggest that genetic recombinations of exogenous DNA into the chromosomal loci of treponemes are rare but detectable events. In addition, these findings suggest that the corresponding recombination could be of selective advantage during host infection given the fact that the recombination of foreign DNA without available horizontal gene transfer mechanisms is quite infrequent.
In the work of Arora et al. [30], the authors find the genes with predicted putative recombinant regions (e.g., TP0136, TP0462, TP0548, TP0733, TP0894-898) using a phylogenetic incongruence method, ClonalFrameML, and Gubbins, overlapping with genes identified in our study. However, several predicted recombinant genes comprising TP0179, TP0313, TP0315, TP0967, TP0968 [30], were not identified in this study reflecting the fact that not all recombination events result in detectable positive selection signal.
Out of the 22 genes showing adaptive evolution identified in this study, nine genes (40.9%) were identified in the set of 71 genes showing intra-strain heterogeneity (reviewed in Šmajs et al. [18]). Since most of the observed intra-strain heterogeneity resulted in non-synonymous amino acid changes, identified intra-strain heterogeneity should be considered as ongoing Table 7. Serological reactivity of a patient's syphilis and Lyme disease sera recognizing synthetic peptides corresponding to protein regions containing positively selected amino acid residues. Peptides were designed to cover protein regions containing several positively selected amino acid positions. Positive selection in treponemes adaptive evolution and, as such, it is not surprising that both sets of positively selected/recombinant genes and genes showing intra-strain heterogeneity overlapped considerably. Functionally, the genes showing adaptive evolution identified in this study were the tpr genes (tprC, D, F, G, I, J, L), outer membrane proteins (TP0133, TP0136, TP0515, TP0548,  TP0733, TP0859, TP0865) and genes encoding outer membrane biogenesis protein (BamA), lipoproteins (TP0462, TP0856, TP0858), methyl-accepting chemotaxis protein (TP0488), and three other proteins (TP0126a, TP0314, TP0619). The driving force for adaptive evolution at these loci thus appears to be the host immune response. The observed reactivity of syphilitic sera with some peptides derived from protein sequences evolving under positive evolution supports this prediction.

Peptide
This study has several limitations. First, the number of analyzed genomes was quite limited, especially regarding the TEN genomes. In addition, in recently published draft genomes of TPA, the candidate genes are often only partially sequenced. With a much larger set of treponemal sequences, one can expect a higher number of positively selected genes. The second major limitation was the disproportionality in the number of analyzed genes per species/subspecies, which reflects genome availability. Another limitation is related to the fact that the analyzed complete genomes were obtained from treponemal strains propagated in rabbits and could therefore reflect adaptation of treponemes to this host. However, the analysis of draft genome sequences in this study obtained directly from clinical material suggests that the observed traces of positive selections are present also during infection of humans. Moreover, the identified positively selected positions may represent recent mutations that were not yet removed by negative selection.
In this study, a detailed analysis of traces of positive selection in 3 T. pallidum subspecies including ssp. pallidum (TPA), ssp. pertenue (TPE), and ssp. endemicum (TEN) enabled us to classify most of the identified positively selected genes to a particular subspecies when analyses were performed separately within strains and isolates of the same subspecies or when a PAML branch-site model was used to identify lineages with positively selected loci. The majority of positively selected genes were identified within the TPA and TPE genomes, likely as a result of the highest number of available sequences for these subspecies. However, TPA sets of positively selected genes differed from TPE genes. Among TPA, members of the paralogous tpr family (tprCGJ) and the TP0136 paralogous gene family (TP0136, TP0462) prevailed, while among TPE, a paralogous gene family containing TP0856, TP0858, TP0859, TP0865, showed adaptive evolution. Interestingly, the genes belonging to the latter family (TP0548, TP0856, TP0858, TP0859) were found to be recombinant among TEN genomes. These findings suggest that genomic loci showing signs of adaptive evolution could differ between TPA and TPE/ TEN strains/isolates. This finding supports the observed and consistent genetic differences between treponemal subspecies TPA, TPE, and TEN, and shows that the ways TPA and TPE strains interact with a host during infection is different. Although some authors suggest that the subspecies classification is a case of opportunity and not the consequence of genetic and biological differences [79,80], our findings support the latter explanation.
Supporting information S1 Table. A list of positively selected protein positions identified by PAML software using site and branch-site models and PAML analysis within treponemal subspecies.