Insights into the mysterious genetic variation profile of tprK in Treponema pallidum under the development of natural human syphilis infection

Although the variations of the tprK gene in Treponema pallidum were considered to play a critical role in the pathogenesis of syphilis, how actual variable characteristics of tprK in the course of natural human infection enabling the pathogen’s survive has thus far remained unclear. Here, we performed NGS to investigate tprK of T. pallidum directly from primary and secondary syphilis samples. Compared with diversity in tprK of the strains from primary syphilis samples, there were more mixture variants found within seven V regions of the tprK gene among the strains from secondary syphilis samples, and the frequencies of predominant sequences within V regions of tprK were generally decreased (less than 80%) with the proportion of minor variants in 10-60% increasing. Noteworthy, the variations within V regions of tprK always obeyed a strict 3 bp changing pattern. And tprK in the strains from the two-stage samples kept some stable amino acid sequences within V regions. Particularly, the amino acid sequences IASDGGAIKH and IASEDGSAGNLKH in V1 not only presented a high proportion of inter-population sharing, but also presented a relatively high frequency (above 80%) in the populations. Besides, tprK always demonstrated remarkable variability in V6 at both the intra- and inter-strain levels regardless of the course. These findings unveiled that the different profile of tprK in T. pallidum directly from primary and secondary syphilis samples, indicating that throughout the development of syphilis T. pallidum constantly varies its domain tprK gene to obtain the best adaptation to the host. While this changing was always subjected a strict gene conversion mechanism to keep an abnormal TprK. The highly stable peptides found in V1 would probably be promising potential vaccine components. And the highly heterogenetic regions (e.g. V6) could provide insight into the mysterious role of tprK in immune evasion. Author summary Although the variations of the tprK gene in Treponema pallidum were considered to play a critical role in the pathogenesis of syphilis, how actual variable characteristics of tprK in the course of natural human infection enabling the pathogen’s survive has thus far remained unclear. Here, we performed next-generation sequencing, a more sensitive and reliable approach, to investigate tprK of Treponema pallidum directly from primary and secondary syphilis patients, revealing that the profile of tprK in T. pallidum from the two-stage samples was different. Within the strains from secondary syphilis patients, more mixture variants within seven V regions of tprK were found, the frequencies of their predominant sequences were generally decreased with the proportion of minor variants in 10-60% was increased. And the variations within V regions of tprK always obeyed a strict 3 bp changing pattern. Noteworthy, the amino acid sequences IASDGGAIKH and IASEDGSAGNLKH in V1 presented a high proportion of inter-population sharing and presented a relatively high frequency in the populations. And V6 region always demonstrated remarkable variability at intra- and inter-patient levels regardless of the course. These findings provide insights into the mysterious role of TprK in immune evasion and for further exploring the potential vaccine components.


Abstracts
Although the variations of the tprK gene in Treponema pallidum were considered to play a critical role in the pathogenesis of syphilis, how actual variable characteristics of tprK in the course of natural human infection enabling the pathogen's survive has thus far remained unclear. Here, we performed NGS to investigate tprK of T. pallidum directly from primary and secondary syphilis samples. Compared with diversity in tprK of the strains from primary syphilis samples, there were more mixture variants found within seven V regions of the tprK gene among the strains from secondary syphilis samples, and the frequencies of predominant sequences within V regions of tprK were generally decreased (less than 80%) with the proportion of minor variants in 10-60% increasing. Noteworthy, the variations within V regions of tprK always obeyed a strict 3 bp changing pattern. And tprK in the strains from the two-stage samples kept some stable amino acid sequences within V regions. Particularly, the amino acid sequences IASDGGAIKH and IASEDGSAGNLKH in V1 not only presented a high proportion of inter-population sharing, but also presented a relatively high frequency (above 80%) in the populations. Besides, tprK always demonstrated remarkable variability in V6 at both the intra-and inter-strain levels regardless of the course. These findings unveiled that the different profile of tprK in T. pallidum directly from primary and secondary syphilis samples, indicating that throughout the development of syphilis T. pallidum constantly varies its domain tprK gene to obtain the best adaptation to the host. While this changing was always subjected a strict gene conversion mechanism to keep an abnormal TprK. The highly stable peptides found in V1 would probably be promising potential vaccine components. And the highly heterogenetic regions (e.g. V6) could provide insight into the mysterious role of tprK in immune evasion.

Introduction
The natural history of syphilis is one of a complex chronic disease caused by the infection of Treponema pallidum subsp. pallidum. The disease has a series of highly distinct clinical stage [1], which usually includes the localized chancre primary stage, the disseminated secondary stage, and the late tertiary stage in untreated individuals [2]. This characteristic pattern of successive episodes is reminiscent of antigenic variation during pathogen infection that accounts for these repeated cycles of pathology [3,4]. Studies have indicated that antigenic variation in outer membrane antigens is a hallmark of many chronic multistage infectious diseases [5][6][7].
Previous investigations of tprK, from a 12-member paralogue of the T. pallidum repeat (tpr) gene family, have revealed that tprK is highly heterogeneous at both the inter-and intra-strain levels, with sequence diversity in seven discrete variable regions (V1-V7) that are separated by conserved sequences [8-10]. Although a surface location for TprK is still controversial [11-13], many studies hypothesize that antigenic variation of TprK would facilitate T. pallidum's ability to escape immune clearance, thus permitting the pathogen to persist in the host and remarkable results using rabbit models support this hypothesis [14][15][16]. In this regard, Reid et al. [17] explored the role of tprK in the development of secondary syphilis in a rabbit model based on a clone-based Sanger approach, demonstrating that the rampant variants of TprK were instrumental in the development of later stages of syphilis. However, the authors inevitably encountered a problem in that the inoculum did not maintain an exactly identical tprK clone as required. Additionally, important information about the variants, especially those with low-level diversity, would be lost if only the clone-based Sanger approach was used. Therefore, understanding of the variations of tprK that facilitate the development of syphilis is not complete. In our previous study [18], we employed next-generation sequencing (NGS) to explore the tprK gene directly from primary syphilis samples, demonstrating that the profile of tprK in primary syphilis patients generally contains a high proportion sequence (frequency above 80%) and many low-frequency minor variants (frequency below 20%) within each region. Only some sequences had frequencies between 20% and 80%. This causes us question whether this characteristic distribution of variants in tprK changes with the development of disease and whether tprK keep some relatively stable components in these rampant variations.
In the present study, we sought to perform a comprehensive investigation of characteristic variations of tprK in T. pallidum directly from syphilis patients with primary and secondary disease by employing NGS, thus revealing extensive information on the association of genetic variations of tprK in T. pallidum with disease progression, providing important insights into the immune evasion and persistence of this pathogen or potential vaccine component for human immunology study.

NGS of tprK directly from primary and secondary syphilis patients
The samples (n=28) were collected at Zhongshan Hospital, Xiamen University. Of the 28 samples, 14 samples (P1~14) were from patients diagnosed with primary syphilis, and 14 samples (S1~14) were from patients with secondary syphilis. The clinical information for all 28 patients is shown in Table 1. The qPCR data of target gene tp0574 showed that the amount of treponemal DNA in each clinical sample was sufficient for amplification of the full length tprK. Based on the sequencing data of tp0136, most strains belonged to the SS-14-like group, and only five belonged to the Nichols-like group. The median sequencing depth of the tprK segment samples ranged from 9810.91 to 56676.38, and the coverage ranged from 99.34% to 99.61% (S1 Table). Abbreviations: RPR, reactive plasma reagin; TPPA, T. pallidum particle agglutination; IQR, interquartile range;

The characteristic profile of tprK in T. pallidum from the samples of the two clinical stages
Using the strategy, distinct nucleotide sequences within seven V regions of the tprK gene were captured from each sample, and 491 sequences were obtained in 14 secondary syphilis samples, which was more than that captured in primary syphilis samples (335 in total) (S2 Table). The trends in the number of the total different sequences distributed in the seven V regions of the tprK gene were roughly the same between the two-stage samples; the highest and lowest numbers of different sequences were both found in V6 and V1 in two-stage samples, respectively (Fig 1).
When the frequencies of distinct sequences within each V region in each strain were calculated, they generally contained a predominant sequence within the regions across all the samples (Fig 2). However, compared to the frequency of the predominant sequence in primary syphilis samples, the frequencies of the sequence within the variable regions of tprK was generally decreased in secondary syphilis samples, especially in V7, where the number of sequences with a frequency lower than 60% was increased (8/14 vs 1/14). Notably, the frequency of predominate sequences in V1 of all 28 samples remained almost higher than 80%.
We applied three thresholds (1-5%, 5-10% and 10-60%) to investigate the distribution of minor variants. As Fig 3 shows, most of the minor variants were concentrated in the 1-5% range in both groups. However, the proportions in the other two ranges (5-10% and 10-60% in secondary syphilis samples) were reversed relative to the distribution pattern in primary syphilis samples; that is, the minor variants distributed in 10-60% unexpectedly increased in secondary syphilis samples; as a result, the lowest proportion range was replaced with 5-10% (37/393). Also, the proportion of minor variants distributed above 20% correspondingly increased, with 21/393 in the secondary syphilis samples relative to 6/237 in the primary syphilis samples.
Then, the length of all captured sequences within the regions was analyzed, demonstrating that the change in length could be characterized as a 3 bp or multiple 3 bp pattern in either primary or secondary syphilis samples (Table 2)

The amino acid sequence within each V region of the tprK gene
The captured nucleotide sequences within the seven V regions from each sample were translated into amino acid sequences in silico. There was no sequence yielding a tprK frame shift or premature termination, and synonymous sequences were rare and found only in V2 and V5 (S3 Table). Among the populations from the two-stage samples, a parallel scenario of sequence diversity in each V region was found. Altogether, V1, V2 and V4 had strong parallel sequence ability, and V6 was the region with the least consistent sequence diversity (Table 3).  When all the exclusive types of amino acid sequences within the regions from the 5 two-stage samples were aligned, some sequence types were continually found across 6 primary and secondary syphilis samples (S4 Table). As described above, V1, V2 and 7 V4 presented a strong inter-population shared sequence capacity, regardless of what 8 stage the samples came from. Interestingly, the sequences in these three regions, 9 especially in V2, showed a high proportion of identity across the samples from both 10 groups (Fig 4a). Moreover, there contained predominant sequences of some 11 populations among the observed identical types of sequences (Fig 4b). And some where the predominant sequences were the same in V6 (Fig 4b). Then, the levels of 26 nucleotide diversity in V6 between each sample (Dxy) were calculated using DnaSP 27 v.6.12.01. As shown in Fig 5, the Dxy nucleotide diversity in V6 between each 28 sample was almost above 0.15. This is in agreement with the proposed high diversity in V6 among most T. pallidum strains.  found that the molecular localization in the N-terminal region (containing the region 84 V1) of tprK displays promising partial protection in a rabbit model. Therefore, the 85 most stable peptides in V1 could be a potential vaccine component.

86
In this study, we also found that V6 always presented high heterogeneity at the  Table. 151 Library construction and next-generation sequencing 152 The four subfragment amplicons corresponding to each sample were mixed in 153 equimolar amounts into one pool to produce a separate library, using a barcode to distinguish each sample. Library construction and sequencing were performed by the