Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Identification of Functional Candidates amongst Hypothetical Proteins of Treponema pallidum ssp. pallidum

  • Ahmad Abu Turab Naqvi,

    Affiliation Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi—110025, India

  • Mohd Shahbaaz,

    Affiliation Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi—110025, India

  • Faizan Ahmad,

    Affiliation Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi—110025, India

  • Md. Imtaiyaz Hassan

    Affiliation Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi—110025, India

Identification of Functional Candidates amongst Hypothetical Proteins of Treponema pallidum ssp. pallidum

  • Ahmad Abu Turab Naqvi, 
  • Mohd Shahbaaz, 
  • Faizan Ahmad, 
  • Md. Imtaiyaz Hassan


14 May 2018: Naqvi AAT, Shahbaaz M, Ahmad F, Hassan MI (2018) Correction: Identification of Functional Candidates amongst Hypothetical Proteins of Treponema pallidum ssp. pallidum. PLOS ONE 13(5): e0197452. View correction


Syphilis is a globally occurring venereal disease, and its infection is propagated through sexual contact. The causative agent of syphilis, Treponema pallidum ssp. pallidum, a Gram-negative sphirochaete, is an obligate human parasite. Genome of T. pallidum ssp. pallidum SS14 strain (RefSeq NC_010741.1) encodes 1,027 proteins, of which 444 proteins are known as hypothetical proteins (HPs), i.e., proteins of unknown functions. Here, we performed functional annotation of HPs of T. pallidum ssp. pallidum using various database, domain architecture predictors, protein function annotators and clustering tools. We have analyzed the sequences of 444 HPs of T. pallidum ssp. pallidum and subsequently predicted the function of 207 HPs with a high level of confidence. However, functions of 237 HPs are predicted with less accuracy. We found various enzymes, transporters, binding proteins in the annotated group of HPs that may be possible molecular targets, facilitating for the survival of pathogen. Our comprehensive analysis helps to understand the mechanism of pathogenesis to provide many novel potential therapeutic interventions.


Treponema pallidum ssp. pallidum is experimentally investigated to be the cause of venereal syphilis, a globally existing sexually transmitted disease (STD) [14]. T. pallidum ssp. pallidum is a Gram-negative bacterium, classified as a member of family Spirochaetaceae [5]. The syphilis infection is frequently transmitted through sexual contacts, which results in the pandemic of this particular disease [6]. The primary effects of infection can be seen as skin lesions on the site of infection [4]. The secondary and tertiary stages of syphilis are assumed to be lethal because of the prevalence of the organism in the body of host [7,8]. The infection of syphilis is severe in nature as 12 million new cases of venereal syphilis were reported by World Health Organization in the year 1999 with most of the cases were from the developing countries [4].

The SS14 strain of T. pallidum ssp. pallidum was first isolated from the skin lesion of a patient with secondary syphilis [2,9]. The genome sequence of T. pallidum ssp. pallidum is available in the NCBI database containing 1,087 genes encode 1,027 proteins. Among these, function of 444 proteins are not experimentally determined so far, and are termed as hypothetical proteins (HPs). A hypothetical protein is one predicted to be encoded by an identified open reading frame, but for which no protein product has been confirmed or characterized. [10]. However, HPs possibly play important roles in the survival of pathogen, and hence disease progression [10,11]. Since, it is very difficult to work on T. pallidum ssp. pallidum because of its complete obligate dependence on a mammalian host system to survive in the environment. Therefore, genomic sequence of T. pallidum ssp. pallidum offers a wealth of basic information which can be further analyzed to extract useful information [3]. A precise function of HPs from several pathogenic organism have been reported already using sequence and structure based methods [1114].

The already sequenced genome of the T. pallidum ssp. pallidum was taken in our study to explore the function of these HPs with high precision using well optimized bioinformatics tools described elsewhere [15]. To predict function of HPs with high confidence, their sequences are retrieved from the NCBI and analyzed by using various bioinformatics tools for the prediction of physicochemical properties, sub-cellular localization, sequence similarity search, virulence factor prediction, etc. Moreover, HPs may act as potential virulent factors which may be predicted by bioinformatics tools and targeted further for the structure based rational drug design [1620]. The predicted functions of HPs are further validated by using a statistical technique like ROC (Receiver operating characteristic) that is helpful to assess the performance of used bioinformatics tools. We believe that such analyses expand our knowledge regarding the functional roles of HPs of T. pallidum ssp. pallidum and provide an opportunity to discover novel potential drug targets [21].

Materials and Methods

Here we used our well optimized series of tools for the functional annotation of HPs [11,15,22]. The sequences of all HPs were obtained from the NCBI ( The sequences of all 444 HPs were retrieved using their primary accession numbers in FASTA format from Uniprot database (

Analysis of physicochemical properties

Physicochemical parameters of all HPs were analyzed using Expasy’s ProtParam server ( This online server performs the theoretical measurement of various physicochemical parameters such as molecular mass, isoelectric point, extinction coefficient, instability index, aliphatic index and grand average of hydropathicity (GRAVY). The predicted properties of HPs are listed in the S1 Table.

Sub-cellular localization

The precise estimation of sub-cellular localization (such as cytoplasm, periplasm, inner membrane, outer membrane and extracellular space) of a protein is helpful in predicting its function at the cellular level. Previous studies show that a protein present in the cytoplasm is a drug target. While membrane proteins found on the surface are considered to be a vaccine targets [23]. Array of online subcellular localization software is used to predict the location of HPs in the T. pallidum ssp. pallidum. PSORTb CELLO (v2.5) and PSLpred are effective tools to predict the subcellular localization of a particular protein. The SignalIP4.1 was used to predict signal peptide cleavage sites. SecretomeP2.0 was used to predict non-classical protein secretion, i.e., signal peptide independent secretion. TMHMM and HMMTOP were used to predict transmembrane helices in proteins as it is helpful in identification of the membrane proteins. Detailed information on subcellular localization is listed in S2 Table.

Sequence comparisons

In order to search for known functional homologues of HPs, we performed sequence similarity searching using BLASTp against non-redundant (nr) database of proteins. We have performed HMM based similarity search using HMMSCAN, a module of HMMER server used to search for a similar domain and families. It works as an interface for searching the Pfam, TIGRFAMs, Gene3D and superfamily databases of protein families and domains. Results of sequence comparison are listed in the S3 Table.

Domain and function assignment

Proteins are classified into families and superfamily on the basis of their sequence, structure and function by various protein classification tools like CATH, SCOP, etc. Here, we used varieties of tools to predict the function of HPs. We have also used PANTHER, a database distinguishing proteins in families and subfamilies, which provides GO based function assignment of the protein. Furthermore, Pfam database was used to predict the function of proteins based on sequence similarity. We have also performed protein classification using clustering techniques using SYSTERS and ProtoNet. SYSTERS is a database of protein family which uses BLASTp to search the database for similar sequences and provides the cluster of proteins formed on the basis of functional similarity. However, the ProtoNet provides hierarchical classification of proteins. CDART tool was used to search the conserved domains in HPs which searches the query sequence against Conserved Domain Database (CDD). We have also analyzed HPs using Simple Modular Architecture Research Tool (SMART) which predicts the function of a protein based on the domain architecture. The motif search in protein sequences was done by using InterProscan, which searches various available databases for function prediction. Results of function prediction based on these tolls are listed in the S4 Table.

Virulence factor analysis

Identification of bacterial virulence factors can help to understand the mechanism of pathogenesis and search for potential therapeutic targets [23,24]. We used VICMpred [25] and VirulentPred [26] for identification of HPs which may be responsible for virulence in the T. pallidum ssp. pallidum. Virulent HPs from T. pallidum ssp. pallidum are listed in the S5 Table.

Prediction of protein interaction network

Functional association among proteins is necessary to complete any biological process, therefore, the knowledge of protein-protein interaction is also helpful for prediction of function of a protein. Here we have used STRING (version-9.1) [27] to predict the proteins which show interaction with HPs and hence its involvement in a particular metabolic process.

Performance assessment

The predicted functions of HPs from the genome of T. pallidum ssp. pallidum are validated using the receiver operating characteristic (ROC) analysis. This statistical analysis is performed using 100 sequences of proteins with known function (S6 Table). Functions of these proteins are predicted using the adopted pipeline for the annotation of the HPs. The diagnostics efficacy is evaluated at six levels. The true positive or true negative prediction is classified as ‘‘0” or ‘‘1” binary numerals. In addition, 1, 2, 3, 4 and 5 is the adopted confidence ratings. The average accuracy of the used pipeline is found to be 93.91% (S8 Table). ROC analysis indicates high reliability of bioinformatics tools used here (S7 and S8 Tables).

The level of confidence for each prediction is assumed on the basis of number of tools predicting similar function. For a particular HP, if its similar function was clearly given by four and more tools, then such prediction was considered as output with high level of confidence. Whereas if the function predicted by less than four tools, we have not included these HPs in the Table 1. Although, we separately provided a table for function prediction at low level of confidence in the S9 Table.

Results and Discussion

The genome of the SS14 strain was sequenced to high accuracy by Matejková et al., [2] in 2008 using oligonucleotide array strategy. But errors in key features such as start codons (alternate or otherwise) and stop codons (due to sequencing errors) were observed. Recently, the complete genome sequence of the TPA Mexico, A strain was reported by Pětrošová et al., [28] using the Illumina sequencing technique. However, a recent report on resequencing of T. pallidum ssp. pallidum strains Nichols and SS14 has identified errors in 11.5% of all annotated genes and subsequently corrected [29]. Hence, we assume that the available genome sequence of T. pallidum ssp. pallidum in the database is free from experimental sequencing errors. Extensive sequence analysis of all 444 HPs based on the above mentioned tools helped us to precisely assign function to 207 HPs with high confidence (Table 1). We have also predicted functions for 237 HPs with low level of confidence (S9 Table). We annotated the function of these HPs using protein classification databases such as CATH, Superfamily, Pfam, PANTHER, SYSTERS. Recent studies pertaining to experimental analysis of T. pallidum ssp. pallidum genome (Nichols) have provided us with solid evidences that support most of the predictions of this work [30]. All of these studies are performed using Nichols strain which shows slight variations from SS14 strain of T. pallidum ssp. pallidum [2]. Besides slight variations in some regions, we have found substantive correlation with data provided by these studies with that of predicted function in the present work. We categorized all these 207 HPs in various functional classes that contain 83 enzymes, 58 binding proteins, 28 transporters, 31 proteins involved in various cellular processes like regulation mechanisms, and 17 proteins exhibiting miscellaneous functions (Fig 1). Various functional classes of these classified HPs are described below.

Fig 1. Classification of 207 HPs into various groups by utilizing the functional annotation results of various bioinformatics tools.

The chart shows that there are 83 enzymes, 28 proteins involve in transportation, 58 binding proteins, 21 proteins involved in cellular processes like transcription, translation, replication etc. and 17 showing miscellaneous functions among 207 HPs from T. pallidum ssp. pallidum.


Enzymes play vital role in many leading biochemical processes. About 40% of annotated HPs are enzymes. T. pallidum ssp. pallidum is an obligate parasite therefore it solely depends on the host for most of its nutritional requirements [4]. Enzymes may facilitate its survival in the host by carrying out various cellular processes making it viable for the course of infection in the host.

We found six oxidoreductases among these HPs of T. pallidum ssp. pallidum. These enzymes presumably play an essential role in the pathogenesis. B2S298 (HP TPASS_0151) is NADH-quinone reductase (NQR2/RnfD) which regulates expression of virulence factors in Vibrio cholerae [31]. It is also involved in sodium translocation and electron transport [31]. Most of the oxidoreductases are involved in iron-sulphur cluster transport [31].

There are 27 HPs predicted as transferases. Many members of this class are involved in lipid biosynthesis, RNA processes and other significant cellular processes thus responsible for bacterial pathogenesis and virulence. There are various kinases such as B2S2P4 (HP TPASS_0296), which take part in coenzyme A biosynthesis [32]. B2S1Z8 (HP TPASS_0050) is predicted to be phosphoribosyl transferase. Members of PRTase family are involved in DNA processing and nucleotide metabolism [33]. Titz et al., [30] provided a similar function for the TP0050 gene product in Nichols strain of T. pallidum ssp. pallidum in their study which shows a significant similarity with HP TPASS_0050. B2S2Q5 (HP TPASS_0307) is a PASTA domain containing protein which is found in penicillin binding proteins and serine/threonine kinases [34]. McKevitt et al., [35] in their study of T. pallidum ssp. pallidum (Nichols strain) antigens predicted TP0307 as conserved hypothetical protein. This domain has special affinity for β-lactam antibiotics [34]. They characterized TP0750, TP0494 as conserved HPs [35]. In the present work, we have successfully assigned functions to their homologues in SS14 strain i.e. HP TPASS_0750 (B2S3Y0) and HP TPASS_0494 (B2S389) as nicotinate-nucleotide adenylyltransferase and zinc ribbon domain containing protein, respectively. B2S389 (HP TPASS_0494) and B2S3H9 (HP TPASS_0592) exhibit DNA directed polymerase activity, hence proving their role in bacterial pathogenesis by facilitating regulatory processes. B2S492 (HP TPASS_0860) is HAMP domain containing protein which is a characteristic domain of signal transduction proteins and helps in signal conversion [36].

The third class of enzymes is hydrolases. There are more than 50% proteins in all characterized enzymes representing this class of enzymes. The majority of representative proteins of hydrolase class are membrane bound proteins involved in various significant processes such transmembrane transport, metal ion binding, cell wall degradation, thus associated with various virulence factors. There is a number proteins having peptidase activity that contains LysM domain, responsible for cell wall degradation in prokaryotes [37] which helps various transmembrane transporters to carry out their functions. There are six phosphohydrolases in this group. They contain conserved HD motif which holds the specific characteristic of signal transduction systems [38] and have metal ion binding property [39]. We found B2S4K0 (HP TPASS_0963) and B2S4K9 (HP TPASS_0972) which exhibit antibiotic resistance capacity and are involved in macrolide antibiotic transportation [40]. Titz et al., [30] predicted TP0936, a counterpart of HP TPASS_0963 in the Nichols strain as ABC transporter and depicted its involvement in membrane biogenesis. We predicted HP TPASS_0444 (B2S340) as peptidoglycan-binding protein. Homologue of HP TPASS_0444 in the Nichols strain (TP0444) is predicted as conserved HP in the above mentioned study. We have successfully assigned function to the homologue of TP0877 in SS14 strain (HP TPASS_0877) as glycoprotease which is characterized as conserved HP in the gene expression analysis as done by Smajs et al., [41].

Lyases also play a key role in bacterial pathogenesis as they are involved in various biosynthesis processes. B2S3A6 (HP TPASS_0512) shows 2-C-methyl-D-erythritol 2, 4-cyclodiphosphate synthase activity and is involved in isoprenoid synthesis. It may be acting as a potential drug target [42].


Transporter proteins are involved in transportation of nutrients, that are helpful in various metabolic processes, and hence survival of the organism. These proteins also facilitate the transfer of virulence factors and are directly involved in infection [43]. We found 28 proteins having functions as transporters possibly involved in transportation of metal ions, virulence factors and biosynthesis assembly proteins. Some of HPs are the members of ABC transporter class proteins. B2S3C6 (HP TPASS_0534) is V-type ATP synthase (subunit C) which may be involved in ATP synthesis hence may be involved in providing energy for various metabolic processes of bacterial pathogen [44]. B2S3F9 (HP TPASS_0567) is MgtE N-terminal domain containing protein and helps in magnesium transport [45]. McKevitt et al and Smajs et al characterized its counterpart (TP0567) as HPs in their experimental studies [35,41]. Similarly, B2S3G4 (HP TPASS_0580) is FMN-binding domain protein which is found to be involved in the electron transfer pathway [46]. Titz et al., [30] predicted the gene product of Nichols strain (TP0580) as ABC transporter whereas Smajs et al., [41] characterized it as conserved hypothetical integral membrane protein. B2S3L4 (HP TPASS_0625) is an outer membrane protein (OmpA) which works as a receptor for T-even like phages. It also acts as a porin protein with low permeability allowing penetration of small solutes [47]. B2S460 (HP TPASS_0826) is predicted as mechanosensitive ion channel which allows efflux of solvent and solutes in cytoplasm hence making its role significant in survival of pathogen [48]. B2S478 (HP TPASS_0846) contains major facilitator superfamily domain and is a representative of a class of membrane transporters which are involved in transportation of sugars, amino acids, drugs, various metabolites and varieties of ions [49]. B2S4D8 (HP TPASS_0906) and B2S4M3 (HP TPASS_0986) are multidrug transporters and exhibit multiple drug resistance capability thus making the pathogen viable against drugs [50]. A detailed understanding of the functional mechanism of all these transporters will be helping to discover effective drugs against them.

Binding proteins

We have characterized 58 proteins as binding proteins out of 207 functionally annotated HPs. We have further divided these into 13 DNA binding, nine RNA binding, 31 protein binding, three ion binding and two adhesion proteins. The DNA and RNA binding proteins are involved in various cellular and regulatory processes such as transcription, translation and recombination and thus playing a vital role in the survival and propagation of pathogen in the host. 31 HPs are the protein binding in nature, and 29 of them are tetratricopeptide repeat (TPR) containing proteins. TPR containing proteins are involved in protein-protein interactions and thus plays an important role in virulence [51]. B2S214 (HP TPASS_0066) and B2S215 (HP TPASS_0067) are tetratricopeptide repeat containing proteins. Titz et al., [30] predicted their homologues in Nichols strain (TP0066 and TP0067) to be involved in DNA metabolism. Tetratricopeptide repeat containing proteins are involved in various metabolic and regulatory processes [51]. Homologues of this protein predicted with tetrapeptide repeats in the present work are characterized as HP by McKevitt and Smajs group [35,41]. Therefore, proteins showing 100% similarity may be considered exhibiting similar functions for Nichols strain and indicating experimental evidence. We found that B2S2J3 (HP TPASS_0246) and B2S3Y9 (HP TPASS_0752) are showing similarity with von Willebrand factor with a type A domain which is found to be responsible for various blood disorders [5254]. Association of type A domain makes it liable to be involved in various significant activities such as cell adhesion and immune defense [55]. Thus, such HPs may be possible therapeutic targets because they are involved in the bacterial pathogenesis by helping in cell adhesion and immune defense mechanism.

Cellular processes/regulatory proteins

There are 21 HPs presumably involved in various cellular and regulatory mechanisms, and are important for the pathogenesis of T. pallidum ssp. pallidum. Most of these proteins are involved in cell division, chromosome segregation and condensation, sporulation, intercellular signaling and various flagellar proteins involved in transport activity. These proteins may also be important for bacterial pathogenesis and can be treated as possible drug targets [56]. B2S2P5 (HP TPASS_0297) is found to be presumably involved in sporulation and cell division. Titz et al., [30] predicted involvement of its counterpart TP0297 (Nichols strain) in the cell wall metabolism. B2S3T0 (HP TPASS_0702) is prokaryotic chromosome segregation/condensation protein ScpA whereas its homologue in Nichols strain (TP0702) was characterized as a HP in the study done by Smajs et al on T. pallidum ssp. pallidum transcriptome [41].

Proteins with miscellaneous functions

We found 17 HPs exhibiting miscellaneous functions such as cell signaling, solvent tolerance proteins, etc. B2S234 (HP TPASS_0086) is a PilZ domain containing protein that serves as the receptor for cyclic di-GMP which act as secondary messenger for bacteria [57,58]. Cyclic di-GMP is involved in regulation of exo-polysaccharide synthesis, motility of bacteria, gene expression and host-pathogen interaction [57,58]. Hence, these HPs may also be considered to be significant in the pathogenesis of T. pallidum ssp. pallidum. B2S3A9 (HP TPASS_0515) and B2S424 (HP TPASS_0796) are organic solvent tolerance proteins responsible for antibiotic resistance [59]. Smajs et al., [41] characterized its homologue in the Nichols strain (TP0796) as conserved HP. B2S3B5 (HP TPASS_0522) is a colicin V production protein that is a bacterial toxin which disrupts the membrane potential of other sensitive cell thus leading to their death [60]. B2S3F5 (HP TPASS_0563) is a DnaJ domain containing protein which is an exclusive feature of hsp40 family of molecular chaperons [61]. These molecular chaperons are involved in various significant processes such as protein folding, polypeptide translocation and protein degradation [61]. Our knowledge of these HPs will be helpful in the field of the drug discovery by completing the mosaic of knowledge regarding the host-pathogen interaction especially in the case of T. pallidum ssp. pallidum.

We compared the group of HPs successfully annotated with high confidence (Table 1) with those of unannotated genes (Table S9). For the comparison, we considered several characteristics features such as average gene length, the number of predicted protein- protein interactions, gene expression level and predicted antigens. Surprisingly, there is a relative difference between average gene lengths of the HPs of both groups was observed. The average length of polypeptides chain, not annotated, are less than 40 amino acids, which corresponding to the gene length of 120 bps. Whereas, in the group of HPs predicted with a high level of confidence (n = 207) the average gene length is relatively high. We can infer that the relatively smaller gene lengths have affected the confidence level of this group.

We further used STRING [27] to predict the protein-protein interactions. While comparing both groups for the number of predicted protein-protein interactions, we found no such characteristic difference that could affect the confidence level of function prediction. For instance, string predicted 10 functional partners for the protein HP TPASS_0017 (B2S1W5) whereas it predicted 4 functional partners for the protein HP TPASS_0004 (B2S1V4) which is an HP of the group for which functions are assigned with low level of confidence. It predicted only two functional partners for the HP TPASS_0022 (B2S1X0) which is from first group whereas it predicted 10 functional partners for the HP TPASS_0008 (B2S1V7) which is an HP from second group.

We checked the expression level of genes from both groups on the basis of study of Smajs et al. [41]. We did not find any such correlations for the gene expression levels in this study. On the other hand, we checked the number of predicted antigens using the investigation of McKevitt et al. [35] for T. pallidum antigens. We found 17 predicted antigens in the group of HPs for which functions are predicted with a high-level of confidence. Whereas, against the expectations, we found a relatively higher number of predicted antigens i.e., 24 in the second group. The comparison done between both the groups considering characteristics such as gene length, predicted protein—protein interactions, gene expression levels and predicted antigens established no characteristic difference except for the gene length that is relatively low in the second group (n = 237). We should notice, although, that no differences between the group of genes with predicted function and the group of genes with a less accurate predicted function is here observed if we compare these results with previously published experimental studies [35,41]. This may suggest that the degree of prediction accuracy does not necessarily allow to univocally identify functional genes and has to be taken with caution.

Virulent proteins

Gram negative pathogens are frequently evolved to modify the features like increase motility, cell adhesion and to tackle with immune response of the host, thus increasing their virulence inside the host environment [62]. We have used VICMpred and Virulentpred servers to predict virulence factors in this group of 444 HPs. There are 19 HPs (out of 207) found to be virulent on the basis of the consensus sequence analysis (Table 2). It was already hypothesized that targeting virulence factor provides a better therapeutic intervention against bacterial pathogenesis [63]. The predicted HPs having virulent characteristics provide a powerful target-based therapies to clear an existing infection and are further considered as an adjunct therapy to existing antibiotics, or potentiators of the host immune response [64]. The progress reported recently a proof of concept for antivirulence molecules at the preclinical stages should allow the antivirulence concept to become a reality as a new antibacterial approach.

Table 2. List of HPs with virulence factors in T. pallidum ssp. pallidum.


Functional annotation of 444 HPs from T. pallidum ssp. pallidum has been carried out using various in silico approaches and functions have been assigned to 207 HPs with high confidence. Performance assessment of bioinformatics tools was carried out using ROC analysis and reported in terms of accuracy and sensitivity of the predicting tools. We are not considering the HPs annotated with low level of confidence. Our prediction is showing functional importance of the HPs in the survival of the pathogen in the host. Our study facilitates a rapid identification of the hidden function of HPs which is potential therapeutic targets and may play a significant role in better understanding of host-pathogen interactions. Once these HPs are established as a novel drug/vaccine targets, further research for new inhibitors and vaccines can be conducted.

Supporting Information

S1 Table. List of computed physicochemical properties of 444 HPs from T. pallidum ssp. pallidum.


S2 Table. List of predicted subcellular localizations of 444 HPs from T. pallidum ssp. pallidum.


S3 Table. List of predicted results of Blast, STRING, HMMER, SMART and INTERPROSCAN for 444 HPs from T. pallidum ssp. pallidum.


S4 Table. List of predicted results of CATH, SUPERFAMILY, PANTHER, CDART, Pfam, SYSTERS and ProtoNet for 444 HPs from T. pallidum ssp. pallidum.


S5 Table. List of predicted virulence factors from 444 HPs from T. pallidum ssp. pallidum by using VICMPred and Virulentpred.


S6 Table. List of annotated function of 100 proteins with known function from T. pallidum ssp. pallidum using BLASTp, HMMER, SMART and INTERPROSCAN for ROC analysis.


S7 Table. List of functionally annotated domain of 100 proteins with known function from T. pallidum ssp. pallidum by CATH, SUPERFAMILY, PANTHER, CDART, Pfam, SYSTERS, and ProtoNet for ROC analysis.


S8 Table. List of accuracy, sensitivity, specificity and ROC area of various bioinformatics.


S9 Table. Functionally annotated HPs from T. pallidum ssp. pallidum with low level of confidence.


Author Contributions

Conceived and designed the experiments: AATN MIH. Performed the experiments: AATN MS MIH. Analyzed the data: AATN MS FA MIH. Contributed reagents/materials/analysis tools: AATN MIH. Wrote the paper: AATN FA MIH. Contributed to workstation management: MS MIH.


  1. 1. McGill MA, Edmondson DG, Carroll JA, Cook RG, Orkiszewski RS, Norris SJ (2010) Characterization and serologic analysis of the Treponema pallidum proteome. Infection and immunity 78: 2631–2643. pmid:20385758
  2. 2. Matejkova P, Strouhal M, Smajs D, Norris SJ, Palzkill T, Petrosino JF, et al. (2008) Complete genome sequence of Treponema pallidum ssp. pallidum strain SS14 determined with oligonucleotide arrays. BMC Microbiol 8: 76. pmid:18482458
  3. 3. Fraser CM, Norris SJ, Weinstock GM, White O, Sutton GG, Dodson R, et al. (1998) Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281: 375–388. pmid:9665876
  4. 4. Peeling RW, Hook EW 3rd (2006) The pathogenesis of syphilis: the Great Mimicker, revisited. The Journal of pathology 208: 224–232. pmid:16362988
  5. 5. Paster BJ, Dewhirst FE (2000) Phylogenetic foundation of spirochetes. J Mol Microbiol Biotechnol 2: 341–344. pmid:11075904
  6. 6. Hertel M, Matter D, Schmidt-Westhausen AM, Bornstein MM (2014) Oral syphilis: a series of 5 cases. J Oral Maxillofac Surg 72: 338–345. pmid:24045192
  7. 7. Singh AE, Romanowski B (1999) Syphilis: review with emphasis on clinical, epidemiologic, and some biologic features. Clin Microbiol Rev 12: 187–209. pmid:10194456
  8. 8. Abell E, Marks R, Jones EW (1975) Secondary syphilis: a clinico-pathological review. Br J Dermatol 93: 53–61. pmid:1191529
  9. 9. Stamm LV, Kerner TC Jr., Bankaitis VA, Bassford PJ Jr. (1983) Identification and preliminary characterization of Treponema pallidum protein antigens expressed in Escherichia coli. Infection and immunity 41: 709–721. pmid:6347894
  10. 10. Desler C, Suravajhala P, Sanderhoff M, Rasmussen M, Rasmussen L (2009) In Silico screening for functional candidates amongst hypothetical proteins. BMC Bioinformatics 10: 289. pmid:19754976
  11. 11. Kumar K, Prakash A, Tasleem M, Islam A, Ahmad F, Hassan MI (2014) Functional annotation of putative hypothetical proteins from Candida dubliniensis. Gene 543: 93–100. pmid:24704023
  12. 12. Shahbaaz M, Ahmad F, Imtaiyaz Hassan M (2014) Structure-based functional annotation of putative conserved proteins having lyase activity from Haemophilus influenzae. 3 Biotech: 1–20.
  13. 13. Kumar K, Prakash A, Islam A, Ahmad F, Hassan MI (2014) Structure based Functional Annotation of Hypothetical Proteins from Candida dubliniensis: A Quest for Novel Drug Target. 3Biotech (In Press).
  14. 14. Sinha A, Ahmad F, Hassan MI (2014) Structure Based Functional Annotation of Putative Conserved Proteins from Treponema pallidum: Search for a Potential Drug Target. Letters in Drug Design & Discovery 12: 46–59.
  15. 15. Shahbaaz M, Hassan MI, Ahmad F (2013) Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20. PloS one 8: e84263. pmid:24391926
  16. 16. Hassan MI, Kumar V, Singh TP, Yadav S (2007) Structural model of human PSA: a target for prostate cancer therapy. Chem Biol Drug Des 70: 261–267. pmid:17718721
  17. 17. Hassan MI, Kumar V, Somvanshi RK, Dey S, Singh TP, Yadav S (2007) Structure-guided design of peptidic ligand for human prostate specific antigen. J Pept Sci 13: 849–855. pmid:17890654
  18. 18. Thakur P, Kumar J, Ray D, Anjum F, Hassan MI (2013) Search of potential inhibitor against New Delhi metallo-beta-lactamase 1 from a series of antibacterial natural compounds using docking approach. J Nat Sci Biol Med.
  19. 19. Thakur PK, Hassan I (2011) Discovering a potent small molecule inhibitor for gankyrin using de novo drug design approach. Int J Comput Biol Drug Des 4: 373–386. pmid:22199037
  20. 20. Thakur PK, Prakash A, Khan P, Fleming RE, Waheed A, Ahmad F, et al. (2013) Identification of Interfacial Residues Involved in Hepcidin-Ferroportin Interaction. Lett Drug Des Discov 11: 363–374.
  21. 21. Sinha A, Ahmad F, Hassan MI (2014) Structure Based Functional Annotation of Putative Conserved Proteins from Treponema pallidum: Search for a Potential Drug Target. Letters in Drug Design & Discovery (In Press).
  22. 22. Mazandu GK, Mulder NJ (2012) Function Prediction and Analysis of Mycobacterium tuberculosis Hypothetical Proteins. Int J Mol Sci 13: 7283–7302. pmid:22837694
  23. 23. Vetrivel U, Subramanian G, Dorairaj S (2011) A novel in silico approach to identify potential therapeutic targets in human bacterial pathogens. Hugo J 5: 25–34. pmid:23205162
  24. 24. Zheng LL, Li YX, Ding J, Guo XK, Feng KY, Wang YJ, et al. (2012) A comparison of computational methods for identifying virulence factors. PLoS One 7: e42517. pmid:22880014
  25. 25. Saha S, Raghava GP (2006) VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition. Genomics Proteomics Bioinformatics 4: 42–47. pmid:16689701
  26. 26. Garg A, Gupta D (2008) VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinformatics 9: 62. pmid:18226234
  27. 27. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al. (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41: D808–815. pmid:23203871
  28. 28. Petrosova H, Zobanikova M, Cejkova D, Mikalova L, Pospisilova P, Strouhal M, et al. (2012) Whole genome sequence of Treponema pallidum ssp. pallidum, strain Mexico A, suggests recombination between yaws and syphilis strains. PLoS Negl Trop Dis 6: e1832. pmid:23029591
  29. 29. Petrosova H, Pospisilova P, Strouhal M, Cejkova D, Zobanikova M, Mikalova L, et al. (2013) Resequencing of Treponema pallidum ssp. pallidum strains Nichols and SS14: correction of sequencing errors resulted in increased separation of syphilis treponeme subclusters. PloS one 8: e74319. pmid:24058545
  30. 30. Titz B, Rajagopala SV, Goll J, Hauser R, McKevitt MT, Palzkill T, et al. (2008) The binary protein interactome of Treponema pallidum—the syphilis spirochete. PLoS One 3: e2292. pmid:18509523
  31. 31. Hase CC, Mekalanos JJ (1999) Effects of changes in membrane sodium flux on virulence gene expression in Vibrio cholerae. Proceedings of the National Academy of Sciences of the United States of America 96: 3183–3187. pmid:10077658
  32. 32. Obmolova G, Teplyakov A, Bonander N, Eisenstein E, Howard AJ, Gilliland GL (2001) Crystal structure of dephospho-coenzyme A kinase from Haemophilus influenzae. Journal of structural biology 136: 119–125. pmid:11886213
  33. 33. Anantharaman V, Iyer LM, Aravind L (2012) Ter-dependent stress response systems: novel pathways related to metal sensing, production of a nucleoside-like metabolite, and DNA-processing. Molecular bioSystems 8: 3142–3165. pmid:23044854
  34. 34. Yeats C, Finn RD, Bateman A (2002) The PASTA domain: a beta-lactam-binding domain. Trends in biochemical sciences 27: 438. pmid:12217513
  35. 35. McKevitt M, Brinkman MB, McLoughlin M, Perez C, Howell JK, Weinstock GM, et al. (2005) Genome scale identification of Treponema pallidum antigens. Infection and immunity 73: 4445–4450. pmid:15972547
  36. 36. Stewart V (2014) The HAMP signal-conversion domain: static two-state or dynamic three-state? Molecular microbiology 91: 853–857. pmid:24417364
  37. 37. Joris B, Englebert S, Chu CP, Kariyama R, Daneo-Moore L, Shockman GD, et al. (1992) Modular design of the Enterococcus hirae muramidase-2 and Streptococcus faecalis autolysin. FEMS microbiology letters 70: 257–264. pmid:1352512
  38. 38. Galperin MY, Natale DA, Aravind L, Koonin EV (1999) A specialized version of the HD hydrolase domain implicated in signal transduction. Journal of molecular microbiology and biotechnology 1: 303–305. pmid:10943560
  39. 39. Aravind L, Koonin EV (1998) The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends in biochemical sciences 23: 469–472. pmid:9868367
  40. 40. Lin HT, Bavro VN, Barrera NP, Frankish HM, Velamakanni S, van Veen HW, et al. (2009) MacB ABC transporter is a dimer whose ATPase activity and macrolide-binding capacity are regulated by the membrane fusion protein MacA. J Biol Chem 284: 1145–1154. pmid:18955484
  41. 41. Smajs D, McKevitt M, Howell JK, Norris SJ, Cai WW, Palzkill T, et al. (2005) Transcriptome of Treponema pallidum: gene expression profile during experimental rabbit infection. Journal of bacteriology 187: 1866–1874. pmid:15716460
  42. 42. Kishida H, Wada T, Unzai S, Kuzuyama T, Takagi M, Terada T, et al. (2003) Structure and catalytic mechanism of 2-C-methyl-D-erythritol 2,4-cyclodiphosphate (MECDP) synthase, an enzyme in the non-mevalonate pathway of isoprenoid synthesis. Acta crystallographica Section D, Biological crystallography 59: 23–31. pmid:12499535
  43. 43. Franke K, Nguyen M, Hartl A, Dahse HM, Vogl G, Wurzner R, et al. (2006) The vesicle transport protein Vac1p is required for virulence of Candida albicans. Microbiology 152: 3111–3121. pmid:17005990
  44. 44. Rappas M, Niwa H, Zhang X (2004) Mechanisms of ATPases—a multi-disciplinary approach. Curr Protein Pept Sci 5: 89–105. pmid:15078220
  45. 45. Hattori M, Tanaka Y, Fukai S, Ishitani R, Nureki O (2007) Crystal structure of the MgtE Mg2+ transporter. Nature 448: 1072–1075. pmid:17700703
  46. 46. Liepinsh E, Kitamura M, Murakami T, Nakaya T, Otting G (1997) Pathway of chymotrypsin evolution suggested by the structure of the FMN-binding protein from Desulfovibrio vulgaris (Miyazaki F). Nature structural biology 4: 975–979. pmid:9406543
  47. 47. MacIntyre S, Henning U (1990) The role of the mature part of secretory proteins in translocation across the plasma membrane and in regulation of their synthesis in Escherichia coli. Biochimie 72: 157–167. pmid:1974149
  48. 48. Naismith JH, Booth IR (2012) Bacterial mechanosensitive channels—MscS: evolution's solution to creating sensitivity in function. Annual review of biophysics 41: 157–177. pmid:22404681
  49. 49. Pao SS, Paulsen IT, Saier MH Jr. (1998) Major facilitator superfamily. Microbiology and molecular biology reviews: MMBR 62: 1–34. pmid:9529885
  50. 50. Ninio S, Rotem D, Schuldiner S (2001) Functional analysis of novel multidrug transporters from human pathogens. The Journal of biological chemistry 276: 48250–48256. pmid:11574548
  51. 51. Cerveny L, Straskova A, Dankova V, Hartlova A, Ceckova M, Staud F, et al. (2013) Tetratricopeptide repeat motifs in the world of bacterial pathogens: role in virulence mechanisms. Infection and immunity 81: 629–635. pmid:23264049
  52. 52. Ruggeri ZM, Ware J (1993) von Willebrand factor. FASEB J 7: 308–316. pmid:8440408
  53. 53. Ahmad F, Jan R, Kannan M, Obser T, Hassan MI, Oyen F, et al. (2013) Characterisation of mutations and molecular studies of type 2 von Willebrand disease. Thromb Haemost 109: 39–46. pmid:23179108
  54. 54. Hassan MI, Saxena A, Ahmad F (2012) Structure and function of von Willebrand factor. Blood Coagul Fibrinolysis 23: 11–22. pmid:22089939
  55. 55. Colombatti A, Bonaldo P, Doliana R (1993) Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular matrix proteins. Matrix 13: 297–306. pmid:8412987
  56. 56. Singer HM, Kuhne C, Deditius JA, Hughes KT, Erhardt M (2014) The Salmonella Spi1 virulence regulatory protein HilD directly activates transcription of the flagellar master operon flhDC. J Bacteriol 196: 1448–1457. pmid:24488311
  57. 57. Amikam D, Galperin MY (2006) PilZ domain is part of the bacterial c-di-GMP binding protein. Bioinformatics 22: 3–6. pmid:16249258
  58. 58. Ryjenkov DA, Simm R, Romling U, Gomelsky M (2006) The PilZ domain is a receptor for the second messenger c-di-GMP: the PilZ domain protein YcgR controls motility in enterobacteria. J Biol Chem 281: 30310–30314. pmid:16920715
  59. 59. Pourahmad Jaktaji R, Ebadi R, Karimi M (2012) Study of Organic Solvent Tolerance and Increased Antibiotic Resistance Properties in E. coli gyrA Mutants. Iranian journal of pharmaceutical research: IJPR 11: 595–600. pmid:24250484
  60. 60. Yang CC, Konisky J (1984) Colicin V-treated Escherichia coli does not generate membrane potential. J Bacteriol 158: 757–759. pmid:6373733
  61. 61. Frydman J (2001) Folding of newly translated proteins in vivo: the role of molecular chaperones. Annu Rev Biochem 70: 603–647. pmid:11395418
  62. 62. Livorsi DJ, Stenehjem E, Stephens DS (2011) Virulence factors of gram-negative bacteria in sepsis with a focus on Neisseria meningitidis. Contributions to microbiology 17: 31–47. pmid:21659746
  63. 63. Clatworthy AE, Pierson E, Hung DT (2007) Targeting virulence: a new paradigm for antimicrobial therapy. Nat Chem Biol 3: 541–548. pmid:17710100
  64. 64. Marra A (2006) Targeting virulence for antibacterial chemotherapy: identifying and characterising virulence factors for lead discovery. Drugs R D 7: 1–16. pmid:16620133