Corynebacterium diphtheriae (Cd) is a Gram-positive human pathogen responsible for diphtheria infection and once regarded for high mortalities worldwide. The fatality gradually decreased with improved living standards and further alleviated when many immunization programs were introduced. However, numerous drug-resistant strains emerged recently that consequently decreased the efficacy of current therapeutics and vaccines, thereby obliging the scientific community to start investigating new therapeutic targets in pathogenic microorganisms. In this study, our contributions include the prediction of modelome of 13 C. diphtheriae strains, using the MHOLline workflow. A set of 463 conserved proteins were identified by combining the results of pangenomics based core-genome and core-modelome analyses. Further, using subtractive proteomics and modelomics approaches for target identification, a set of 23 proteins was selected as essential for the bacteria. Considering human as a host, eight of these proteins (glpX, nusB, rpsH, hisE, smpB, bioB, DIP1084, and DIP0983) were considered as essential and non-host homologs, and have been subjected to virtual screening using four different compound libraries (extracted from the ZINC database, plant-derived natural compounds and Di-terpenoid Iso-steviol derivatives). The proposed ligand molecules showed favorable interactions, lowered energy values and high complementarity with the predicted targets. Our proposed approach expedites the selection of C. diphtheriae putative proteins for broad-spectrum development of novel drugs and vaccines, owing to the fact that some of these targets have already been identified and validated in other organisms.
Citation: Jamal SB, Hassan SS, Tiwari S, Viana MV, Benevides LdJ, Ullah A, et al. (2017) An integrative in-silico approach for therapeutic target identification in the human pathogen Corynebacterium diphtheriae. PLoS ONE 12(10): e0186401. https://doi.org/10.1371/journal.pone.0186401
Editor: Alexandre G. de Brevern, UMR-S1134, INSERM, Université Paris Diderot, INTS, FRANCE
Received: December 6, 2016; Accepted: September 29, 2017; Published: October 19, 2017
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: The study was supported by grant from the TWAS-CNPq Postgraduate Fellowship Programme (https://twas.org/opportunity/twas-cnpq-postgraduate-fellowship-programme) for granting a fellowship for doctoral studies and CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brasil: http://www.capes.gov.br/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: BLAST, Basic Local Alignment Search Tool; Cd, Corynebacterium diphtheriae; DNA, Deoxyribonucleic acid; LDC, Lysine decarboxylase; MVD, Molegro Virtual Docker; PDB, Protein Data Bank; RNA, Ribonucleic acid; NP, Natural Product
Corynebacterium diphtheriae is responsible for causing diphtheria which remains a major global cause of death (http://www.who.int/immunization_monitoring/diseases/diphteria/), and has conventionally been divided into four subgroups of biovars i.e., gravis, intermedius, mitis and belfanti based on biochemical characteristics according to Funke et al., 1997  and Whitman et al., 2012 . It was once a major cause of infant mortality, which spread as an epidemic and resulted in thousands of deaths . The death rates dropped over time specifically in countries where living standards have improved, and the death rates rapidly declined after the introduction of immunization programs . Despite these measures, it remains a significant pathogen around the globe, even today. A variety of mechanisms were responsible for causing such death rates; for example the ‘strangling angel’ effect on children that ascended from the wing shaped pseudo-membranes formed in the oropharynx. Disarticulation and impaction of these pseudo-membranes triggers acute airway obstruction and can result in sudden death [3, 4]. Since there has been a plethora of reported cases on both non-lethal and lethal diphtheria across various countries in the past few years, and that significant population displacements in the form of immigration are happening, more such cases are bound to follow. A passable handling requires quick inroads in discovering diphtheria antitoxin and antibiotic treatment .
Computational methods and other approaches, like reverse vaccinology, have been established for the rapid identification of novel targets in the post-genomic era [6, 7]. Approaches like subtractive and comparative microbial genomics as well as differential genome analysis  are being used for the identification of targets in a number of human pathogens like M. tuberculosis , Burkholderia pseudomalleii , Helicobacter pylori  Pseudomonas aeruginosa , Neisseria gonorrhea  and Salmonella typhi .
The main principle is to find targeted genes/proteins that are essential for the pathogen and possess no homology counterpart in the host , such that drugs targeting these “pathogen-essential non-host homologs” can be applied with little (or no) off targets in the host. Some pathogen-essential proteins, though, may possess a certain degree of homology to host proteins. However, they might still be selected as potential molecular targets for structure-based selective inhibitor development. Significant differences in the active sites or in other druggable pockets might exist, such that the pathogenic protein could still be targeted [16, 17].
Here, we exploit an integrative in silico approach for the predictive proteome of C. diphtheriae species to associate the genomic information with the identification of putative therapeutic targets based on their three dimensional structure. It can be utilized for the identification of potent inhibitors, which might possibly lead to the discovery of compounds that inhibit pathogenic growth. The predicted proteomes from the 13 genomes of C. diphtheriae were modeled (pan-modelome) using the MHOLline workflow as proposed by Hassan et al., 2014 . Furthermore, intra-species conserved proteins with adequate 3D models (core-modelome) were filtered on the basis of predicted essentiality for the bacteria, which leads to the identification of eight essential bacterial proteins. They were found non homologous to all host proteins and have been subjected to virtual screening using multiple compound libraries.
We provided a list of putative targets in C. diphteriae, and possible mechanisms to design peptide vaccines, and suggest novel lead, natural and drug-like compounds that could bind to the proposed target proteins.
Materials and methods
The thirteen C. diphtheriae strains, including three of the four biovars: gravis, mitis and belfanti (Table 1) were included in this study. The gene and protein sequences of these thirteen C. diphtheriae strains were retrieved from NCBI (ftp://ftp.ncbi.nih.gov/genomes/Bacteria). The different steps involved in this computational approach for genome-scale modelome prediction and for the prioritization of putative drug and vaccine targets are given in (Figs 1 & 2).
The table represents the total number of protein sequences as an input data fed to the MHOLline workflow (upper red arrow). The blue arrow represents the core genes of thirteen Cd strains. The rectangular boxes show how this workflow processes and filters a large quantity of genomic data for putative drug and vaccine target identification of a pathogen.
Prediction of core-modelome and identification of core genome
To construct the core-modelome of C. diphtheriae, we followed a slightly modified protocol described by Hassan et al., 2014 . High throughput structural modeling, MHOLline (http://www.mholline.lncc.br), was used to predict the modelome (whole-proteome set of protein 3D models) for each strain. MHOLline uses comparative modeling approach for protein 3D structure prediction through MODELLER . Our workflow also includes BLASTp (Basic Local Alignment Search Tool for Protein) , HMMTOP (Prediction of transmembrane helices and topology of proteins),  BATS (Blast Automatic Targeting for Structures), FILTERS, ECNGet (Get Enzyme Commission Number), MODELLER, and PROCHECK .
MHOLline work on the basis of available template. It is probable that MHOLline cannot detect all the common conserved proteins due to the unavailability of the template. To overcome this probability, we used EDGAR (an Efficient Database framework for comparative Genome Analyses using BLAST score Ratios for pan-genomics analysis) to collect common conserved genome as well of all Cd strains . Later, the results from MHOLine and EDGAR were compared and crosschecked to obtain the final dataset of common conserved proteins.
Identification of intra-species conserved proteins
Primarily, for the identification of highly conserved proteins with available 3D models in all Cd strains (≥ 95% sequence identity), the standalone release of NCBI BLASTp+ (v2.2.26) was adapted from the NCBI ftp. Site (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) and installed on a local machine. Furthermore, a search was performed using NCTC13129 as a random reference genome for all strains. Comparative genomics/proteomics approach was next adopted for selecting the highly conserved proteins using an all-against-all BLASTp analysis with a cut-off value of E = 0.0001, as in many other essentiality studies before [6, 13, 15, 18, 24].
Essential and non-host homologous (ENH) protein targets
A subtractive genomics approach was next followed for the selection of conserved targets, which were essential to the bacteria . Concisely, the set of proteins derived from the core-modelome of C. diphtheriae was subjected to the Database of Essential Genes (DEG) for homology analyses. The DEG encompasses experimentally validated data of currently available essential genomic elements like protein-coding genes and non-coding RNAs, from bacteria, archaea and eukaryotes. For a bacterium, essential genes form a minimal genome, i.e., a set of functional modules that has key roles in the emerging field of synthetic biology . The cutoff values used for BLASTp were: E-value = 0.0001, bit score ≥100 and identity ≥ 25% [15, 18].
The pool of essential genes was then subjected to NCBI-BLASTp (E-value = 0.0001, bit score ≥100 and identity ≥ 25%) against the human genome for filtering pathogen-essential host-homologs . The remaining set of pathogen-essential non-host homologs were additionally crosschecked with NCBI-BLASTp PDB database using the default values to find any remote structural similarity with the existing host homolog protein structures, keeping the cutoff level to ≤ 15% for query coverage. The biochemical pathways of these proteins have been checked using KEGG (Kyoto Encyclopedia of Genes and Genomes) , functionality using UniProt (Universal Protein Resource) , virulence using PAIDB (Pathogenicity island database) , and cellular localization using CELLO (subCELlular LOcalization predictor) . The final list of targets was based on criteria described by Barh et al., 2011 & Hassan et al., 2014 [15, 18].
Essential and host homologous (EH) protein targets
We further extended our analyses to the set of protein targets that were essential to C. diphtheriae but homologous to host proteins. The essential protein targets deviating from the cutoff values for essential non-host homologous proteins were treated as host homologous proteins. This set of targets was also checked for pathway involvement, functional annotation virulence, and cellular localization as mentioned above.
Computational identification of druggable pockets
The information obtained from 3D structures and druggability analyses are important features for prioritizing and authenticating putative pathogen targets [30, 31]. As mentioned above, for druggability analyses, the final list of essential non-host and host homologous protein targets were subjected to DoGSiteScorer in PDB format . The DoGSiteScorer is an automated pocket detection and analysis tool for calculating the druggability of protein cavities. For each detected cavity the tool returns the pocket residues and a druggability score ranging from 0 to 1. Values closer to 1 indicate highly druggable protein cavity, i.e. the predicted cavities are likely to bind ligands with high affinity . The DoGSiteScorer also calculates volume, depth, surface area, lipophilic surface, and further parameters for each predicted cavity.
Ligand libraries preparation, virtual screening and docking analyses
The ligand libraries were prepared from four different sources, compounds from ZINC database (ZINC drug-like molecules, ZINC Natural Product), natural compounds from literature survey  and the Di-terpenoid Iso-steviol derivatives (S1 Table). ZINC (drug-like molecules) contains 11,193 drug-like molecules, with Tanimoto cutoff level of 60%  and ZINC (Natural Product) contain 11,203 molecules. Whereas, the small library of natural compounds contained 28 molecules and the library of Di-terpenoid Iso-steviol derivatives contained 31 molecules respectively. The structures of these molecules were constructed using MOE-Builder tool. The 3D structures were modeled and partial charges were calculated using MOE (Molecular Operating Environment). The energies of the modeled molecules were minimized using the energy minimization algorithm of MOE tool (gradient: 0.05, Force Field: MMFF94X, Chiral Constraint) . The modeled molecules were saved in the.mol2 file format and subjected to docking analysis.
The 3D structures of proteins were examined for structural errors such as missing atoms, wrong bonds and protonation states in the MVD (Molegro Virtual Docker) . The consensus set of protein cavities and those predicted with DogSiteScorer (druggability ≥ 0.80) were compared with the MVD detected cavities, for all Cd targets. The maximum numbers of residues from DoGSiteScorer falling in the cavities detected by MVD were merged and final grid was generated based on the consensus between the highest scoring pocket from DoGSiteScorer and cavities detected by MVD for docking. The most druggable cavity was subjected to virtual screening using MVD. The program comprises of three search algorithms for molecular docking analyses namely MolDock Simplex Evolution (SE), MolDock Optimizer  and Iterated Simplex (IS). We employed the MolDock Optimizer search algorithm, which is based on a differential evolutionary algorithm, using the default parameters that are a) population size = 50, b) scaling factor = 0.5 and c) crossover rate = 0.9. The orientations of docked molecules from the library of natural compounds and from the derivatives of Di-terpenoid Iso-steviol were analyzed in Chimera . The 200 top ranked compounds (ZINC drug-like molecules, ZINC Natural Product) for each target protein were evaluated for shape complementarity and hydrogen bond interactions. This led to the selection of a final set of compounds with polypharmacology and polypharmacy characteristics for target proteins in C. diphtheriae.
Results and discussion
Modelome prediction and conserved targets identification in C. diphtheriae
Among 13 strains of C. diphtheriae species, our employed methodology produced high-confidence 3D structural models from orthologous proteins in C. diphtheriae species through the efficient MHOLline workflow (Fig 3). A comparative structural genomics approach was followed where all the G2 sequences classified as “Very High”, “High”, “Good” and “Medium to Good quality” by MHOLline, from the 12 Cd strains, were aligned to the Cd NCTC13129 strain as a reference genome. First, we identified a set of common conserved proteins with a pre-defined sequence similarity of 95–100%. This resulted in a set of 463 protein sequences, being conserved in all Cd strains (S3 Table).
Predicted proteomes from the genomes of 13 C. diphtheriae strains were fed to the MHOLline workflow in FASTA format. The grey bars represent the number of input data. The remaining bars (MHOLline output data) show the number of not aligned sequences (G0, green bars), sequences for which there is a template structure available at RCSB PDB (blue bars), and sequences with acceptable template structures that were modeled in the MHOLline workflow (G2, red bars).
Protein targets as putative drug and vaccine candidates
The identification of essential proteins in C. diphtheriae was carried out where the core-modelome was compared to DEG (Database of Essential Genes). This filter drastically reduced the number of selected targets to 23 final targets. Further comparison of the corresponding protein sequences to the human host proteome resulted in a set of 8 targets as essential non-host homologous (ENH, Table 2) and a set of 15 targets as essential host homologous proteins (EH, Table 3).
Prioritization parameters for drug targets and vaccine candidates
There are several factors that can aid in determining potential therapeutic targets . For vaccine candidates, the information about subcellular localization is important: Proteins that contain transmembrane motifs are favored [24, 30, 38, 39]. The 23 essential proteins have a low molecular weight and all are localized in the cytoplasmic compartment of C. diphtheriae (Tables 2 & 3). After the druggability evaluation using DoGSiteScorer  for both essential non-host and host homologous conserved targets from C. diphtheriae, we could predict at least one druggable cavity for each Cd target. The host homologous proteins as therapeutic targets could adversely affect the host. Therefore, the first step in numerous in silico drug target identification approaches are filtering proteins homologous to host proteome. Thus, we only consider the eight pathogen-essential non host homologs for the docking studies [13, 15, 40]. For the eight pathogen-essential non host homologs (S2 Table) glpX, nusB, rpsH, hisE, DIP1084, DIP0983, smpB, and bioB 3, 0, 1, 0, 2, 0, 1 and 3 cavities with score > 0.80 were predicted. The cavity of each protein exhibiting the highest druggability score was subjected to docking analyses. The numbers of predicted cavities with their respective druggability scores are given in Tables 2 & 3.
The identified eight non-host homologous and essential Cd proteins could be novel therapeutic targets for Corynebacterium diphtheriae.
As per our knowledge, glpX, hisE and bioB proteins have been reported as potential drug target in Mtb. Protein nusB is a member of Nus-transcription Factor family that help bacteria in the process of elongation, transcription: translation coupling and termination. Some members of this family (nusG) has already been reported as drug target. Furthermore, rpsH and smpB are also reported as potential drug target by Folador et al., 2016 in their in silico study . Protein DIP1084 is Putative iron transport membrane protein (FecCD-family) and DIP0983 is uncharacterized Hypothetical Protein that need to be characterized experimentally. Hence, these protein could be a good therapeutic target against Cd.
Virtual screening and molecular docking
For each target protein (glpX, nusB, rpsH, hisE, DIP1084, DIP0983, smpB, and bioB) four different libraries were separately screened. A total of 28 molecules from natural compounds library and 31 compounds from the derivatives of Di-terpenoid Iso-steviol library were docked. Furthermore, top 200 drug-like molecules from virtual screening analyses of two large libraries (ZINC drug-like molecules, ZINC Natural Product) were examined one-by-one for the selection of the final set of promising molecules that showed favorable interactions with the ENH targets. The biological importance and an analysis of the predicted protein-ligand interaction/s for each target are described here. The molecule names, ZINC codes and MolDock scores for the selected ligands, as well as the number of predicted hydrogen bonds with the protein cavity residues involved in these interactions, are shown below (Tables 4–11) for each target protein. The predicted binding modes of selected ligands are also shown for each pathogen target in Figs 5–12.
Validation of docking protocol
To validate the accuracy of MolDock program (MVD), the co-crystallized ligand of Biotin synthase, bioB (PDB ID; 1R30) was extracted and then re-docked into the binding pocket of receptor protein. The RMSD between docked and co-crystallized ligand was found to be 1.81 A˚, which shows that the adopted docking protocol is valid and can be used to correctly predict the binding pose of the ligands [35, 42]. The superposition of co-crystallized ligands and docked is shown in Fig 4.
NP_939302.1 (glpX, Fructose 1, 6-bisphosphatase II) is a key enzyme of gluconeogenesis and catalyzes the hydrolysis of fructose 1, 6-bisphosphate to form fructose 6-phosphate and orthophosphate. A reverse reaction catalyzed by phosphofructokinase in glycolysis, and the product, fructose 6-phosphate, is an important precursor in various biosynthetic pathways . In all organisms, gluconeogenesis is an important metabolic pathway that allows the cells to synthesize glucose from non-carbohydrate precursors, such as organic acids, amino acids and glycerol. FBPases are members of the large superfamily of lithium sensitive phosphatases, which includes three families of inositol phosphatases and FBPases (the phosphoesterase clan CL0171, 3167 sequences, Pfam data base). The FBPases are already reported as targets for the development of drugs for the treatment of noninsulin dependent diabetes [44, 45]. Based on a comparison with a crystallographic structure of the glpX template (PDB ID: 1NI9, GlpX from Escherichia coli), none of the active site residues were identified. The docking analysis was performed utilizing the highest scoring pocket obtained from DoGSiteScorer. Table 4 shows a set of 10 promising ligands according to their minimum energy values and the maximum number of hydrogen bond interactions from the four aforementioned libraries. Compounds ZINC67912153, ZINC13142972, Jacarandic Acid and 16-hydrazonisosteviol are shown in Fig 5.
A-I: 3D cartoon representation of the docking analyses for the most druggable protein cavity of NP_939302.1 (glpX, Fructose 1,6-bisphosphatase II) with Jacarandic Acid (CID 73645). A-II: 3D surface representation of the docking analyses for the structures of Jacarandic Acid with glpX protein. Figs B-I, II, C-I, II & D-I, II represent same information for compounds 16-hydrazonisosteviol, ZINC13142972 and ZINC67912153 respectively, for the same protein cavity.
NP_939692.1 (nusB, Transcription antitermination protein NusB) is a prokaryotic transcription factor involved in antitermination processes, during which it interacts with the mRNA nut site at boxA portion. The crystal structure of M. tuberculosis and E. coli NusB proteins suggest that the basic N-terminal region of the molecule associates with the rRNA BoxA. Hypothetically, this is indicative of the so-called arginine rich RNA binding motif (ARM) in the bacteriophage N protein, HIV tat and HIV rev. This suggestion is supported by the presence of a phosphate-binding site at the N-terminal end of α-A in each NusB protomer that includes a pair of conserved arginines, Arg10 and Arg14 . The bismuth-dithiol solutions have been shown to selectively inhibit Escherichia coli rho transcription termination factor . A comparison between the crystallographic structures of the NusB template (PDB ID: 1EYV, NusB from M. tuberculosis) and our modeled structure reveals that the conserved arginines were located at position 12 and 16 (Arg12 and Arg16) and are likely to contribute in the interactions. Although none of these residues are predicted to form hydrogen bonds with selected docked ligands, these molecules were predicted to interact with other residues in the pocket. Table 5 shows the 8 selected ligands from all the four libraries according to their minimum energy values and the number of hydrogen bond interactions. The compounds ZINC15043210, ZINC00053531 Jacarandic Acid and 16-hydrazonisosteviol are shown in (Fig 6). A decent binding mode and good shape complementarity was observed in these complexes.
A-I: 3D cartoon representation of the docking analyses for the most druggable protein cavity of NP_939692.1 (nusB, Transcription antitermination protein NusB) with Jacarandic Acid (CID 73645). A-II: 3D surface representation of the docking analyses for the structures of Jacarandic Acid with nusB protein. Figs B-I, II, C-I, II and D-I, II represent same information for compounds 16-hydrazonisosteviol, ZINC00053531 and ZINC15043210 respectively, for the same protein cavity.
NP_938900.1 (rpsH, 30S ribosomal protein S8) is an important RNA-binding protein that inhabits a central position within the small ribosomal subunit. It widely interacts with 16S rRNA and is vital for the correct folding of the central domain of the rRNA. The protein rpsH S8 also controls the synthesis of numerous ribosomal proteins by binding to mRNA. It binds exactly to very similar sites in the two RNA molecules. It is a ribosomal protein that has medium-size, and its role as a significant primary RNA-binding protein in the 30S subunit is discovered recently. The S8 mutations within the protein have been shown to result in defective ribosome assembly. In Escherichia coli, the S8-binding site within 16S rRNA has been investigated independently by a number of techniques including nuclease protection, RNA–protein crosslinking, RNA modification, hydroxyl-radical footprinting and chemical probing. The rpsH S8 protein is also one of the principal regulatory elements that control ribosomal protein synthesis by the translational feedback inhibition mechanism discovered by Nomura and colleagues . It regulates the expression of the spc operon that encodes, in order, the ten ribosomal proteins L14, L24, L5, S14, S8, L6, L18, S5, L30 and L15 . The active site residues of rpsH, based on a comparison with its template structure were Arg86, Tyr88, Ser107, Ser109, Gly124, Gly125 and Glu126. However, none of the molecules interacts with these residues (Table 6); nonetheless they are predicted to interact with other residues of the binding cavity predicted by DoGSiteScorer. The predicted binding mode of best scoring compounds each library ZINC35457686, ZINC15221730, Jacarandic Acid and 17-hydroxyisosteviol are shown in Fig 7.
A-I 3D cartoon representation of the docking analyses for the most druggable protein cavity of NP_938900.1 (rpsH, 30S ribosomal protein S8) with Jacarandic Acid (CID 73645). A-II: 3D surface representation of the docking analyses for the structures of Jacarandic Acid with rpsH protein. Figs B-I, II, C-I, II and D-I, II represent same information for compounds 17-hydroxyisosteviol ZINC15221730 and ZINC35457686 respectively, for the same cavity.
NP_938502.1 (bioB, Biotin synthase) catalyzes the final step in the biotin biosynthetic pathway by converting dethiobiotin (DTB) to biotin. This reaction uses organic radical chemistry for inserting sulfur atom between non activated carbons C6 and C9 of DTB. BioB is a member of the “radical SAM” or “AdoMet radical” superfamily, which is categorized by the presence of a conserved CxxxCxxC sequence motif (C, Cys; x, any amino acid) that synchronizes an essential Fe4S4 cluster, as well as by the use of S-adenosyl-Lmethionine (SAM or AdoMet) for radical generation. AdoMet radical enzymes act on a wide variety of biomolecules. For example, BioB and lipoyl-acyl carrier protein synthase (LipA) are involved in vitamin biosynthesis; lysine 2,3-aminomutase (LAM) facilitates the fermentation of lysine; class III ribonucleotide reductase (RNR) and pyruvate formate lyase (PFL) catalyze the formation of glycyl radicals in their respective target proteins; and spore photoproduct lyase repairs ultraviolet light-induced DNA damage . The protein bioB was reported as putative drug target in C. diphtheriae by Barh et al., 2011 in their in silico study . A comparison between our modeled protein and template structures suggest Cys86, Cys90, Cys93 and Arg291 as the active residues. Although, only Cys86, Cys90 and Cys93 were found to interact with the compounds from our prepared libraries, the molecules were predicted to interact with other residues in the pocket. The binding mode of compounds with active site residues and low scores suggest a set of 10 molecules (Table 7) as promising leads from our four libraries. The predicted binding modes of Jacarandic Acid, 16-oxime, 17-hydroxyisosteviol, ZINC16952914 and ZINC77269615 are shown in Fig 8.
A-I 3D cartoon representation of the docking analyses for the most druggable protein cavity of NP_938502.1 (bioB, Biotin synthase) with Rhein (CID 10168). A-II: 3D surface representation of the docking analyses for the structure of Rhein with bioB protein. Figs B-I, II, C-I, II & D-I, II represent same information for compounds 16-oxime, 17-hydroxyisosteviol, ZINC16952914 and ZINC77269615 respectively, for the same protein cavity.
NP_939612.1 (hisE, Phosphoribosyl-ATP pyrophosphatase) is the second enzyme in the histidine-biosynthetic pathway, hydrolyzing irreversibly phosphoribosyl-ATP to phosphoribosyl-AMP and pyrophosphate. It is encoded by the hisE gene, which is present as a separate gene in many bacteria and archaea but is fused to hisI in other bacteria, fungi and plants. As it is essential for growth as seen in in vitro experiments, HisE is a potential drug target for tuberculosis . A comparison of template and target protein structures here showed that there was no reported information about ligand-residue/s association in the active site cavity. Hence, the cavity chosen for virtual screening was simply the one that presented the highest DogSiteScorer druggability score (>80). A list of best dock molecules is shown below (Table 8). The binding patterns of Jacarandic Acid, 16–17 dihydroxyisosteviol, ZINC05809437 and ZINC67913372 are shown in Fig 9.
A-1 3D cartoon representation of the docking analyses for the most druggable protein cavity of NP_939612.1 (hisE, Phosphoribosyl-ATP pyrophosphatase) with Jacarandic Acid (CID 73645). A-II: 3D surface representation of the docking analyses for the structure of Jacarandic Acid with hisE protein. Figs B-I, II, C-I, II & D-I, II represent same information for compounds 16–17 dihydroxyisosteviol, ZINC05809437 and ZINC67913372 respectively, for the same protein cavity.
NP_939123.1 (smpB, SsrA-binding protein) is a small protein B (SmpB), which is very useful for biological functions of tmRNA. In bacteria, a hybrid RNA molecule that combines the functions of both messenger and transfer RNAs rescues stalled ribosomes, and targets aberrant, partially synthesized proteins for proteolytic degradation. The flexible RNA molecule adopts an open L-shaped conformation and SmpB binds to its elbow region, stabilizing the single-stranded D-loop in an extended conformation. The most prominent feature of the structure of tmRNAΔ is a 90o rotation of the TѰC-arm around the helical axis. Because of this important conformation, the SmpB–tmRNA D-complex positioned into the A-site of the ribosome orients SmpB towards the small ribosomal subunit, and directs tmRNA towards the elongation-factor binding region of the ribosome. The tmRNA–SmpB rescue system is ubiquitous in bacteria, and is also found in some chloroplasts and mitochondria . In this case the template structure (PDB ID: 1P6V) did not contain any ligand, and no reported information was found about the ligand-residue interaction in their cavities. Therefore, amongst the cavities identified by MVD, the best cavity for docking analysis was chosen in consensus with highest druggability score from the DogSiteScorer. ZINC31168211 was found to form the network of 12 hydrogen bonds with Asn9, Ser16, Val49, Ser50, Thr52, Asp53, Ser54, Thr109. Table 9 lists top compounds from respective libraries selected for this target while the binding modes of Rhein, 16-hydroxyisosteviol, ZINC01414475 and ZINC31168211 are also shown (Fig 10).
A-I 3D cartoon representation of the docking analyses for the most druggable protein cavity of NP_939123.1 (smpB, SsrA-binding protein) with Rhein (CID 10168). A-II: 3D surface representation of the docking analyses for the structure of Rhein with smpB protein. Figs B-I, II, C-I, II & D-I, II represent same information for compounds 16-hydroxyisosteviol ZINC01414475 & ZINC31168211 respectively, for the same protein cavity.
NP_939445.1 (DIP1084, Putative iron transport membrane protein, FecCD-family) The Pfam search for the protein showed that it has two main components, FecCD and ABC_trans. The FecCD is a subfamily of bacterial binding-protein-dependent transport systems family constituting transport system permease proteins involved in the transport of numerous compounds through the membrane. These transporters tend to catalyze the thermodynamically unfavorable translocation of substrates against a transmembrane concentration gradient through the coupling to a second, energetically favorable process. ABC systems can be categorized in three functional groups, as follows. Importers mediate the uptake of nutrients in prokaryotes. The nature of the substrates that are transported is very wide, including mono- and oligosaccharides, organic and inorganic ions, amino acids, peptides, iron-siderophores, metals, polyamine cations, opines, and vitamins . Exporters are involved in the secretion of various molecules, such as peptides, lipids, hydrophobic drugs, polysaccharides, and proteins, including toxins such as hemolysin. The third category of systems is apparently not involved in transport, with some members being involved in translation of mRNA and in DNA repair. Table 10 shows a set of 11 high scoring compounds against the proposed target. Compound ZINC70454922 from ZINC NP library was predicted to form ten hydrogen bonds with relatively low docking score (Fig 11).
A-I 3D cartoon representation of the docking analyses for the most druggable protein cavity of NP_939445.1 (DIP1084, Putative iron transport membrane protein, FecCD-family) with Jacarandic Acid (CID 73645). A-II: 3D surface representation of the docking analyses for the structure of Jacarandic Acid with DIP1084, Putative iron transport membrane protein. Figs B-I, II, C-I, II & D-1, II D represent same information for compounds 16-hydrazonisosteviol ZINC13142972 and ZINC70454922 respectively, for the same protein cavity.
NP_939345.1 (DIP0983, Hypothetical protein DIP0983) is a conserved hypothetical protein. It is annotated as a possible lysine decarboxylase (LDC) in the Pfam database (PF03641)  due to the presence of the highly conserved PGGxGTxxE motif. Some enzymes i:e “Lonely Guy” LOG are often mis-annotated as lysine decarboxylases enzymes; it is apparently responsible for catalyzing L-lysine decarboxylation to produce the polyamine metabolite cadaverine . Conversely, this annotation is not supported by any biochemical or functional data in any of the PGGxGTxxE motif containing LDC identified so far. This motif is highly conserved among a vast number of proteins with unknown function, predicted from bacterial, yeast, and plant; in Arabidopsis thaliana, all the genome-annotated LOG proteins are identified as LDC like proteins by protein family. Based on sequence BLAST against the PDB, LOG from Claviceps purpurea shares more than 30% identical residues with crystal structures of LDC-like proteins of unknown function, whose structures are already determined. Recently, lysine decarboxylase has been reported as a therapeutic target by Lohinai et al., 2015 for Periodontal Inflammation . Here we listed 12 compounds showing good potency against our target tabulated in Table 11. Four of the compounds with promising docking results are shown in Fig 12.
A-1: 3D cartoon representation of the docking analyses for the most druggable protein cavity of NP_939345.1 (DIP0983, Hypothetical protein DIP0983) with Jacarandic Acid (CID 73645). A-II: 3D surface representation of the docking analyses for the structure of Jacarandic Acid with Hypothetical protein DIP0983. Figs B-I, II, C-I, II & D-I, II represent same information for compounds 17-hydroxyisosteviol, ZINC00211173 and ZINC67911471 respectively, for the same protein cavity.
Among the drug-like molecule ZINC13142972 (1-[(2S, 3S, 4S, 5R)-3,4-dihydroxy-5-(hydroxymethyl) oxolan-2-yl]imidazo[1,2-b]pyrazole-7-carbonitrile) was predicted to show good results against two of our targets NP_939302.1 (glpX, Fructose 1,6-bisphosphatase II) and NP_939445.1 (DIP1084, Putative iron transport membrane protein, FecCD-family). It has been reported that at present 50% of drug molecules are either from natural source or their derivatives . Interestingly, the compounds from second library of ZINC (Natural Product) showed better energy scores among all the libraries. Furthermore, from the library of natural compounds (28 molecules), Jacarandic Acid and Rhein were identified as the top ranked molecules and in silico analysis of the library (derivatives of diterpenoid isosteviol) suggest that compounds 16-hydroxyisosteviol, 16-hydrazonisosteviol, 17-hydroxyisosteviol, 16–17 dihydroxyisosteviol and 16-oxime, 17-hydroxyisosteviol were top ranked molecules, however, with much higher energy scores (less negative) than the top compounds from the ZINC libraries (ZINC drug-like molecules, ZINC Natural Product).
We utilized a bioinformatics pipeline for determining the conserved proteome of 13 strains of C. diphtheriae, and subsequently exploit 3D structural information, resulting in a small set of prioritized putative drug/vaccine targets, of which eight proteins are pathogen-essential, non-host homologous and 15 are pathogen-essential, host-homologs. After a detailed structural comparison between host and pathogen proteins, we suggest that eight of the non -host homologs could be considered for antimicrobial chemotherapy in future studies on anti-diphtheriae drugs and vaccines. Moreover, the strategy described herein is of general nature and can also be employed to other pathogenic microorganisms.
S1 Table. Structural information of the Di-terpenoid Iso-steviol derivatives.
S2 Table. Information of templates used for 8 essential non host homologous targets.
We would like to acknowledge wealthy cooperation and assistance of all the team members & collaborators.
- 1. Funke G, von Graevenitz A, Clarridge JE, 3rd, Bernard KA. Clinical microbiology of coryneform bacteria. Clin Microbiol Rev. 1997;10(1):125–59. pmid:8993861; PubMed Central PMCID: PMCPMC172946.
- 2. Goodfellow M, Kämpfer P,. Busse HJ, Trujillo M, Suzuki KI, Ludwig W. Whitman Bergey’s manual of systematic bacteriology: Springer; 2012.
- 3. Hodes HL. Diphtheria. Pediatr Clin North Am. 1979;26(2):445–59. pmid:379784.
- 4. Hart PE, Lee PY, Macallan DC, Wansbrough-Jones MH. Cutaneous and pharyngeal diphtheria imported from the Indian subcontinent. Postgrad Med J. 1996;72(852):619–20. pmid:8977947; PubMed Central PMCID: PMCPMC2398589.
- 5. Wagner KS, White JM, Crowcroft NS, De Martin S, Mann G, Efstratiou A. Diphtheria in the United Kingdom, 1986–2008: the increasing role of Corynebacterium ulcerans. Epidemiol Infect. 2010;138(11):1519–30. pmid:20696088.
- 6. Barh D, Gupta K, Jain N, Khatri G, Leon-Sicairos N, Canizalez-Roman A, et al. Conserved host-pathogen PPIs. Globally conserved inter-species bacterial PPIs based conserved host-pathogen interactome derived novel target in C. pseudotuberculosis, C. diphtheriae, M. tuberculosis, C. ulcerans, Y. pestis, and E. coli targeted by Piper betel compounds. Integr Biol (Camb). 2013;5(3):495–509. pmid:23288366.
- 7. Perumal D, Lim CS, Sakharkar KR, Sakharkar MK. Differential genome analyses of metabolic enzymes in Pseudomonas aeruginosa for drug target identification. In Silico Biol. 2007;7(4–5):453–65. pmid:18391237.
- 8. Pizza M, Scarlato V, Masignani V, Giuliani MM, Arico B, Comanducci M, et al. Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science. 2000;287(5459):1816–20. pmid:10710308.
- 9. Asif SM, Asad A, Faizan A, Anjali MS, Arvind A, Neelesh K, et al. Dataset of potential targets for Mycobacterium tuberculosis H37Rv through comparative genome analysis. Bioinformation. 2009;4(6):245–8. pmid:20975918; PubMed Central PMCID: PMCPMC2951718.
- 10. Chong CE, Lim BS, Nathan S, Mohamed R. In silico analysis of Burkholderia pseudomallei genome sequence for potential drug targets. In Silico Biol. 2006;6(4):341–6. pmid:16922696.
- 11. Dutta A, Singh SK, Ghosh P, Mukherjee R, Mitter S, Bandyopadhyay D. In silico identification of potential therapeutic targets in the human pathogen Helicobacter pylori. In Silico Biol. 2006;6(1–2):43–7. pmid:16789912.
- 12. Sakharkar KR, Sakharkar MK, Chow VT. A novel genomics approach for the identification of drug targets in pathogens, with special reference to Pseudomonas aeruginosa. In Silico Biol. 2004;4(3):355–60. pmid:15724285.
- 13. Barh D, Kumar A. In silico identification of candidate drug and vaccine targets from various pathways in Neisseria gonorrhoeae. In Silico Biol. 2009;9(4):225–31. pmid:20109152.
- 14. Rathi B, Sarangi AN, Trivedi N. Genome subtraction for novel target definition in Salmonella typhi. Bioinformation. 2009;4(4):143–50. pmid:20198190; PubMed Central PMCID: PMCPMC2825597.
- 15. Barh D, Jain N, Tiwari S, Parida BP, D'Afonseca V, Li L, et al. A novel comparative genomics analysis for common drug and vaccine targets in Corynebacterium pseudotuberculosis and other CMN group of human pathogens. Chem Biol Drug Des. 2011;78(1):73–84. pmid:21443692.
- 16. Aronov AM, Verlinde CL, Hol WG, Gelb MH. Selective tight binding inhibitors of trypanosomal glyceraldehyde-3-phosphate dehydrogenase via structure-based drug design. J Med Chem. 1998;41(24):4790–9. pmid:9822549.
- 17. Singh S, Malik BK, Sharma DK. Molecular modeling and docking analysis of Entamoeba histolytica glyceraldehyde-3 phosphate dehydrogenase, a potential target enzyme for anti-protozoal drug development. Chem Biol Drug Des. 2008;71(6):554–62. pmid:18489439.
- 18. Hassan SS, Tiwari S, Guimaraes LC, Jamal SB, Folador E, Sharma NB, et al. Proteome scale comparative modeling for conserved drug and vaccine targets identification in Corynebacterium pseudotuberculosis. BMC Genomics. 2014;15 Suppl 7:S3. pmid:25573232; PubMed Central PMCID: PMCPMC4243142.
- 19. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, et al. Comparative protein structure modeling using MODELLER. Curr Protoc Protein Sci. 2007;Chapter 2:Unit 2 9. pmid:18429317.
- 20. Mount DW. Using the Basic Local Alignment Search Tool (BLAST). CSH Protoc. 2007;2007:pdb top17. pmid:21357135.
- 21. Tusnady GE, Simon I. The HMMTOP transmembrane topology prediction server. Bioinformatics. 2001;17(9):849–50. pmid:11590105.
- 22. Laskowski RA, MacArthur MW, Moss DS and Thornton JM. PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography. 1993;26. Epub 291.
- 23. Blom J, Albaum SP, Doppmeier D, Puhler A, Vorholter FJ, Zakrzewski M, et al. EDGAR: a software framework for the comparative analysis of prokaryotic genomes. BMC Bioinformatics. 2009;10:154. pmid:19457249; PubMed Central PMCID: PMCPMC2696450.
- 24. Abadio AK, Kioshima ES, Teixeira MM, Martins NF, Maigret B, Felipe MS. Comparative genomics allowed the identification of drug targets against human fungal pathogens. BMC Genomics. 2011;12:75. pmid:21272313; PubMed Central PMCID: PMCPMC3042012.
- 25. Zhang R, Ou HY, Zhang CT. DEG: a database of essential genes. Nucleic Acids Res. 2004;32(Database issue):D271–2. pmid:14681410; PubMed Central PMCID: PMCPMC308758.
- 26. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. pmid:10592173; PubMed Central PMCID: PMCPMC102409.
- 27. Magrane M, Consortium U. UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford). 2011;2011:bar009. pmid:21447597; PubMed Central PMCID: PMCPMC3070428.
- 28. Yoon SH, Park YK, Lee S, Choi D, Oh TK, Hur CG, et al. Towards pathogenomics: a web-based resource for pathogenicity islands. Nucleic Acids Res. 2007;35(Database issue):D395–400. pmid:17090594; PubMed Central PMCID: PMCPMC1669727.
- 29. Yu CS, Lin CJ, Hwang JK. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci. 2004;13(5):1402–6. pmid:15096640; PubMed Central PMCID: PMCPMC2286765.
- 30. Aguero F, Al-Lazikani B, Aslett M, Berriman M, Buckner FS, Campbell RK, et al. Genomic-scale prioritization of drug targets: the TDR Targets database. Nat Rev Drug Discov. 2008;7(11):900–7. pmid:18927591; PubMed Central PMCID: PMCPMC3184002.
- 31. Butt AM, Nasrullah I, Tahir S, Tong Y. Comparative genomics analysis of Mycobacterium ulcerans for the identification of putative essential genes and therapeutic candidates. PLoS One. 2012;7(8):e43080. pmid:22912793; PubMed Central PMCID: PMCPMC3418265.
- 32. Volkamer A, Kuhn D, Rippmann F, Rarey M. DoGSiteScorer: a web server for automatic binding site prediction, analysis and druggability assessment. Bioinformatics. 2012;28(15):2074–5. pmid:22628523.
- 33. Tiwari S, da Costa MP, Almeida S, Hassan SS, Jamal SB, Oliveira A, et al. C. pseudotuberculosis Phop confers virulence and may be targeted by natural compounds. Integr Biol (Camb). 2014;6(11):1088–99. pmid:25212181.
- 34. Voigt JH, Bienfait B, Wang S, Nicklaus MC. Comparison of the NCI open database with seven large chemical structural databases. J Chem Inf Comput Sci. 2001;41(3):702–12. pmid:11410049.
- 35. Wadood A, Jamal SB, Riaz M, Mir A. Computational analysis of benzofuran-2-carboxlic acids as potent Pim-1 kinase inhibitors. Pharm Biol. 2014;52(9):1170–8. pmid:24766364.
- 36. Thomsen R, Christensen MH. MolDock: a new technique for high-accuracy molecular docking. J Med Chem. 2006;49(11):3315–21. pmid:16722650.
- 37. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–12. pmid:15264254.
- 38. Caffrey CR, Rohwer A, Oellien F, Marhofer RJ, Braschi S, Oliveira G, et al. A comparative chemogenomics strategy to predict potential drug targets in the metazoan pathogen, Schistosoma mansoni. PLoS One. 2009;4(2):e4413. pmid:19198654; PubMed Central PMCID: PMCPMC2635471.
- 39. Crowther GJ, Shanmugam D, Carmona SJ, Doyle MA, Hertz-Fowler C, Berriman M, et al. Identification of attractive drug targets in neglected-disease pathogens using an in silico approach. PLoS Negl Trop Dis. 2010;4(8):e804. pmid:20808766; PubMed Central PMCID: PMCPMC2927427.
- 40. Shanmugham B, Pan A. Identification and characterization of potential therapeutic candidates in emerging human pathogen Mycobacterium abscessus: a novel hierarchical in silico approach. PLoS One. 2013;8(3):e59126. pmid:23527108; PubMed Central PMCID: PMCPMC3602546.
- 41. Folador EL, de Carvalho PV, Silva WM, Ferreira RS, Silva A, Gromiha M, et al. In silico identification of essential proteins in Corynebacterium pseudotuberculosis based on protein-protein interaction networks. BMC Syst Biol. 2016;10(1):103. pmid:27814699; PubMed Central PMCID: PMCPMC5097352.
- 42. Wadood A, Riaz M, Jamal SB, Shah M. Interactions of ketoamide inhibitors on HCV NS3/4A protease target: molecular docking studies. Mol Biol Rep. 2014;41(1):337–45. pmid:24234753.
- 43. Horecker BL, Melloni E, Pontremoli S. Fructose 1,6-bisphosphatase: properties of the neutral enzyme and its modification by proteolytic enzymes. Adv Enzymol Relat Areas Mol Biol. 1975;42:193–226. pmid:236638.
- 44. Wright SW, Carlo AA, Carty MD, Danley DE, Hageman DL, Karam GA, et al. Anilinoquinazoline inhibitors of fructose 1,6-bisphosphatase bind at a novel allosteric site: synthesis, in vitro characterization, and X-ray crystallography. J Med Chem. 2002;45(18):3865–77. pmid:12190310.
- 45. Sassetti CM, Rubin EJ. Genetic requirements for mycobacterial survival during infection. Proc Natl Acad Sci U S A. 2003;100(22):12989–94. pmid:14569030; PubMed Central PMCID: PMCPMC240732.
- 46. Gopal B, Haire LF, Cox RA, Jo Colston M, Major S, Brannigan JA, et al. The crystal structure of NusB from Mycobacterium tuberculosis. Nat Struct Biol. 2000;7(6):475–8. pmid:10881194.
- 47. Brogan AP, Verghese J, Widger WR, Kohn H. Bismuth-dithiol inhibition of the Escherichia coli rho transcription termination factor. J Inorg Biochem. 2005;99(3):841–51. pmid:15708806.
- 48. Yates JL, Arfsten AE, Nomura M. In vitro expression of Escherichia coli ribosomal protein genes: autogenous inhibition of translation. Proc Natl Acad Sci U S A. 1980;77(4):1837–41. pmid:6445562; PubMed Central PMCID: PMCPMC348603.
- 49. Davies C, Ramakrishnan V, White SW. Structural evidence for specific S8-RNA and S8-protein interactions within the 30S ribosomal subunit: ribosomal protein S8 from Bacillus stearothermophilus at 1.9 A resolution. Structure. 1996;4(9):1093–104. pmid:8805594.
- 50. Berkovitch F, Nicolet Y, Wan JT, Jarrett JT, Drennan CL. Crystal structure of biotin synthase, an S-adenosylmethionine-dependent radical enzyme. Science. 2004;303(5654):76–9. pmid:14704425; PubMed Central PMCID: PMCPMC1456065.
- 51. Javid-Majd F, Yang D, Ioerger TR, Sacchettini JC. The 1.25 A resolution structure of phosphoribosyl-ATP pyrophosphohydrolase from Mycobacterium tuberculosis. Acta Crystallogr D Biol Crystallogr. 2008;64(Pt 6):627–35. pmid:18560150; PubMed Central PMCID: PMCPMC2631106.
- 52. Gutmann S, Haebel PW, Metzinger L, Sutter M, Felden B, Ban N. Crystal structure of the transfer-RNA domain of transfer-messenger RNA in complex with SmpB. Nature. 2003;424(6949):699–703. pmid:12904796.
- 53. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42(Database issue):D222–30. pmid:24288371; PubMed Central PMCID: PMCPMC3965110.
- 54. Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, et al. The Pfam protein families database. Nucleic Acids Res. 2002;30(1):276–80. pmid:11752314; PubMed Central PMCID: PMCPMC99071.
- 55. Dzurova L, Forneris F, Savino S, Galuszka P, Vrabka J, Frebort I. The three-dimensional structure of "Lonely Guy" from Claviceps purpurea provides insights into the phosphoribohydrolase function of Rossmann fold-containing lysine decarboxylase-like proteins. Proteins. 2015;83(8):1539–46. pmid:26010010.
- 56. Lohinai Z, Keremi B, Szoko E, Tabi T, Szabo C, Tulassay Z, et al. Biofilm lysine Decarboxylase, a New Therapeutic Target for Periodontal Inflammation. J Periodontol. 2015:1–15. pmid:26110450.
- 57. Veeresham C. Natural products derived from plants as a source of drugs. J Adv Pharm Technol Res. 2012;3(4):200–1. pmid:23378939; PubMed Central PMCID: PMCPMC3560124.