Identifying Novel Drug and Vaccine Molecules of Vibrio parahaemolyticus by Subtractive Proteomics and Vaccinomics Approach

Multidrug-resistant Vibrio parahaemolyticus has become a significant threat to human health as well as aquaculture, prioritizing the development of effective drug and vaccine candidates. Hence, the study was designed to identify novel therapeutics using a comprehensive genomebased analysis of V. parahaemolyticus. From V. parahaemolyticus proteome, a total of 4822 proteins were investigated in order to find out effective drug and vaccine targets. A range of diverse subtractive proteomics approaches – namely, identification of human non-homologous and pathogen-specific essential proteins, druggability and ‘anti-target’ analysis, prediction of subcellular localization, human microbiome non-homology screening, analysis of virulence factors, protein-protein interactions studies. Among 16 novel cytoplasmic proteins, ‘VIBPA Type II secretion system protein L’ and ‘VIBPA Putative fimbrial protein Z’ were allowed to molecular docking with 350 human metabolites, which revealed that Eliglustat, Simvastatin and Hydroxocobalamin were the top drug molecules considering free binding energy. On the contrary, ‘Sensor histidine protein kinase UhpB’ and ‘Flagellar hook-associated protein of 25 novel membrane proteins were subjected to T and B cell epitope prediction, antigenicity testing, transmembrane topology screening, allergenicity and toxicity assessment, population coverage analysis and molecular docking were adopted to generate the most immunogenic epitopes. Three subunit vaccines were constructed by the combination of highly antigenic epitopes along with suitable adjuvant, PADRE sequence and linkers. The designed vaccine constructs (V1, V2, V3) were analyzed by their physiochemical properties and molecular docking with MHC molecules that suggested the superiority of construct V1. Besides, the binding affinity of human TLR1/2 heterodimer and construct V1 was also biologically significant. The vaccine-receptor complex exhibited deformability at a minimum level that also strengthened our prediction. The bioRxiv preprint doi: https://doi.org/10.1101/2020.04.17.045849. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. optimized codons of the designed construct was cloned into pET28a(+) vector of E. coli strain K12. However, the predicted drug molecules and vaccine constructs could be further studied to combat V. parahaemolyticus associated infections.


Introduction
Vibrio parahaemolyticus, a highly reported pathogenic bacteria of aquatic environment, belongs to vibrionaceae family has emerged as the leading cause of seafood-associated gastroenteritis and a significant hazard for global aquaculture (Tang et al., 2014, Ghenem et al., 2017, Chen et al., 2017, Letchumanan et al., 2014. The overgrowing population, with increased purchasing power worldwide, has enhanced the demand for and export potential of seafood, resulting in the steady expansion of the aquaculture industry (Rico et al., 2012). However, the sector has continuously been challenged by aquatic animal health problems, which are a significant constraint to the development of this sector (Bondad-Reantaso et al., 2005). Besides, Antimicrobial resistance bioRxiv preprint doi: https://doi.org/10.1101/2020.04.17.045849. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
(MDR) has been recognized as an essential global threat issue to food safety (Food and Agriculture Organization, 2016). The continuous and inappropriate use of antibiotics in the aquaculture industry favors the development of a variety of resistant isolates and the dissemination of resistance genes within the bacterial population (Tendencia and de laPena, 2001). V.
Though V. parahaemolyticus was first isolated in 1952, reports demonstrated the recent outbreaks of V. parahaemolyticus are more severe (Jung, 2018;Liu et al., 2015;Jang et al., 2013). On the recent outbreak in the city of Osaka (Japan), acute gastroenteritis was reported in 272 individuals, 20 of whom died (Daniels et al., 2000). To date, V. parahaemolyticus has been responsible for 20-30% of food poisoning cases in Japan and sea foodborne diseases in many Asian countries (Alam et al., 2002). A total 802 outbreaks of food-borne diseases have been reported in 13 of the coastal provinces of eastern China, causing more than 17,000 individuals to become ill (Wang et al., 2011a;Wu et al., 2014), where V. parahaemolyticus attributed the most significant number (40.1%) of these cases (Liu et al., 2006, Nair et al., 2007, Chao et al., 2011. The leading cause of human gastroenteritis associated with seafood consumption in the United States is V. parahaemolyticus (Kaysner and De Paola, 2001;Newton et al., 2012). Centers for Disease Control and Prevention (CDC) declared it as a significant foodborne bacterium compared to other Vibrio species, which was responsible for approximately 34,664 foodborne cases annually in the USA (Scallan et al., 2011;Huang et al., 2016).
The food poisoning caused by V. parahaemolyticus usually occurs in summer and is predominantly associated with different kinds of seafood, including crab, shrimp, shellfish, lobster, fish and oysters (Cruz et al, 2015;Letchumanan et al., 2015). V. parahaemolyticus is usually found in a free-swimming state, with its motility conferred by a single polar flagellum affixed to inert and animate surfaces including zooplankton, fish, shellfish or any suspended matter underwater (Gode-Potratzetal, 2011). Among the whole range of seafood, shellfish is regarded as a high-risk food because it is infested with large populations of bacteria, including V. parahaemolyticus (Peng et al., 2010;China Statistical Yearbook, 2012). Illness is inevitable, once consumers eat undercooked identified for S. typhi meningitidessero group B (Perumalet al., 2007;Barh and Kumar, 2009) using the mentioned approach.
Moreover, in silico docking studies between the identified drug targets and existing drugs with slight modification may lead to the discovery of novel drugs for the treatment of infections (Pagadala et al., 2017;Wong, 2015;Ferreira et al., 2015;Yuriev et al., 2013). As a result, a wide range of drug targets and lead compounds can be identified before laboratory experimentation, to save time and money. The study was designed to employ a comprehensive genome-based analysis of Vibrio parahaemolyticus for identifying novel therapeutic targets as well as suitable drug and vaccine molecules through subtractive proteomics and vaccinomics approaches.

Results
Various bioinformatics tools and databases were used to analyze the entire proteome of V.
parahemolyticus through subtractive genomics and vaccinomics approach. The step by step results (or workflow in Fig. 1 and Fig. 2) from the complete computational analysis was presented in Table 1.

Retrieval of complete proteome and identification of essential proteins
The whole proteome of V. parahemolyticus strain O3:K6 was extracted from NCBI database bioRxiv preprint doi: https://doi.org/10.1101/2020.04.17.045849. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.

Analysis of Metabolic pathways
The KEGG server contains 131 metabolic pathways for V. parahaemolyticus (Supplementary file 6) and 325 pathways for humans (Supplementary file 7). Through manual comparison, 40 metabolic pathways were found to be pathogen-specific and are provided in Supplementary file 8.
Proteins involved in these unique pathways can be selected as drug targets. Non-homologous essential proteins subjected to BLASTp in the KAAS server at KEGG revealed that 96 proteins among 1107 assigned both KO (KEGG Orthology) and metabolic pathways that further deputed as Set 5 (Supplementary file 9).

Druggability analysis and identification of novel drug targets
Only 56 proteins showed similarity with the available drug targets, while the remaining 41 showed no hit. These 41 proteins (Set 6) were considered as novel drug targets which include both cytoplasmic and membrane proteins (Supplementary file 10). Besides, the results of druggable proteins are provided in supplementary table 1. Furthermore, the other 41 proteins were considered as novel therapeutic targets and subjected to human 'anti-targets' analysis.

'Anti-target' analysis and prediction of subcellular localization
A total of 210 'anti-targets' reported in the literature were fetched from NCBI (Supplementary file 11). All novel drug target proteins were successfully screened through BLASTp analysis, and no evidence of similarity was seen. Hence, all these novel drug target proteins were listed for human microbiome analysis considering non-homologous to host 'anti-targets' (Supplementary file 10).
Moreover, The results of subcellular localization analysis by four servers are provided in Supplementary file 11. The result revealed that among 41 specific proteins involved in pathogenspecific pathways, 16 were cytoplasmic proteins assigned as Set 8 (Supplementary file 12, Table   2), while the remaining 25 sequences were membrane proteins.

Human microbiome non-homology analysis
Cytoplasmic proteins showing <45% similarity with reported human microbiome proteins were selected for protein-protein interaction analysis, whereas membrane proteins were selected for further vaccine candidacy. However, microbiome analysis revealed that a total of 9 proteins (Set 9) of the pathogen showed <45% similarity with human microflora (Supplementary Table 2, Supplementary file 13).

Analysis of virulence factors (VF's) and protein-protein interactions studies (PPIs)
From 9 cytoplasmic novel proteins, five uncharacterized proteins were removed and the remaining four proteins were considered for VFDV analysis. The VFDB result showed that two proteins (Set 10) i.e., VIBPA Type II secretion system protein L (Q87TC9) and VIBPA Putative fimbrial protein Z (Q87I65) were associated with virulence of V. parahaemolyticus (Table 3). These proteins were subjected to protein-protein interaction study. STRING v10.5 revealed that Type II secretion system protein L confers interactions with nine proteins (Fig. 3A), while putative fimbrial protein Z exhibits interactions with three other proteins (Fig. 3B). These proteins are mainly responsible for protein transport, involved in biofilm formation and bacterial secretion system, or act as regulatory proteins (e.g., transcription regulator, signal transduction response regulator ).

Screening of drug molecules against novel cytoplasmic proteins
A total of 335 unique metabolites were retrieved from Human Metabolites Database for docking analysis against predicted therapeutic drug targets (novel cytoplasmic proteins). Docking scores were analyzed to screen the top drug candidates with the lowest binding energy (Supplementary Table 3). Among top 10 metabolites (Table 4), Eliglustat (DB09039) was found superior in terms of free binding energy for both protein targets, followed by Simvastatin (DB09039) and Hydroxocobalamin (DB00200) for VIBPA Type II secretion system protein L (Q87TC9) and VIBPA Putative fimbrial protein Z (Q87I65) respectively. ADME analysis was performed to get an insight into how the predicted pharmaceuticals will interact with the body as a whole (Table 5).

Screening of novel outer membrane proteins for vaccine construction
From the 25 novel outer membrane proteins (Supplementary file 14) designated as Set 11, two were selected based on the highest antigenicity score and human microbiome analysis to develop a novel chimeric peptide vaccine against V. parahemolyticus (Table 6). The schematic diagram summarizing the protocol over in silico vaccinomics strategy has been elucidated in Fig. 1. Sensor histidine protein kinase (Q87HJ8) and flagellar hook-associated protein (Q87JH9) possessed better antigenicity (0.64 and 0.53 respectively) while showed less percentage similarity (<45% and <41% respectively) when compared with gut microbiome data (Supplementary Table 4).  Table 5). Top 3 epitopes (CTL and HTL) for each protein were considered based on the above-mentioned parameters to design the final vaccine construct (Table 7).

Prediction of 3D structures for superior epitopes and analysis of molecular docking
The epitopes, showing conservancy pattern at a biologically significant level, were only allowed for further docking analysis. 3D structures were predicted for top epitopes (6 from Flagellar hookassociated protein and six from sensor histidine protein kinase) to analyze their interactions with different HLA alleles. The PEP-FOLD3 server modeled five 3D structures for each epitope, and the best one was identified for docking study. The result showed that 'AILLFPFAL' epitope of Flagellar hook-associated protein was superior in terms of free binding energy while interacted with HLA-A* 11:01 (−8.3 kcal/mol). Demonstrated energy was −9.1 kcal/mol for epitope 'GGRHNNLDL' of Sensor histidine protein kinase ( Table 8).

Identification of B-Cell epitope
B-cell epitopes of both proteins were generated using six different algorithms from IEDB (Supplementary Fig. 1 and Supplementary Fig. 2). The epitopes were further investigated to reveal their non-allergenicity pattern, and the top one epitope from each prediction was selected for vaccine construction (Supplementary Table 6).

Epitope cluster analysis and vaccine construction
The construction of vaccine protein was based on identifying larger cassettes containing multiple epitopes. Epitope cluster analysis tool from IEDB predicted 21 epitope clusters among the top epitopes (6 CTL, 6 HTL epitopes, and 12 BCL epitopes) proposed in Allergenicity, antigenicity and solubility prediction of different vaccine constructs bioRxiv preprint doi: https://doi.org/10.1101/2020.04.17.045849. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
Results revealed V1 as the most potent vaccine candidate with better antigenic nature (1.18) and non-allergic behavior that can elicit a strong immune response (Supplementary Table 7). All three constructs showed solubility above the threshold value (0.45). Again, construct V1 was superior in terms of solubility potential. The surface distribution of charge, hydrophobicity and stability were calculated at 91 different combinations of pH and ionic strength (Fig. 5).

Physicochemical characterization and secondary structure analysis
The molecular weight of the designed construct V1 was 39.45 kDa.  Fig. 3). Around 34.59% polar, 16.21% hydrophobic and 9.46% aromatic regions were identified in the structure (Fig. 5).
Tertiary structure prediction, refinement, validation and disulfide engineering of vaccine construct I-TASSER predicted five models for each proposed vaccine candidates, which were ranked based on cluster size. Ten best templates (with the highest Z-score) selected from the LOMETS threading programs were used to predict the tertiary structures. Homology modeling was performed by using 1kj6 from RCSB Protein Data Bank) as a best suited template for Vaccine protein V1. Results showed that model 1 had the highest C-Score of -2.11 while the estimated TM-score and RMSD were 0.46±0.15 and 11.6±4.5Å (Fig. 6). After refinement, 88.3% and 98.1% residues were in the favored and allowed region revealed by Ramachandran plot analysis (Fig. 6). The modeled tertiary structure of designed construct V2 and V3 have been shown in Supplementary Fig. 4. A total of 22 pairs of amino acid residues were identified with the potential to form disulfide bonds by DbD2 server. However, only two pairs (i.e., ARG 82-Gly 85, Lys 347-Thr 350) were compatible with disulfide bond formation considering the energy, chi3 and B-factor parameter ( Supplementary Fig.   5). All these residues were replaced with a cysteine residue.

Protein-protein docking and molecular dynamics simulation
Docking study was conducted between three vaccine constructs (i.e., V1, V2, V3) and different HLA alleles. Construct V1 showed biologically significant results and found to be superior in terms of free binding energy (Supplementary Table 8). Besides, the binding affinity of the predicted vaccine and TLR1-2 heterodimer complex was also analyzed. The 3D structure of human TLR-1/2 heterodimer was retrieved from the RCSB protein data bank. ClusPro generated thirty protein-ligand complexes (clusters) as output along with respective free binding energy. The lowest energy was −1257.9 for cluster 1 (Fig. 7). FireDock output refinement of the PatchDock server showed the lowest global energy of −7.08 for solution 5. Normal mode analysis allowed the demonstration of large scale mobility and the stability of proteins. The analysis was performed based on the internal coordinates of the protein-protein complex. In the 3D model, the direction of each residue was given by arrows, and the length of the line represented the extent of mobility ( Fig. 8A). The eigenvalue found for the complex was 2.4784e−05 (Fig. 8B). The vaccine protein V1 and TLR1-2 heterodimers were oriented towards each other. The B-factor values deduced from normal mode analysis was analogous to RMS (Fig. 8C). Hinges in the chain indicated the probable deformability of the complex measured by the contortion of each residue (Fig. 8D). The variance associated with each normal mode was inversely linked to the eigenvalue. Covariance matrix explained the coupling between pairs of residues was correlated, uncorrelated, or anti-correlated motions were represented via red, white and blue colors, respectively (Fig. 8E). The result also generated an elastic network model (Fig. 8F) that identified the pairs of atoms connected via springs. Each dot in the diagram was indicated one spring between the corresponding pair of atoms and colored based on the degree of stiffness.

Codon adaptation, in silico cloning and similarity analysis with human proteins
E. coli strain K12 was selected as the host for the cloning purpose of the vaccine construct V1.
Vaccine protein V1 was transcribed reversely, where the Codon Adaptation Index (CAI) was found 0.994, and the GC content of the optimized codons (50.55%) was also significant. The construct did not hold restriction sites for ApaI and BglI, which ensured its safety for cloning purposes. The optimized codons were incorporated into the pET28a(+) vector along with ApaI and BglI restriction sites. A clone of 5634 base pair was obtained, including the 1118 bp desired sequence and the rest belonging to the vector. The desired region was shown in red color in between the pET28a(+) vector sequence (Fig. 9). Sequence similarity analysis of the proposed vaccine with human proteins revealed that there was no similarity between predicted vaccine constructs and human proteins.

Discussion
Due to severe effects on human health (Velazquez-Roman et al., 2014;Wu et al., 2014), the emergence of rapid antibiotic resistance (Golkar et al., 2014) and also having economic importance for substantially impairing the aquaculture production (Tang et al., 2014), it has become an urgent necessity to develop effective drug targets and vaccine candidates against Vibrio parahaemolyticus. Different computational approaches are now being widely practiced to identify proteins those are essential for the survival of the pathogen and not involved in the metabolic pathways of the host, thereby choosing the proteins associated only in the metabolic pathways of the pathogen is important (Judson and Mekalanos, 2000;Hossain et al., 2013). Essential proteins are most promising for new antibacterial drug targets since most antibiotics are designed to bind essential gene products . Here, subtractive genome approaches (removal of paralogous proteins, identification of non-homologous proteins against the host, identification of essential proteins and metabolic pathways analysis of the pathogen), and vaccinomics strategy were employed for identifying novel drug and vaccine molecules through the comprehensive proteome exploration of Vibrio parahaemolyticus genome.
The complete proteome of V. parahaemolyticus (4822 proteins) was retrieved from the NCBI database, and the homologous proteins were removed based on their identity with human proteins.
Proteins encoded by essential genes and unique to an organism can be considered as speciesspecific drug targets (Mondal et al., 2015), as they play vital roles in its metabolism. The study revealed 96 unique, essential proteins (Set5) of V. parahaemolyticus, which can be considered as suitable drug targets for combating V. parahaemolyticus infections. Among the unique proteins, 55 proteins were druggable and can be targeted using existing drugs (92) that are already approved and available in the market (Supplementary Table 1). In the case of a broad-spectrum drug, for avoiding mutational changes as well as the emergence of resistant bacteria, the DrugBank databases were screened, which contains entries for 2556 approved small molecule drugs, 962 approved biotech drugs, 112 nutraceuticals and over 5125 experimental drugs. A total of 41 proteins of V. parahaemolyticus showed no similarity after passing through the DrugBank database and listed for the prediction of novel drug targets and vaccine candidates (set 6). To avoid severe cross-reaction and toxic effects in human, identification of nonhomologous proteins to essential human proteins (referred to as 'anti-targets') was a crucial step considered in this study. However, the identified novel drug targets (41) showed no evidence of similarity with the 'anti-targets'.
Although both cytoplasmic and membrane proteins serve the purpose as therapeutic targets (Michael et al., 2014), membrane proteins are best suited for vaccine candidates (Baliga et al., 2018;Hasan et al., 2019a). Hence, in this study, membrane proteins (25) were used for vaccine construction, whereas cytoplasmic proteins (16) were proposed as suitable drug targets. Targeting human microbiome non-homology proteins is suitable since drugs or vaccines designed and administered for these targets will be less harmful to other commensal microbial strains. Among the novel cytoplasmic proteins (16), only the proteins (9) conferring <45% similarity with human microbiota were retained. TheVFDB (virulence factor database) analysis studies confirmed that 'VIBPA Type II secretion system protein L (Q87TC9)' and 'VIBPA Putative fimbrial protein Z (Q87I65)' were associated with virulence in the host (Set 10). The protein-protein interaction studies also strengthened the superiority of these two proteins as suitable drug targets (Table 4).
Molecular docking of 350 human metabolites against 'VIBPA Type II secretion system protein L' and 'VIBPA Putative fimbrial protein Z' was conducted to screen superior drug molecules (Supplementary Table 2). The study revealed that 'Eliglustat (DB09039)' was the top drug molecules for both protein targets in terms of free binding energy ( i.e., 'Simvastatin (DB09039)' and 'Hydroxocobalamin (DB00200)' were also performed through ADME analysis (Table 5).
Several advantages help the researchers to select membrane proteins both as drug and vaccine candidates as their functions can be easily studied through computer-based approaches than wetlab process (Hasan et al., 2019a;Hasan et al., 2019b). In this study, two vaccine targets, 'Sensor histidine protein kinase UhpB (Q87HJ8)' and 'Flagellar hook-associated protein (Q87JH9)' were selected after screening the novel outer-membrane proteins (25) based on their antigenicity score and human microbiome non-homology analysis (Table 6). Both the proteins further analyzed to design a potent, highly immunogenic vaccine candidate against V. parahaemolyticus. Numerous antigenic epitopes were generated which were investigated extensively for antigenicity, allergenicity, toxicity, conservancy and other physiochemical properties using a number of bioinformatics tools and software. The final vaccine constructs were designed with the help of different adjuvants and amino acid linkers (Solanki and Tiwari, 2018). It has been reported that the PADRE sequence reduces the polymorphism of HLA molecules in the population (Ghaffari-Nazari et al., 2013;Hasan et al., 2019b). Linkers in vaccines also enhanced the immunogenicity of the vaccines in previous studies Hasan et al., 2019c). Therefore, all the important that could induce the immunogenicity of the designed vaccine constructs were taken.
Also, disulfide engineering was employed to enhance the stability of the designed vaccine. The purpose of the molecular docking analysis was to show the proposed epitopes could interact with at least one MHC molecule at minimum binding energy. Therefore, it was done to explore the heterodimer in THP-1 monocytes (Gulati et al., 2019). So, a docking study was also performed to analyze the affinity between the designed construct and human TLR1/2 heterodimer. The vaccine receptor complex showed deformability at a minimum level, which also strengthened our prediction. Finally, the optimized codons of the designed construct been cloned into the pET28a(+) vector of E. coli strain K12.
The idea of subtractive genomic analysis using various bioinformatics tools has brought a revolution in the drug discovery process. The present study will help to develop novel therapeutics and preventive measures against V. parahaemolyticus, thereby help to reduce the mortality and morbidity caused by it. However, further in vivo trials using model organisms are highly recommended to validate our prediction.

Materials and Methods
The whole proteome of V. parahemolyticus was analyzed according to subtractive genomic approach to recognize novel drug targets as well as vaccine candidates. The overall workflow for subtractive proteomic analysis and vaccinomics approach has been illustrated in Fig .1 and Fig. 2, respectively.

Retrieval of complete proteome and identification of essential proteins
The whole proteome of V. parahemolyticus strain O3:K6 was retrieved from NCBI Genome database. Paralogous sequences were excluded from the proteome of V. parahemolyticus by using CD-HIT (Li and Godzik, 2006). With a cutoff score of 0.6, proteins with more than 60% identity were excluded. Remaining proteins were subjected to BLASTp against H. sapiens human Refseq proteome in 'Ensemble Genome Database 92' using threshold expectation value (E-value) 10 -3 as the parameter. Proteins were assumed as homologous were excluded if any significant hit above the threshold value 10 -4 was found. The remaining non-homologous proteins were subjected to the Database of Essential Genes (DEG) Luo, 2013). Proteins hit with expectation value ≤ 10-100, identity ≥ 25% were listed for metabolic pathway analysis considering as essential non-homologous proteins of V. parahemolyticus.

Analysis of metabolic pathways
Kyoto Encyclopedia of Genes and Genomes which contains complete metabolic pathways present in living organisms (Moriya et al., 2007). Metabolic pathways of V. parahemolyticus were analyzed against the human metabolic pathways through the KEGG server. All metabolic pathways present in the pathogen (V. parahemolyticus) and host (H. sapiens) were collected from the KEGG PATHWAY database using three letters KEGG organism code 'vpa' and 'has' respectively. A comparison was made in order to recognize the unique metabolic pathways only present in the V. parahemolyticus, while the remaining pathways of the pathogen were grouped as a common one. Identified host non-homologous, essential proteins of V. parahemolyticus were subjected to BLASTp through the KAAS server at KEEG. Proteins present only in the unique metabolic pathways of the pathogen were listed for further analysis.

Druggability analysis and identification of novel drug targets
A 'druggable' target needs to have the potentiality to bind to the drugs and drug-like molecules with high affinity. Shortlisted unique proteins were screened through the database of DrugBank 5.1.0 (Wishart et al., 2017) using default parameters to identify both druggable proteins and novel therapeutic targets.

'Anti-target' analysis and prediction of subcellular localization
This analysis was performed to avoid any kind of cross-reactivity, and toxic effects due to docking between the drugs administered for the pathogen and the host 'anti-targets'.'Anti-targets' are gene servers were used to predict subcellular localization of shortlisted pathogen-specific essential proteins.

Human microbiome non-homology analysis
Both membrane and cytoplasmic proteins were subjected to BLASTp through NCBI protein blast server against the dataset present in the Human Microbiome Project server (https://www.hmpdacc.org/hmp/) "(BioProject-43021)" (Turnbaugh et al., 2007) with a cutoff score 0.005. Membrane proteins showing <45% similarity were selected for vaccine candidacy, whereas cytoplasmic proteins showing <45% similarity were selected for protein-protein interaction analysis.

Analysis of virulence factors (VF's) and protein-protein interactions studies (PPIs)
Virulence factors are responsible for modulating or degrading host defense mechanisms by bacteria. Novel cytoplasmic proteins with the least similarity with the human microbiome were subjected to BLASTp search against the database of protein sequences from the VFDB core dataset (Chen et al., 2005) with default hit with cut-off bit score >100, and E-value was 0.0001. The protein-protein interactions studies (PPIs) of selected shortlisted proteins were predicted using STRING v10.5 (Szklarczyk et al., 2018). PPIs with a high confidence score (≥90%) were considered to avoid false-positive results. Only characterized proteins were subjected to BLASTp.

Screening of drug molecules against novel cytoplasmic proteins
All the pharmaco-metabolites reported in the Human Metabolites Database (www.hmdb.ca) were used for the screening of suitable drugs. Molecular docking was performed against predicted drug targets (novel cytoplasmic proteins) by using AutoDock Vina tools (Trott and Olson, 2010). The size of the grid box was set to 54 A° x 74 A° x 126 A° (x, y and z) and 65A° x 85 A° x 65 A° (x, y and z) with 1 A° spacing between the grid points for two cytoplasmic therapeutic target proteins (Q87TC9 and Q87165 respectively).

Screening of novel outer membrane proteins for vaccine construction
The VaxiJen v2.0 (http://www.ddg-pharmfac.net/vaxijen/) was used for the investigation of protein immunogenicity to find the most potent antigenic outer membrane proteins (Doytchinova and Flower, 2007a). Proteins were prioritized based on their antigenic score (threshold value 0.4) and sequence similarity with human microbiota.

Prediction of 3D structures for superior epitopes and analysis of molecular docking
Top-ranked epitopes were subjected to the PEP-FOLD server to predict peptide structures (Wang et al., 2011). Depending on the available structures deposited in Protein Data Bank (PDB) database, HLA-A*11:01 and HLA-DRB1*04:01 were selected for docking analysis with MHC class I and class II binding epitopes respectively. MGLTools were used to visualize and analyze the molecular structures of biological compounds (Sanner, 1999). The grid box was set to 28 A°, 18 A°, 20 A° (x, y and z) with a default value of 1.0 A° spacing by AutodockVina at 1.00-°A spacing. The exhaustiveness parameter was kept at 8.00, and the number of outputs was set at 10 (Morris et al., 2009). Output PDBQT files were converted in PDB format using Open Babel. The docking interaction was visualized with the PyMOL molecular graphics system (https://www.pymol.org/).

Identification of B-Cell epitope
B cell epitopes were predicted for both proteins to find the potential antigens that would interact with B lymphocytes and initiate the immune response. Several tools from IEDB i.e. Kolaskar and Tongaonkar antigenicity scale (Kolaskar and Tongaonkar, 1990), Karplus and Schulz flexibility prediction (Karplus and Schulz, 1985), Bepipred linear epitope prediction analysis (Jespersen et al., 2017), Emini surface accessibility prediction (Emini et al., 1985), Parker hydrophilicity prediction (Parker et al., 1986) and Chou and Fasman beta-turn prediction (Chou and Fasman, 1978) were used to identify the B cell antigenicity depending on six different algorithms.

Epitope cluster analysis and vaccine construction
Epitope cluster analysis tool from IEDB was used to identify the epitope clusters with overlapping peptides for both proteins using the top CTL, HTL and BCL epitopes as input.

Allergenicity, antigenicity and solubility prediction of different vaccine constructs
The AlgPred v.2.0 (Saha and Raghava, 2006) and AllerTOP v.2.0 (Dimitrov et al., 2013) servers were utilized to predict the non-allergic behavior of the vaccine constructs. For proposing the superior vaccine candidate, the VaxiJen v2.0 server (Doytchinova and Flower, 2007b) was utilized. The probable antigenicity of the constructs was determined through an alignmentindependent algorithm. Protein-sol software (Hebditch et al., 2017) predicted the solubility score of the proposed vaccines.

Physicochemical characterization and secondary structure analysis
The ProtParam, a tool from Expasy's server (http://expasy.org/cgi-bin/protpraram) (Gasteiger et al., 2003;Hasan et al., 2015)  The I-TASER server  was employed for determining the 3D structure of designed vaccine constructs based on the degree of similarity between the target protein and available template structure from PDB. Refinement was performed using ModRefiner (Xu and Zhang, 2011). The refined protein structure was validated through the Ramachandran plot assessment by the MolProbity software (Chen, V.B. et al., 2010). Residues in the highly mobile region of the protein exhibit the potential to be mutated with cysteine. Pairs of residues with proper geometry and the ability to form a disulfide bond were detected by the DbD2 server to perform disulfide engineering (Craig and Dombkowski, 2013). The value of chi3 considered for the residue screening was between −87 to +97 while the energy considered was<2.5.

Protein-protein docking and molecular dynamics simulation
The binding affinity of the vaccine constructs with different HLA alleles and human immune receptors, ClusPro 2.0 (Comeau et al., 2004), hdoc (Macalino et al., 2018) and PatchDock server (Schneidman-Duhovny et al., 2005) were applied. Desirable complexes were identified according to better electrostatic interaction and free binding energy following refinement via the FireDock server (Mashiach et al., 2008). The iMODS server was used to explain the collective motion of proteins via analysis of normal modes (NMA) in internal coordinates (Lopez-Blanco et al., 2014).
Essential dynamics is a powerful tool and alternative to the costly atomistic simulation that can be compared to the normal modes of proteins to determine their stability (Aalten et al., 1997;Wuthrich et al., 1980;Cui and Bahar, 2007). The server predicted the direction and magnitude of the immanent motions of the complex in terms of deformability, eigenvalues, B-factors and covariance. The structural dynamics of the protein-protein complex was investigated (Prabhakar et al., 2016).

Codon adaptation, in silico cloning and similarity analysis with human proteins
A codon adaptation tool (JCAT) was used to adapt the codon usage to the well-characterized prokaryotic organisms for accelerating the expression rate in it. Rho-independent transcription termination, prokaryote ribosome binding site and cleavage sites of restriction enzyme ApaI and BglI were avoided while using the server (Grote et al., 2005). The optimized sequence of vaccine protein V1 was reversed, followed by conjugation with ApaI and BglI restriction site at the Nterminal and C-terminal sites, respectively. SnapGene (Solanki and Tiwari, 2018)             For each graph, X-axis and Y-axis represent the position and score. Residues that fall above the threshold value are shown in yellow color while the highest peak in yellow color identifies most favored position.         -(6,7,9,10,17,18,20,21-octahydrodibenzo[b,k][1,4,7,10,13,16]