Comparative proteomic analysis to annotate the structural and functional association of the hypothetical proteins of S. maltophilia k279a and predict potential T and B cell targets for vaccination

Stenotrophomonas maltophilia is a multidrug-resistant bacterium with no precise clinical treatment. This bacterium can be a vital cause for death and different organ failures in immune-compromised, immune-competent, and long-time hospitalized patients. Extensive quorum sensing capability has become a challenge to develop new drugs against this pathogen. Moreover, the organism possesses about 789 proteins which function, structure, and pathogenesis remain obscured. In this piece of work, we tried to enlighten the aforementioned sectors using highly reliable bioinformatics tools validated by the scientific community. At first, the whole proteome sequence of the organism was retrieved and stored. Then we separated the hypothetical proteins and searched for the conserved domain with a high confidence level and multi-server validation, which resulted in 24 such proteins. Furthermore, all of their physical and chemical characterizations were performed, such as theoretical isoelectric point, molecular weight, GRAVY value, and many more. Besides, the subcellular localization, protein-protein interactions, functional motifs, 3D structures, antigenicity, and virulence factors were also evaluated. As an extension of this work, ’RTFAMSSER’ and ’PAAPQPSAS’ were predicted as potential T and B cell epitopes, respectively. We hope our findings will help in better understating the pathogenesis and smoothen the way to the cure.

Introduction other phenomena of the HPs of S. maltophilia. Furthermore, we will also predict the best epitope-based subunit vaccine candidate and different B and T cell epitopes.

Materials and methods
The complete framework and the tools used in this study are depicted in Fig 1 and Table 1, respectively. The whole process is comprised of three phases: Phase-I, Phase-II, and Phase-III. The genome analysis and characterization of the HPs are performed in Phase-I. Phase-II includes annotations of different functional properties using multiple servers and tools. Prioritization of targets to design a vaccine against the pathogen and validation of the findings are illustrated in Phase-III.

Phase-I
Sequence retrieval. The complete genome sequence of S. maltophilia K279a (GeneBank Assembly ID: GCF_000072485.1) (RefSeq: NC_010943.1) was retrieved from the NCBI database (http://www.ncbi.nlm.nih.gov/). There were about 789 Hypothetical Proteins (HPs) among the 4332 proteins, which were separated and stored as FASTA files using the Refseq accession number for further analysis. Different computational strategies were followed to predict various essential properties of those proteins.
Conserved domain search. The repetitive sequences or recurring structural units that have notable functional capabilities in many contexts of a protein and thought to modulate or signify different functions in different proteins through their unique re-combinational rearrangements can be called domains [62]. Throughout evolution, these domains work as building blocks that are rigidly conserved among the protein families rather than the whole protein sequences [63].
To classify the protein families and to predict the highly conserved and well-defined domains or folds in the HPs, we used four online bioinformatics tools, namely CDD-BLAST [64][65][66], SMART [67], PFAM [68], ScanProsite [69]. All the HPs were subjected to those web tools mentioned above, which resulted in variable predictions of the conserved domains among the HPs. Thereat, variability was observed in the confidence level of the cumulative predictions. Percentile confidence scores were given to determine a higher or lower confidence level, i.e., a combinatorial score of 100 is given to those proteins which are being predicted to have the same functional domains. After analyzing all the HPs, we have found 24 proteins that have a high confidence level (HCL) of 100 and considered them for further investigations.
To find out highly conserved domains of a query protein, NCBI's online tool CDD-BLAST uses RPBLAST, which is derived from PSI-BLAST, scans the query sequence with the help of Position Specific Scoring Matrices (PSSMs) against the Conserved Domain Database. The SMART stands for Simple Modular Architecture Research Tool, which is a web-based server that predicts domain profiles and architectural similarities of the target protein using stable Ensembl [70], SP-TrEMBL [71], Swiss Prot [72], where direct similarities search among the sequences is avoided. Pfam database has two parts: Pfam-A and the Pfam-B. The Pfam-A is derived from Pfamseq, which is built from the updated release of UniProtKB at a particular time frame. Each family of Pfam-A contains three major elements, namely: A curated seed alignment, a full alignment that is automatically constructed, and Profile Hidden Markov Models (Profile HMMs). On the contrary, the Automatic Domain Decomposition Algorithm (ADDA) generates low-quality un-annotated Pfam-B families using nonredundant clusters [73]. Most of the proteins of a set of an enormous number of proteins that are functionally different can be grouped into a narrow range of families according to their sequence similarities. This principle is the core of the web-based prediction tool ScanProsite.

Phase-II
Physicochemical characterization. Different physicochemical properties of the HCL proteins were measured using Expasy's ProtParam [74] server, i.e., theoretical isoelectric point (pI), molecular weight (MW), total amino acid number, Aliphatic index (AI) [75], extinction coefficient [76], grand average of hydropathy (GRAVY) [77], the total number of positive and negative charged residues and instability index [78]. Subcellular localization. Depending upon different positional orientations, a protein can be targeted for vaccines (structural or extracellular proteins) or drugs (cytoplasmic or

Tools URL Remarks
Physicochemical Characterization intracellular proteins) [79], where UniProtKB can be useful for experimental proteins information [80]. There is an information gap about the HPs as they are not experimentally characterized. Hence, their subcellular localizations are also in concealment. To unveil this characteristic feature, online bioinformatics tools CELLO (v2.5) [81], which uses a system based on two-level SVM (Support Vector Machine) and PSORTb [82], which is the most reliable subcellular localization prediction tool for bacteria, was used. Besides, to predict signal peptide and secretory pathway (non-classical), we used the neural network-based system Sig-nalP [83] and CBS server's tool SecretomeP [84], respectively. Our study also used SOSUI [85], TMHMM [86], and HMMTOP [87] to predict the solubility and transmembrane topology of the proteins. Domain and function assignment.
To predict the precise functions of the proteins, we employed several servers for the accuracy of the work. CDD (Conserved Domain Database), ScanProsite, SMART, and Pfam were used earlier to search the domains. Furthermore, to assign functional motifs, the online tool MTIF (https://www.genome.jp/tools/motif/) was recruited where the output is very large. We also used InterProScan [88], which works in a combination of different signature recognition methods of proteins, utilizing InterPro consortium, where large databases like Pfam, SUPERFAMILY, SMART, PANTHER, ProSite are the integral parts.
Protein structure prediction. Along with the functional motif prediction of the HPs, it is crucial to predict the 3D structures as well [89]. Template-based protein structure prediction online tool PS square version 2, popularly known as PS2-v2 [90], was exploited to predict the tertiary structure of the proteins. Protein FASTA sequence is the input format for the query, which is analyzed using both Pair-wise and multiple sequence alignments in the combination of IMPALA [91], PSI-BLAST [64,92], T-Coffee [93] through both target-template selection and alignment. By default, the best homologous template is selected based on scores to generate a 3D structure using the amino acid sequence of the target protein with the help of an integrated modeling package. However, the server failed to generate a 3D model for some proteins. To overcome this problem, we implemented the manual system to select a template from the suggested list of the server and generated the 3D models of those proteins. Though it was executed successfully still, there was an error in template selection and modeling for two proteins. We overcame this problem using the SWISS-MODEL [ , where only the highest scored protein was taken as a functionally associated partner. Besides, this study also showed the PPI network and gene co-occurrence for the highest antigenic and all the virulent proteins.
Identification of antigenic protein. All the previous analyses helped us to select 11 proteins among the entire set of HCL HPs, which are predicted to be connected to classical or non-classical secretory pathways or localized in the extracellular space/periplasm/plasma membrane by CELLO prediction server or possessed one or more transmembrane topology. These types of proteins are generally targeted for subunit vaccines. To check the probability of these proteins as potential protective antigens, we used the VaxiJen v2.0 [100] server at a threshold of 0.5 for a very high precisions level. Besides, we also checked the antigenicity of the rest of the proteins as they can also induce cell-mediated and humoral immunity [101], and we found about 15 proteins to be antigenic out of the 24. Among them, the most antigenic protein was taken to predict potential B cell and T cell epitopes.

Phase-III
T cell epitope identification. To induce cell-mediated and humoral immunity, identifying potential epitopes for T cell and B cell is essential. A tool from the CBS server, NetCTL 1.2 [102], was used at threshold 0.5 with a sensitivity of 0.89 and specificity of 0.94 to predict probable epitopes. The prediction is based on the peptide to MHC-I (Major Histocompatibility Complex class I) binding, C terminal proteasomal cleavage, and TAP (Transporter associated with Antigen Processing) transport efficiency using 12 prominent supertypes of MHC-I. This server uses ANN (Artificial Neural Network) based method to predict MHC-I binding and C terminal proteasomal cleavage where TAP transport efficiency is calculated using the Weight Matrix method.
For peptide to MHC-I binding prediction, the Stabilize Matrix Method (SMM) [103] was selected in a tool from IEDB (Immune Epitope Database) [104], which was employed to determine the IC 50 (Half Maximal Inhibitory Concentration) value. All the available alleles were selected with the peptide length of 9.0 before the prediction. Finally, selected epitopes were analyzed using the T cell epitopes-processing prediction tool that calculates a combinatorial score for TAP transport, proteasomal cleavage, and MHC-I binding [105]. We used the SMM method in this case as well.
Prediction of population coverage. Among the different ethnicity, the coverage of our proposed epitopes with corresponding HLAs was calculated using the population coverage tool [106] from the IEDB server.
Allergenicity appraisal. Two web-based tools were used to predict the allergenicity of the epitopes with very high specificity, namely: AllerTOP v2.0 [107] with an accuracy of 85.3% and AllerCatPro [108] with 84% accuracy.
Molecular docking simulations. Before docking, the 3D structure of the epitope RTFAMSSER was built using PEP-FOLD3 [109], and the PDB (Protein Data Bank) structure of the HLA-C � 03:03 (PDB ID: 1EFX) was retrieved from the RCSB database [110] where it was complexed with human natural killer cell receptor KIR2DL2. Then the complex was opened using Discovery Studio [111] to remove the receptor and recover the simplified HLA-C � 03:03.
Autodock Vina [112] was used to calculate the binding energy between the target epitope and the corresponding HLA. The docked complex was visualized using PyMol [113] and UCSF Chimera [114].
However, the rest of the epitopes and HLA alleles were also subjected to molecular docking simulation following the similar procedure in order to estimate the relation between the docking score, IC50 value, and combined score of proteasome score, TAP score, MHC-I score, processing score.
Linear and conformational b cell epitope identification. B lymphocytes play a crucial role in the induction of immune response mediated by B cell epitopes [115]. We used IEDB B cell epitope prediction tools to identify the B cell epitopes. Bepipred linear epitope prediction analysis [116], Kolaskar and Tongaonkar antigenicity scale [117], Karplus and Schulz flexibility prediction [118], Emini surface accessibility prediction [119], Parker hydrophilicity prediction [120] were performed to predict and confirm the linear antigenic B cell epitope properties. As beta-turn regions of a protein are found in the antigenic portions [121], we utilized the Chou and Fasman beta-turn prediction tool in this regard [122]. Furthermore, the conformational or discontinuous B cell epitopes were also predicted using the IEDB tool Elipro [123]. For this prediction, the 3D structure of the protein was built using PS2-v2 and validated. Then the valid, optimized structure was submitted to the server, and the scoring criteria were set at 0.5, where less than that value is rejected, and the most stringent score is considered to be at 1.0. To calculate the residue clusters, 6.0 Å (Angstrom) was selected as the maximum distance parameter.

Sequence evaluation
The implementation of advanced technologies in DNA sequencing techniques enables us to reveal the exact sequence of an immense number of bacterial genomes in a short time with a considerably low cost. Many genes are found to be conserved in a broad spectrum of bacterial genomes throughout the evolutionary process. As a result, the precise annotations and functions of these genes are assigned using sequence homology or similarity search against functionally specified genes [124]. Although, one-third of the sequenced genes have no specified functional assignment due to the rapid deviations of functions between the similar gene sequences in the road to evolution [125,126]. Consequently, only sequence homology or similarity search cannot predict or ascertain the proper function of a gene, which ultimately results in faulty functional allocation [127].
To overcome this crux and lessen the proportion of HPs, it is recommended to use multiple bioinformatics tools for discovering appropriate functions of the hypothetical proteins. On account of this, the current study focused on annotating the functions of the hypothetical proteins of Stenotrophomonas maltophilia by recruiting diverse bioinformatics methods and tools. At first, the conserved domains for the 789 hypothetical proteins were searched with the help of four bioinformatics web tools, namely Pfam, CDD-BLAST, ScanProsite, SMART. Based on these results, the proteins were classified into five groups where 24 proteins showed a specific consensus functional domain in all the tools and hence are grouped into high confidence level (HCL) proteins. The tools did not find any domain for 479 proteins, and the combined confidence level was zero. Remaining HPs (286 proteins) showed hit in one, two, or three of the four tools mentioned above, which resulted in different confidence levels (i.e., 25% for 172, 50% for 59, and 75% for 55 proteins). The result is summarized in the S1 Table. However, further analysis is required to reveal the proper functions of these proteins. We considered only the 24 HCL HPs for downstream study because these proteins showed at least one conserved domain in all four servers. To avoid false-positive results and increase the accuracy of the study, we excluded all the other four confidence level proteins.
The theoretical pI, molecular weight, extinction coefficient, total number of negative and positive charged residues, instability index, GRAVY value, and other physiochemical properties of the HCL HPs were measured by the online bioinformatics tool ProtParam, and the result is shown in S2 Table. The cumulative value of hydropathy for all the amino acid residues of a protein chain is divided by the total number of residues of that protein sequence to calculate the GRAVY value [77]. The lower GRAVY value indicates the possibility of a protein being hydrophilic (globular), where the higher value confirms the hydrophobic (membranous) nature of the proteins.
We found the GRAVY values of our concerned proteins ranging from -0.958 to -0.044, which points towards the hydrophilic properties of the proteins and helps in predicting the localization. Functional motifs of these hypothetical proteins were discovered using web-based tools MOTIF and InterProScan for further confirmation about the functions. Using the tertiary structural information, we can validate the predicted biochemical functions of a protein [128]. So, we assigned the PS2-v2 server for the resolution of the 3D structure of HCL HPs, which generates a PDB file in a template-based manner and fold recognition scheme. Then all the sequence evaluation data were collated, and the HCL HPs were sorted into different functional groups, which consist of eleven enzymes, three binding proteins, four regulatory proteins, two inhibitors, two transporters, and two proteins of manifold functions. These groups are described below: Enzymes. Bacterial enzymes are crucial for their pathogenesis in the host. They also provide essential nutrients and control various metabolic pathways, which helps in the growth and survival of the organism [129]. In our study, we found 11 enzymes among the 24 annotated HCL HPs that have different physiological and pathological importance to S. maltophilia. Among them, WP_005408386.1 and WP_012479842.1 are phosphotransferases (catalyze phosphorylation reactions), which play a key role in the bacterial PTS (Phosphotransferase System) in transporting sugar [130]. Besides, WP_012479842.1 is a member of the chloramphenicol phosphotransferase-like protein family. This protein phosphorylates and inactivates the lethal chloramphenicol metabolites in bacteria, which inhibits ribosomal peptidyl transferase and thus shuts protein production down [131,132].
We found WP_005409007.1 protein to be a member of the SmrA superfamily. Member of this family contains the Smr domain, which is thought to participate in crossing over, mismatch repair, or segregation, and it also has nicking endonuclease activity [133,134]. Vicinal Oxygen Chelate (VOC) is a family of proteins that are involved in sequestering and localizing metal ions. This type of domain or fold consists of two β-α-β-β-β units, which are responsible for the formation of the partially closed beta-sheet barrel around the metal ions [135]. The protein WP_005414366.1 was found to be a member of the VOC superfamily. So, we assume this protein may involve in the metal resistance trait in the organism. The protein WP_012478637.1 contains the Haloacid Dehydrogenase or HAD domain superfamily, which participates in various cellular processes, i.e., detoxification, amino acid biosynthesis, and many more [136,137]. X-ray crystallography revealed the conserve hydrolase fold analogous to the Rossmann fold found in the members of this superfamily [138]. This fold contains two subdomains, where the large one remains strictly conserved, and the small domain shows structural variations among the classes [139]. WP_012480920.1 protein belongs to the Isoprenoid Biosynthesis Enzymes Class-I. Protein WP_012478648.1 was found to maintain the protein tyrosine phosphatase superfamily, which is homologous to the dual-specificity protein phosphatase known as Cyclin-Dependent Kinase Inhibitor-3 (CDKN3) [140]. WP_012480806.1 glycosidase enzyme possesses six helical hairpin structures in a closed circular order and hence are included in the sixhairpin glycosidase superfamily [141]. We found the CheB domain in WP_012481043.1 protein, which is a strong indication for this protein of being a member of methylesterase CheB, C-terminal superfamily. The members of this superfamily consist of parallel β sheet with the α-β-α array in seven strands and remove the methyl group from the methyl-accepting chemotaxis proteins (MCP) [142,143]. Among the enzymes, we were able to identify only one protease enzyme (WP_044570756.1) containing DUF2268 (DUF is annotated as Domain of Unknown Function) domain, which is predicted as a Zn-dependent protease.
Binding proteins. We have characterized two (WP_005412620.1 and WP_005413412.1) calcium ion binding proteins containing the EF-hand domain, and the rest is a DNA binding protein. EF-hand Ca 2+ -binding motifs are found in pairs. Each of them comprises a loop that is 12 residues long where a 12 residue α-helix flanks the loop on either side [144]. The conformation of the EF-hand motif changes upon the binding of the Ca 2+ ion. The ion is positioned in the loop in a pentagonal bipyramidal fashion [145,146]. The DNA binding protein WP_012479848.1 belongs to the Bro-N family proteins which function is unknown. But the experimental shreds of evidence of Bro-A and Bro-C suggest its ability to regulate host DNA replication and/or transcription by binding with it directly [147].
Regulatory proteins. In this study, we were successfully able to characterize a novel regulatory protein (WP_012479796.1) of S. maltophilia that is crucial for its extensive multi-drug resistance nature. This protein is a member of the LuxR transcription regulatory protein family, which is one of the most important proteins in Quorum Sensing (QS). It also plays key roles in plasmid transfer, motility, biofilm formation, nodulation, and the expression of many genes that includes the antibiotics and virulence factors encoding genes [148]. This family protein has an autoinducer binding domain at the N-terminal that generally binds to the N-acyl homoserine lactones (AHL). Binding with autoinducer results in the dismantling of the C-terminal DNA-binding domain that promotes it to bind with the DNA and actuate the transcription [149].
WP_012479125.1 and WP_012480949.1 protein contains the structural motif Tetratrico Peptide Repeat (TPR). This protein domain consists of 34 amino acids that are repeated 3-16 fold and occur in a helix-turn-helix array with the nearby TPRs in a parallel manner, which results in anti-parallel α-helices [150,151]. These proteins are engaged in many biological processes, such as the regulation of transcription, cell cycle, protein transport, and folding [152].
The functional analysis disclosed a vital protein (WP_012478875.1) that can act as a regulatory protein and immune protein both at the same time due to the presence of Ankyrin repeat-containing domain and NTF2 fold domain. NTF2 domain-containing proteins are found in the polymorphic toxin system of bacteria [153]. This domain is always fused with ankyrin repeats, which is a multi-repeat β 2 -α 2 motif of 33 amino acid residues [154]. Proteins of these domains can participate in a variety of functions, including the initiation of transcription, ion transportation, cell-cycle regulation, and signal transduction [155].
Inhibitor proteins. Two HCL HPs among the annotated 24 showed similarities with lysozyme inhibitors. Lysozymes are the hydrolase enzymes recruited by the innate immune system of animals for the degradation of bacterial major cell wall component peptidoglycan [156]. WP_005413200.1 is a C-type lysozyme inhibitor superfamily protein, more specifically membrane-bound lysozyme inhibitor of C-type lysozyme (MliC), which are well known for their conferring support in extensive lysozyme tolerance to the gram-negative bacteria [157]. This protein forms ionic and hydrogen bonds with its invariant loop to the lysozyme at the active site cleft [158]. The second inhibitor (WP_044569343.1) is of the IVY (Inhibitor of Vertebrate Lysozyme) superfamily, which is also known as a virulence factor [159,160]. IVY proteins consist of three layers of α 2 -β 5 -α 2 topology and a crucial 5-residue long loop for the inhibitory function [161].
Transporter proteins. Maintenance and assembly of outer membrane (OM) components play a vital role in bacterial survival and pathogenesis. To aid this process, many transport proteins are involved in bacteria. We found two such proteins, namely the LPS-assembly lipoprotein LptE (WP_005410539.1) and Curli production assembly/transport component CsgG (WP_032966398.1). During the assembly through the beta-barrel assembly machine, LptE interacts with LptD and forms a complex [162] that is involved in lipopolysaccharides (LPS) assembly at the outer region of OM [163][164][165]. Along with them, LptA, LptB, and LptC are also involved in the LPS transport machinery. Blocking any of them disrupts the LPS assembly system as a whole and creates the same type of OM biogenesis defects [164]. On the other hand, CsgG is a lipoprotein that works as the stabilizer for CsgA and CsgB during the Curli assembly [166]. The Curli protein is amyloid fiber in nature and promotes cell to cell communication via biofilm formation [167].
We found two proteins showing miscellaneous functions. One of them (WP_024956629.1) contains the DUF2329 domain, which is a domain of unknown functionality. But WP_005410716.1 proteins were found to have a CheW like domain associated with the chemotaxis process of the bacteria [168]. The domain is about 150 residues long and is made up of two β-sheet subdomains. Every beta-sheet is comprised of a five-stranded loose beta-barrel centering a hydrophobic core component [169].
The MOTIF and InterProScan servers were used to validate the predicted functions of the proteins by the blast servers ( Table 2). Web-based tool STRING was employed to predict the possible functional partners of the HCL HPs (S3 Table).

Structure prediction
All of the 24 HCL HPs were subjected to the PS2-v2 server to generate the 3D structure of the proteins. The server effectively produced a PDB file for each of the 22 proteins. In the case of the rest two proteins, it showed an error message, which is due to the inappropriate or unavailability of a suitable template for the prediction. To solve this problem, we used the SWISS--MODEL and generated the 3D structure. The result is depicted in the S3 Table.

Subcellular localization prediction
The subcellular localization of the HCL HPs was predicted using various bioinformatics tools, which predicted not only their cellular locus but also their solubility and secretion or signaling ability along with possible membrane helices. Among the 24 HCL HPs we found about 10 proteins (WP_005412620.1, WP_005413200.1, WP_005413412.1, WP_012479125.1, WP_012480806.1, WP_012480949.1, WP_024956629.1, WP_032966398.1, WP_044569343.1, WP_044570756.1) that are in or near the outer membrane or the periplasmic space of S. maltophilia. All of them have at least one transmembrane helix to anchor the membrane. The remaining 14 proteins were predicted as cytoplasmic soluble proteins with no transmembrane helices. An exception of them is the protein WP_005410539.1. This protein possesses one transmembrane helix, which was further verified by all three tools (HMMTOP, TMHMM, and SOSUI). The result of subcellular localization is shown in the S4 Table. Virulent protein prediction Virulentpred and VICMpred were used to predict the virulence factor of the high confidence level hypothetical proteins. These web tools predicted two HPs among the 24 proteins as virulent, and the other proteins were either non-virulent or predicted virulent by only one server. The result is shown in Table 3. It is thought that the virulence factors can be potentially good candidates and can provide comparatively better therapeutic interposition in case of bacterial infections [170]. Characterized virulent HPs can yield a dynamic target-based therapy against the infections and can be a subsidiary therapy to the antibiotics or can work as effector molecules to the immune response of the host [171].

Antigenic protein prediction
Antigenicity of a protein is the primary requirement of being targeted by the host immune system [172]. Vaxijen server 2.0 predicted about 15 proteins as a probable antigen candidate with a threshold of 0.50 for higher sensitivity and accuracy. The scores of the remaining nine proteins were below the threshold value, and thus, they were excluded. The result is shown in Table 4.

Protein-protein interaction
Interaction between various proteins plays a crucial role in all most all biological processes. One protein mutually interacts with others to perform common cellular functions. For example, the activation of transcription includes multiple transcription factors that work together in gene expression. Moreover, the functions of proteins can be predicted using their PPI information because it is very rare for a protein to interact with different biomolecules. Therefore, PPI databases have become an important resource to study biological networks and pathways [173]. We predicted the PPI and gene co-occurrence for three annotated HCL HPs (highest antigenic protein and two virulent protein), which are thought to be vital players in the pathogenesis of the organism (Fig 2). Gene Co-occurrence network is the graphical visualization of a particular gene network that is possibly present, not necessarily conserved, in a variety of biological organisms. Here in the figure, A1, B1, and C1 are the PPI network of the protein WP_012479796.1, WP_012480949.1, and WP_005413200.1, respectively, while A2, B2, and C2 depicted their corresponding gene co-occurrences. The colored nodes of the PPI network represent functionally associated first shell proteins, and each edge shows the type of interactions. On the other hand, Gene co-occurrence is presented as a phylogenetic tree where the topmost part contains the proteins of the specific network, and the left side contains the organisms. Right-sided color denotes the similarities for a particular gene of interest in a given genome. Higher the color intensity, the higher the sequence similarity or conservancy. For a clade that is collapsed in the tree, the highest and lowest similarities are indicated by two distinct colors.

Epitope prediction for vaccine target
For the prediction of subunit vaccine candidates, the outer membrane proteins of the bacteria are the target of choice. We have selected only the outer membrane/periplasmic/extracellular proteins predicted by the CELLO prediction tool. We have found 11 such proteins (S4 Table). Though each of them can induce the immune response in the host, we selected only the highest antigenic protein (WP_005413200.1 scored 1.1056 in VaxiJen) for this purpose.
T cell epitope prediction. NetCTL server identified potential T cell epitopes with preselected criteria using the selected antigenic protein. Seven epitopes that have a combinatorial score of more than 1.5 were selected, and the data is presented in Table 5.
Using the SMM method, we predicted the MHC-I binding affinity for all of the seven epitopes. A broad range of MHC Class I alleles was screened for interaction with the epitopes. The lower or higher IC50 value measured the affinity. The lower the IC 50 higher the affinity, The cutoff was 0.5, which means less than that value is probable non-antigenic in nature. https://doi.org/10.1371/journal.pone.0252295.t004 and vice versa. We allowed only those MHC-I alleles that interacted with the epitopes with an IC 50 value of less than 250nM ( Table 6).
The IEDB tool predicted MHC-I processing (TAP transport, proteasomal cleavage, and MHC-I combined predictor) with a combined score for individual epitopes from the submitted protein sequence. Peptides are formed due to the cleavage of peptide bonds with the help of the proteasome complex. Then these peptides bind to the MHC Class I molecules and are transported by the TAP proteins to the plasma membrane and presented to the CD4 + T cells or helper T lymphocytes. Higher the combinatorial score higher the processing potency ( Table 6).
The 9 mer peptide RTFAMSSER interacted with the maximum number of alleles among the seven epitopes. The interacted alleles include HLA-A � 31:01, HLA-A � 68:01, HLA-C � 12:03,   Table 6). Allergenicity assessment and population coverage. To avoid cross-reactivity, all the epitopes were subjected to AllerTOP v2.0, and AllerCatPro and six epitopes were predicted as non-allergens by these servers where epitope WTKGSDDGL found to have allergic activity ( Table 6). So, we excluded that epitope for further analysis.
Population coverage is a crucial parameter in vaccine development. Hence, the cumulative population coverage percentage was measured using the IEDB population coverage tool for all Table 6. Promising T cell epitopes with their properties: IC 50 value, docking score (kcal/mol), combinatorial processing score.

Serial
No.

IC50 Value <250nM
The combined score of Proteasome score, TAP score, MHC-I score, processing score the non-allergenic epitopes. We found the maximum coverage in Europe, which was 90.03%, followed by Northeast Asia 85.65%, and South Asia 84.06%. Besides, we also measured the population coverage for North America (82.53%) and Southeast Asia (80.64%). The cumulative World population coverage was 85.30%. The results are summarized in Table 7 and Fig 3 Molecular docking analysis. Molecular docking is the most common method used in reverse vaccinology to analyze the interaction pattern between epitopes and MHC molecules. We performed molecular docking in general for all the epitopes and the respective alleles ( Table 6). The ranges of docking score, i.e., binding affinity was between -6.9 to -10.4 kcal/mol, respectively. The IC 50 values were taken for the study were <250 nM, which is an indication of strong binding affinity between alleles and their respective epitopes. The higher the IC50, the lower the affinity [105]. Along with that, the combined scores of proteasome score, TAP score, MHC-I score, processing score are a quantity-based prediction that is proportional to the total amount of peptides presented by the MHC molecules on the surface of the cells. Higher the value, the higher the amount of presented peptides [105]. The IC50 value, combined scores, and docking scores cumulatively showed a strong interaction pattern between the epitopes and the HLA with an average binding affinity of -8.4 kcal/mol. Though all the non-allergen epitopes can individually induce an immune response, we took only the RTFAMSSER epitope for post docking interaction analysis because it interacted with the maximum number of alleles as compared to others. To check the interaction modes between the predicted T-cell epitope and the HLA-C � 03:03, molecular docking was performed using Autodock Vina. The result comes with a binding affinity of -7.8 kcal/mol. In addition, our study about the HLA-C � 03:03 suggests that the binding cleft of the MHC molecule is located near the α1 helix region between residues 70-77 and α2 helix region between residues 144-152 [174]. Post docking interaction was analyzed (Fig 4), and the nonbonding interaction data are tabulated in Table 8.
Post docking analysis of the docked complex shows that our potent epitope formed 12 hydrophobic, nine electrostatic, and 15 hydrogen bonds with the MHC molecule. Half of the hydrophobic interaction was formed within the binding cleft, which is an indication of the stable binding pattern as combined hydrophobic interaction plays a vital role in protein stability. In hydrophobic interaction, epitope interacted with the MHC molecule only in α2 binding cleft where interestingly hydrogen bond was formed in both α1 and α2 binding cleft. Interestingly, Lys146 showed both alkyl and pi-alkyl type hydrophobic interaction along with conventional hydrogen bond, whereas Glu152 exhibited salt bridge and conventional hydrogen bond along with attractive charge type electrostatic interaction. Experimental evidence shows that the dimorphic amino acid Asn80 generally interacts with both NK cell receptors and the foreign antigens (epitopes) [174]. Interestingly, this docking result shows two conventional hydrogen bonds between Asn80 and the epitope.
B cell epitope prediction. Linear B cell epitope prediction. Several authentic tools were recruited to identify potential linear B-cell epitopes (Fig 5). Kolaskar  antigenicity prediction tool assessed the conserve regions considering the Physico-chemical properties of the protein. The threshold value was set at 1.00, which determines the possibility of a conserved region being a potential antigen scoring more than that. The minimum and maximum antigenic propensity values were 0.920 and 1.240, where the average was 1.058. We were able to identify such regions that can induce a humoral immune response presented in Fig 5A and Table 9.

PLOS ONE
We were ascertained of the region 35-42 amino acid residues as surface accessible by the Emini Surface Accessibility prediction tool that can act as B cell epitope (Fig 5B and  Table 10).
Moreover, it is well known about the surface accessibility or hydrophilicity of the beta-turn regions of a protein, which was predicted by Chou and Fasman Beta-turn prediction tool. The predicted beta-turn regions were 20-57, 87-107, and 171-188 (Fig 5C). The antigenicity of the peptide is strongly correlated with its flexibility [100]. Karplus and Schulz flexibility prediction tool identified 21-51 as the most flexible regions (Fig 5D). In the end, Bepipred linear epitope prediction tool suggested the probable linear B-cell epitopes (Fig 5E and Table 11).
Parker Hydrophilicity prediction tool was recruited for further confirmation about the hydrophilic nature of our predicted B cell epitopes (Fig 5F). Analysis of the data from B cell epitope prediction tools revealed the most potent B cell-mediated immunity inducing conserved epitope 'PAAPQPSAS' in the region of 34-42.

Conformational B cell epitope prediction
Most of the epitopes for B cells are discontinuous or conformational rather than linear [175]. To predict the discontinuous B cell epitopes, the 3D structure of the protein was generated and validated, and submitted to the Ellipro server. This server identified eight different epitopes for the protein WP_005413200.1 ( Table 12).

PLOS ONE
Structural and functional annotation of the hypothetical proteins of S. maltophilia k279a: Computational study The 3D structures of these epitopes were visualized using Jmol (integrated service of the server), which demonstrates their particular positions in the protein. The epitope residues were predicted using the full-length protein, where they were scattered throughout the surface. The scores of the predicted epitopes reside between 0.503 to 0.796, where the cutoff score was previously selected 0.50 by default. The detailed view of these conformational epitopes is illustrated in Fig 6.   Fig 6. Conformational or discontinuous B cell epitopes of WP_005413200.1 predicted from the PDB structure (homology). Here, A-H, the yellow balls represent the residues of the corresponding epitopes, and sticks in white color are the structure of the core residues.
There are still some limitations in epitopes prediction using different bioinformatics tools, and therefore, improvements are required in the prediction methods. Improving the incorrectly delineated epitope databases can result in higher accuracy prediction [176]. It is more suitable to include multiple tools for more accurate and consistent outcomes as the results obtained from different tools and methods may differ [177].

Conclusions
At first, we resolved all the 789 HPs from S. maltophilia K279a and predicted the functions with precision and confidence for 24 proteins. Next, the characterization was carried out, followed by the functional validation with different approaches, including structure-based methods. The physical and chemical parameters and the subcellular localization information helped to distinguish the HPs from the others. The PPI also gave an idea about their corresponding metabolic pathways. Besides, we were able to detect two virulence-associated proteins vital for the pathogenesis and survival of this organism. Among the HPs, we predicted the T cell and B cell epitopes for the highest antigenic protein, which is located in the periplasmic membrane of the pathogen. Pieces of evidence of our study suggest the potency of these epitopes as good targets against the bacteria. Nevertheless, clinical experiments are needed to ensure the efficacy of these candidates as vaccines.
Supporting information S1