A comprehensive in silico analysis for identification of therapeutic epitopes in HPV16, 18, 31 and 45 oncoproteins

Human papillomaviruses (HPVs) are a group of circular double-stranded DNA viruses, showing severe tropism to mucosal tissues. A subset of HPVs, especially HPV16 and 18, are the primary etiological cause for several epithelial cell malignancies, causing about 5.2% of all cancers worldwide. Due to the high prevalence and mortality, HPV-associated cancers have remained as a significant health problem in human society, making an urgent need to develop an effective therapeutic vaccine against them. Achieving this goal is primarily dependent on the identification of efficient tumor-associated epitopes, inducing a robust cell-mediated immune response. Previous information has shown that E5, E6, and E7 early proteins are responsible for the induction and maintenance of HPV-associated cancers. Therefore, the prediction of major histocompatibility complex (MHC) class I T cell epitopes of HPV16, 18, 31 and 45 oncoproteins was targeted in this study. For this purpose, a two-step plan was designed to identify the most probable CD8+ T cell epitopes. In the first step, MHC-I and II binding, MHC-I processing, MHC-I population coverage and MHC-I immunogenicity prediction analyses, and in the second step, MHC-I and II protein-peptide docking, epitope conservation, and cross-reactivity with host antigens’ analyses were carried out successively by different tools. Finally, we introduced five probable CD8+ T cell epitopes for each oncoprotein of the HPV genotypes (60 epitopes in total), which obtained better scores by an integrated approach. These predicted epitopes are valuable candidates for in vitro or in vivo therapeutic vaccine studies against the HPV-associated cancers. Additionally, this two-step plan that each step includes several analyses to find appropriate epitopes provides a rational basis for DNA- or peptide-based vaccine development.


Introduction
HPVs are a large branch of the Papillomaviridae family, grouped in different genera (Alpha-, Nu-/Mu-, Beta-and Gamma-papillomaviruses), with more than 200 genotypes [1][2][3][4]. The classification of Papillomaviruses (PVs) has been based on L1 gene sequence. They are clinically divided into two groups: low-risk HPVs, like HPV 6 and 11, which cause benign lesions (warts and benign papillomas), and high-risk HPVs (hrHPVs), like HPV16 and 18, which are a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 response. However, the addition of CD4+ T cell epitopes can significantly augment its strength and duration [49,52,53]. CD8+ CTLs commonly recognize intracellular-originated peptides presented by MHC-I molecules. They accommodate peptides with 8-11 residues; the ideal length is 9 residues. While CD4+ Helper T Lymphocytes (HTLs) commonly recognize extracellular-originated peptides presented by MHC-II molecules. They accommodate peptides with 10-30 residues or even more; the ideal length is [12][13][14][15][16]. The strength of the interaction between a T cell receptor and a peptide-MHC complex (pMHC), depends on the presented peptide and the MHC structure [49,54]. The binding of a peptide to MHC-I molecule is the most selective stage in the way of peptide presentation [55].
Bioinformatics tools can predict the potential immunogenic epitopes from thousands of epitopes in a short time [56]. Generally, the algorithms of these tools range from ones programmed to determine peptide-MHC molecule binding data to those based on structural similarity, molecular modeling, and molecular docking [57]. Peptides that bind to a specific MHC molecule have sequence similarity. Therefore, peptide sequence patterns have been used to predict their binding to MHC molecules [58]. In recent years, the accuracy of these methods has increased strikingly, and more than 90% of natural epitopes have been recognized at a high specificity of 98% [59]. This improvement in performance was achieved by the expanding experimental binding data, available in the immune epitope database (IEDB) and analysis resource (http://www.iedb.org/), and by the improvement of machine-learning algorithms [60].
Regarding the fundamental importance of epitope prediction in vaccine development, we investigated the best potential CD8+ T cell epitopes from the E5, E6, and E7 oncoproteins of four prevalent hrHPV genotypes (16, 18, 31 and 45) in the world and Iran [61], as shown in

Plan of the study
A two-step plan was designed to identify the most probable CD8+ T cell epitopes (Fig 2). For the first step, MHC-I and II binding, MHC-I processing, MHC-I population coverage and MHC-I immunogenicity prediction analyses, and for the second step, MHC-I and II protein-peptide docking, epitope conservation, and cross-reactivity with host antigens analyses were considered. The second step analyses were performed only for the selected peptides in the first step.

MHC-I binding prediction
Binding of epitopes to MHC-I molecules is an essential step for antigen presentation to CTLs. Herein, it was predicted by four online servers, as illustrated in Table 1. The HLA supertypes and frequently occurring HLA-I alleles provided by the servers were included in the analysis. However, when an allele (e.g., HLA-B � 14:02) was not provided, but its allele group (i.e., HLA-B � 14) was available, we used the allele group instead of the allele. The used human and mouse alleles, or allele groups are provided in supporting information (S1 Table).

IEDB MHC-I binding prediction
Currently, eight prediction methods are available in the IEDB MHC-I binding prediction tool, i.e., IEDB recommended [62], Consensus [69], NetMHCpan3 [59,70], artificial neural network (ANN) [71,72], SMM with a peptide-MHC binding energy covariance matrix (SMMPMBEC) [73], stabilized matrix method (SMM) [74], CombLib_Sidney2008 [75], PickPocket [76], netMHCcons [77] and netMHCstabpan [78]. The IEDB-recommended and consensus are not Independent methods; they use ANN, SMM and CombLib_Sidney2008 methods to generate a representative index for each predicted pMHC; The median of percentile ranks (PRs) or binding scores obtained from the used methods is reported as a representative PR or consensus score in the IEDB-recommended or consensus method respectively. The PR is calculated by comparing the half maximal inhibitory concentration (IC 50 ) of subjected peptide against a group of random peptides from Swiss-Prot database. The IC 50 value, expressed as nanomolar, shows binding affinity. The lower IC 50 or PR means higher binding affinity. As a rough guideline, peptides with IC 50 values <50 nM are considered as high affinity, 50-500 nM intermediate affinity and more than 500-5000 nM low affinity. No known T cell epitope has got an IC 50 value >5000 nM to date [60].
In this study, IEDB recommended method was used. The outputs for each pMHC in this method consisted of a median PR, a method-specific IC 50 , and a method-specific PR. Predictions were made against 76 frequently occurring human MHC-I alleles (including 12 HLA supertypes) and 6 MHC-I mouse alleles. Epitope length was set on 8, 9, 10, and 11mer. Peptides with median PR <2.0 are applied for the analysis.
NetMHCpan4 MHC-I binding prediction. NetMHCpan4 server predicts binding of peptides to the known MHC molecules using ANNs method. It is trained on a combination of naturally eluted ligands (55 human and mouse MHC-I alleles) and binding affinity data (172 MHC molecules from human, mouse, cattle, primates, and swine). Besides, the user can perform a prediction against any custom MHC-I molecule by uploading its full-length sequence [66].
In this study, predictions were performed for 8, 9, 10, and 11mer peptides against 76 frequently occurring human MHC-I alleles and 8 MHC-I mouse alleles. PR thresholds for strong and weak binders were set on 0.5 and 2.0, respectively. Peptides with PR <2.0 were applied for the analysis.
Rankpep MHC-I binding prediction. Rankpep predicts binder peptides of a given protein sequence or sequence alignments to MHC-I and II molecules. The algorithm of Rankpep based on the comparison of sequence similarities, using position-specific scoring matrices (PSSMs) method. It employs profiles of a group of aligned peptides recognized to bind to a specific MHC molecule and creates a consensus sequence by determining the most common residue for each position. Then, it allocates an optimal score to the consensus sequence, compares the score of the subjected peptide with the optimal score, and gives the peptide a percentile optimal value for comparison. Finally, it highlights strong binders in red [67,68].
Herein, the prediction was made against 31 frequently occurring HLA-I and 7 H2-I alleles. The server did not provide all common lengths of epitopes for all the MHC alleles. Thus, the used alleles and their provided epitope lengths are shown together, as given in supporting information (S1 Table).
SYFPEITHI MHC-I binding prediction. SYFPEITHI (http://www.syfpeithi.de/0-Home. htm/) is a database of over 7000 published and verified peptide sequences of human, mouse, and other organisms, known as natural binders of MHC-I and II molecules. When SYF-PEITHI analyzes a peptide for binding prediction against a specific MHC-I allele, its scoring system evaluates every residue of the query and gives it an arbitrary value between 1 and 15, according to whether it is an anchor, auxiliary anchor, or preferred residue. It allocates the value 1 to those residues which slightly preferred in that particular position, 15 to the Ideal anchor residues, and -1 to -3 to those residues which exhibit an adverse effect on the binding ability. The sum of these values is the score of the peptide. The maximal score could vary between different MHC alleles [54,79].
Herein, the prediction was made against 26 frequently occurring HLA-I alleles and 5 H2-I alleles. Epitope length was set on 8, 9, 10, and 11mer. Every predicted pMHC which got a score less than 70% of the reference sequence score was excluded from the analysis. The allele-specific reference sequence was selected from Rankpep's consensus sequence [68], or our SYFPEITHI predicted epitopes, whichever got the highest score in SYFPEIHI server. The reference sequences, their sources, and their scores are given in supporting information (S2 Table).

MHC-II binding prediction
Recognition of high immunogenic CD8+ T cell epitopes was the primary aim of this study. Therefore, all predictions were primarily made against epitopes with 8-11 residue length. However, it was valuable to determine that which 9mer MHC-I epitope is the core peptide of the MHC-II epitope(s) too. The core peptide lies on the MHC-II molecule grooves, and play the central role in constructing pMHC. With this strategy, the short minimal predicted epitopes could be used in designing of synthetic long peptides (SLPs), resulting in peptide loading to both MHC-I and II molecules.
The prediction was made against 35 human alleles (IEDB reference set) and three mouse alleles, given in supporting information (S3 Table). The server has fundamentally set the epitope length on 15mer. Each IEDB-recommended method participated in the prediction process offered a core Peptide (9mer) for each predicted epitope (15mer). We associated the 9mer MHC-II core peptides with the 9mer MHC-I predicted epitopes to determine that which MHC-I epitope is the core peptide of the MHC-II epitope(s) too.

MHC-I processing prediction
MHC-I T cell epitope processing predictions of E5, E6, and E7 oncoproteins are made by the IEDB combined predictor (http://tools.iedb.org/processing/). This tool combines predictors of three main steps of MHC-I antigen presentation pathway (proteasomal processing, transporter associated with antigen processing (TAP) transport, and MHC-I binding) and calculates a total processing score for each predicted epitope. It allows the user to choose a method from ANN, SMM, SMMPMBEC, Comblib_Sidney2008, NetMHCpan, NetMHCcons and PickPocket methods for the binding prediction. In the current update (2018), the IEDB team has changed the choice of the recommended prediction method for the processing tool to be NetMHCpan 3.0 rather than a consensus, since the processing tools requiring an IC 50 value, which the consensus method does not provide. Furthermore, NetMHCpan 3.0 has provided all MHC alleles and has performed the predictions very well in recent comparisons [65].
There are two types of proteasomes, the housekeeping types which are expressed instinctively, and immuno types which are provoked by IFN-γ secretion. The immunoproteasomes are believed to improve the efficiency of antigen presentation [62,65]. In this study, the immunoproteasome option was selected.
The program outputs for every predicted epitope consisted of proteasome score, TAP score, MHC score, processing score (proteasome + TAP score), total score (Proteasome + TAP + MHC score), and MHC-I IC 50 . The TAP scoring system calculates a-log (IC 50 ) value for the binding of a peptide (or N-terminal of its precursors) to the TAP molecules. The higher TAP score, the higher transport rate. [62,65,84].
Herein, the analysis was made against the human and mouse MHC-I alleles used later in the IEDB binding prediction, with the IEDB-recommended method and other default settings of the program. Epitopes with IC 50 <1000 nM for HLA-I alleles and <5000 nM for H2-I alleles were included in the analysis.

MHC-I immunogenicity prediction
Several factors could clarify the difference between epitope and non-epitope peptides; An essential factor is epitope immunogenicity, i.e., it could be recognized by T cells. Some amino acids, particularly those with large and aromatic side chains (especially tryptophan, phenylalanine, and Isoleucine), are associated with immunogenicity. Moreover, the positions P4-6 of a peptide are more critical for immunogenicity [85].
In this study, the MHC-I immunogenicity of all predicted epitopes was determined by the IEDB web server (http://tools.iedb.org/immunogenicity/) [85]. This tool uses the properties of amino acids and their locations to predict the immunogenicity of a pMHC. The default option was selected to specify which positions of the query peptide to be masked from the analysis, because it masked positions which are also suggested for the most frequent human MHC-I allele, HLA-A � 02:01.

Population coverage prediction
IEDB population coverage prediction tool (http://tools.iedb.org/population/) [86] is used to predict the HLA-I population coverage of all 8-11mer predicted epitopes in the first step. This tool can accept a target population by two query levels: 1) area-country-ethnicity and 2) ethnicity alone. It can integrate allele frequency information retrieved from the Allele Frequency Net Database (AFND) (http://www.allelefrequencies.net/default.asp) [87]. IEDB also accepts custom populations with allele frequencies defined by users. Since, HLA-I and HLA-II T cell epitopes elicit immune responses from two different T cell populations (CTL and HTL, respectively), the server provided three different population coverage modes: 1) HLA-I lonely, 2) HLA-II lonely, and 3) HLA-I and HLA-II together.
Herein, the MHC-I promiscuous predicted epitopes and their binding HLA-I alleles (IC 50 <500 nM or PR<2.0) were entered as inputs for the analysis against the world population.

Molecular docking analysis
The primary aim of molecular docking is the prediction of the binding site of a ligand at a protein receptor surface, and then docking and modeling the ligand into the recognized site. In this study, the binding ability of the first step selected peptides to human and mouse MHC molecules, was analyzed by CABS-dock (http://biocomp.chem.uw.edu.pl/CABSdock/) server. The server uses a multistage procedure that involves multiple programs, with the Cα-Cβ-side chain (CABS) model at its heart. The detailed information about these stages is given in supporting information (S2 File) [88,89]. Also, Fig 3 shows the pipeline of CABS-dock protocol [88].
CABS-dock gets the 3D structure of the receptor and the sequence of the peptide as obligatory inputs. Furthermore, there are some non-obligatory inputs as recommendations which could improve outputs. In this study, duplicate dockings for each peptide (6240 dockings in total) were done against the most significant human/mouse MHC-I and II molecules which had at least one well-structured protein data bank (PDB) file in the RCSB Protein Data Bank (https://www.rcsb.org/), as shown in Table 2. These PDB files are in the complex with their peptidic ligand and some X-ray crystallography solution molecules (heteroatoms). Thus, these excess molecules, as well as redundant MHC molecules were removed before executing docking process. Since, the binding site of epitopes on the MHC molecules was well-known previously, the unlikely regions to bind masked before the analysis.
CABS-dock returns ten representative models (medoids) as the best-simulated models and ranks them by cluster density (CD). Cluster density is equal to the number of elements in a cluster divided by their average ligand root mean square deviation (RMSD). The higher CD value implies greater accuracy. Ligand RMSD value shows the differentiation measure between cluster elements. As a guideline; RMSD < 3.0 Å means high accuracy; RMSD � 3.0 and � 5.5 Å means medium accuracy and RMSD > 5.5 Å means low accuracy [88]. Herein, the RMSD and CD of the best-simulated models were selected for the analysis. The best model, which has the highest CD value, is not necessarily the top-ranked model, because, in some cases, peptides were not attached to their binding site properly. Thus, these malformed models were excluded from the analysis. It is important to note that, due to the different frequency of MHC alleles in human populations, the equal CD value of different MHC alleles, don't have equal value regarding population coverage. Thus, to involve the effect of population coverage, the CD value of every model was multiplied by its allele population coverage (divided by hundreds for more facility) to obtain a weighted index. Then, the sum of all HLA-I or II weighted indexes of each peptide was calculated to get a total docking score (TDS), used as a score to compare the candidate peptides. It is the first time that the TDS has been formulated and used for this purpose. This formula is also applicable to the similar docking scores obtained from other servers.

Epitope conservancy analysis
The use of highly conserved epitopes in a vaccine formulation reduces the risk of tumor immune escape and provides broader protection against different virus strains or genotypes. Thus, the conserved areas are preferred to use in therapeutic vaccines, if they are appropriate epitopes. Herein, the epitope conservancy analyses for the first step selected peptides were done in three levels: 1. Inter-isoform conservancy: the percent of conservancy between all isoforms of each E5, E6, or E7 oncoprotein.

Cross-reactivity with host antigens
Cross-reactivity with host antigens can cause adverse immune responses. Therefore, the selected peptides in the first step were checked for similarities with the mouse and human proteomes by the NCBI BLASTp tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

Results
Regarding the studies, different peptides usually get different scores/ranks in different analyses. This inconsistency indicates that these results needed to be analyzed with an integrated approach. Indeed, integrated approach is more practical and efficient in such conditions in comparison with analysis by analysis filtering approach, in which those epitopes are chosen for the next analysis that have gotten an acceptable score in the previous analysis. Herein, the integrated approach was applied in both steps of epitope selection.
Since the ultimate goal of the discovery of therapeutic epitopes is to use them in human vaccines, only the scores/ranks of human alleles were used to rank epitopes in some studies. However, investigators usually test therapeutic vaccines on mouse species in preclinical trials, thus in the current study, the binding status of the predicted epitopes to mouse MHC-I alleles was also studied by several binding predictors and molecular docking, as well.
As stated above, CTL-mediated responses play a crucial role in killing the malignant cells. Besides, the binding of epitopes to MHC-I molecules is the most selective step for antigen presentation to CTLs. Therefore, in the first step, the selection was made primarily by the comparison of obtained MHC-I binding, processing and immunogenicity scores/ranks, and population coverage percentages. However, the MHC-II binding ranks were actually of secondary importance to the selection process as an added advantage. Additionally, the population coverage has a dual application. First, it determines the coverage of a given peptide in the target population. Second, it is the best index for summarizing and evaluating of the HLA-I binding predictions too, since it is calculated from the results of HLA-I binding prediction analyses. In the first step, ten peptides (Tables 3-5) from each HPV genotype oncoprotein (120 peptides in total), which got better results in the first step analyses were selected for the second step analyses, including protein-peptide molecular docking, epitope conservation, and crossreactivity with host antigens. The individual detailed results of the MHC-I and II binding (S3 File), MHC-I immunogenicity (S4 File) and MHC-I population coverage (S5 File) predictions, as well as, MHC-I and II molecular docking (S6 File) and epitope conservation (S7 File) analyses are given in supporting information, as 15 Excel files. Indeed, CABS-dock returns ten representative medoids as the best-simulated models and ranks them by cluster density (CD). Cluster density is derived from two factors (the number of elements in a cluster and their average ligand RMSD) that is an advantage for this server. In the second step, five peptides out of ten selected peptides in the first step (Tables 6-8), which got better results in all analyses of both steps, were selected as the final-predicted epitopes. None of the final predicted epitopes showed more than 90% sequence similarity with mouse and human proteomes.

Discussion
High prevalence and mortality of oncogenic infectious pathogens such as HPV and Helicobacter pylori have caused serious problems for humans. Currently, people who are infected with hrHPVs but show normal cytology or precancerous lesions do not have any treatment option, causing the disease progress toward invasive carcinoma in some cases. Unfortunately, no FDA-approved immunotherapy exists for pre-existing HPV infections or their related cancers to date. Immunotherapy of HPV-associated cancers by DNA or peptide-based vaccines, depends on the recognition of highly immunogenic epitopes, inducing robust and specific immune responses, particularly cell-mediated responses against the malignant cells.
The primary aim of this study was the prediction of CD8+ T cell epitope from the E5, E6 and E7 oncoproteins, using a comprehensive two-step selection plan. These proteins chose because they play a pivotal role in the cell transformation, immune evasion, and maintenance of malignancy, as well as, their permanent expression (E6 and E7) by the malignant cells [24][25][26]. Expression of E5 oncoprotein occurs in the early phase of HPV infection. Evidence indicates that E5 play a prominent role in the genesis of HPV-associated cancers, but is not essential for cancer progression [90], since when HPV genome integrates into the host genome, it usually results in the disruption of E1, E2, and E5 genes. Therefore, targeting E5 protein provides an opportunity for treatment of HPV infections and preventing the precancerous lesions from the progression to established carcinomas [20,91]. Some genotypes of hrHPVs are more involved in the genesis of epithelial tissue malignancies [61]. Thus, in this study, hrHPV16, 18, 31 and 45 were targeted due to their high prevalence in the HPV-associated cancers, especially cervical carcinoma. There are several limitations for epitope prediction: 1) The major drawback of peptidebased vaccines is low immunogenicity [92,93]. Many studies have focused on enhancing immunogenicity using immune stimulating agents or adjuvants to avoid this problem. Another solution is the use of agonist epitopes [94]. Epitope immunogenicity is a crucial factor in vaccine development. However, many of known natural epitopes when are analyzed in silico by IEDB MHC-I immunogenicity predictor, do not obtain a high score. Therefore, in this study, epitope selection was based on the integrated approach, in which one analysis does not play an important role alone. 2) There are certain drawbacks associated with the function of each method invented for the MHC-peptide binding prediction [95]. For this reason, several predictors and a molecular docking program were used to augment the prediction accuracy. 3) Some web tools have been developed for MHC-II epitope prediction. Since MHC-II groove can bind to peptides with variable lengths, and different peptides have the different number of residues between their N-terminus and first anchor [54], the exact assignment of MHC-II core peptide would be a difficult problem which reduces the success rate of these prediction tools. Therefore, most MHC-II prediction tools did not usually make epitope predictions as accurately noted for MHC-I molecules [64,96]. In cancer immunotherapy, the CTL-mediated responses play the central role in eradication of malignant cells, and the binding of epitopes to MHC-I molecules is an essential step for antigen presentation to CTLs. Thus, in this study, predicted epitopes were primarily selected by their MHC-I binding and processing scores. However, the MHC-II binding scores were actually of secondary importance to the epitope selection process as an extra advantage. Additionally, there are several other essential determinants which significantly affect the outcomes, such as antigen processing, immunogenicity, population coverage, conservancy and cross-reactivity with host antigens. Vaccine development requires a comprehensive approach to cover all these effectual elements, covered in this study.
The primary aim of molecular docking is the recognition of binding site of a ligand at a protein receptor surface, and docking and modeling the ligand into this recognized site. In this study, CABS-dock server was used for molecular docking analyses. CABS-dock has several main advantages: 1) The method does not require any data about the peptide structure and its binding site. 2) During docking process, peptide conformation is entirely flexible. 3) It is possible to apply dynamic conformational changes in the receptor structure and 4) to exclude some receptor regions from the docking search, leading to the more efficient search in the vicinity of the binding site at a sensible time. [88,89].
In comparison with protein-ligand (small molecules) docking, Protein-peptide docking analysis is more problematic, since significant conformational changes occur during the process. As a general rule, how much the length of the query peptide to be longer, there are more torsions and conformational flexibilities. Additionally, in comparison to Protein-Protein interactions, Protein-peptide dockings are more transient, and their binding affinities are notably weaker [88]. These factors make structural predictions of long peptides very challenging. Therefore, in this study, 9mer peptides were preferred for selection compared to other possible lengths. They are also preferred by all MHC-I molecules as epitope and by MHC-II molecules as the core peptide of epitopes. Moreover, expansion of 9 or 10mer CTL epitopes to longer peptides may create a practical alternative, containing both CD4+ HTL and CD8+ CTL epitopes; Especially, when CD4+ HTL epitopes, covering CTL epitopes, are not recognized [97].  [94,96,[98][99][100][101][102][103]. However, the prediction of T cell epitopes inducing strong responses has remained a big challenge. For therapeutic HPV vaccines, many candidates have been designed to trigger the activation of CTLs or HTLs, mostly by targeting two major HPV oncoproteins, E6 or E7 [104], and in a few studies, E5 oncoprotein [98,99]. As well as, several clinical trials have been launched for immunotherapy of HPV-associated cancers [46], although, they have not been so immunogenic, to induce a sufficient cellular immunity and eradicate malignant cells completely. Some studies have suggested that the use of E6 and E7 SLPs, containing both CD4+ HTL and CD8+ CTL epitopes, led to more potency and durability of CD8+ T cell reactivity in vivo, in comparison with the minimal CTL epitopes [97,105].
In 1993, As pioneers in HPV epitope studies, Feltkamp et al. recognized the HPV16-E7 sequence RAHYNIVTF as an MHC-I epitope that can provoke CTL-mediated responses and eradicates established HPV l6-induced tumor cells in mice [106,107]. This sequence is the first HPV16-E7 predicted epitope in our study as well.
In 2015, Kumar et al. studied HPV16-E5 oncoprotein to predict the candidate T-cell and Bcell epitopes [98]. They have screened 11 potent epitopes for MHC-I molecules according to PR and the immunogenicity score, using IEDB MHC-I binding and immunogenicity predictors. They found a 14mer potent epitope, SAFRCFIVYIIFVY, having the lowest PR and the highest immunogenicity score, i.e., 0.5 and 0.70, respectively. Notably, our second HPV16-E5 predicted epitope, SAFRCFIVY, is the N-terminal part of SAFRCFIVYIIFVY, and our first predicted epitope, FLIHTHARF, is the C-terminal part of the third epitope of their study, VYIPLFLIHTHARF.
In 2017, Tsang et al. scanned the HPV16-E6 and E7 oncoproteins for the match peptides with the consensus motif of HLA-A2 binding peptides [94]. The BIMAS algorithm [108] was employed to rank probable binding peptides according to the predicted one-half-time dissociation of pMHCs. Three potential CTL predicted epitopes of the E6 protein (KLPQLCTEL, KISEYR-HYC, and QQYNKPLCDL) and three of the E7 protein (YMLDLQPET, TLHEYMLDL, and RTLEDLLMGT) were selected. They showed the immunogenicity of these peptides was enhanced when their agonist epitopes were used. The KLPQLCTEL and TLHEYMLDL sequences are the seventh and the fifth predicted epitopes of HPV16-E6 and HPV16-E7 in our study, respectively.
As far as we know, this is the first time that in a laborious in silico study for epitope prediction, E5, E6 and E7 oncoproteins of hrHPV16, 18, 31 and 45 have been investigated altogether. Moreover, in previous studies, usually only one predictor tool was used for making epitope prediction, or if several tools were used, no integrated approach was employed to make the conclusion. We believed that our predicted epitopes are valuable candidates for further in vitro and in vivo therapeutic vaccine studies. Additionally, the introduction of the ten epitopes for each HPV genotype oncoprotein in the first step of the study shows which region of each oncoprotein is rich of the epitope, and thus, is more suitable for use in the design of SLPs. Notably, the previous in vivo studies have been conducted using SLPs of hrHPV-E6 and/or-E7 oncoproteins, in particular HPV16 oncoproteins [92,[129][130][131][132][133]. Furthermore, the two-step plan of this in silico study, which each step includes several analyses to find proper epitopes by an integrated approach, would provide a basis for rational epitope prediction. However, it could be more efficient by adding other useful analyses. Further studies are recommended on the peptide binding assays, the design of polyepitope constructions including E5, E6 and E7 epitopes, the expansion of the minimal CTL epitopes to longer peptides (SLPs), the use of various adjuvants, involvement of delivery routes, mouse immunization with the designed constructs, evaluation of immune responses such as cytokines, antibodies, CTLs and tumor growth for finding the best construct for clinical trials. It is important that improper vaccine design and immunosuppressive microenvironment were known as the main reasons of the failure in cancer immunotherapy by therapeutic cancer vaccines [134].