Immunoinformatics mapping of potential epitopes in SARS-CoV-2 structural proteins

All approved coronavirus disease 2019 (COVID-19) vaccines in current use are safe, effective, and reduce the risk of severe illness. Although data on the immunological presentation of patients with COVID-19 is limited, increasing experimental evidence supports the significant contribution of B and T cells towards the resolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Despite the availability of several COVID-19 vaccines with high efficacy, more effective vaccines are still needed to protect against the new variants of SARS-CoV-2. Employing a comprehensive immunoinformatic prediction algorithm and leveraging the genetic closeness with SARS-CoV, we have predicted potential immune epitopes in the structural proteins of SARS-CoV-2. The S and N proteins of SARS-CoV-2 and SARS-CoVs are main targets of antibody detection and have motivated us to design four multi-epitope vaccines which were based on our predicted B- and T-cell epitopes of SARS-CoV-2 structural proteins. The cardinal epitopes selected for the vaccine constructs are predicted to possess antigenic, non-allergenic, and cytokine-inducing properties. Additionally, some of the predicted epitopes have been experimentally validated in published papers. Furthermore, we used the C-ImmSim server to predict effective immune responses induced by the epitope-based vaccines. Taken together, the immune epitopes predicted in this study provide a platform for future experimental validations which may facilitate the development of effective vaccine candidates and epitope-based serological diagnostic assays.


Introduction
Coronavirus disease 2019 (COVID-19) is a highly transmissible acute respiratory disease caused by a novel strain of coronavirus called the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Phylogenetic analysis of whole-genome sequences of SARS-CoV-2 isolated from infected patients revealed an overall sequence identity of 96.2%, 79.6%, and 50% with the genome of RaTG13, SARS-CoV BJ01, and MERS-CoV, respectively [1,2]. The genome of SARS-CoV-2 encodes both structural and nonstructural proteins (NSPs). The first ORFs (ORF1a/b) encode 16 NSPs (NSP1-16) except in gamma coronavirus which lacks NSP1. There is a -1 frameshift between ORF1a and ORF1b leading to the production of two polypeptides (pp1a and pp1ab). These polyproteins are post-translationally processed by virusencoded chymotrypsin-like protease (3CLpro) or main protease (Mpro) and by one or two papain-like proteases into 16 NSPs. The structural proteins include spike glycoprotein (S), envelope protein (E), membrane protein (M), and nucleocapsid protein (N). The S protein has two functional subunits that mediate cell attachment (S1 subunit) and virus-host membrane fusion (S2 subunit). The S proteins of SARS-CoV-2 and SARS-CoV are phylogenetically closely related with an amino acid sequence identity of approximately 77% [1] while utilizing the same cellular receptor angiotensin-converting enzyme 2 (ACE2) for entry into cells [3].
Antibodies that bind to the S protein have been reported to neutralize SARS-CoV-2. The ensuing rapid development of neutralizing antibodies against the S protein is correlated with the immune response to the virus, and individuals who show seroconversion may develop a lasting immune response against SARS-CoV-2 [4][5][6]. In a recent study, a rapid diagnostic test for the serodiagnosis of COVID-19 was developed using the S and N proteins of SARS-CoV-2 [7][8][9]. Many studies have reported the generation of IgM-and IgG-specific neutralizing antibodies against the S protein of SARS-CoV-2 [10][11][12][13][14][15]. Limited information is available on specific SARS-CoV-2 proteins that are recognized by immune cells; however, a few studies have found a higher proportion of T-cell responses specific to structural proteins in convalescent serum from patients with COVID-19 [16][17][18]. As of June 3, 2021, the World Health Organization (WHO) has approved the emergency use of COVID-19 vaccines, namely AstraZeneca/ Oxford, Moderna, Johnson and Johnson, Pfizer/BioNTech, Sinopharm, and Sinovac Biotech after the evaluation of safety and efficacy data from clinical trials. The findings from clinical trials of a Chimpanzee adenoviral vector vaccine (ChAdOx1 nCoV-19, AZD1222) expressing the S protein of SARS-CoV-2 protein demonstrated an acceptable safety profile and good efficacy in symptomatic patients [19]. B.1.1.7, a new variant of SARS-CoV-2, has emerged as the dominant variant of COVID-19 in the UK. The clinical efficacy of AZD1222 against symptomatic infection was 70�4% for the B.1.1.7 variant and 81�5% for non-B.1.1.7 lineages; however, the neutralization activity of AZD1222 was lower against the B.1.1.7 variant than that against the non-B.1.1.7 variants [20]. The Moderna COVID-19 (mRNA-1273) vaccine is a lipid nanoparticle-encapsulated nucleoside-modified mRNA vaccine, which encodes the stabilized prefusion S protein of SARS-CoV-2. It showed an overall efficacy of 94.1% (95% confidence interval = 89.3-96.8%) in preventing symptomatic patients diagnosed with COVID-19, including those with severe symptoms [21]. The vaccine was also highly effective in clinical trials, exhibiting a high efficacy in individuals with different demographical characteristics, including age, sex, race, and ethnicity, as well as in individuals with pre-existing medical conditions. In a phase 1 trial of the mRNA-1273 vaccine, serum neutralizing antibody responses were detected in all participants (40 adults aged between 56 to 70 years), which were similar to those previously reported among vaccine recipients by multiple methods [22]. The vaccine elicited an increased production of CD4 response lymphocytes, including type 1 helper T cells [22]. The COVID-19 vaccine developed by Johnson and Johnson's/Janssen (Ad. 26.COV2.S) is a recombinant, replication-incompetent adenovirus serotype 26 (Ad26) vector vaccine, which encodes the stabilized prefusion S protein of SARS-CoV-2. The Johnson and Johnson vaccine showed 66.3% efficacy in symptomatic patients with laboratory-confirmed COVID-19 with no prior history of COVID-19 [23]. The efficacy of Ad. 26.COV2.S, which varied geographically, was the highest in the United States (74.4%; 95% CI = 65.0-81.6%), followed by Latin America (64.7%; 95% CI = 54.1-73.0%) and South Africa (52.0%; 95% CI = 30.3-67.4%) [23]. These findings indicated that CD4+ T-cell responses were observed in 76 to 83% of the participants in cohort 1. In addition, 60 to 67% of those in cohort 3 showed a clear skewing toward type 1 helper T-cell responses on day 14 post-vaccination [23]. The Pfizer-BioNTech COVID-19 (BNT162b2) vaccine is a lipid nanoparticle-formulated nucleoside-modified mRNA vaccine, which encodes the prefusion S protein of SARS-CoV-2. Randomized clinical trials of a twodose regimen of BNT162b2 revealed an overall efficacy of 95% in patients with COVID-19 who were 16 years of age or older [24]. The Sinopharm COVID-19 or BIBP is an inactivated vaccine developed by Beijing Bio-Institute of Biological Products Co Ltd. This vaccine showed an overall efficacy of 79% in preventing symptomatic disease and hospitalization [25]. Sinovac-CoronaVac is an inactivated vaccine developed by Sinovac Biotech showed efficacies of 51% against symptomatic SARS-CoV-2 infection, 100% against severe COVID-19, and 100% against hospitalization in the studied population [26]. Although the above-mentioned vaccines are effective against both symptomatic and asymptomatic COVID-19, the recent global rise in the emergence of SARS-CoV-2 variants might compromise the effectiveness of these vaccines. In recent years, in silico vaccine designing, which involves the prediction of immunogenic antigenic sequences in viral proteins, has gained increasing attention due to advantages such as the rapid screening and identification of multiple antigen candidates that can generate an immune response in vivo. This approach harnesses high-throughput proteomics data available in public databases to screen for the most effective candidate epitopes based on criteria such as antigenicity and immunogenicity profiles. Therefore, the identification of B-cell and T-cell epitopes of structural proteins is essential for developing effective diagnostic tests and epitopebased vaccine candidates against SARS-CoV-2. Although computational approaches have successfully predicted potential B-and T-cell epitopes, a detailed analysis of epitopes of four structural proteins of SARS-CoV-2 might have important implications in the design and analysis of COVID-19 vaccines under various stages of clinical development [27][28][29][30][31][32][33][34][35][36][37][38]. Currently, welldeveloped bioinformatic approaches for epitope analysis have been used to successfully identify epitopes that generate both weak and strong immune responses, which might have been experimentally ignored [39]. In our study, we used multiple online bioinformatics resources and stringent selection criteria to identify potent B-and T-cell epitopes of four structural proteins of SARS-CoV-2. Our in-silico prediction method has led to the identification of potent, common, and species-specific B-and T-cell epitopes, which are likely to be recognized in humans. Moreover, we determined the conservancy of the predicted epitopes across different species of coronaviruses (CoVs).

Prediction of B-cell epitopes
Linear B-cell epitopes were predicted using BepiPred-2.0, Bcepred, and ABCpred online servers. The BepiPred-2.0 server predicts linear B-cell epitopes based on a web-based random forest algorithm, which is trained on epitopes annotated from antibody-antigen protein structures [41]. The Bcepred server predicts B-cell epitopes based on physicochemical properties such as hydrophilicity, flexibility, accessibility, polarity, exposed surface, number of turns, and antigenic propensity [42]. The ABCpred server uses an artificial neural network algorithm to predict linear B-cell epitopes in an antigen sequence with 65.93% accuracy. In this study, we applied a window length of 18 amino acids (aa) with a threshold setting of 0.7 and ranked the predicted B-cell epitopes according to their score; a higher score indicates a higher probability of being an epitope [43].
Moreover, EPSVR, DiscoTope, CBTOPE, and ElliPro servers were used to predict conformational B-cell epitopes. The EPSVR server uses a support vector regression (SVR) method to predict antigenic B-cell epitopes [44]. The DiscoTope server predicts discontinuous B-cell epitopes from 3D structures of proteins. This method calculates the surface accessibility and a novel epitope propensity amino acid score to predict potential epitopes [45]. The CBTOPE server predicts conformational B-cell epitopes from the primary sequence of the protein with an accuracy of more than 85% [39]. The ElliPro server predicts linear and discontinuous B-cell epitopes based on the protein structure and the homology-based model of the amino acid sequence [46]. The helical behavior of predicted monomeric peptides was computed using the Agadir server (http://agadir.crg.es) [47].

Prediction of T-cell epitopes
TepiTool is an interactive and easy-to-use tool used to predict potential major histocompatibility complex (MHC) class I-and class II-binding peptides based on a panel of 27 most frequent alleles [48]. The Proteasomal cleavage/TAP transport/MHC class I combined predictor (http:// tools.iedb.org/processing) in the Immune epitope database and analysis resource (IEDB) integrates data on proteasomal processing, transporter associated with antigen processing (TAP) transport, and MHC binding to produce an overall score for the intrinsic potential of each peptide being a T-cell epitope [49]. The MHC-NP predictor predicts T-cell peptides naturally processed by the MHC [50]. The NetMHCIIpan 4.0 server predicts peptide binding to known sequences of any MHC II molecule using artificial neural networks [51]. Three different tools, the IEDB combined server, TepiTool, and MHC-NP were used to predict 9-mer epitopes of MHC I. First, the 9-mer epitope list was generated using the 27 HLA-Class I allele reference list. Next, the top 2% were chosen from the high-scoring epitopes with an IC50 cut-off value of less than or equal to 500 nM. In TepiTool, MHC-I epitopes were searched from the query sequence using 'the panel of 27 most frequent A & B alleles' available in the server. In the 'Prediction method', the default IEDB method was selected. When the field 'peptides' was to be included in prediction, the default setting was applied at 'low number of peptides' since the epitope length was only 9 mer. Finally, the peptides were predicted based on the IC50 cut-off value of less than or equal to 500 nM. In MHC-NP, all default parameters were selected, and 9-mer epitopes that bind to 17 class I alleles were selected. The top 2% highest binders (epitopes with highest scores) were selected from the raw files.
MHC II epitopes of 15-mer length were predicted using three different tools, namely, IEDB combined server, TepiTool, and NetMHCIIpan. In TepiTool, MHC-II epitopes were searched using 'the panel of 27 most frequent A & B alleles', with the inclusion of HLA-DR, HLA-DP, and HLA-DQ alleles. In the 'prediction method', the default IEDB was applied, and a 'moderate number of peptides' was selected for the default setting of 'peptides' since the epitope length was 15-mer. Finally, the peptides were predicted based on the IC50 cut-off value of less than or equal to 1000 nM. In NetMHCIIpan, 15-mer peptides were predicted for binding to the 27 MHC-II alleles from the reference list. The default threshold value for strong binders (% rank) was set as 1 and that for weak binders (% Rank) was set as 5. The results were sorted according to the prediction score; the strong binders were listed first and subjected to further analysis. All 9-mer and 15-mer peptides were predicted for their binding affinity to 27 MHC class I and MHC class II molecules, which accounted for 97% of HLA-A and HLA-B allelic variants in most ethnicities [52].

Population coverage analysis of T-cell epitopes
The IEDB analysis resource tool was used to analyze the population coverage of predicted MHC class I and class II epitopes using the allele frequency database of 115 countries [53]. The population coverage was analysed by entering the following data: the number of epitopes, epitope, MHC-restricted alleles as predicted by the epitope prediction servers, and the populations in which the coverage is to be checked. The MHC-I alleles (n = 27) that bound to the predicted epitopes were checked using IEDB. For MHC-II alleles, twenty three alleles (n = 23) were available in the population coverage tool of IEDB (http://tools.iedb.org/population/) and four alleles (HLA-DRB3 � 01:01, HLA-DRB3 � 02:02, HLA-DRB4 � 01:01, and HLA-DRB5 � 01:01) were excluded from the analysis. The ability of the predicted T-cell epitopes to induce interferon-gamma (IFN-γ) response was assessed using the IFN epitope server. The IFN epitope tool uses an algorithm based on three models, motif-based, SVM-based, and hybrid approaches, for the prediction of IFN-γ-inducing epitopes [54].

Analysis of antigenicity, allergenicity, and conservancy of predicted B-and T-cell epitopes
The antigenicity and allergenicity of epitopes were predicted using VaxiJen v2.0 and AllerTOP online servers, respectively. VaxiJen predicts protective antigens and subunit vaccines based on an alignment-independent method [55]. AllerTOP is a server for in silico prediction of allergens in a given antigen [56]. The epitope conservancy analysis tool from the IEDB analysis resource was used to compute the degree of the conservancy of an epitope within a given protein sequence [57].

Immune simulation of multi-epitope vaccine
For designing multi-epitope vaccine, the predicted immunogenic epitopes were united with the help of linkers B-cell (GGGGS), HTL (KK), and CTL (AAY) [58]. To determine the immunogenicity and immune response profile of the multi-epitope vaccine consisting of the top predicted B-and T-cell epitopes, in silico immune simulation was conducted using the C-ImmSim server. The C-ImmSim server uses a position-specific scoring matrix and machine learning techniques to predict epitope and immune interactions [59]. The server simulates three components of the immune system found in mammals: (i) the bone marrow, where hematopoietic stem cells differentiate into cells of lymphoid and myeloid lineages; (ii) the thymus, where naive T-cells are selected to avoid autoimmunity; (iii) a tertiary lymphatic organ such as lymph nodes. All default simulation parameters were selected, with time steps set at 1, 84, and 168 (each time step is 8 hours, and time step 1 is injection at time = 0). Therefore, three injections were administered four weeks apart, without lipopolysaccharide (LPS). The Simpson index D (a measure of diversity) was interpreted from the plot.

Prediction of B-cell epitopes of S, E, M, and N proteins of SARS-CoV-2
In this study, we predicted B-and T-cell epitopes using multiple prediction servers by leveraging existing immunological knowledge and the genetic closeness of SARS-CoV-2 with SARS--CoV [4][5][6]. Previous studies have identified epitopes mainly on the S protein of SARS-CoV-2 (Table 1) using a limited number of prediction servers [27][28][29][30][31][32][33][34][35][36][37][38]. Using well-established prediction tools, we identified potential B-and T-cell epitopes in the structural proteins (S, E, M, and N) of SARS-CoV-2 (Fig 1a). By selecting the top linear B-cell epitopes predicted using BepiPred, Bcepred, and ABCpred servers, we obtained 20 linear B-cell epitopes (11 for S, 6 for N, 2 for M, and 1 for E proteins) in the structural proteins of SARS-CoV-2 ( Table 2). A heat map was generated using R software that showed the distribution of antigenic B-cell epitopes across the length of SARS-CoV-2 structural proteins (Fig 1b and 1c). Interestingly, some of the antigenic epitopes predicted by at least two servers are clearly delineated in the heat map.
In the absence of bioinformatics tools to analyze the common epitopes, we manually sorted the epitopes and found conserved epitopes of S (aa 407-416, aa421-427, aa1028-1049, aa1254-1273), N (aa173-189, aa235-247), M (aa163-182), and E (aa [58][59][60][61][62][63][64][65][66][67][68] proteins that are common to both SARS-CoV and SARS-CoV-2 ( Table 3). The B-cell epitopes were mapped on the 3D structure of the structural proteins of SARS-CoV and SARS-CoV-2 and visualized using BIO-VIA Discovery Studio 2017 R2. The images predicted the probable localization of epitopes on the surface of the 3D structure of S, E, M, and N proteins (Fig 2a-2e). In addition, the high alpha-helical content of most peptides, as predicted by the high Agadir scores, indicated the stability of peptides in solution [47]. The findings of the present in silico study have been provided, and the gap in the study has been addressed. https://doi.org/10.1371/journal.pone.0258645.t001 The Cryo-EM structure of the S protein of SARS-CoV-2 (PDB: 6VSB) is available from residues 1 to 1208, but it lacks the 65 residues of the C-terminal region of the protein [3]. To overcome this problem, we used I-TASSER-modelled structures of the S protein of SARS-CoV-2 [60]. We compared I-TASSER-modelled structures with cryo-EM solved structures and observed high similarity, with a root mean square deviation (RMSD) of approximately 1.3 Å (SARS-CoV) and 2.0 Å (SARS-CoV-2). Similarly, the structures of E, M, and N proteins were modelled and used as query and input structures to identify potent conformational B-cell epitopes using CBTOPE, DiscoTope, ElliPro, and EPSVR servers. We filtered and selected the top conformational B-cell epitopes that were predicted by at least two prediction servers used in this study. We obtained a total of 21 conformational B-cell epitopes (9 for S, 8 for N, 3 for M, and 1 for E proteins) in the structural proteins of SARS-CoV-2 (Table 4). We observed conformational B-cell epitopes of S, E, M, and N proteins common to both SARS-CoV-2 and SARS-CoV with a high degree of epitope conservancy (Table 5). Furthermore, we noticed that the predicted conformational epitopes are likely to be localized on the accessible region of the 3D structure of SARS-CoV-2 proteins, as visualized by BIOVIA Discovery Studio 2017 R2 (Fig 3a-3e).
We performed sequence alignment of RBDs of the S protein and found that some of the linear and conformational B-cell epitopes are located in the RBD of the S protein of SARS-CoV-2 (Fig 4, inset Table). A previous study has identified four linear B-cell immunodominant (ID) sites on the RBD of the S protein located at aa330-349, aa375-394, aa450-469, and aa480-499 with an average positive rate of � 50% among all 39 patients [61]. Although mice immunized with the entire RBD, aa370-395, and aa435-479 generated high titers of specific antibodies, these antibodies demonstrated weak neutralizing activity [61]. The identification of these epitopes in our present study clearly suggests that the bioinformatically predicted epitopes contribute to the immunogenicity of the S protein (Table 2). Interestingly, we observed that CR3022, a monoclonal antibody targeting a highly conserved cryptic epitope, has 20 out of 28 binding residues of the SARS-CoV-2 located in the RBD of the S protein (aa370-395; Table 2 & Fig 4) [10,   Four structural proteins of SARS-CoV-2, namely spike (S), envelope (E), membrane (M,) and nucleocapsid (N) proteins, were subjected to prediction of B-cell epitopes. List of B-cell epitopes predicted by at least two servers, with the corresponding prediction servers numbered as (1) for BepiPred, (2) for Bcepred, and (3) for ABCpred. https://doi.org/10.1371/journal.pone.0258645.t002 Table 3. Linear B-cell epitopes common to SARS-CoV-2 and SARS-CoV. . The B38, which is a SARS-CoV-2-specific human neutralizing monoclonal antibody, has 9 out of 18 binding residues on the RBD of SARS-CoV-2 [11,12,62]. B38 is involved in both B38 and hACE2 interaction located in the ID of predicted linear and conformational B-cell epitopes (aa403-427, aa437-461, and aa483-493; Tables 2 and 4). Similarly, another monoclonal antibody (S309) known to neutralize both SARS-CoV-2 and SARS-CoV, has 11 out of 21 binding residues in ID sites [10,62]. A previous computational study has predicted aa656-660 of the S protein and aa60-65 of the E protein as dominant conformational B-cell epitopes [37], and it is worthy to note that the same epitopes have been predicted in our study (Table 4). A recent study has predicted that conformational epitopes are mainly localized at aa405-427 and aa439-505 regions of the tertiary structure of the S protein [38]. The highly conserved linear B-cell epitope (aa1253-1273, Table 2) predicted in the S protein has been shown to elicit IgG antibodies in some patients with COVID-19 when tested using synthetic peptides spanning aa1256-1273 [63]. A previous study using pools of overlapping linear B-cell peptides has found IgG immunodominant regions on the S protein of SARS-CoV-2. These regions are recognized by convalescent sera from patients with COVID-19, and the region aa562-580 is localized near the RBD of the S protein [64]. The partial sequence of this immunogenic peptide is computationally predicted as a conformational B-cell epitope in our study ( Table 4). Findings of computational prediction of epitopes reported in this study, as well as the published experimental observations, suggest the association of linear ID B-cell epitopes with conformational epitopes. However, further studies are needed to identify the functions of the predicted linear and conformational epitopes for the rational design of de novo peptide-based vaccines.

Prediction of MHC I-binding epitopes of S, E, M, and N proteins of SARS-CoV-2
A few studies on SARS-CoV-2-specific T-cell responses and their role in protective immunity [65] found that recovered patients with COVID-19 show CD4+ and CD8+ memory responses  The list of conformational B-cell epitopes predicted by at least two servers, with the corresponding prediction servers numbered as (1)    to SARS-CoV-2 [16]. SARS-CoV-2-specific CD4+ T-cell responses were also frequently observed in unexposed healthy participants, suggesting the possibility of pre-existing crossreactive immune memory to seasonal human coronaviruses [18,66]. The access to information on SARS-CoV-2 proteins, and epitopes recognized by human T-cells, can greatly assist researchers in selecting potential epitopes, or target proteins for the future design of candidate vaccines. We selected the high affinity-ranked top 2% peptides and found 55 non-overlapping peptides (26 CD8+ T-cell epitopes for S, 16 for N, 10 for M, and 3 for E proteins), as strong binders of MHC I molecule, based on the prediction of epitopes by at least two servers (Table 6). These strong MHC I binding peptides were assessed for their predicted capacity to elicit IFN-γ responses and are concurrently predicted to possess antigenic and non-allergenic properties. Notably, we predicted 16 CD8+ T-cell epitopes (5 CD8+ T-cell epitopes each for S and N proteins and 3 each for M and E proteins), which are common to structural proteins of both SARS-CoV-2 and SARS-CoV ( Table 7).

Prediction of MHC II-binding epitopes of S, E, M, and N proteins of SARS-CoV-2
Helper T cells, which are required for adaptive immune responses, help activate B cells to secrete antibodies and activate cytotoxic T cells to kill infected target cells. TepiTool [48] and NetMHCIIpan [51] were used to predict and identify high-affinity MHC II-binding epitopes based on 27 HLA class II molecules (HLA-DP, HLA-DQ, and HLA-DR). We found 25 nonoverlapping CD4+ T-cell epitopes (15 CD4+ T-cell epitopes for S, 6 for N, 3 for M, and 1 for E proteins), which were predicted as antigenic and non-allergenic, as well as IFN-gamma-inducing property, with a high degree of conservancy (Table 8). Furthermore, we found 4 CD4+ Tcell epitopes for S, 5 for N, and 2 each for M and E proteins, which are common to both SARS-CoV-2 and SARS-CoV, with epitope conservancy in the range of 80 to 100% ( Table 9). The allele-wise distribution of predicted MHC I and MHC II epitopes for the S, E, M, and N proteins of SARS-CoV-2 are presented in a heat map (Fig 5a and 5b).
On careful observation of computationally predicted epitopes, we found that the regions aa403-427 and aa437-461 of the S protein were predicted as both linear B-cell and MHC Ibinding epitopes (Table 10). Similarly, the conserved C-terminal epitope of S (aa1253-1273) and E (aa57-75) proteins common to SARS-CoV and SARS-CoV-2 were predicted as both linear and conformational B-cell epitopes. Notably, the peptide aa366-394 in the N protein and the peptide aa160-182 in the M protein were predicted to induce both humoral and cell-mediated immune responses (Table 11).
The findings of our computational predictions in conjunction with previous experimental studies suggest that T-cell-based immunity might be generated largely against the S and N proteins of SARS-CoV-2; therefore, the S and N proteins of SARS-CoV-2 could be selected as target candidates for recombinant protein-based vaccines [14][15][16][61][62][63]. CD4+ T-cell responses were observed predominately in the S protein, and the robustness of the T-cell response was correlated with the values of IgG and IgA titers of SARS-CoV-2. Among the structural proteins, the S and M proteins were mainly recognized as possible targets for CD8+ T cells of SARS-CoV-2 [16,17,65,66].

PLOS ONE
Mapping of B-and T-cell epitopes of SARS-CoV-2 List of top-ranked CD8+ T-cell epitopes predicted using at least two servers, with the corresponding prediction servers numbered as (1)

Table 7. List of predicted T-cell epitopes with high-affinity binding to MHC I molecules common to SARS-CoV-2 and SARS-CoV.
CoV protein Sl.
No. List of top-ranked CD8+ T-cell epitopes predicted using at least two servers, with the corresponding prediction servers numbered as (1)   List of top-ranked CD4+ T-cell epitopes predicted using at least two servers, with the corresponding prediction servers numbered as (1) for NetMHCpan4 and (2) for TepiTool. https://doi.org/10.1371/journal.pone.0258645.t008 Table 9. T-cell epitopes predicted with high affinity to MHC II molecules common to SARS-CoV-2 and SARS-CoV.

Cross-validation of predicted epitopes of SARS-CoV-2 in the NIAID Virus Pathogen Database and Analysis Resource and published literature
The Virus Pathogen Database and Analysis Resource (ViPR) database funded by the National Institute of Allergy and Infectious Diseases (NIAID), USA, provides two types of immunerelated information on antigens; the predicted epitopes (NetCTL 1.2 server) and the experimentally determined epitopes derived from the IEDB. The B-and T-cell epitopes predicted in this study were searched on the ViPR database (https://www.viprbrc.org) by selecting different parameters such as family-Coronaviridae, human host, and experimentally determined B-and T-cell epitopes. Two different assays, positive T-cell and positive MHC-binding assays were applied for testing the T-cell response against epitopes and epitope-MHC binding, respectively [67]. We have furnished the unique identification number (IEDB ID) of some predicted Band T-cell epitopes of structural proteins of SARS-CoV-2, which are identical to experimentally determined epitopes of structural proteins of SARS-CoV (Tables 2-10). It is pertinent to note that the predicted continuous B-cell epitopes reported in this study have been previously identified using similar computational tools, and some of the predicted epitopes have now been experimentally validated (Tables 2-9). Similar findings presented in this work have been reported previously [27][28][29][30][31][32][33][34][35][36][37][38].

Identification of potential B-and T-cell epitopes of structural proteins for development of serological assays and multi-epitope-based vaccines
Generally, the S and N proteins are the major targets for the development of vaccines and diagnostic tools against SARS-CoV-2 using recombinant antigens. The development of rapid  antibody tests requires the production of recombinant antigens and their validation in an enzyme-linked immunosorbent assay (ELISA) using the convalescent serum of patients with COVID-19. In addition, ELISA does not require the culture of SARS-CoV-2 in a BSL-3 containment facility [10][11][12][13][14][68][69][70]. In this study, we designed a multi-epitope chimeric construct, using the computationally predicted B-and T-cell epitopes of the S, E, M, and N proteins of SARS-CoV-2 (Table 6). In the multi-epitope vaccine constructs, we included MHC class I (CTL)-and class II (HTL)-binding antigenic, non-allergenic, and conserved epitopes, which were predicted to elicit IFN-γ release. The CTL and HTL epitopes were linked together by AAY and KK cleavable linkers, whereas the B-cell epitopes (linear and conformational) were linked together with the GGGGS flexible linker (Fig 6a-6d). The following four multiepitope chimeric vaccine constructs containing N-CTL, HTL and B-cell epitopes were designed: (i) The RBD of the S protein (B-and T-cell epitopes), (ii) the full-length S protein (B-and T-cell epitopes), (iii) the structural protein construct (B-and T-cell epitopes of S, E, M, and N proteins), and (iv) the chimeric construct of S and N proteins (B-and T-cell epitopes of S and N proteins) (Fig 6). The structure of multi-epitope vaccine constructs was modeled using I-TASSER and was validated by the RAMPAGE server to generate a Ramachandran plot [71] (S1a- S1d Fig in S1  File). Most of the amino acid residues of epitope constructs were found in the favorable region (S3 Fig in S1 File, Inset table). Various physicochemical parameters including the number of residues, theoretical pI, molecular weight, aliphatic index, and grand average of hydrophobicity (GRAVY), were analysed using ProtParam [48]. Based on the aliphatic index scores, the multi-epitope constructs might be considered moderately thermostable (S1 Table in S1 File). GRAVY scores obtained displayed negative value for all the constructs, indicating the likelihood of the chimeric multi-epitope constructs being globular and hydrophilic in nature. The secondary structure of multi-epitope constructs was predicted using the online server PSIPRED [51], and the alpha-helical content of the constructs is provided in S4a- S4d Fig in S1 File.
The cleavable linkers are required to be accessible for the proteases associated with the MHC I and II antigen processing pathways [49]. We observed that the cleavable linkers (AAY and KK) included in multi-epitope vaccines were predicted on the accessible region of vaccine constructs, indicating a high probability of T-cell epitope presentation by MHC molecules (S2a-S2d in S1 File). The results of C-ImmSim server (http://150.146.2.1/C-IMMSIM/index.php) prediction revealed the multi-epitope constructs were able to stimulate cytokine production, including IFN-γ, following immunization with the peptide (Fig 7a-7d). In silico immune simulation of the multi-epitope constructs {(a) RBD of the S protein, (b) the S protein, (c) structural protein construct, and (d) chimeric construct of S and N proteins)} showed consistent correlation with the actual immune responses as observed by the primary response of high levels of IgM. This is followed by a marked increase in B-cell populations and levels of IgG 1 + IgG 2 , IgM, and IgG + IgM antibodies, as a part of the secondary and tertiary responses S5a-S5d in S1 File [i-iv]). A similarly high response was seen in the T H (helper) and T C (cytotoxic) cell populations with corresponding memory development, especially for constructs made of structural proteins (S, E, M, and N) and chimeric constructs of S and N proteins (S5c and S5d Fig in S1 File).

Discussion
Research communities around the world are in the process of developing safe, effective, and affordable vaccines against COVID-19. As of June 3, 2021, the World Health Organization has evaluated and recommended the emergency use of COVID-19 vaccines developed by AstraZeneca/Oxford, Moderna, Johnson and Johnson, Pfizer/BioNTech, Sinopharm, and Sinovac Biotech based on satisfactory published findings on the safety and efficacy profiles of these vaccines [19-26, 72, 73]. Although data on immunological characteristics of patients with COVID-19 are limited, growing experimental evidence suggests that B and T cells are naturally activated during infection, and they contribute substantially towards the resolution of SARS--CoV-2 infection [16-18, 74, 75]. Importantly, SARS-CoV-2-specific antibodies which are found in convalescent sera of approximately 95% of patients with COVID-19 can neutralize the virus, and the antibody titers correlate positively with measured levels of immunoglobulins that mainly target S and N proteins [7-9, 14, 76]. There have been interesting observations on virus-specific CD4+ and CD8+ T cells in sera from both patients with acute COVID-19 and convalescent patients. A similar development of T cells in SARS-CoV-2-unexposed healthy individuals might have important implications for new design and analysis of ongoing vaccine trials [16][17][18]. A proportion of the SARS-CoV-2-specific CD8+ T cells isolated from convalescent peripheral blood of patients with COVID-19 exhibited the undesirable "exhausted" phenotype; such perturbations of T-cell subsets may eventually weaken the antiviral immunity of the host [77]. While efforts to develop vaccines that are more effective are ongoing, there is a need to develop a better understanding of B-and T-cell responses against SARS-CoV-2.
Of the linear B-cell epitopes predicted in the RBD of the S protein (Fig 4, Inset table), the synthetic peptide spanning the regions aa331-356, aa370-395, aa437-461, and aa483-493 has been shown to react with convalescent sera from patients with COVID-19 [61,77]. Intriguingly, most of the amino acid residues spanning the predicted B-cell epitope (aa331-356 and aa437-461, Fig 4, Inset table) of the S protein have been shown to interact with the cross-neutralizing mAb S309 in an ACE2 receptor-independent manner [10,62]. In a related study using the recombinant SARS-CoV-2 RBD antigen, a strong correlation between levels of RBDbinding antibodies and SARS-CoV-2-neutralizing antibodies was observed in patients with COVID-19 [11,78]. Similar investigations performed using samples collected from patients with COVID-19 in the USA, Europe, and Hong Kong have detected specific and sensitive antibodies using the full-length and RBD of the S protein [69,78]. Interestingly, the predicted linear B-cell epitope (aa653-666, Table 2) has now been experimentally validated using the synthetic peptide aa655-672 of the S protein, which was abundantly detected in samples derived from patients with COVID-19 [79].
The S and N proteins are considered the main targets for the development of vaccines and immunoassays against COVID-19. Antibodies to N protein appear earlier than S antibodies are found more sensitive for detection of early infection of SARS-CoV-2 [14]. Leung et al. (2004) showed that antibody responses specific to the N protein were detected during early infection with SARS-CoV [5]. The B-cell epitope predicted and identified (aa359-403) in the N protein of SARS-CoV was previously shown to react abundantly with the serum of patients with SARS-CoV [80,81]. The C-terminal epitope (aa366-TEPKKDKKKKADET-QALPQRQKKQQTVTL-394) predicted in the N protein of SARS-CoV-2 was located on the accessible region of the virus protein structure; however, the immunogenicity of this predicted epitope requires further experimental validation. A recent study used a peptide-based SARS--CoV-2 proteome microarray and identified several B-cell peptides (approximately 5-mer in length) in the serum of patients with COVID-19 [82]. Many of these peptides partially overlap with the predicted B-cell epitopes of S, N, and M proteins presented in this study (Table 2). Previously, 206 monoclonal antibodies specific to the RBD of the S protein of SARS-CoV-2 have been characterized in infected individuals [83]. Wang et al. (2020) identified a human monoclonal antibody (47D11) that neutralizes SARS-CoV-2 and SARS-CoV by binding to their respective RBDs without hampering ACE2 receptor interaction [12]. Amanat et al. identical pathogen [87,88]. Furthermore, recent studies have experimentally demonstrated the presence of cross-reactive T cells in SARS-CoV-2 and SARS-CoV implicating the importance of heterologous immunity in SARS-CoV-2 infection [16,18,66].
Recently, to accelerate the process of vaccine development, researchers have employed in silico methods based on immunoinformatic data to develop multi-epitope vaccines without the cumbersome necessity to cultivate pathogens [27][28][29][30][31][32][33][34][35][36][37][38]. Experimental epitope identification is an expensive procedure, which presents several challenges, including antibody production to identify antigenic regions in a target protein, unavailability of animal models, and further epitope validation through crystallography. Computational approaches can help to guide experimental assays and improve precision by facilitating the selection of specific regions with a high probability of being effective epitopes [39]. An ideal multi-epitope vaccine should be designed to include epitopes that can elicit CTL, HTL, and B cells as well as induce effective responses against a targeted virus.
In this study, we designed multi-epitope chimeric constructs that included the predicted Band T-cell epitopes of the S, E, M, and N proteins of SARS-CoV-2. Some of the epitopes predicted in this study have been verified experimentally in peer-reviewed studies. Although we included CTL, HTL, and B-cell epitopes in the chimeric constructs, these multi-epitope vaccine constructs should be subjected to further experimental validation in vitro and in vivo. The designing of efficacious multi-epitope vaccines remains a great challenge. Expression of the recombinant multi-epitope protein from the synthetic gene encoding the constructed multiepitope vaccine can be a great challenge. Moreover, the ability of the multi-epitope construct to retain the native antigenic structure as a vaccine subunit and to elicit protective immune responses remains to be investigated. In addition, it is unclear whether the linear and conformational B-cell epitopes will fold into appropriate conformations resembling the native S, E, M, and N proteins and elicit a protective immune response. Tembusu virus (TMUV) is a newly emerging flavivirus that causes rapid egg drop and neurological symptoms in ducks. Potential in silico predicted B-cell epitopes of the E protein of TMUV fused with glutathione Stransferases tag have been successfully expressed in Escherichia coli [89]. The findings suggested that two of the four predicted B-cell epitopes could elicit the generation of neutralizing antibodies in ducks and provide protection when challenged with TMUV [89]. A chimeric recombinant antigen comprising of predicted CD4+ and CD8+ T cell-specific epitopes derived from Leishmania infantum was successfully expressed in E. coli. This purified antigen showed protective efficacy in mice against Leishmania amazonensis infection [90]. In our earlier study, we successfully synthesized chimeric antigens with predicted B-and T-cell epitopes of rotavirus proteins. The chimeric antigen was subsequently purified using the E. coli expression system, and the antigen presentation is yet to be investigated [58]. Thus, the present bioinformatic prediction provides a platform for the design of synthetic peptides and its validation using convalescent sera from patients with COVID-19. However, it remains to be investigated whether the pool of B-and T-cell epitopes identified in this study can stimulate B and T cells. The amount of cytokines released in response to these antigenic epitopes needs to be measured both in vitro and in vivo, thus providing a platform for future investigations on SARS-CoV-2-specific immune responses.

Conclusions
In this study, the in silico predicted immune epitopes are limited to the structural proteins of SARS-CoV-2. Noteworthily, many of the predicted B-and T-cell epitopes, derived from multiple computational tools, have been experimentally verified in recent studies using sera from convalescent patients with COVID-19. Further rigorous experimental validations of predicted epitopes might provide a better understanding and resolution of the immune response to SARS-CoV-2 infection.