Conformational B-Cell Epitope Prediction on Antigen Protein Structures: A Review of Current Algorithms and Comparison with Common Binding Site Prediction Methods

Accurate prediction of B-cell antigenic epitopes is important for immunologic research and medical applications, but compared with other bioinformatic problems, antigenic epitope prediction is more challenging because of the extreme variability of antigenic epitopes, where the paratope on the antibody binds specifically to a given epitope with high precision. In spite of the continuing efforts in the past decade, the problem remains unsolved and therefore still attracts a lot of attention from bioinformaticists. Recently, several discontinuous epitope prediction servers became available, and it is intriguing to review all existing methods and evaluate their performances on the same benchmark. In addition, these methods are also compared against common binding site prediction algorithms, since they have been frequently used as substitutes in the absence of good epitope prediction methods.


Introduction
Antigenic epitopes are regions of the antigen protein surface that are preferentially recognized by antibodies. Prediction of Bcell antigenic epitopes is of direct help to the design of vaccine components and immuno-diagnostic reagents. Usually, B-cell antigenic epitopes are classified as either continuous or discontinuous. The majority of available epitope prediction methods focus on continuous epitopes [1,2,3,4,5,6,7,8,9,10,11,12].

Performance of Structure-based Prediction Methods
In the review, we will discuss and evaluate conformational epitope predictors of DiscoTope [15], BEpro(PEPITO) [16], ElliPro [17], SEPPA [18], EPITOPIA [19,20] and EPCES [21], EPSVR [22], Bpredictor [23], and EPMeta [22] for all of which there exist web servers or free downloadable software packages. DiscoTope [15] integrates with linear combination two scores, the hydrophilicity scale and the epitope log-odds ratios, the latter of which is also one kind of epitopic residue propensity score. BEpro(PEPITO) [16] also applies linear combination to two scores: the epitopic residue propensity and the half sphere exposure values at multiple distances. ElliPro [17] uses only one single score, i.e. residue protrusion index (PI). SEPPA [18] employs the epitopic residue propensity and the compactness of the neighboring residues around one residue (contact number or flat surface), again using linear combination. EPITOPIA [19,20] applies a naive Bayesian classifier to forty-four physico-chemical and structural-geometrical attributes, including secondary structure, propensity, conservation, solvent accessible surface, and hydrophilicity etc. EPCES [21] devises a special linear method, using a voting mechanism for consensus, to integrate six scores, namely propensity, amino acid side-chain energy value, secondary structure composition, contact number, conservation score, and surface planarity score. One step forward, EPSVR [22] uses the same attributes as EPCES [21] but Support Vector Regression (SVR) to integrate all scores. Bpredictor [23] employs the random forest classifier to adjacent residue distance score, accessible surface area, conservation, secondary structure, and propensity etc. EPMeta is a meta server, which combines EPSVR, EPCES, EPITOPIA, SEPPA, PEPITO, and Discotope1.2.
In general, the features used by these predictors include conservation score, structural features such as secondary composition, geometry characteristics such as protrusion index and planarity score, and amino acid features such as hydrophilicity and propensity (odd-ratios). These attributes can be integrated by linear combination or machine-learning algorithms, such as naive Bayesian classifiers, SVR, and random forest classifiers. Different number of features can be used in a given predictor, from two scores to forty-four attributes. For small numbers of attributes, a simple linear combination can usually work well, whereas large numbers of features often require sophisticated machine-learning algorithms to optimally integrate the scores. Notably, some of these features may be mutual-exclusive or overlapped. For example, the antigenic epitope is frequently located at either a protruding region or a flat surface. In such cases, linearly combining two incompatible terms contradicts the physical basis and will only degrade the performance of a predictor.
The above epitope predictors are trained with most or all of the available antigen-antibody complex structures obtained from x-ray diffraction on crystallized proteins. Therefore, the independent test set compiled by Liang et al. [22], which contains 19 protein monomer structures with epitope information derived from experimental methods other than crystal structures, was applied to all methods as an independent evaluation. Table 1 shows the area under receiver operating characteristic curve (AUC) values of all methods. A receiver operating characteristic (ROC) curve represents a dependency of sensitivity and (1-specificity), which is plotted with true positives rate versus false positive rate at various threshold settings. To change the threshold setting, the number of predicted residues is increased in steps of 1% of total surface residues. The mean AUC values are calculated using the method described by Liang et al. [22], except for Bpredictor. For Bpredictor, the AUC value is directly obtained from the manuscript, where the same benchmark by Liang et al. was applied as in the current work. Among single servers, EPSVR and Bpredictor have the best performance according to the AUC values. Although EPSVR has the highest mean AUC value, the differences between EPSVR and other servers are not statistically significant (p-value .0.05), according to the pairwise t-student tests. The meta server, EPMeta, achieves a mean AUC value of 0.638, which is significantly higher than all single servers.

Single Chain or Multiple Chains
The recognition of antibody to antigenic epitopes has high specificity; the epitopic surface is not as conserved as other functional protein binding sites, which comes from the conserved functions of protein-protein interactions during evolution. The interfaces of regular protein-protein binding are usually more conserved and have more hydrophobic amino acid residues than non-binding protein surfaces. This makes the exposed protein-  protein interfaces relatively easy to distinguish from both the antigenic epitopes and non-binding protein surfaces. In other words, the prediction task for a single chain protein that has both protein-protein binding interfaces and an antigenic epitope is easier than that of a complete protein complex. In the benchmark, six of the proteins (PDB IDs: 1eku, 1av1, 1al2, 1jeq, 2gib, and 1qgt) possess multiple chains. Therefore, in the evaluation all methods are tested with two different scenarios for these six proteins: prediction on a single chain, where the experimental antigenic epitope is located, and prediction on the whole protein, including all chains. When using multiple chains, all chains are considered, and the total number of surface residues is counted for the intact complex structure. As a result, some methods, such as EPSVR, show dampened performances if the whole protein is used for prediction, resulting in lower mean AUC values for the 6 proteins as compared with predicting based on the single chain containing the antigenic epitope. Therefore, in the future, if sufficient data exist, variant test datasets shall be compiled for different cases, i.e. single chain antigens, single chains from antigen complexes, and antigen complexes. A good antigenic epitope predictor shall have satisfying performance on all types of benchmarks.

Protein Binding Site Prediction Methods
Protein binding site prediction methods are frequently borrowed for conformational epitope prediction [24,25], since epitopic patches can be considered as one kind of protein binding sites, and due to the lack of many epitope prediction methods for analysis and comparison. The methodologies used by protein binding site prediction and epitope prediction are similar; both integrate some amino acid scoring functions with a machine learning algorithm or other platform to train a prediction model on known data. The major difference is their distinct training sets; while protein binding site prediction uses all known proteinprotein binding complexes, an epitope prediction method is trained with antibody-antigen complexes only. Therefore, we also applied the independent benchmark of epitopes to some binding site prediction methods. For this we selected binding site prediction methods that have both demonstrated good performance and convenient web servers for public use. The AUCs achieved by these methods for the epitope benchmark are shown in Table 2. One can see that the performances of the binding site prediction methods to predict B-cell epitopes are significantly lower than all conformational epitope prediction methods. This is not surprising, because all binding site prediction methods are designed based on the conservation and hydrophobicity of binding patches, but B-cell epitopic patches are neither conserved nor more hydrophobic compared with other protein-protein binding surfaces. Instead, the residues on the antigenic epitopes are more diverse than regular surface residues due to the evolution pressure from the host immune system. Therefore, we conclude that the general binding site prediction methods are not suitable for antigenic epitope prediction. Any future developed epitope prediction method is not recommended to claim performance improvement by comparing with binding site prediction methods.

Discussion
Currently, various sets of attributes and classifiers have been applied by different existing epitope prediction algorithms, which naturally leads to one question: Which combination of attributes is optimal for the prediction? To answer this question, one may systematically evaluate different machine-learning algorithms on all non-redundant attributes and allocate the optimal set among them. Also of great importance to the epitope prediction research is the growth of the training data, especially the antigens that have both bounded and unbounded structures. In addition, it is also important to collect high quality independent testing data, such as the ones compiled by Liang et al. [22] that contain experimentally measured epitopic residues but no complex structures. We also recommend that all future researchers implement their developed algorithms as free accessible web servers or downloadable software packages, because B-cell epitope prediction algorithms will likely become more and more complicated and meta-methods usually have better prediction accuracy than any of the single algorithms (Table 1).

Conclusions
In recent years, there have been developed a number of new conformational B-cell epitope prediction algorithms. While the prediction performance has accumulated some improvement, it is still far from satisfactory. Compared with other bioinformatic problems, antigenic epitope prediction is especially difficult due to the lack of properties that are universally observed for the antigenic epitopes but not for other protein surfaces. Additionally, common binding site prediction methods are not suitable for antigenic epitope prediction because they focus on the conservation of surface residues.

Author Contributions
Conceived and designed the experiments: BY CZ SL. Performed the experiments: BY DZ SL CZ. Analyzed the data: BY DZ. Wrote the paper: DZ SL CZ.