Graph-theoretical formulation of the generalized epitope-based vaccine design problem

doi:10.1371/journal.pcbi.1008237

Fig 1.

Epitope-based vaccine pipeline.

An epitope-based vaccine pipeline is comprised of three major steps: (1) epitope discovery, (2) selection, and (3) assembly. Different types of vaccine design exist: (3a) epitope mixture, (3b) string-of-beads, and (3c) mosaics.

More »

Expand

Fig 2.

The graph encoding of the vaccine design problem.

Vertices represent epitopes and edge weights are design-dependent. By jointly modeling the epitope selection and vaccine assembly problem, we seek n subsets of vertices (n = 3 in the figure), each representing a separate polypeptide (blue, yellow and red in the figure), with the highest immunogenicity, whose simple tours start from and end in a placeholder node s, and are shorter than given limits in terms of the number of vertices k and edge-weight sum h. (a) The edge weights are simply ignored for epitope mixtures. (b) For string-of-beads, the edge weights represent the negative log-likelihood of being cleaved at the junction site of the two connecting epitopes, so that low total edge weight is achieved by selecting epitopes that are likely to be separated correctly upon proteasomal cleavage. (c) The weights in mosaic designs represent the added length to the mosaic vaccine once the two connecting epitopes are joined with maximum overlap, so that low total edge weight results in shorter polypeptides.

More »

Expand

Table 1.

Integer linear program formulation of the generalized epitope vaccine design problem.

It results in a cocktail of |T| polypeptides each composed of at most k epitopes. Together, the polypeptides cover at least Θ_s pathogens and Θ_a MHC alleles, and contain epitopes with an average pathogen conservation of at least Γ.

More »

Expand

Fig 3.

Novel EV design possibilities obtainable by considering epitope selection and epitope assembly at the same time.

(a) We created five bootstraps of protein sequences and generated string-of-beads vaccines on the Pareto frontier by maximizing immunogenicity and cleavage score at the same time. The epitopes were either joined directly (yellow) or by optimal spacer sequences (blue). The cleavage score of the two groups was normalized separately to allow comparisons between spacers and direct links. (b) We selected five percentiles of the cleavage score, from 40 to 80 of the global maximum, and estimated the immunogenicity that can be obtained via linear interpolation between the closest points on the Pareto frontier of each bootstrap. The table above shows the percent increase in immunogenicity relative to the no-spacers group, the effect size d computed as the difference of the means normalized by the standard deviation of the no-spacers group, and the Mann-Whitney U test statistic, that shows the number of pairwise comparisons that were favorable for the designed spacers (* for p < 0.05).

More »

Expand

Fig 4.

Decrease in cleavage score by randomly permuting string-of-beads vaccines.

(a) For each vaccine in the Pareto frontier of each bootstrap, we created 50 new string-of-beads by randomly permuting its epitopes and compared the cleavage scores of the two vaccines. The black dots in the box plots show the median decrease. (b) cleavage score and immunogenicity of the original vaccine with optimized ordering.

More »

Expand

Fig 5.

Cocktail of mosaic proteins compared to an equivalent string-of-beads vaccine.

We designed a cocktail (blue) of four polypeptides (cyan) that covers 99% of the pathogens, even though the single fragments only cover between 85 and 95%. The orange and red bars correspond to an epitope mixture designed by OptiTope and a mosaic with the same number of amino acids respectively; the former was required to reach 99% pathogen coverage, and the latter was unconstrained.

More »

Expand

Fig 6.

Comparison of mosaic and epitope mixture vaccines of different sizes.

Mosaic vaccines were much better than epitope mixtures or string-of-beads of the same length (blue), as long as the pathogens offer enough epitope variety. By enforcing an overlap between epitopes of eight amino acids (red), the vaccine did not improve after a certain length. This could be prevented by relaxing this requirement to only four amino acids (yellow). The vaccines are compared with respect to four metrics: immunogenicity (a), population coverage (b), pathogen coverage (c), and conservation (d). The experiment was repeated for each of the five bootstraps, and bars represent the resulting standard deviation. Note that these vaccines were not designed with pathogen coverage nor epitope conservation in mind; mosaics naturally reached higher values.

More »

Expand

Fig 7.

Mosaic vaccines naturally target conserved regions even when this is not required.

(a) shows, for each residue position in aligned sequences where the consensus is not a gap, the smoothed entropy (blue, and residue entropy in lighter color) and the potential immunogenicity (green) (b) shows the number of pathogens covered in each position by a 20-epitopes mixture with maximal immunogenicity (yellow), a short mosaic of 28 amino acids (red) and a long mosaic of 90 amino acids (blue). The count is normalized separately for each vaccine to account for their different coverage. (c) shows the pairwise correlations of the variables shown in the left plot, so that every dot in the scatter plots corresponds to a different residue position, and linear fits are shown in red. The lower triangular half shows the Spearman correlation coefficients (above) and the respective p-value (below). Colors range from blue (large negative correlation) to white (no correlation) to red (large positive correlation), and the font is bold if the correlation is significant with a confidence of at least 99.5% at the Bonferroni-corrected significance level of 5%. The diagonal contains histograms showing the distribution of each variable, with logarithmic y axis.

More »

Expand

Fig 8.

Comparison of mosaic vaccines optimized for different objectives.

Here we designed mosaics of varying amino acid length (on the x axis) while optimizing for conservation (blue), immunogenicity (red), and pathogen coverage (yellow). The plots compare the vaccines in terms of conservation (a), immunogenicity (b), pathogen coverage (c), and population coverage (d). For longer vaccines, optimizing for pathogen coverage only gives modest improvements on the mosaics optimized for immunogenicity in terms of coverage, and does not increase conservation by much. When optimized for, conservation is considerably higher but becomes harder to improve as the vaccine becomes longer, due to the fact that few epitopes are well conserved and highly immunogenic at the same time. Average of five runs, standard deviation on the error bars.

More »

Expand

Fig 9.

Effect of different immunogenicity predictors on optimized vaccines.

Each scatter plot shows the immunogenicity predicted by a certain method (y axis), when the ten-epitopes string-of-beads vaccine was designed optimizing the immunogenicity predicted by a different method (x axis). The diagonal shows the immunogenicity distribution of the optimized vaccines. The color of each point indicates the cleavage score of the vaccine (brighter is larger). Inside each scatter plot, we report the intersection-over-union (IoU) of the epitopes, the number of shared epitopes between vaccines of similar cleavage score indicating median and 25th and 75th percentile in parentheses, and the correlation coefficient (Pearson or Spearman depending on the plot) with its p-value against the null hypothesis of no correlation.

More »

Expand

Fig 10.

Effect of different immunogenicity predictors on vaccines optimized for different objectives.

We repeated the experiment shown in Fig 8 using different epitope immunogenicity predictors. For each bootstrap, we designed mosaic vaccines optimizing their conservation (blue), immunogenicity (yellow), or pathogen coverage (red) and compared the vaccines in terms of conservation (a), immunogenicity (b), pathogen coverage (c), and population coverage (d). (e) shows the pairwise difference of the four metrics across all vaccine sizes, separated by optimization objective, between all pairs of immunogenicity functions. There is little variation among the results obtained with different immunogenicities.

More »

Expand