Towards the prediction of non-peptidic epitopes

doi:10.1371/journal.pcbi.1009151

Fig 1.

Cluster inertia plotted against the number of clusters (k).

The cluster inertia is computed as the sum of squared distances of samples to their closest cluster.

More »

Expand

Fig 2.

Principal component visualization of the ChEBI dataset.

(a) Principal components of the 8 clusters and their sizes. (b) T cell epitopes. (c) B cell epitopes.

More »

Expand

Fig 3.

Example molecules for each cluster generated for the ChEBI dataset.

ChEBI IDs used for the example molecules: (a) Steroid/terpenoid like: CHEBI:776; (b) Betaine/glycerolipid derivatives: CHEBI:17636; (c) Fatty acid derivatives: CHEBI:16196; (d) Acyl-CoA derivatives: CHEBI:11010; (e) Glucoside/oligosaccharide derivatives: CHEBI:16551; (f) Nucleobase-containing molecular entities: CHEBI:15422; (g) Diverse small molecules: CHEBI:55395; (h) Cyclic Halide / Phenols: CHEBI:59246. All examples represent molecules that have been tested positive in B cell essays—except for the acyl-CoA derivatives, where no epitope was described.

More »

Expand

Table 1.

BiNChE ontology analysis of cluster 4.

The name “glucoside/oligosaccharide derivatives” was chosen for this cluster.

More »

Expand

Table 2.

Summary of the compiled molecular clusters.

The mean fold-enrichment can be used as an indicator of the homogeneity of the cluster.

More »

Expand

Fig 4.

Cross-validation performance of the RF models for different radii parameters used to generate Morgan fingerprints.

The prediction of epitopes that tested positive in T cell assays (a) and B cell assays (b).

More »

Expand

Fig 5.

Model comparison for different feature sets for the epitopes that tested positive in B cell assays.

Cluster 3 was not benchmarked, since there were no epitopes in this structural class.

More »

Expand

Fig 6.

Model comparison for different feature sets for the epitopes that tested positive in T cell assays.

Cluster 3 was not benchmarked, since there were no epitopes in this structural class.

More »

Expand

Fig 7.

Performance of the epitope classifiers for different feature sets.

Cluster 3 is not benchmarked, since there were no epitopes in this structural class. The RF classifiers are depicted with a continuous line and the similarity classifiers are shown with a dotted line.

More »

Expand

Table 3.

Epitope prediction performance of the RF models on the test dataset.

The ROC-AUC values could not be computed for some clusters because of missing positive samples.

More »

Expand

Fig 8.

Substructures of most significant fingerprint features for the classification of T cell epitopes of the fatty acid derivatives (cluster 2).

A depiction of each feature is shown (smaller box) alongside an example molecule containing it (larger box). In the feature box, the central atom is labeled with a purple sphere; aliphatic ring atoms are labeled with grey spheres. In the molecule box, all matched feature atoms are labeled with blue spheres. The statistics of the features are shown in Table 4.

More »

Expand

Table 4.

Most important fingerprint features for the prediction of T cell epitopes of the fatty acid derivatives (cluster 2).

The fingerprint feature IDs correspond to Fig 8. The corr. p-value is based on the hypothesis (H₀), that the feature count is equally distributed in the epitopes and the background. For explanation of other feature-specific metrics see Methods. For those features where no examples are present in the background dataset, the fold-enrichment and mean count difference cannot be computed.

More »

Expand

Table 5.

Most important fingerprint feature for the prediction of T cell epitopes of the glucoside/oligosaccharide derivatives (cluster 4).

The fingerprint feature corresponds to Fig 9.

More »

Expand

Fig 9.

Histogram of the fingerprint feature (ID:16163127) count responsible for T cell prediction of the glucoside/oligosaccharide derivatives (cluster 4).

The vast majority of epitopes have a long fatty acid chain attached to the glycoside. (a) Example molecule with 20 fingerprint features; all matched feature atoms are labeled with blue spheres. (b) Depiction of the fingerprint feature; the central atom is labeled with a purple sphere.

More »

Expand

Fig 10.

Substructures of most significant fingerprint features for the classification of B cell epitopes of the glucoside/oligosaccharide derivatives (cluster 4).

A depiction of each feature is shown (smaller box) alongside an example molecule containing it (larger box). In the feature box, the central atom is labeled with a purple sphere; aliphatic and aromatic ring atoms are labeled with grey and yellow spheres. In the molecule box, all matched feature atoms are labeled with blue spheres.

More »

Expand

Table 6.

Most important fingerprint features for the prediction of B cell epitopes of the glucoside/oligosaccharide derivatives (cluster 4).

The fingerprint feature IDs correspond to Fig 10.

More »

Expand

Fig 11.

Substructures of most significant fingerprint features for the classification of B cell epitopes of the nucleobase-containing molecular entities (cluster 5).

A depiction of each feature is shown (smaller box) alongside an example molecule containing it (larger box). In the feature box, the central atom is labeled with a purple sphere; aliphatic and aromatic ring atoms are labeled with grey and yellow spheres. In the molecule box, all matched feature atoms are labeled with blue spheres. The statistics of the features are shown in Table 7.

More »

Expand

Table 7.

Most important fingerprint features for the prediction of B cell epitopes of the nucleobase-containing molecular entities (cluster 5).

The fingerprint feature IDs correspond to Fig 11.

More »

Expand

Fig 12.

The feature responsible for the prediction of T cell recognition of the nucleobase-containing molecular entities (cluster 5).

A depiction of the feature is shown (smaller box) alongside an example molecule containing it (larger box). In the feature box, the central atom is labeled with a purple sphere. In the molecule box, all matched feature atoms are labeled with blue spheres.

More »

Expand

Table 8.

Most important fingerprint feature for the prediction of T cell epitopes of the nucleobase-containing molecular entities (cluster 5).

The fingerprint feature corresponds to Fig 12.

More »

Expand