Predicting recognition between T cell receptors and epitopes with TCRGP

doi:10.1371/journal.pcbi.1008814

Fig 1.

TCRGP pipeline for training a classifier to predict if new unseen TCRs recognize a certain epitope.

(A) Sequence preparation. For the training, TCRs specific to the epitope of interest and control sequences not recognizing the epitope are required. TCRGP can utilize CDR3, CDR1, CDR2, and CDR2.5 sequences from both TCR α- and β-chains. A separate alignment is created for each CDR type and the aligned sequences are given numerical presentations. We use principal components of a modified BLOSUM62 substitution matrix to encode each amino acid. We utilize all 21 components, but in this illustration we only show the first components. (B) Using the numerical presentation, we create separate covariance matrices for each CDR type. During the training of the Gaussian process classifier, an optimal combination of the base kernels and their parameters are learned. (C) When the classifier has been trained, we can make probabilistic predictions for new TCRs.

More »

Expand

Fig 2.

Epitope-specificity prediction with the Dash data.

(A) The left panel shows the cross-validated ROC curves for each subject in the Dash data for BMLF1_280-288, when TCRGP has been trained using all CDRs from TCRα and TCRβ. The mean AUROC score is 0.905. The right panel shows the cross-validated ROC curve for all subjects and also the corresponding threshold values. From this figure we can determine which threshold values correspond to different true positive rates (TPRS) and false positive rates (FPRS). (B) Violin plots present the distributions of AUROC scores for the different epitopes obtained with models utilizing varying CDRs. The blue parts of the violin plots illustrate the AUROC scores of predictions made by TCRGP for all the epitopes. The orange sides illustrate the AUROC scores obtained with TCRdist. Each point within a violin plot presents the mean AUROC score obtained for one epitope. The used chains (α and/or β) and CDRs (three or all) are indicated below each panel. (C) Comparison of AUROC scores obtained with TCRGP and TCRdist using only CDR3 from TCRαβ, TCRβ, or TCRα for each epitope separately. The epitopes have been arranged in increasing order of AUROC scores obtained by TCRGP using CDR3 from α- and β-chains (blue line). (D) Comparison of AUROC scores obtained with TCRGP and TCRdist using all CDRs from TCRαβ, TCRβ, or TCRα for each epitope separately. The epitopes have been arranged in increasing order of AUROC scores obtained by TCRGP using all CDRs from α- and β-chains (blue line). (E) Fractions of total weight given to kernels corresponding to different CDRs, when TCRGP has been trained to predict which TCRs are specific to the epitopes in the Dash data using all CDRs from both TCR chains.

More »

Expand

Table 1.

Mean AUROC scores for the Dash data using leave-one-subject-out cross-validation.

TCRGP models and TCRdist models were trained using either only CDR3s or all CDRs from TCRαβ, TCRβ, or TCRα.

More »

Expand

Fig 3.

Epitope specificity prediction with the VDJdb data.

(A) The left panel shows the cross-validated ROC curves for each subject in the VDJdb data for HCV NS3_1436-1444-epitope, when TCRGP has been trained using TCRα and TCRβ with all CDRs. The mean AUROC score is 0.944. The right panel shows the cross-validated ROC curve for all subjects and also the threshold values for classification are shown. From this figure we can determine which threshold values correspond to different true positive rates (TPRS) and false positive rates (FPRS). (B) One violin plot presents the distribution estimate of mean AUROC scores obtained with one method for all epitopes in our VDJdb data. Below each violin plot there is the name of the method used and in the brackets which CDRβs have been used (3 for CDR3, all for CDR1, CDR2, CDR2.5, and CDR3). Each point within a violin plot presents the mean AUROC score obtained for one epitope. RF refers to the Random Forest TCR-classifier of De Neuter et al. [19]. RF using only CDR3β has not been included in this figure as it could not provide predictions for all of the 22 epitopes. (C) Comparison of AUROC scores obtained with the different methods for each epitope separately. The epitopes have been arranged in increasing order of AUROC scores obtained by TCRGP using all CDRβs (orange line) (D) For each epitope from the VDJdb dataset, TCRGP models were trained using different numbers of unique epitope-specific TCRβs, always complemented with the same number of control TCRβs. For each point of the learning curve the model was trained with 100 random samples of the TCRβs, using either CDR1, CDR2, CDR2.5, and CDR3 (blue curves), or only CDR3 (orange curves). The darker lines show the mean of the predictions and the shaded areas ± the standard deviation for the 100 folds. The points indicate the tested sample sizes. Here learning curves for four peptides are shown. (E) Leave-one-out cross-validated AUROC scores correlate with the diversity and number of samples (Pearson correlation -0.66). The sizes of the circles indicate the number of unique TCRs used for training.

More »

Expand

Table 2.

Mean AUROC scores for the VDJdb data using leave-one-subject-out cross-validation.

TCRGP models and TCRdist models were trained using TCRβ with either only CDR3β or all CDRβs. RF models and DeepTCR models were trained using only CDR3β or CDR3β with Vβ-gene, from which the other CDRβs can be derived from.

More »

Expand

Table 3.

Datasets.

Dash data: The data set constructed by Dash et al. [10] contains epitope-spcefic TCRs for Epstein-Barr virus (EBV), human Cytomegalovirus (CMV), Influenza A virus (IAV) and mouse Cytomegalovirus (mCMV). VDJdb data: Data set gathered from VDJdb contains epitope-specific TCRs for Cytomegalovirus (CMV), Epstein-Barr virus (EBV), Influenza A virus (IAV), Hepatitis C virus (HCV), Herpes Simplex virus type 2 (HSV-2), Yellow Fever virus (YFV), Dengue virus type 1 (DENV1), Dengue virus type 3 (DENV3-4), and Human immunodeficiency virus type 1 (HIV-1). For the MHC chains we show here only the allele group. For some epitopes there are TCRs for which there exists more detailed information and some variation, which are shown in S3 File.

More »

Expand

Fig 4.

Mean AUROC scores obtained from TCRGP models with different kinds of data.

Blue violin plots show results of models trained with only CDR3β and orange violin plots of models trained with all CDRβs. (A) Mean AUROC scores from TCRGP models for the 22 epitopes in VDJdb data, when TCRs specific to one epitope are considered as positive and TCRs specific to the other 21 epitopes are considered as control data. 200-fold stratified cross validation was used for the evaluation. (B) Same as (A), but only the 885 unique TCRs specific to the seven HLA-A*02 restricted epitopes are used.

More »

Expand

Fig 5.

Analysis of HBV-specific T cells in HCC patients.

(A) Schematics for the analysis of single-cell RNA and TCRαβ sequencing data using TCRGP and multimer-sorted data. (B) Numbers of cells predicted to recognize different epitopes by TCRGP with probability of at least 85%. HBV-reactivity was assessed by four different TCRGP classifiers trained against four different HBV-epitopes (HBV_core169, HBV_core195, HBV_pol282, HBV_pol387). Other predictions were made using the models trained with the VDJdb data. (C) Dimensionality reduced representation (UMAP) of the 1189 CD8+ T cells from HBV+ HCC-patients from peripheral blood, normal adjacent tissue and tumour tissue. Encircled dots represent the T cells predicted to be HBV-reactive by TCRGP. (D) The frequencies of T cells predicted to recognize different HBV-epitopes in each cluster. (E) Z-score normalized mean expressions of known canonical markers to assess CD8+ cell phenotypes (naïve, cytotoxic, costimulatory inhibitory, and effector memory markers) in the three different exhausted cell clusters. Exhausted 3 was predicted to be enriched for HBV-targeting T cells (p = 3e-06, p.adj = 0.001).

More »

Expand

Table 4.

HBV-epitopes for which TCRGP classifiers were trained.

The numbers of epitope-specific TCRs and subjects, and mean AUROC scores from leave-one-subject-out cross-validations are shown.

More »

Expand