Detection of disease-specific signatures in B cell repertoires of lymphomas using machine learning

doi:10.1371/journal.pcbi.1011570

Table 1.

Sample characteristics.

More »

Expand

Fig 1.

Broad BCR repertoire metrics in lymphoma and control cohorts.

In the four panels, the essential repertoire metrics (A) clonality, (B) richness, (C) diversity, and (D) somatic hypermutation rate are shown with corresponding quantiles (Q_0.25, Median, Q_0.75). We pairwise performed a two-sided Mann-Whitney-U-Test with α = 0.05 (*** p < 0.001).

More »

Expand

Table 2.

Numbers of BCR repertoires used for training.

More »

Expand

Fig 2.

F1 scores of logistic regression models in training and test sets.

(A) shows F1 scores averaged over validation folds during training in all three scenarios and (B) those of best validated logistic regression model on the test set in all three scenarios. As a comparison, the performance of the best random forest is displayed in green.

More »

Expand

Table 3.

Validation of best models on the independent test set.

More »

Expand

Fig 3.

Data separation using n = 1 to n = 100 top repertoire clonotypes.

Principal Component Analysis (PCA) was performed on the feature list while varying the number of top repertoire clonotypes included in the analysis. (A) n = 1 clonotype, (B) n = 4 clonotypes, (C) n = 20 clonotypes, (D) n = 100 clonotypes. We compared sample means using a multivariate analysis of variance (MANOVA).

More »

Expand

Fig 4.

20 lymphoma subset predictors with greatest coefficient magnitude.

(A) Predictors were averaged over all classes in the best performing model for discrimination of HD vs. NLPBL vs. DLBCL vs. CLL. (B) Contribution of each predictor to the discrimination between pairs of cohorts.

More »

Expand