Table 1.
Sample characteristics.
Fig 1.
Broad BCR repertoire metrics in lymphoma and control cohorts.
In the four panels, the essential repertoire metrics (A) clonality, (B) richness, (C) diversity, and (D) somatic hypermutation rate are shown with corresponding quantiles (Q0.25, Median, Q0.75). We pairwise performed a two-sided Mann-Whitney-U-Test with α = 0.05 (*** p < 0.001).
Table 2.
Numbers of BCR repertoires used for training.
Fig 2.
F1 scores of logistic regression models in training and test sets.
(A) shows F1 scores averaged over validation folds during training in all three scenarios and (B) those of best validated logistic regression model on the test set in all three scenarios. As a comparison, the performance of the best random forest is displayed in green.
Table 3.
Validation of best models on the independent test set.
Fig 3.
Data separation using n = 1 to n = 100 top repertoire clonotypes.
Principal Component Analysis (PCA) was performed on the feature list while varying the number of top repertoire clonotypes included in the analysis. (A) n = 1 clonotype, (B) n = 4 clonotypes, (C) n = 20 clonotypes, (D) n = 100 clonotypes. We compared sample means using a multivariate analysis of variance (MANOVA).
Fig 4.
20 lymphoma subset predictors with greatest coefficient magnitude.
(A) Predictors were averaged over all classes in the best performing model for discrimination of HD vs. NLPBL vs. DLBCL vs. CLL. (B) Contribution of each predictor to the discrimination between pairs of cohorts.