Inference of B cell clonal families using heavy/light chain pairing information

doi:10.1371/journal.pcbi.1010723

Inference of B cell clonal families using heavy/light chain pairing information

Fig 3

Clustering performance on simulation as a function of SHM (mean fraction of nucleotides mutated) for heavy chain (left) and light chain (right).

Each point is the mean F1 score (± standard error, often smaller than points) over three samples, each consisting of 10,000 simulated rearrangement events with family sizes drawn from a geometric distribution with mean three. In addition to the inference methods (shown in color), we include two synthetic partition methods (grey), which generate incorrect partitions starting from the true partition. These are purely to provide intuitive comparison: the first splits 20% of sequences, chosen at random, into singleton clusters; the second merges together families whose true naive sequences are closer than 3% in Hamming distance (“synth.”, see text for details). The F1 score’s component precision and sensitivity are plotted in S3 Fig, while the performance with and without using pairing information is compared for partis in S4 and S7 Figs, and SCOPer in S5 Fig. Performance of partis as a function of the number of families (with SHM constant) is shown in S6 and S7 Figs. Note that enclone by design discards some sequences with higher SHM levels, so we display its performance only for those samples where it passes at least 90% of sequences (it discards ≃9% of sequences at 10% SHM, ≃60% at 20%, and ≃94% at 30%).

doi: https://doi.org/10.1371/journal.pcbi.1010723.g003