Skip to main content
Advertisement

< Back to Article

Fig 1.

Fraction of true alleles missing (left) and alleles spuriously inferred (right) by partis on simplified “sparse” repertoires as a function of the number of sequences in the sample.

Each point represents the mean performance (± standard error) on 50 independent simulation samples of the indicated sample size varying the following variables. Top: SHM levels (the SHM distributions corresponding to “low”, “typical”, and “high” are shown in S1 Fig). Middle: new-allele prevalence (as a fraction of the existing allele’s prevalence). Bottom: number of SNPs (Nsnp) separating new and existing alleles.

More »

Fig 1 Expand

Fig 2.

Fraction of true alleles missing (left) and alleles spuriously inferred (right) by partis on simplified “sparse” repertoires as a function of the number of sequences in the sample.

Each point represents the mean performance (± standard error) on 50 independent simulation samples of the indicated sample size varying the following variables. Top: Nsnp with multiple new alleles, where, e.g. “1 + 3” indicates two new alleles, separated by 1 and 3 SNPs from the same existing allele. Middle: mean number of leaves per clonal family. Bottom: tree balance.

More »

Fig 2 Expand

Table 1.

Missing and spurious alleles on full-repertoire simulation for the three germline inference methods plus “full IMGT” annotation.

More »

Table 1 Expand

Fig 3.

Full-repertoire germline set accuracy for the currently widespread method of aligning every sequence to its closest match in the full IMGT V gene set.

The phylogenetic tree is constructed with a leaf for each germline gene in either the true or inferred germline sets (see Methods). Branch lengths connecting different V gene families are set to zero. Leaves are colored according to the similarity of the true and inferred germline sets, with shared genes in green and unshared in red, the latter broken into missing (light red) and spurious (dark red). Novel alleles (not in the IMGT database, whether from the true simulated set or spuriously inferred) are highlighted in gold. Shown on the first three replicates (0-2) of both the low-SHM (left), and high-SHM (right) full-repertoire simulation samples (see text).

More »

Fig 3 Expand

Fig 4.

Full-repertoire germline set accuracy for IgDiscover(explanation in Fig 3).

Shown on the first three replicates (0-2) of the low-SHM full-repertoire simulation samples. The high-SHM samples are not shown, since IgDiscover is designed only for low-SHM IgM samples (see text).

More »

Fig 4 Expand

Fig 5.

Full-repertoire germline set accuracy for TIgGER (explanation in Fig 3).

More »

Fig 5 Expand

Fig 6.

Full-repertoire germline set accuracy for partis (explanation in Fig 3).

More »

Fig 6 Expand

Fig 7.

Full-repertoire V naive accuracy (Hamming distance between true and inferred V naive sequences) for the three germline inference methods plus “full IMGT” annotation.

Results are the sum (figures, top) or mean (table, ± standard error) of ten independent 50,000-sequence samples for both low-SHM (left) and high-SHM (right). IgDiscover is shown for only the low-SHM samples, since it is designed only for IgM.

More »

Fig 7 Expand

Fig 8.

Comparison of all three inference methods on the healthy donor samples from [34] (other subjects shown in S2 and S3 Figs).

The phylogenetic tree is constructed with a leaf for each germline gene that was inferred by any of the methods. Branch lengths connecting different V gene families are set to zero. Leaves are colored according to how many methods inferred the corresponding gene: one (green, red, blue), two (grey), or all three (white). The same trees, but with leaves labeled with gene names, are shown in S4 Fig.

More »

Fig 8 Expand

Fig 9.

Comparison of germline sets inferred by partis and TIgGER for subjects FV, GMC, and IB from [35], with all ten time points merged for each subject.

The phylogenetic tree is constructed with a leaf for each germline gene that was inferred by either of the two methods. Branch lengths connecting different V gene families are set to zero. Leaves are colored according to how many methods inferred the corresponding gene: either one (red, blue) or both (white). Since this data is not IgM specific, IgDiscover is not shown. Includes the three time points in Fig 11, plus seven more, for each subject. The same trees, but with leaves labeled with gene names, are shown in S5 Fig.

More »

Fig 9 Expand

Fig 10.

Comparison of the three methods on IgM data from subjects lp08248 (left) and lp23810 (right) from [36].

The phylogenetic tree is constructed with a leaf for each germline gene that was inferred by any of the methods. Branch lengths connecting different V gene families are set to zero. Leaves are colored according to how many methods inferred the corresponding gene: one (green, red, blue), two (grey), or all three (white). See Fig 12 for other results for these subjects. The same trees, but with leaves labeled with gene names, are shown in S6 Fig.

More »

Fig 10 Expand

Fig 11.

Comparison of inferred germline sets for samples taken at different time points for subjects FV, GMC, and IB from [35].

Shown for three (of ten total) time points surrounding influenza vaccination: two days before, three days after, and seven days after; for partis (top) and TIgGER (bottom). The phylogenetic tree is constructed with a leaf for each germline gene that was inferred at any of the three time points. Branch lengths connecting different V gene families are set to zero. Leaves are colored according to the number of time points at which the corresponding gene was inferred: one (dark grey), two (light grey), or all three (white). Since this data is not IgM specific, IgDiscover is not shown. See Fig 9 for other results for these subjects. The same trees, but with leaves labeled with gene names, are shown in S7 Fig.

More »

Fig 11 Expand

Fig 12.

Comparison of inferred germline sets for IgM vs IgG data from subjects lp08248 and lp23810 from [36] for partis (left) and TIgGER (right).

The phylogenetic tree is constructed with a leaf for each germline gene that was inferred for either of the two isotypes. Branch lengths connecting different V gene families are set to zero. Leaves are colored according to the number of isotype-specific samples for which the corresponding gene was inferred: either one (grey) or both (white). See Fig 10 for other results for these subjects. The same trees, but with leaves labeled with gene names, are shown in S8 Fig.

More »

Fig 12 Expand

Fig 13.

Mutation accumulation plots showing the relationship between the mutation probability at position 55 across all sequences aligning closest to IGHV4-39*06 (y-axis), and the number of mutations in the entire observed V sequence (x-axis) for three simple, hypothetical BCR repertoires.

In the top row are two repertoires that consist of a single allele: where this allele is known (left), and where it is unknown, but separated by seven SNPs from a known allele (right). In a more typical case, given the relative completeness of the standard germline sets, we would observe a mixture of sequences from the known and unknown alleles (bottom). This is equivalent to the (shifted) superposition of the two plots in the top row.

More »

Fig 13 Expand

Fig 14.

Example one-piece (green) and two-piece (red) fits for positions without (top row) and with (bottom row) evidence for new alleles.

The left and right plots in the top row show the difference between positions with low and high mutability (cold and hot spots). The bottom row shows a position with evidence for a new allele with Nsnp equal to two (left) and a similar plot for Nsnp equals five (right). Note that both one-piece and two-piece models fit well in the top row, whereas in the bottom row only the two-piece model provides an adequate fit.

More »

Fig 14 Expand