Fig 1.
Hypothetical phylogenetic trees used to illustrate tree/trait association statistics.
Trait values at internal nodes of the tree are inferred using maximum parsimony given the trait values at the tips, which are shown using different colors and shapes. Two example state changes are highlighted. (left) No association between tip-trait values and tree: Distribution of traits across this tree is indistinguishable from randomly distributed traits by any statistic used. (middle) Tip-trait values clustered in tree: Cells with the same trait value are more closely related to each other in the tree, which will yield significantly fewer switches than in the same tree with permuted tips, and therefore a significantly low PS statistic. (right) Asymmetric ancestor/descendant relationships among trait values: All switches in this tree are from A to B (SP = 1), significantly higher than expected from the same tree with permuted tips. This indicates an asymmetric relationship between these states along the tree.
Fig 2.
Distribution of SP test p values from A to B from two state simulation analyses in which state change between state A and B was determined by the probability of starting in A (πa), relative rate of migrating from A to B (rab), and the average rate of state change (r).
To the left of each plot, possible starting states are circled, relative rates are shown by arrowhead size. (a) πa = 0.5, rab = 1, fully unbiased state change, shows roughly uniform distribution of p values at all tested rates. (b) πa = 0.5, rab = 10 shows low p values at rates < 50. (c) πa = 1, rab = 1 shows low p values at low rates (10) but not at higher rates. (d) πa = 1, and rab = 10 shows low p values at rates < 100. (e) πA = 0.5, rab = 1 shows low p values at rate < 50 if 50% of A sequences are discarded. Compared to (a), this shows that p values are sensitive to biased sampling of sequences. Red lines show the cutoff of p value = 0.05.
Fig 3.
Distribution of SP test p values from four state simulation analyses under multiple modes of evolution diagrammed to the left of each plot.
Twenty repetitions were performed in each scenario. In simulations, possible starting states are circled and possible state changes are shown with arrows. All allowed state changes occurred at the same relative rate and the total rate of state change (r) was 10 changes/mutation/site (see Fig 2). (a) Permuting trait values among trees reveals low p values for all state changes between A and B, and between C and D. (b) Permuting within each tree reveals low p values from C to D, but not between A and B. Both a and b imposed no constraints on the types of state changes allowed in the maximum parsimony algorithm. (c) Direct switching simulations result in low p values from A to D, but not from other states to D. (d) Sequential switching simulations result in low p values from B to D but not from other states to D. (e) Irreversible switching simulations result in low p values from B and C to D, but not from A. (f) Unconstrained switching simulations also result in low p values from B and C to D, but not from A. The strange results of e and f are likely artefacts of the constrained parsimony algorithm, which forbids reverse alphabetical state changes (e.g. D to C), used to count state changes in simulations c-d.
Fig 4.
Controlling false positive rates using down-sampling.
Y axis shows the proportion of 50 simulation replicates in which the SP test from A to B was significant. Simulations were performed on large ladder phylogenies (Methods) at varying rates, biases, and trees per repetition (representing lineages within a simulated repertoire). In each case, simulated lineages were down-sampled to the specified maximum tip-to-state change ratio. (a) Unbiased simulations, significant SP test values indicate false positives. These show that the SP test has a high rate of false positives when the rate of state change is low (i.e. 1 change/mutation/site) and the maximum tip-to-state change ratio is high (> 20). Down-sampling lineages to a maximum tip-to-state change ratio of 10–20, however, controls this false positive rate. (b) Simulations with rates biased from A to B, in which significant SP test values indicate true positives. These results show that down-sampling does not simply reduce power. However, power is lowered when the switching rate is high (>100) or the number of trees/repertoire is low (< 20).
Fig 5.
Analysis of B cell subtypes in three HIV+ subjects.
(a) Example tree visualized using ggtree [38,39] showing observed relationship between CD19hi MBCs and GCBCs. (b-d) Direction of significant SP test δ values for subjects 1 (b), 2 (c), and 3 (d). Arrows within each diagram show the direction of significantly high (blue) or significantly low (red) SP statistics between CD19hi MBCs, CD19lo MBCs, unswitched MBCs (Unsw), and GCBCs in each subject.
Fig 6.
Analysis of antibody isotypes from a single subject.
(a) Example trees visualized using ggtree [38,39] showing observed relationships between cells expressing BCRs with IgA1 and IgE, as well as IgG1 and IgG4 isotypes. IgE and IgG4 are indicated on each tree using larger tip circles. (b) Distribution of SP test δ values to IgE from each of the other isotypes (different colors), and distribution of SP test δ values to IgG4 from each of the other isotypes (different colors). State changes in b were calculated using constrained parsimony which forbids state changes that violate the geometry of the Ig heavy chain locus, and SP tests were performed using permutation among trees.