Fig 1.
Outline of the RecPD methodology: Calculating recombination adjusted phylogenetic diversities of feature distributions by employing ancestral state reconstructions.
(A) Tip states are assigned to a given species phylogenetic tree based on presence (teal), or absence (red) of the gene family of interest. (B) Ancestral node states are inferred using one of there ancestral state reconstruction methods and assigned to either presence (teal), absence (red), or split (blue) states, the latter which indicate potential gain/loss events. (C) Branches are assigned to presence states if they join consecutive presence or split nodes/tips (teal), otherwise they are assigned to absence (red). (D) Gene-family lineages are identified, and split state nodes are assigned to gains (teal) or losses (red). Branches descended from the phylogenetic tree root node where no ancestral presence nodes were identified are assigned to absence (grey). RecPD is then calculated as the sum of gene family lineage branch-lengths normalized by the total branch-lengths of the phylogenetic tree.
Table 1.
RecPD and derived measures.
Fig 2.
RecPD results in lower PD estimates compared to recombination-agnostic Faith’s PD and is affected by ancestral state reconstruction approach employed.
(A) RecPD using three ancestral state reconstruction methods (NN, MPR, and ACE) and Faith’s PD distributions binned by gene family prevalence. Results shown correspond to all 1022 possible randomized gene-family distributions mapped onto a randomly generated example tree topology of ten tips. (B) RecPD normalized by Faith’s PD for three ancestral state reconstruction methods binned by gene-family prevalence. Results shown correspond to all 1022 possible randomized gene-family distributions mapped onto a tree of ten tips. (C) Example gene family distribution of prevalence = 5 illustrating differences in inferred evolutionary events using different methods: Faith’s PD = 5 losses, 0 gains (boxed), RecPD(NN) = 2 losses, 4 gains, RecPD(MPR) = 0 losses, 5 gains, and RecPD(ACE) = 5 losses, 0 gains.
Table 2.
Gene family evolutionary history simulations: Parameters used.
Fig 3.
RecPD with nearest-neighbours (NN) ancestral state reconstruction accurately identifies simulated gene family evolutionary histories, while MPR and ACE over- and under- estimate recombination, respectively.
Summary of simulations of gene family evolution comparing actual phylogenetic diversity to estimated diversity using Faith’s PD and RecPD employing three different ancestral reconstruction methods (NN, MPR, and ACE: see Table 1 for parameters used and number of simulations run). (A) Scatterplots of estimated PD values against actual PD values by method. (B) Corresponding density plot distributions. Results are shown for recombination predominant rate regime on randomly generated trees with 10 tips.
Fig 4.
Pairwise gene family evolutionary history correlations using RecPDcor differ in comparison to recombination-agnostic and phylogeny-agnostic approaches.
(A) RecPD correlation (RecPDcor) values for randomized gene family distributions vs. prevalence reveals that the majority of low-prevalence trait distributions have distinct evolutionary histories; (B) RecPDcor of trait distributions substantially differs when compared to 1) Faith’s PD based branch-length weighted Jaccard similarity (recombination-agnostic), and 2) tip presence and absence Jaccard similarity (phylogeny-agnostic) measures: 1) Ignoring recombination results in over-estimation of low-prevalence feature evolutionary history correlations (RecPDcor / Faith’s PD < 1); 2) Ignoring evolutionary history results in under-estimation of intermediate-prevalence feature evolutionary history correlations (tip_jacc / RecPDcor > 1).
Fig 5.
RecPDcor identifies correlated gene-family distributions missed by tip Jaccard similarity.
Top panels–Distributions of two features (black = present, white = absent) arrayed against a species tree. RecPDcor and tip Jaccard similarities both identify correlated and anti-correlated gene families. Bottom panels–Distributions where RecPDcor reveals correlated gene family distributions where tip Jaccard does not, and where tip Jaccard overestimates correlation of gene families with distinct evolutionary histories. Tree topologies are represented as cladograms with branches of equal length; actual branch-lengths are indicated by branch-thickness as indicated in the legend. Branches are assigned to different categories based on the overlap of their RecPD-inferred gene family lineages: Shared—branches where both traits are present (teal); 2) Unique—branches where only a single trait is present (blue); None–branches where both traits are absent (red).
Fig 6.
RecPD reveals significant impact of recombination in the phylogenetic distributions of P. syringae effector families.
(A) Core genome phylogeny of the P. syringae species complex, with internal tree nodes indicating P. syringae phylogroups. The outer ring barplot shows the total number of distinct effector families carried by each strain and coloured according to strain phylogroup designation. (B) Plot of effector family prevalence against RecPD(NN) and Faith PD for all 70 effector families. (C) Effector family RecPD values normalized by Faith’s PD, binned by effector family prevalence. The point size indicates effector family longevity.
Fig 7.
RecPD gene lineage reconstructions reveal significant differences in evolutionary histories between effector families of similar prevalence and phylogenetic diversity.
Example pairs of effector family distributions mapped onto the P. syringae core-genome phylogeny. (A) Effector families HopS and HopAW show similar prevalence (HopS = 114 and HopAW = 116) but different RecPD values (HopS = 0.399 and HopAW = 0.198). (B) Effector families HopH and HopBN show different prevalence HopH = 206 and HopBN = 78) but similar RecPD values (0.16). Tree topologies are represented in a ‘willow tree’ format, with branches set to equal length, and actual branch-lengths indicated by branch-thickness. Branches are coloured according to overlap between RecPD-inferred gene family lineages.
Table 3.
RecPD and associated measures for selected effector families shown in Fig 7.
Fig 8.
nRecPD indicates largely vertical descent of Pseudomonas spp. growth phenotype distributions, revealing clade specific loss-patterns and between-clade recombination.
A heatmap showing presence / absence profiles of 10 growth phenotypes assayed over 10 representative strains of the genera Pseudomonas. Growth phenotype columns are hierarchically clustered and split into 4 major clusters, while species are arranged according to a 16S rRNA phylogenetic tree. Corresponding nRecPD values calculated for each growth phenotype are indicated by the top horizontal annotation strip, with blue indicating vertical descent (nRecPD = 1) and purple indicating signatures of recombination (nRecPD ~ 0.7).