Fig 1.
Phylogenetic tree showing a reconstruction of fungal GDH and GOx sequences decorated with illustrations of key concepts used within GRASP.
A, Two extant POGs (j indicates extant sequence number) mapped to an ancestral POG. Each extant POG has a single path through strictly ordered sequence positions (i indicates position). Ancestral states are influenced by all sequences, which explains why i = 608 is inferred as glycine, despite glycine not appearing in either sequence j = 359 or 360. B, Three ancestor POGs showing most probable assignments from a joint reconstruction at positions i ∈ {315, …, 327} for nodes N7, N8, and N9. GRASP supports the simultaneous viewing of multiple ancestors from a joint reconstruction, enabling a direct comparison at different time points. C, A single ancestor POG showing inferred marginal distributions at positions i ∈ {243, …, 254} for node N320. For marginal reconstructions, nodes are coloured according to their posterior probabilities and can be queried to view histograms of these underlying distributions, as is done for position i = 244. The marginal reconstruction from (c) was used to reconstruct the inferred ancestor (N320) as well as an alternative ancestor in which a single amino acid (N320_Y244E) was altered based on posterior probabilities from the marginal distribution that resulted in increased thermal stability (S1 Table).
Fig 2.
Results of indel evaluation of each of GRASP’s indel methods across 1) simulated data where the correct alignment and correct tree are supplied and 2) simulated data where a realigned alignment and correct tree are supplied across four indel rates (0.001, 0.005, 0.01, 0.03) and four taxon sizes (100, 250, 500, 750).
A, Number of correct indels identified by each method for each taxon size for each indel rate (N = 5). B, Number of indels uniformly identified by all methods or uniformly missed by all methods for each taxon size for each indel rate (N = 5). C, Number of indels uniquely identified by each indel method at four taxon sizes at indel rate 0.03, organised by indel type and size (N = 5).
Fig 3.
A, Phylogenetic tree showing positions of CYP2U/CYP2R/CYP2D ancestors chosen for synthesis and evaluation.
Coloured boxes indicate the content that was removed from correspondingly coloured ancestral nodes (N51, N5, and N1) and, in the case of the N51 and N5 insertions, preemptively inserted into N2. B, Amino acid sequences surrounding and including the insertion or deletion of the content at each ancestor. Numbers under sequences indicate the position numbers of the start and end columns represented in the alignment. C, Thermal stability assays for each ancestor with and without inserted content. Data are means +/- SEM, N = 2, ns indicates a not significant result. D, E, Activity assays for the substrates luciferin CEE and luciferin ME-EGE for each ancestor with and without inserted content. Membranes from cells expressing only human CPR are included as a negative control. Different symbols indicate different experimental repeats, performed in triplicate, lines indicate mean and standard deviation, and p-values were determined by a two-tailed Student’s t-test. No data points were excluded.
Fig 4.
A, Phylogenetic trees of the smallest and largest DHAD data sets after producing 14 randomly sampled data sets in 500 sequence increments, added to our base data set of 1,612 sequences and reaching a maximum size of 9,112 sequences.
B, Heat maps of the fractional distances between ancestor sequences generated from different DHAD data set sizes, representing the same (principal) three branch points. C, Phylogenetic trees of the smallest and largest data sets after increasing CYP2U sequences via addition of homologous subfamilies, starting with 165 CYP2U sequences then growing to 359 sequences and reaching a maximum of 595 sequences via addition of sequences from CYP2R and CYP2D, respectively. D, Heat maps of the fractional distances between ancestor sequences resulting from different CYP2U data set sizes representing the same two branch points. Ancestors from the N4/N5 equivalent branch points across the three data set sizes had 98% identity, which cannot be discerned visually. E, Heat map of the average fractional distance of 50 randomly selected ancestors between the KARI I data sets, ranging from 1,176 to 11,756 sequences.
Fig 5.
A, Average fractional distance between tools, calculated as pairwise fractional distances for each ancestral prediction for a given tool against all other ancestral predictions of other tools using 5 groups of 336 or 337 sequences, 10 groups of 168 or 169 sequences, or 20 groups of 84 or 85 sequences. Parameter choices are joint (J) vs. marginal (M) reconstruction and fixed vs. variable evolutionary rates (FastML and PAML only). B, Average fractional distance between a better-sampled ancestor inferred by GRASP using 1,682 sequences and each tool / parameter combination at 5, 10, and 20 groups. C, Run times of GRASP and FastML at 5 and 10 groups; PAML was omitted due to long run times. Run times for all tools at 10 and 20 groups are shown in S16 Fig.
Fig 6.
A tree with five extant sequences and ancestors with bi-directional and uni-directional support for different edges shown.
A, Input POG and the matrix E* (recovering all edges under consideration for ancestors). Red squares refer to all edges involving position 5 (occupied by W in all sequences). Non-zero entries in the red squares identify indices for 2 and 4, as well as the end terminus. B, Ancestor POGs (labelled N0-N3) are shown at branch points as inferred by bi-directional edge parsimony. Solid arrows indicate bi-directional support; dashed arrows indicate uni-directional support. For uni-directional edges, the direction of support is shown by the direction of the arrow. The N1 ancestor is annotated to illustrate the relationship between indices in an ancestral matrix and the edges in the resulting ancestor POGs. The (a, b) and (b, a) indices are highlighted for select positions—for example, (4,5) and (5,4) are the forwards support from index 4—5 and backwards support from index 5—4 (green boxes and green edge in N1) showing that the edge from 4—5 is supported in both directions. The edge from 2—5 is supported going forward but not backward (blue boxes and blue edge in N1). Other edges forward from index 2 are maximally parsimonious in the forward direction at N1, such as 2—3 and 2—4, but only the support for 4 is reciprocated (purple boxes and purple edges in N1) recovering a “preferred” path from 2 to 4 to 5 in N1. C, Extant POGs, with position indices.