Engineering indel and substitution variants of diverse and ancient enzymes using Graphical Representation of Ancestral Sequence Predictions (GRASP)
Fig 4
A, Phylogenetic trees of the smallest and largest DHAD data sets after producing 14 randomly sampled data sets in 500 sequence increments, added to our base data set of 1,612 sequences and reaching a maximum size of 9,112 sequences.
B, Heat maps of the fractional distances between ancestor sequences generated from different DHAD data set sizes, representing the same (principal) three branch points. C, Phylogenetic trees of the smallest and largest data sets after increasing CYP2U sequences via addition of homologous subfamilies, starting with 165 CYP2U sequences then growing to 359 sequences and reaching a maximum of 595 sequences via addition of sequences from CYP2R and CYP2D, respectively. D, Heat maps of the fractional distances between ancestor sequences resulting from different CYP2U data set sizes representing the same two branch points. Ancestors from the N4/N5 equivalent branch points across the three data set sizes had 98% identity, which cannot be discerned visually. E, Heat map of the average fractional distance of 50 randomly selected ancestors between the KARI I data sets, ranging from 1,176 to 11,756 sequences.