Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations
Fig 2
A. Flow chart showing the steps required to create and interpret a hiMSA (as described in text). B. Schematic of a BPPS-generated “contrast alignment” that corresponds to node 8 of the hierarchy in (A). One such contrast alignment is created for each node in the hierarchy. Sequences assigned to node 8’s subtree (blue nodes in (A)) constitute a ‘foreground’ partition, those assigned to the most closely related nodes (red nodes in (A)) constitute a ‘background’ partition, and the remaining sequences constitute a non-participating partition. Horizontal bars represent sequences assigned to the similarly-colored corresponding nodes in (A). Blue vertical bars represent conserved foreground residue patterns (as shown below each bar); these diverge from (or contrast with) the background compositions at those positions (white vertical bars). Red vertical bars above the alignment quantify the degree of divergence. C. BPPS sampling explores the space of domain hierarchies by attaching or removing leaf nodes, moving subtrees, inserting or deleting internal nodes, moving sequences between nodes and, for each subtree, adding or deleting residue patterns based on how well they discriminate the foreground from the background (as shown in (B)). D. Schematic diagram of a hiMSA from the perspective of a leaf node. One such diagram could be created for each node in a hierarchy. (center) The node 6 lineage of the full hiMSA. Horizontal lines represent aligned sequences and are color-coded by level in the hierarchy. Thin light gray horizontal lines represent non-homologous and deleted regions. Vertical lines represent the contrasting pattern positions upon which the hierarchy is based and are similarly color-coded by levels. (left & right sides) Subtrees corresponding to each level. The colored, gray and white nodes in each tree correspond, respectively, to their alignment foreground, background and non-participating partitions, the sequences of which are colored similarly. The background for the entire superfamily (lower right) consists of random sequences.