Skip to main content
Advertisement

< Back to Article

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations

Fig 3

Hierarchy and key features of the acetylase superfamily.

A. The acetylase hierarchy identified by the sampler. For clarity smaller subtrees not discussed in the text have been omitted; the complete hierarchy is given in S1.3 Fig. Purple nodes are not discussed in the text. B. Root node “contrast alignment” highlighting conserved patterns most characteristic of acetylases as a whole. Shown are six representative sequences assigned to node-13 of the acetylase hierarchy in (A). These sequences correspond to an uncharacterized prokaryotic acetylase family that conserves all of the root node canonical residues. The sequences are labeled by their bacterial phyla except for the first (proteobacterial) sequence, the structure of which is shown in (C). Below the representative alignment is a summary of the most conserved amino acid residues at each position; the number of sequences (assigned to the foreground) is given in parentheses on the first line. The 1st to 3rd lines show up to three residues at each position that occur both most frequently and in ≥10% of the sequences. Directly below this, the frequencies of the designated residues are given in integer tenths; for example, an ‘8’ indicates that 80–90% of the sequences in the foreground alignment match the corresponding pattern residue. In column 88, for example, glycine occurs in 60–70% and alanine in 20–30% of the sequences. To highlight larger integers ‘5’ and ‘6’ are shown in black and ‘7’-‘9’ in red. The first of these lines (labeled as “wt_res_freqs” for “weighted residue frequencies”) reports the effective number of aligned sequences. In all of these cases, reported frequencies have been down-weighting for redundancy. The black dots above the alignment indicate the pattern positions that were identified by the sampler. Pattern-matching (correlated) residues are highlighted in color, with biochemically similar residues colored similarly. For example, acidic residues are shown in red, basic residue in cyan and hydrophobic residues in yellow; histidine, glycine and proline are each assigned a unique color. The height of the red bars above the alignment quantify (using a semi-logarithmic scale) the degree to which residue frequencies in the foreground diverge at each position from the corresponding positions in the background. In this case, the foreground corresponds to the root node, that is, to the entire tree and thus to all acetylases, and the background corresponds to all proteins unrelated to acetylases, which is represented by standard amino acid residue frequencies. C. The acetylase fold with canonical residues most characteristic of the superfamily. The structure show is that of an E. coli putative N-acetyltransferase assigned to node 13; the corresponding sequence to the first aligned in (B) (pdb_id: 2kcw). D-F. Residue positions likely responsible for acetylase functional specificity. D. Histogram of normalized average ∆-BILD scores over all column positions. Scores were linearly adjusted so that the lowest score is zero and the highest score is 100. Data points with scores greater than 50 are plotted above the histogram and are spread out vertically to avoid overlap. Histogram bars that are more than two standard deviations above the mean are colored red; corresponding data points are color coded (as explained in text) and enlarged to enhance visibility. Numbers next to data points correspond to the positions of the corresponding aligned columns within the main alignment (i.e., the root node alignment) shown in S4 Fig. E. Surface representation of the substrate binding pocket showing the locations of six of the residues in (F), which are color coded and numbered as in (D). See text for further details. F. Locations within the crystal structure of Pseudomonas syringae tabtoxin resistance protein complexed with acyl-CoA (pdb_id: 1gheB)[82] of the nine residues corresponding to the rightmost data points in (D); this protein was assigned to node 104. Residue sidechains are colored as are the data points in (D) and labeled by column positions in the core alignment. In addition, four consensus amino acid residues generally conserved in acetylases are shown in yellow; acyl-CoA is shown in cyan.

Fig 3

doi: https://doi.org/10.1371/journal.pcbi.1005294.g003