Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

Figure 5

Crotonase superfamily: sequence similarity network from full-length sequences and from just the domain in common.

The displayed networks all describe the pairwise relationships between 1,170 sequences from the crotonase superfamily. A. Network colored by family annotation, involving full-length sequences, thresholded at an E-value of 1×10−30. The worst edges displayed correspond to a median of 33% identity over alignments of 250 residues. B. The full-length network from A with nodes colored by sequence length and edges colored by alignment length. The same bifunctional enoyl-CoA hydratases (bECH) are marked with a dashed oval in B and C. C. Network colored by family annotation, involving just the crotonase domain, thresholded at 1×10−29. The worst edges displayed correspond to a median of 38% identity over alignments of 180 residues. D. 17 selected edges from the network in A and B. In the left panel, for each pair of sequences participating in an alignment, the log E-value versus the HMM used to define the crotonase domain is shown for each sequence—the single domain ECH (sECH) is on the bottom, and the second member of the pair is on the top—and the log BLAST E-value for the alignment between the two is in the middle. Two example bECH and sECH sequences (not alignments) are shown at the bottom of the left and middle panels. In the middle panel, each amino acid in each sequence from the 17 alignments is colored according to whether it was aligned to the crotonase domain defined by the HMM, and/or was paired to the other sequence in the BLAST alignment used to define an edge. Locations of six of these edges are marked in the enlarged view of the network in A in the right panel. The locations of the example bECH and sECH sequences are marked in the right panel using stars. See Tables S1 and S3 for quantitative comparisons.

