Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies
A. Thresholded sequence similarity networks represent sequences as nodes (circles) and all pairwise sequence relationships (alignments) better than a threshold as edges (lines). The same network, depicting three simulated protein classes, is shown here at four different thresholds. At stringent thresholds, the sequences break up into disconnected groups; within each group the sequences are highly similar. The relative positioning of disconnected groups has no meaning, while the lengths of connecting edges tend to correlate with the relative dissimilarities of each pair of sequences. As the threshold is relaxed and edges associated with less significant relationships are added to the network, groups merge together and eventually become completely interconnected. B. Simulated dendrogram for a sequence set that might give rise to the network in A.