Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies
A. 605 human Class A: Rhodopsin-like GPCR domains. This sequence set includes the 42 amine-binding sequences from Table II and Fig. 2. This network was thresholded at a BLAST E-value of 1×10−11; the worst edges displayed correspond to a median of 24% identity over an alignment length of 210 amino acids. Sequences colored red for “Class A: Rhodopsin-like” were not classified to a specific subgroup within the class. B. 766 human GPCR domains. This sequence set includes all of the 605 Class A sequences from (A), now colored dark grey. This network was thresholded at an E-value of 1×10−2, and the worst edges displayed correspond to a median of 22% identity over an alignment length of 120 amino acids. Also included in both networks is a set of 17 sequences in light grey. These sequences were used here as negative controls, and were randomly drawn from the human proteome and not annotated as GPCRs. The red and green clan labels correspond to PFAM clans. The sequences that are associated with or that are extremely similar to high resolution structures are noted [PDB identifiers 1F88, 2VT4, 3EML, 2RH1, and 2R4R]. See Table S1 for network statistics.