Figure 1.
Connecting two counterpart regions by shared synonyms of two protein sequences.
The words YIAKQRQ in protein S and VKALPDA in protein T share two synonyms which are extracted from their similar sequences.
Figure 2.
The algorithm of SymAlign. We use PSI-BLAST to collect a group of similar sequences for the targets from which we define synonyms.
Similarity scores are estimated based on the shared synonyms. A library of all alignable residue pairs is made and fed into T-Coffee for generating a sequence alignment.
Table 1.
Comparison with existing methods on pairwise alignments.
Table 2.
Comparison with existing methods on multiple alignments and outliers.
Table 3.
The comparison results of identifying structural similarity on RV11.
Table 4.
The comparison results of identifying structural similarity on PREFAB.
Table 5.
The proportions of positive cases both identified by TM-align and SymAlign to those only identified by TM-align with respect to different thresholds.
Figure 3.
The dot matrix generated by SymAlign for proteins BB11002.1bb9 and BB11002.1ov3_A in RV11.
A grayscaled dot represents the number of shared synonyms corresponding to a residue pair. We turn a grayscaled dot into a red-scaled one if the corresponding residue pair is annotated as an equivalent pair in the reference alignment. As one can see, the left side of the matrix shows an alternative alignment with a pattern very similar to the reference alignment.