Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

doi:10.1371/journal.pone.0027872

Figure 1.

Connecting two counterpart regions by shared synonyms of two protein sequences.

The words YIAKQRQ in protein S and VKALPDA in protein T share two synonyms which are extracted from their similar sequences.

More »

Expand

Figure 2.

The algorithm of SymAlign. We use PSI-BLAST to collect a group of similar sequences for the targets from which we define synonyms.

Similarity scores are estimated based on the shared synonyms. A library of all alignable residue pairs is made and fed into T-Coffee for generating a sequence alignment.

More »

Expand

Table 1.

Comparison with existing methods on pairwise alignments.

More »

Expand

Table 2.

Comparison with existing methods on multiple alignments and outliers.

More »

Expand

Table 3.

The comparison results of identifying structural similarity on RV11.

More »

Expand

Table 4.

The comparison results of identifying structural similarity on PREFAB.

More »

Expand

Table 5.

The proportions of positive cases both identified by TM-align and SymAlign to those only identified by TM-align with respect to different thresholds.

More »

Expand

Figure 3.

The dot matrix generated by SymAlign for proteins BB11002.1bb9 and BB11002.1ov3_A in RV11.

A grayscaled dot represents the number of shared synonyms corresponding to a residue pair. We turn a grayscaled dot into a red-scaled one if the corresponding residue pair is annotated as an equivalent pair in the reference alignment. As one can see, the left side of the matrix shows an alternative alignment with a pattern very similar to the reference alignment.

More »

Expand