Using Structure to Explore the Sequence Alignment Space of Remote Homologs

doi:10.1371/journal.pcbi.1002175

Using Structure to Explore the Sequence Alignment Space of Remote Homologs

Figure 1

The S4 algorithm.

An alignment matrix is depicted with the template sequence and its SSEs on the horizontal axis and the query sequence on the vertical. (1) The algorithm begins by finding high-scoring primary fragments (black, see text for a definition of high-scoring), one primary fragment for each template SSE (not all shown here). (2) To fill in the gaps between primary fragments (such as PF1 and PF2), “secondary” fragments (gray) are identified. Secondary fragments are chosen based on different criteria: if they are in an SSE that neighbors a primary fragment and on a similar diagonal (Adjacent); if they satisfy alignment rules, such as filling a gap in a β-sheet, (Core, see Materials and Methods); or simply being high-scoring (Score). (3) Starting at the N-terminus, the algorithm enumerates all connections to downstream primary and secondary fragments, resulting in a large ensemble of “fragment alignments”. Alignment rules are tested (see Materials and Methods) whenever any fragment is added to an alignment. (4) The number of fragment alignments is reduced by filtering with thresholds based on statistical energies, core contacts and a redundancy measure (see Materials and Methods). (5) To generate a final global alignment from a set of fragments (e.g. the green line, a boundary is defined around each remaining fragment alignment (dashed lines) within which the traditional a DP-based suboptimal algorithm is used to find an ensemble of full alignments. DFIRE then selects the alignment with the lowest/best energy to represent the set of fragments. (6) The process continues until it has returned the top N alignments, ranked by their residue similarity score.

doi: https://doi.org/10.1371/journal.pcbi.1002175.g001