Skip to main content
Advertisement

< Back to Article

Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures

Figure 2

Matching stems and matching branches.

We give an explanation based on a suffix trie which is equivalent to the suffix interval tree shown in Fig. 5 (see Methods). The suffix trie for S$ with S: = acttcttcggc (left) holds twelve leaves. Each numbered leaf corresponds to exactly one suffix in S. Nodes with only one child are not explicitly shown. Note, that internal nodes implicitly represent all leafs in their respective subtree. Thus, internal nodes can be regarded as sets of suffixes. The right panel holds the longest matches for different matching paths in the trie. Matching the first three suffixes of the read P: = cgtcggc results in three different paths in the suffix trie. Each path is equivalent to a sequence of suffix intervals, a matching stem, in the enhanced suffix array. Let denote the matching stem for Pi = ith suffix of P. The qth interval in , denoted by , implicitly represents the set of suffixes in S matching P[ii+q−1]. The path for the first suffix P0 is of length two (green solid line). Hence, the equivalent matching stem is a sequence of three intervals: , and . Since only represents the suffix S7, the longest prefix match of P0 is of length 2 occurring at position 7 of the reference sequence (right panel). The matching stem for P1 (red solid line) ends with . Therefore, matches of length one occur at positions 8 and 9 in S. The longest prefix match for P3 occurs at position 6 of S (dashed orange line). Note, that the intervals of equivalently represent S6. An alternative path leads to a match with position 4. The branch denotes the alternative that accepts the mismatch of g and t at position 1 of P0.

Figure 2

doi: https://doi.org/10.1371/journal.pcbi.1000502.g002