Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: A survey of mapping algorithms in the long-reads era

Fig. 4

Illustration of strobemers’ capacity to handle indels. As in Fig. 3, two sequences are presented. This time, \(s_2\) has an insertion (pink G). On the left panel, minimizers are selected using \(w=2, k=5\). Blue stars point selected minimizers in each blue window. One can see that the only safe region to generate minimizer is the CGGTT sequence after the insertion, that is shared and of length \(\ge k\). Put differently, k-mers in red have no chance to be in common between the two sequences. However, in this example, the scheme fails to select a common minimizer in the safe region. Strobemer selection is presented in the right panel, using \(k=2,s=2,w=2\). At each position, the first k-mer is selected to be the start site of the strobemer. Then, in the non-overlapping window (of size w) downstream to the first k-mer, a second k-mer is selected according to one of the selection techniques presented in [84] (we illustrate selecting the lexicographical minimizer). We underline the bases that are kept for each strobemer. For instance in \(s_1\), the first k-mer is CG at positions 0 and 1, then the next window starts at position 2. Two k-mers are computed from this window, AC and CG, and AC is the minimizer. Therefore, the strobemer is (CG,AC). Again, strobemers with no chance to be shared between \(s_1\) and \(s_2\) are colored in red. For strobemers, it is the case when at least one part contains the mutated base. We note that not only the CGGTT region has a common strobemer (CG,GT) in both sequences, but also that the scheme allowed to “jump over” the mutated G and could select another common strobemer (GA,CG) in a more difficult region. The strobemers in this example consists of two k-mers (\(s=2\)) but they can be constructed for other \(s>2\)

Back to article page