Parameterized syncmer schemes improve long-read mapping
Fig 2
The selected positions of three different selection schemes S1, S2 and S3 on the same sequence. Selected k-mers are highlighted and underlined. All schemes have the same number of selected k-mers, but the metrics are different. S1: ℓ = 0.529, ℓ2 = 2.974. S2: ℓ = 0.529, ℓ2 = 1.81. S3: ℓ = 0.647, ℓ2 = 2.808. While S1 and S2 have the same ℓ value, the k-mers selected by S2 are more evenly spread and thus S2 has much lower ℓ2. Some of the k-mers selected by S3 overlap, resulting in a higher ℓ value than the other schemes. However, because the gaps between covered bases are more evenly spread, the ℓ2 value is lower than that of S1. Intuitively, it will be easier to map reads using seeds selected by S3 than S1 despite the higher ℓ value, suggesting that ℓ2 is a more appropriate metric.