# Item

ITEM ACTIONSEXPORT

Released

Paper

#### Faster Pattern Matching under Edit Distance

##### External Resource

No external resources are shared

##### Fulltext (restricted access)

There are currently no full texts shared for your IP range.

##### Fulltext (public)

arXiv:2204.03087.pdf

(Preprint), 2MB

##### Supplementary Material (public)

There is no public supplementary material available

##### Citation

Charalampopoulos, P., Kociumaka, T., & Wellnitz, P. (2022). Faster Pattern Matching under Edit Distance. Retrieved from https://arxiv.org/abs/2204.03087.

Cite as: https://hdl.handle.net/21.11116/0000-000C-1EC9-1

##### Abstract

We consider the approximate pattern matching problem under the edit distance.

Given a text $T$ of length $n$, a pattern $P$ of length $m$, and a threshold

$k$, the task is to find the starting positions of all substrings of $T$ that

can be transformed to $P$ with at most $k$ edits. More than 20 years ago, Cole

and Hariharan [SODA'98, J. Comput.'02] gave an $\mathcal{O}(n+k^4 \cdot n/

m)$-time algorithm for this classic problem, and this runtime has not been

improved since.

Here, we present an algorithm that runs in time $\mathcal{O}(n+k^{3.5}

\sqrt{\log m \log k} \cdot n/m)$, thus breaking through this long-standing

barrier. In the case where $n^{1/4+\varepsilon} \leq k \leq

n^{2/5-\varepsilon}$ for some arbitrarily small positive constant

$\varepsilon$, our algorithm improves over the state-of-the-art by polynomial

factors: it is polynomially faster than both the algorithm of Cole and

Hariharan and the classic $\mathcal{O}(kn)$-time algorithm of Landau and

Vishkin [STOC'86, J. Algorithms'89].

We observe that the bottleneck case of the alternative $\mathcal{O}(n+k^4

\cdot n/m)$-time algorithm of Charalampopoulos, Kociumaka, and Wellnitz

[FOCS'20] is when the text and the pattern are (almost) periodic. Our new

algorithm reduces this case to a new dynamic problem (Dynamic Puzzle Matching),

which we solve by building on tools developed by Tiskin [SODA'10,

Algorithmica'15] for the so-called seaweed monoid of permutation matrices. Our

algorithm relies only on a small set of primitive operations on strings and

thus also applies to the fully-compressed setting (where text and pattern are

given as straight-line programs) and to the dynamic setting (where we maintain

a collection of strings under creation, splitting, and concatenation),

improving over the state of the art.

Given a text $T$ of length $n$, a pattern $P$ of length $m$, and a threshold

$k$, the task is to find the starting positions of all substrings of $T$ that

can be transformed to $P$ with at most $k$ edits. More than 20 years ago, Cole

and Hariharan [SODA'98, J. Comput.'02] gave an $\mathcal{O}(n+k^4 \cdot n/

m)$-time algorithm for this classic problem, and this runtime has not been

improved since.

Here, we present an algorithm that runs in time $\mathcal{O}(n+k^{3.5}

\sqrt{\log m \log k} \cdot n/m)$, thus breaking through this long-standing

barrier. In the case where $n^{1/4+\varepsilon} \leq k \leq

n^{2/5-\varepsilon}$ for some arbitrarily small positive constant

$\varepsilon$, our algorithm improves over the state-of-the-art by polynomial

factors: it is polynomially faster than both the algorithm of Cole and

Hariharan and the classic $\mathcal{O}(kn)$-time algorithm of Landau and

Vishkin [STOC'86, J. Algorithms'89].

We observe that the bottleneck case of the alternative $\mathcal{O}(n+k^4

\cdot n/m)$-time algorithm of Charalampopoulos, Kociumaka, and Wellnitz

[FOCS'20] is when the text and the pattern are (almost) periodic. Our new

algorithm reduces this case to a new dynamic problem (Dynamic Puzzle Matching),

which we solve by building on tools developed by Tiskin [SODA'10,

Algorithmica'15] for the so-called seaweed monoid of permutation matrices. Our

algorithm relies only on a small set of primitive operations on strings and

thus also applies to the fully-compressed setting (where text and pattern are

given as straight-line programs) and to the dynamic setting (where we maintain

a collection of strings under creation, splitting, and concatenation),

improving over the state of the art.