Faster Pattern Matching under Edit Distance

Charalampopoulos, Panagiotis; Kociumaka, Tomasz; Wellnitz, Philip

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Paper

Faster Pattern Matching under Edit Distance

MPS-Authors

/persons/resource/persons284929

Kociumaka, Tomasz
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons229250

Wellnitz, Philip
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:2204.03087.pdf
(Preprint), 2MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Charalampopoulos, P., Kociumaka, T., & Wellnitz, P. (2022). Faster Pattern Matching under Edit Distance. Retrieved from https://arxiv.org/abs/2204.03087.

Cite as: https://hdl.handle.net/21.11116/0000-000C-1EC9-1

Abstract

We consider the approximate pattern matching problem under the edit distance.
Given a text $T$ of length $n$, a pattern $P$ of length $m$, and a threshold
$k$, the task is to find the starting positions of all substrings of $T$ that
can be transformed to $P$ with at most $k$ edits. More than 20 years ago, Cole
and Hariharan [SODA'98, J. Comput.'02] gave an $\mathcal{O}(n+k^4 \cdot n/
m)$-time algorithm for this classic problem, and this runtime has not been
improved since.
Here, we present an algorithm that runs in time $\mathcal{O}(n+k^{3.5}
\sqrt{\log m \log k} \cdot n/m)$, thus breaking through this long-standing
barrier. In the case where $n^{1/4+\varepsilon} \leq k \leq
n^{2/5-\varepsilon}$ for some arbitrarily small positive constant
$\varepsilon$, our algorithm improves over the state-of-the-art by polynomial
factors: it is polynomially faster than both the algorithm of Cole and
Hariharan and the classic $\mathcal{O}(kn)$-time algorithm of Landau and
Vishkin [STOC'86, J. Algorithms'89].
We observe that the bottleneck case of the alternative $\mathcal{O}(n+k^4
\cdot n/m)$-time algorithm of Charalampopoulos, Kociumaka, and Wellnitz
[FOCS'20] is when the text and the pattern are (almost) periodic. Our new
algorithm reduces this case to a new dynamic problem (Dynamic Puzzle Matching),
which we solve by building on tools developed by Tiskin [SODA'10,
Algorithmica'15] for the so-called seaweed monoid of permutation matrices. Our
algorithm relies only on a small set of primitive operations on strings and
thus also applies to the fully-compressed setting (where text and pattern are
given as straight-line programs) and to the dynamic setting (where we maintain
a collection of strings under creation, splitting, and concatenation),
improving over the state of the art.