English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

Faster Approximate Pattern Matching: A Unified Approach

MPS-Authors
/persons/resource/persons229250

Wellnitz,  Philip
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2004.08350.pdf
(Preprint), 2MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Charalampopoulos, P., Kociumaka, T., & Wellnitz, P. (2020). Faster Approximate Pattern Matching: A Unified Approach. Retrieved from https://arxiv.org/abs/2004.08350.


Cite as: https://hdl.handle.net/21.11116/0000-0007-8C63-C
Abstract
Approximate pattern matching is a natural and well-studied problem on
strings: Given a text $T$, a pattern $P$, and a threshold $k$, find (the
starting positions of) all substrings of $T$ that are at distance at most $k$
from $P$. We consider the two most fundamental string metrics: the Hamming
distance and the edit distance. Under the Hamming distance, we search for
substrings of $T$ that have at most $k$ mismatches with $P$, while under the
edit distance, we search for substrings of $T$ that can be transformed to $P$
with at most $k$ edits.
Exact occurrences of $P$ in $T$ have a very simple structure: If we assume
for simplicity that $|T| \le 3|P|/2$ and trim $T$ so that $P$ occurs both as a
prefix and as a suffix of $T$, then both $P$ and $T$ are periodic with a common
period. However, an analogous characterization for the structure of occurrences
with up to $k$ mismatches was proved only recently by Bringmann et al.
[SODA'19]: Either there are $O(k^2)$ $k$-mismatch occurrences of $P$ in $T$, or
both $P$ and $T$ are at Hamming distance $O(k)$ from strings with a common
period $O(m/k)$. We tighten this characterization by showing that there are
$O(k)$ $k$-mismatch occurrences in the case when the pattern is not
(approximately) periodic, and we lift it to the edit distance setting, where we
tightly bound the number of $k$-edit occurrences by $O(k^2)$ in the
non-periodic case. Our proofs are constructive and let us obtain a unified
framework for approximate pattern matching for both considered distances. We
showcase the generality of our framework with results for the fully-compressed
setting (where $T$ and $P$ are given as a straight-line program) and for the
dynamic setting (where we extend a data structure of Gawrychowski et al.
[SODA'18]).