English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Faster Approximate Pattern Matching: A Unified Approach

Charalampopoulos, P., Kociumaka, T., & Wellnitz, P. (2020). Faster Approximate Pattern Matching: A Unified Approach. Retrieved from https://arxiv.org/abs/2004.08350.

Item is

Basic

show hide
Genre: Paper
Latex : Faster Approximate Pattern Matching: {A} Unified Approach

Files

show Files
hide Files
:
arXiv:2004.08350.pdf (Preprint), 2MB
Name:
arXiv:2004.08350.pdf
Description:
File downloaded from arXiv at 2020-12-10 08:25
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Charalampopoulos, Panagiotis1, Author
Kociumaka, Tomasz1, Author
Wellnitz, Philip2, Author                 
Affiliations:
1External Organizations, ou_persistent22              
2Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019              

Content

show
hide
Free keywords: Computer Science, Data Structures and Algorithms, cs.DS
 Abstract: Approximate pattern matching is a natural and well-studied problem on
strings: Given a text $T$, a pattern $P$, and a threshold $k$, find (the
starting positions of) all substrings of $T$ that are at distance at most $k$
from $P$. We consider the two most fundamental string metrics: the Hamming
distance and the edit distance. Under the Hamming distance, we search for
substrings of $T$ that have at most $k$ mismatches with $P$, while under the
edit distance, we search for substrings of $T$ that can be transformed to $P$
with at most $k$ edits.
Exact occurrences of $P$ in $T$ have a very simple structure: If we assume
for simplicity that $|T| \le 3|P|/2$ and trim $T$ so that $P$ occurs both as a
prefix and as a suffix of $T$, then both $P$ and $T$ are periodic with a common
period. However, an analogous characterization for the structure of occurrences
with up to $k$ mismatches was proved only recently by Bringmann et al.
[SODA'19]: Either there are $O(k^2)$ $k$-mismatch occurrences of $P$ in $T$, or
both $P$ and $T$ are at Hamming distance $O(k)$ from strings with a common
period $O(m/k)$. We tighten this characterization by showing that there are
$O(k)$ $k$-mismatch occurrences in the case when the pattern is not
(approximately) periodic, and we lift it to the edit distance setting, where we
tightly bound the number of $k$-edit occurrences by $O(k^2)$ in the
non-periodic case. Our proofs are constructive and let us obtain a unified
framework for approximate pattern matching for both considered distances. We
showcase the generality of our framework with results for the fully-compressed
setting (where $T$ and $P$ are given as a straight-line program) and for the
dynamic setting (where we extend a data structure of Gawrychowski et al.
[SODA'18]).

Details

show
hide
Language(s): eng - English
 Dates: 2020-04-172020-11-162020
 Publication Status: Published online
 Pages: 74 p.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: arXiv: 2004.08350
BibTex Citekey: Charalampopoulos_arXiv2004.08350
URI: https://arxiv.org/abs/2004.08350
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show