Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Conference Paper

Computing the threshold for q-gram filters


Kärkkäinen,  Juha
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Kärkkäinen, J. (2002). Computing the threshold for q-gram filters. In Algorithm theory, SWAT 2002: 8th Scandinavian Workshop on Algorithm Theory (pp. 348-357). Berlin, Germany: Springer.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-2F48-7
A popular and much studied class of filters for approximate string matching is based on finding common $q$-grams, substrings of length $q$, between the pattern and the text. A variation of the basic idea uses \emph{gapped} $q$-grams and has been recently shown to provide significant improvements in practice. A major difficulty with gapped $q$-gram filters is the computation of the so-called \emph{threshold} which defines the filter criterium. We describe the first general method for computing the threshold for $q$-gram filters. The method is based on a carefully chosen precise statement of the problem which is then transformed into a constrained shortest path problem. In its generic form the method leaves certain parts open but is applicable to a large variety of $q$-gram filters and may be extensible even to other classes of filters. We also give a full algorithm for a specific subclass. For this subclass, the algorithm has been implemented and used succesfully in an experimental comparison.