User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse




Conference Paper

Computing the threshold for q-gram filters


Kärkkäinen,  Juha
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available

Kärkkäinen, J. (2002). Computing the threshold for q-gram filters. In Algorithm theory, SWAT 2002: 8th Scandinavian Workshop on Algorithm Theory (pp. 348-357). Berlin, Germany: Springer.

Cite as: http://hdl.handle.net/11858/00-001M-0000-000F-2F48-7
A popular and much studied class of filters for approximate string matching is based on finding common $q$-grams, substrings of length $q$, between the pattern and the text. A variation of the basic idea uses \emph{gapped} $q$-grams and has been recently shown to provide significant improvements in practice. A major difficulty with gapped $q$-gram filters is the computation of the so-called \emph{threshold} which defines the filter criterium. We describe the first general method for computing the threshold for $q$-gram filters. The method is based on a carefully chosen precise statement of the problem which is then transformed into a constrained shortest path problem. In its generic form the method leaves certain parts open but is applicable to a large variety of $q$-gram filters and may be extensible even to other classes of filters. We also give a full algorithm for a specific subclass. For this subclass, the algorithm has been implemented and used succesfully in an experimental comparison.