非表示:
キーワード:
-
要旨:
A popular and much studied class of filters for approximate string
matching is based on finding common $q$-grams, substrings of length
$q$, between the pattern and the text. A variation of the basic idea uses
\emph{gapped} $q$-grams and has been recently shown to provide
significant improvements in practice. A major difficulty with gapped
$q$-gram filters is the computation of the so-called \emph{threshold}
which defines the filter criterium. We describe the first general
method for computing the threshold for $q$-gram filters. The method is
based on a carefully chosen precise statement of the problem which is
then transformed into a constrained shortest path problem.
In its generic form the method leaves certain parts open
but is applicable to a large variety of $q$-gram filters and may be
extensible even to other classes of filters. We also give a full
algorithm for a specific subclass. For this subclass, the algorithm
has been implemented and used succesfully in an experimental
comparison.