hide
Free keywords:
-
Abstract:
Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the
NEXI query language
of the INEX benchmark series reflect the emerging interest in IR-style ranked
retrieval
over semistructured data.
TopX is a top-$k$ retrieval engine for text and semistructured data.
It terminates query execution as soon as it can safely determine
the $k$ top-ranked result elements according to a monotonic score aggregation
function with respect to a multidimensional query.
It efficiently supports vague search on both content- and structure-oriented
query conditions for dy\-namic query relaxation with controllable influence on
the result ranking.
The main contributions of this paper unfold into four main points:
1) fully implemented models and algorithms for ranked XML retrieval with XPath
Full-Text functionality,
2) efficient and effective top-$k$ query processing for semistructured data,
3) support for integrating thesauri and ontologies with statistically
quantified relationships among concepts, leveraged for word-sense
disambiguation and \linebreak query expansion, and
4) a comprehensive description of the TopX system, with performance experiments
on large-scale corpora like TREC Terabyte and INEX Wikipedia.