Top-k Query Evaluation with Probabilistic Guarantees

Theobald, Martin; Weikum, Gerhard; Schenkel, Ralf; Nascimento, Mario A.; Özsu, M. Tamer; Kossmann, Donald; Miller, Renée J.; Blakeley, José A.; Schiefer, K. Bernhard

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Top-k Query Evaluation with Probabilistic Guarantees

MPS-Authors

/persons/resource/persons45609

Theobald, Martin
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel, Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Theobald, M., Weikum, G., & Schenkel, R. (2004). Top-k Query Evaluation with Probabilistic Guarantees. In Proceedings 2004 VLDB Conference: The 30th International Conference on Very Large Databases (VLDB) (pp. 648-659). St. Louis, USA: Morgan Kaufmann.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-29E8-B

Abstract

Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algo-rithm for evaluating top-k queries is Fagin’s threshold algorithm (TA). Since the user’s goal behind top-k queries is to identify one or a few relevant and novel data items, it is intriguing to use approximative variants of TA to reduce run-time costs. This paper introduces a family of approximative top-k algorithms based on probabilistic arguments. When scanning index lists of the underlying multidimensional data space in descending order of local scores, various forms of convolution and derived bounds are employed to predict when it is safe, with high probability, to drop candidate items and to prune the index scans. The precision and the efficiency of the developed methods are experimentally evaluated based on a large Web corpus and a structured data collection.