KLEE: A Framework for Distributed Top-k Query Algorithms

Michel, Sebastian; Triantafillou, Peter; Weikum, Gerhard; Böhm, Klemens; Jensen, Christian S.; Haas, Laura M.; Kersten, Martin L.; Larson, Per-{\AA}ke; Ooi, Beng Chin

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

KLEE: A Framework for Distributed Top-k Query Algorithms

MPS-Authors

/persons/resource/persons45041

Michel, Sebastian
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45636

Triantafillou, Peter
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Michel, S., Triantafillou, P., & Weikum, G. (2005). KLEE: A Framework for Distributed Top-k Query Algorithms. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005) (pp. 637-648). New York, USA: ACM.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-26E4-9

Abstract

This paper addresses the efficient processing of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We present KLEE, a novel algorithmic framework for distributed top-k queries, designed for high performance and flexibility. KLEE makes a strong case for approximate top-k algorithms over widely distributed data sources. It shows how great gains in efficiency can be enjoyed at low result-quality penalties. Further, KLEE affords the query-initiating peer the flexibility to trade-off result quality and expected performance and to trade-off the number of communication phases engaged during query execution versus network bandwidth performance. We have implemented KLEE and related algorithms and conducted a comprehensive performance evaluation. Our evaluation employed real-world and synthetic large, web-data collections, and query benchmarks. Our experimental results show that KLEE can achieve major performance gains in terms of network bandwidth, query response times, and much lighter peer loads, all with small errors in result precision and other result-quality measures.