Discovering and Exploiting Keyword and Attribute-Value Co-occurrences to 
Improve P2P Routing Indices

Michel, Sebastian; Bender, Matthias; Ntarmos, Nikos; Triantafillou, Peter; Weikum, Gerhard; Zimmer, Christian; Yu, Philip S.; Tsotras, Vassilis J.; Fox, Edward A.; Liu, Bing

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Discovering and Exploiting Keyword and Attribute-Value Co-occurrences to Improve P2P Routing Indices

MPS-Authors

/persons/resource/persons45041

Michel, Sebastian
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44113

Bender, Matthias
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45636

Triantafillou, Peter
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45808

Zimmer, Christian
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Michel, S., Bender, M., Ntarmos, N., Triantafillou, P., Weikum, G., & Zimmer, C. (2006). Discovering and Exploiting Keyword and Attribute-Value Co-occurrences to Improve P2P Routing Indices. In ACM 15th Conference on Information and Knowledge Management (CIKM2006) (pp. 172-181). New York, USA: ACM.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-2293-1

Abstract

Peer-to-Peer (P2P) search requires intelligent decisions for {\em query routing}: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality. This paper develops and evaluates two solutions to this problem, {\em sk-STAT} based on single-key statistics only, and {\em mk-STAT} based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.