TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

Theobald, Martin; Bast, Holger; Majumdar, Debapriyo; Schenkel, Ralf; Weikum, Gerhard

doi:10.1007/s00778-007-0072-z

Local TagsRelease HistoryDetailsSummary

TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

Theobald, M., Bast, H., Majumdar, D., Schenkel, R., & Weikum, G. (2008). TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. VLDB Journal, 17(1), 81-115. doi:10.1007/s00778-007-0072-z.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-1D3B-3 Version Permalink: https://hdl.handle.net/11858/00-001M-0000-0028-EF14-2

Genre: Journal Article

Latex : Top{X}: Efficient and Versatile Top-k Query Processing for Semistructured Data

Files

show Files

Locators

show

Creators

show

hide

Creators:
Theobald, Martin¹, Author
Bast, Holger², Author
Majumdar, Debapriyo², Author
Schenkel, Ralf¹, Author
Weikum, Gerhard¹, Author

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019

Content

show

hide

Free keywords: -

Abstract: Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-$k$ retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the $k$ top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dy\-namic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: 1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, 2) efficient and effective top-$k$ query processing for semistructured data, 3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and \linebreak query expansion, and 4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.

Details

show

hide

Language(s): eng - English

Dates: Modified: 2009-03-20Published Online: 2008Date issued: 2008

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: Peer

Identifiers: eDoc: 428235
DOI: 10.1007/s00778-007-0072-z
URI: http://dx.doi.org/10.1007/s00778-007-0072-z
Other: Local-ID: C125756E0038A185-796B8D374422B2ACC125730500386AC0-TheobaldBMSW_VLDBJ

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: VLDB Journal

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Berlin : Springer

Pages: - Volume / Issue: 17 (1) Sequence Number: - Start / End Page: 81 - 115 Identifier: ISSN: 1066-8888