Deutsch
 
Benutzerhandbuch Datenschutzhinweis Impressum Kontakt
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Konferenzbeitrag

TopX 2.0 at the INEX 2008 Efficiency Track

MPG-Autoren
/persons/resource/persons45609

Theobald,  Martin
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons43965

Abujarour,  Mohammed
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel,  Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Theobald, M., Abujarour, M., & Schenkel, R. (2008). TopX 2.0 at the INEX 2008 Efficiency Track. In S. Geva, J. Kamps, & A. Trotman (Eds.), Advances in Focused Retrieval (pp. 224-236). Berlin: Springer. doi:10.1007/978-3-642-03761-0_23.


Zitierlink: http://hdl.handle.net/11858/00-001M-0000-0019-B639-4
Zusammenfassung
For the INEX Efficiency Track 2008, we were just on time to finish and (for the first time) evaluate our brand-new TopX 2.0 prototype. Complementing our long-running effort on efficient top-k query processing on top of a relational back-end, we now switched to a compressed object-oriented storage for text-centric XML data with direct access to customized inverted files, along with a complete reimplementation of the engine in C++. Core of the new engine is a multiple-nested block-index structure that seamlessly integrates top-kstyle sorted access to large blocks stored as inverted files on disk with in-memory merge-joins for efficient score aggregations. The main challenge in designing this new index structure was to marry no less than three different paradigms in search engine design: 1) sorting blocks in descending order of the maximum element score they contain for threshold-based candidate pruning and top-k-style early termination; 2) sorting elements within each block by their id to support efficient in-memory merge-joins; and 3) encoding both structural and contentrelated information into a single, unified index structure. Our INEX 2008 experiments demonstrate efficiency gains of up to a factor of 30 compared to the previous Java/JDBC-based TopX 1.0 implementation over a relational back-end. TopX 2.0 achieves overall runtimes of less than 51 seconds for the entire batch of 568 Efficiency Track topics in their content-and-structure (CAS) version and less than 29 seconds for the content-only (CO) version, respectively, using a top-15, focused (i.e., non-overlapping) retrieval mode�an average of merely 89 ms per CAS query and 49 ms per CO query.