Index Maintenance for Time-Travel Text Search

Anand, Avishek; Bedathur, Srikanta; Berberich, Klaus; Schenkel, Ralf

doi:10.1145/2348283.2348318

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

Index Maintenance for Time-Travel Text Search

MPG-Autoren

/persons/resource/persons44012

Anand, Avishek
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

/persons/resource/persons44104

Bedathur, Srikanta
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44119

Berberich, Klaus
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel, Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Anand, A., Bedathur, S., Berberich, K., & Schenkel, R. (2012). Index Maintenance for Time-Travel Text Search. In J. Callan, W. Hersh, Y. Maarek, & M. Sanderson (Eds.), SIGIR'12 (pp. 235-244). New York, NY: ACM.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0014-59CF-5

Zusammenfassung

Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to effciently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive. In this work, we describe a novel index structure that effciently supports time-travel text search and can be maintained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organization, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that maintaining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.