Help Privacy Policy Disclaimer
  Advanced SearchBrowse





Temporal Search in Web Archives


Berberich,  Klaus
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

(Any fulltext), 4MB

Supplementary Material (public)
There is no public supplementary material available

Berberich, K. (2010). Temporal Search in Web Archives. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-25996.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-1456-9
Web archives include both archives of contents originally published on
the Web (e.g., the Internet Archive) but also archives of contents
published long ago that are now accessible on the Web (e.g., the
archive of The Times). Thanks to the increased awareness that web-born
contents are worth preserving and to improved digitization techniques,
web archives have grown in number and size. To unfold their full
potential, search techniques are needed that consider their inherent
special characteristics.

This work addresses three important problems toward this objective and
makes the following contributions:

* We present the Time-Travel Inverted indeX (TTIX) as an efficient
solution to time-travel text search in web archives, allowing users to
search only the parts of the web archive that existed at a user's time
of interest.

* To counter negative effects that terminology evolution has on the
quality of search results in web archives, we propose a novel
query-reformulation technique, so that old but highly relevant
documents are retrieved in response to today's queries.

* For temporal information needs, for which the user is best satisfied
by documents that refer to particular times, we describe a retrieval
model that integrates temporal expressions (e.g., ``in the 1990s'')
seamlessly into a language modeling approach.

Experiments for each of the proposed methods show their efficiency and
effectiveness, respectively, and demonstrate the viability of our
approach to search in web archives.