English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Efficient Index-based Snippet Generation

MPS-Authors
/persons/resource/persons44223

Celikik,  Marjan
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44076

Bast,  Holger
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Celikik, M., Bast, H., & Manolache, G. (2009). Efficient Index-based Snippet Generation. ACM Transactions on Information Systems, 32(2): 6, pp. 6:1-6:24. doi:10.1145/2590972.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-1837-2
Abstract
Ranked result lists with query-dependent snippets have become state of the art in text search. They are typically implemented by searching, at query time, for occurrences of the query words in the top-ranked documents. This \emph{document-based} approach has three inherent problems: (i) when a document is indexed by terms which it does not contain literally (e.g., related words or spelling variants), localization of the corresponding snippets becomes problematic; (ii) each query operator (e.g., phrase or proximity search) has to be implemented twice, on the index side in order to compute the correct result set, and on the snippet generation side to generate the appropriate snippets; and (iii) in a worst case, the whole document needs to be scanned for occurrences of the query words, which is problematic for very long documents. We present a new \emph{index-based} method that localizes snippets by information solely computed from the index, and that overcomes all three problems. Unlike previous index-based methods, we show how to achieve this at essentially no extra cost in query processing time, by a technique we call \emph{query rotation}. We also show how our index-based method allows the caching of individual segments instead of complete documents, which enables a significantly larger cache hit ratio as compared to the document-based approach. We have fully integrated our implementation with the CompleteSearch engine.