English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

EverLast: A Distributed Architecture for Preserving the Web

MPS-Authors
/persons/resource/persons44012

Anand,  Avishek
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44104

Bedathur,  Srikanta
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44119

Berberich,  Klaus
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel,  Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45639

Tryfonopoulos,  Christos
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Anand, A., Bedathur, S., Berberich, K., Schenkel, R., & Tryfonopoulos, C. (2009). EverLast: A Distributed Architecture for Preserving the Web. In Proceedings of the Joint Conference on Digital Libraries (pp. 331-340). New York, NY: ACM.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-1910-0
Abstract
The World Wide Web has become a key source of knowledge pertaining to almost every walk of life. Unfortunately, much of data on the Web is highly ephemeral in nature, with more than 50-80% of content estimated to be changing within a short time. Continuing the pioneering efforts of many national (digital) libraries, organizations such as the International Internet Preservation Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have been tirelessly working towards preserving the ever changing Web. However, while these web archiving efforts have paid significant attention towards long term preservation of Web data, they have paid little attention to developing an globalscale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast , a scalable distributed framework for next generation Web archival and temporal text analytics over the archive. Our system is built on a looselycoupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival efforts taken mainly at a national level by national digital libraries. Key features of EverLast include support of time-based text search & analysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results.