English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Mind the Gap: Large-scale Frequent Sequence Mining

Miliaraki, I., Berberich, K., Gemulla, R., & Zoupanos, S. (2013). Mind the Gap: Large-scale Frequent Sequence Mining. In K. Ross, D. Srivastava, D. Papadias, & S. Papadopoulos (Eds.), SIGMOD'13 (pp. 797-808). New York, NY: ACM. doi:10.1145/2463676.2465285.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Miliaraki, Iris1, Author           
Berberich, Klaus1, Author           
Gemulla, Rainer1, Author           
Zoupanos, Spyros1, Author           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Content

show
hide
Free keywords: -
 Abstract: Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are suffciently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this paper, we propose PFSM, a scalable algorithm for frequent sequence mining on MapReduce. PFSM can handle so-called ``gap constraints'', which can be used to limit the output to a controlled set of frequent sequences. At its heart, PFSM partitions the input database in a way that allows us to mine each partition independently using any existing frequent sequence mining algorithm. We introduce the notion of w-equivalency, which is a generalization of the notion of a ``projected database'' used by many frequent pattern mining algorithms. We also present a number of optimization techniques that minimize partition size, and therefore computational and communication costs, while still maintaining correctness. Our extensive experimental study in the context of text mining suggests that PFSM is significantly more efficient and scalable than alternative approaches.

Details

show
hide
Language(s): eng - English
 Dates: 2013-06-222013
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: Miliaraki2013
Other: Local-ID: 086027E8ABA46DC6C1257B0F003D8C96-Miliaraki2013
DOI: 10.1145/2463676.2465285
 Degree: -

Event

show
hide
Title: ACM SIGMOD International Conference on Management of Data
Place of Event: New York, NY, USA
Start-/End Date: 2013-06-22 - 2013-06-27

Legal Case

show

Project information

show

Source 1

show
hide
Title: SIGMOD'13
  Subtitle : International Conference on Management of Data
  Abbreviation : SIGMOD 2013
Source Genre: Proceedings
 Creator(s):
Ross, Kenneth1, Editor
Srivastava, Divesh1, Editor
Papadias, Dimitris1, Editor
Papadopoulos, Stavros1, Editor
Affiliations:
1 External Organizations, ou_persistent22            
Publ. Info: New York, NY : ACM
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 797 - 808 Identifier: ISBN: 978-1-4503-2037-5