Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

 
 
DownloadE-Mail
  Mind the Gap: Large-scale Frequent Sequence Mining

Miliaraki, I., Berberich, K., Gemulla, R., & Zoupanos, S. (2013). Mind the Gap: Large-scale Frequent Sequence Mining. In K. Ross, D. Srivastava, D. Papadias, & S. Papadopoulos (Eds.), SIGMOD'13 (pp. 797-808). New York, NY: ACM. doi:10.1145/2463676.2465285.

Item is

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Miliaraki, Iris1, Autor           
Berberich, Klaus1, Autor           
Gemulla, Rainer1, Autor           
Zoupanos, Spyros1, Autor           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are suffciently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this paper, we propose PFSM, a scalable algorithm for frequent sequence mining on MapReduce. PFSM can handle so-called ``gap constraints'', which can be used to limit the output to a controlled set of frequent sequences. At its heart, PFSM partitions the input database in a way that allows us to mine each partition independently using any existing frequent sequence mining algorithm. We introduce the notion of w-equivalency, which is a generalization of the notion of a ``projected database'' used by many frequent pattern mining algorithms. We also present a number of optimization techniques that minimize partition size, and therefore computational and communication costs, while still maintaining correctness. Our extensive experimental study in the context of text mining suggests that PFSM is significantly more efficient and scalable than alternative approaches.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2013-06-222013
 Publikationsstatus: Erschienen
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: BibTex Citekey: Miliaraki2013
Anderer: Local-ID: 086027E8ABA46DC6C1257B0F003D8C96-Miliaraki2013
DOI: 10.1145/2463676.2465285
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: ACM SIGMOD International Conference on Management of Data
Veranstaltungsort: New York, NY, USA
Start-/Enddatum: 2013-06-22 - 2013-06-27

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: SIGMOD'13
  Untertitel : International Conference on Management of Data
  Kurztitel : SIGMOD 2013
Genre der Quelle: Konferenzband
 Urheber:
Ross, Kenneth1, Herausgeber
Srivastava, Divesh1, Herausgeber
Papadias, Dimitris1, Herausgeber
Papadopoulos, Stavros1, Herausgeber
Affiliations:
1 External Organizations, ou_persistent22            
Ort, Verlag, Ausgabe: New York, NY : ACM
Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 797 - 808 Identifikator: ISBN: 978-1-4503-2037-5