Large-scale Matrix Factorization with Distributed Stochastic Gradient Descent

Gemulla, Rainer; Haas, Peter J.; Nijkamp, Erik; Sismanis, Yannis

Lokale TagsFreigabegeschichteDetailsÜbersicht

Large-scale Matrix Factorization with Distributed Stochastic Gradient Descent

Gemulla, R., Haas, P. J., Nijkamp, E., & Sismanis, Y.(2011). Large-scale Matrix Factorization with Distributed Stochastic Gradient Descent (Local-ID: C1256DBF005F876D-5B618B1FF070E981C125784D0044B0D1-gemulla11). San Jose, CA: IBM Research Division. Retrieved from http://www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0010-147F-E Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-0024-0A83-4

Genre: Bericht

Dateien

einblenden: Dateien

ausblenden: Dateien

:

dsgdTechRep.pdf (beliebiger Volltext), 494KB

Datei-Permalink:
-

Name:
dsgdTechRep.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Privat

MIME-Typ / Prüfsumme:
application/pdf

Technische Metadaten:

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Gemulla, Rainer¹, Autor
Haas, Peter J.², Autor
Nijkamp, Erik², Autor
Sismanis, Yannis², Autor

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2External Organizations, ou_persistent22

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: As Web 2.0 and enterprise-cloud applications have proliferated, data mining algorithms increasingly need to be (re)designed to handle web-scale datasets. For this reason, low-rank matrix factorization has received a lot of attention in recent years, since it is fundamental to a variety of mining tasks, such as topic detection and collaborative filtering, that are increasingly being applied to massive datasets. We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm; the idea is to exploit the special structure of the matrix factorization problem to develop a new ``stratified'' SGD variant that can be fully distributed and run on web-scale datasets using, e.g., MapReduce. The resulting distributed SGD factorization algorithm, called DSGD, provides good speed-up and handles a wide variety of matrix factorizations. We establish convergence properties of DSGD using results from stochastic approximation theory and regenerative process theory, and also describe the practical techniques used to optimize performance in our DSGD implementation. Experiments suggest that DSGD converges significantly faster and has better scalability properties than alternative algorithms.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Online veröffentlicht: 2011

Publikationsstatus: Online veröffentlicht

Seiten: -

Ort, Verlag, Ausgabe: San Jose, CA : IBM Research Division

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: eDoc: 618949
URI: http://www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf
Anderer: Local-ID: C1256DBF005F876D-5B618B1FF070E981C125784D0044B0D1-gemulla11

Art des Abschluß: -

ausblenden:

Titel: IBM Research Report

Genre der Quelle: Reihe

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: -

Seiten: - Band / Heft: RJ10481 Artikelnummer: - Start- / Endseite: - Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1