KDEformer: Accelerating Transformers via Kernel Density Estimation

Zandieh, Amir; Han, Insu; Daliri, Majid; Karbasi, Amin

Lokale TagsFreigabegeschichteDetailsÜbersicht

KDEformer: Accelerating Transformers via Kernel Density Estimation

Zandieh, A., Han, I., Daliri, M., & Karbasi, A. (2023). KDEformer: Accelerating Transformers via Kernel Density Estimation. Retrieved from https://arxiv.org/abs/2302.02451.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-000C-90F7-A Versions-Permalink: https://hdl.handle.net/21.11116/0000-000C-90F8-9

Genre: Forschungspapier

Dateien

einblenden: Dateien

ausblenden: Dateien

:

arXiv:2302.02451.pdf (Preprint), 5MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/21.11116/0000-000C-90F9-8

Name:
arXiv:2302.02451.pdf

Beschreibung:
File downloaded from arXiv at 2023-02-10 09:56

OA-Status:
Keine Angabe

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
http://arxiv.org/licenses/nonexclusive-distrib/1.0/

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Zandieh, Amir¹, Autor
Han, Insu², Autor
Daliri, Majid², Autor
Karbasi, Amin², Autor

Affiliations:
1Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019
2External Organizations, ou_persistent22

Inhalt

einblenden:

ausblenden:

Schlagwörter: Computer Science, Learning, cs.LG,Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Data Structures and Algorithms, cs.DS

Zusammenfassung: Dot-product attention mechanism plays a crucial role in modern deep
architectures (e.g., Transformer) for sequence modeling, however, na\"ive exact
computation of this model incurs quadratic time and memory complexities in
sequence length, hindering the training of long-sequence models. Critical
bottlenecks are due to the computation of partition functions in the
denominator of softmax function as well as the multiplication of the softmax
matrix with the matrix of values. Our key observation is that the former can be
reduced to a variant of the kernel density estimation (KDE) problem, and an
efficient KDE solver can be further utilized to accelerate the latter via
subsampling-based fast matrix products. Our proposed KDEformer can approximate
the attention in sub-quadratic time with provable spectral norm bounds, while
all prior results merely provide entry-wise error bounds. Empirically, we
verify that KDEformer outperforms other attention approximations in terms of
accuracy, memory, and runtime on various pre-trained models. On BigGAN image
generation, we achieve better generative scores than the exact computation with
over $4\times$ speedup. For ImageNet classification with T2T-ViT, KDEformer
shows over $18\times$ speedup while the accuracy drop is less than $0.5\%$.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erstellt: 2023-02-05Online veröffentlicht: 2023

Publikationsstatus: Online veröffentlicht

Seiten: 26 p.

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: arXiv: 2302.02451
BibTex Citekey: zandieh2302.02451
URI: https://arxiv.org/abs/2302.02451

Art des Abschluß: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle