Speaking the Same Language: Matching Machine to Human Captions by Adversarial 
Training

Shetty, Rakshith; Rohrbach, Marcus; Hendricks, Lisa Anne; Fritz, Mario; Schiele, Bernt

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Bitte beachten Sie, dass eine neuere Version dieses Datensatzes verfügbar ist:
https://pure.mpg.de/pubman/item/item_2456710_7

DetailsÜbersicht

Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

Shetty, R., Rohrbach, M., Hendricks, L. A., Fritz, M., & Schiele, B. (2017). Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training. Retrieved from http://arxiv.org/abs/1703.10476.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-002D-7CB3-3 Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-002D-7CB4-1

Genre: Forschungspapier

Dateien

einblenden: Dateien

ausblenden: Dateien

arXiv:1703.10476.pdf (Preprint), 10MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-002D-7CB5-0

Name:
arXiv:1703.10476.pdf

Beschreibung:
File downloaded from arXiv at 2017-06-26 09:35

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
http://arxiv.org/help/license

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Shetty, Rakshith¹, Autor
Rohrbach, Marcus², Autor
Hendricks, Lisa Anne², Autor
Fritz, Mario¹, Autor
Schiele, Bernt¹, Autor

Affiliations:
1Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society, ou_1116547
2External Organizations, ou_persistent22

Inhalt

einblenden:

ausblenden:

Schlagwörter: Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL

Zusammenfassung: While strong progress has been made in image captioning over the last years, machine and human captions are still quite distinct. A closer look reveals that this is due to the deficiencies in the generated word distribution, vocabulary size, and strong bias in the generators towards frequent captions. Furthermore, humans -- rightfully so -- generate multiple, diverse captions, due to the inherent ambiguity in the captioning task which is not considered in today's systems. To address these challenges, we change the training objective of the caption generator from reproducing groundtruth captions to generating a set of captions that is indistinguishable from human generated captions. Instead of handcrafting such a learning target, we employ adversarial training in combination with an approximate Gumbel sampler to implicitly match the generated distribution to the human one. While our method achieves comparable performance to the state-of-the-art in terms of the correctness of the captions, we generate a set of diverse captions, that are significantly less biased and match the word statistics better in several aspects.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erstellt: 2017-03-30Online veröffentlicht: 2017

Publikationsstatus: Online veröffentlicht

Seiten: 16 p.

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: arXiv: 1703.10476
URI: http://arxiv.org/abs/1703.10476
BibTex Citekey: Shetty2017

Art des Abschluß: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle