Towards Holistic Machines: From Visual Recognition To Question Answering About 
Real-world Image

Malinowski, Mateusz

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Bitte beachten Sie, dass eine neuere Version dieses Datensatzes verfügbar ist:
https://pure.mpg.de/pubman/item/item_2462743_2

DetailsÜbersicht

Towards Holistic Machines: From Visual Recognition To Question Answering About Real-world Image

Malinowski, M. (2017). Towards Holistic Machines: From Visual Recognition To Question Answering About Real-world Image. PhD Thesis, Universität des Saarlandes, Saarbrücken.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-002D-9339-5 Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-002D-933A-3

Genre: Hochschulschrift

Dateien

einblenden: Dateien

Externe Referenzen

einblenden:

ausblenden:

externe Referenz:
http://scidok.sulb.uni-saarland.de/volltexte/2017/6897/ (beliebiger Volltext) Open Access Grün

Beschreibung:
-

OA-Status:
Grün

externe Referenz:
http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de (Verlagsvertrag) Open Access Status unbekannt

Beschreibung:
-

OA-Status:
Keine Angabe

Urheber

einblenden:

ausblenden:

Urheber:
Malinowski, Mateusz^{1, 2}, Autor
Fritz, Mario¹, Ratgeber
Pinkal, Manfred³, Gutachter
Darrell, Trevor³, Gutachter

Affiliations:
1Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society, ou_1116547
2International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551
3External Organizations, ou_persistent22

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Computer Vision has undergone major changes over the recent five years. Here, we investigate if the performance of such architectures generalizes to more complex tasks that require a more holistic approach to scene comprehension. The presented work focuses on learning spatial and multi-modal representations, and the foundations of a Visual Turing Test, where the scene understanding is tested by a series of questions about its content. In our studies, we propose DAQUAR, the first ‘question answering about real-world images’ dataset together with methods, termed a symbolic-based and a neural-based visual question answering architectures, that address the problem. The symbolic-based method relies on a semantic parser, a database of visual facts, and a bayesian formulation that accounts for various interpretations of the visual scene. The neural-based method is an end-to-end architecture composed of a question encoder, image encoder, multimodal embedding, and answer decoder. This architecture has proven to be effective in capturing language-based biases. It also becomes the standard component of other visual question answering architectures. Along with the methods, we also investigate various evaluation metrics that embraces uncertainty in word's meaning, and various interpretations of the scene and the question.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Angenommen: 2017-06-20Online veröffentlicht: 2017Erschienen: 2017

Publikationsstatus: Erschienen

Seiten: 276 p.

Ort, Verlag, Ausgabe: Saarbrücken : Universität des Saarlandes

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: BibTex Citekey: Malinowskiphd17
URN: urn:nbn:de:bsz:291-scidok-68978

Art des Abschluß: Doktorarbeit

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle