Attentive Explanations: Justifying Decisions and Pointing to the Evidence

Park,, Dong Huk; Hendricks, Lisa Anne; Akata, Zeynep; Schiele, Bernt; Darrell, Trevor; Rohrbach, Marcus

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

このアイテムの新しいバージョンが利用可能です:
https://pure.mpg.de/pubman/item/item_2385245_4

詳細要約

公開

成果報告書

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

MPS-Authors

/persons/resource/persons127761

Akata, Zeynep
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons45383

Schiele, Bernt
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

arXiv:1612.04757.pdf
(プレプリント), 10MB

付随資料 (公開)

There is no public supplementary material available

引用

Park, D. H., Hendricks, L. A., Akata, Z., Schiele, B., Darrell, T., & Rohrbach, M. (2016). Attentive Explanations: Justifying Decisions and Pointing to the Evidence. Retrieved from http://arxiv.org/abs/1612.04757.

引用: https://hdl.handle.net/11858/00-001M-0000-002C-4CE7-B

要旨

Deep models are the defacto standard in visual decision models due to their impressive performance on a wide array of visual tasks. However, they are frequently seen as opaque and are unable to explain their decisions. In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions. We postulate that deep models can do this as well and propose our Pointing and Justification (PJ-X) model which can justify its decision with a sentence and point to the evidence by introspecting its decision and explanation process using an attention mechanism. Unfortunately there is no dataset available with reference explanations for visual decision making. We thus collect two datasets in two domains where it is interesting and challenging to explain decisions. First, we extend the visual question answering task to not only provide an answer but also a natural language explanation for the answer. Second, we focus on explaining human activities which is traditionally more challenging than object classification. We extensively evaluate our PJ-X model, both on the justification and pointing tasks, by comparing it to prior models and ablations using both automatic and human evaluations.