Attentive Explanations: Justifying Decisions and Pointing to the Evidence 
(Extended Abstract)

Park, Dong Huk; Hendricks, Lisa Anne; Akata, Zeynep; Rohrbach, Anna; Schiele, Bernt; Darrell, Trevor; Rohrbach, Marcus

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_2534363_3

DetailsSummary

Released

Paper

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

MPS-Authors

/persons/resource/persons127761

Akata, Zeynep
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons79477

Rohrbach, Anna
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons45383

Schiele, Bernt
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:1711.07373.pdf
(Preprint), 2MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Park, D. H., Hendricks, L. A., Akata, Z., Rohrbach, A., Schiele, B., Darrell, T., et al. (2017). Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract). Retrieved from http://arxiv.org/abs/1711.07373.

Cite as: https://hdl.handle.net/21.11116/0000-0000-3F67-7

Abstract

Deep models are the defacto standard in visual decision problems due to their
impressive performance on a wide array of visual tasks. On the other hand,
their opaqueness has led to a surge of interest in explainable systems. In this
work, we emphasize the importance of model explanation in various forms such as
visual pointing and textual justification. The lack of data with justification
annotations is one of the bottlenecks of generating multimodal explanations.
Thus, we propose two large-scale datasets with annotations that visually and
textually justify a classification decision for various activities, i.e. ACT-X,
and for question answering, i.e. VQA-X. We also introduce a multimodal
methodology for generating visual and textual explanations simultaneously. We
quantitatively show that training with the textual explanations not only yields
better textual justification models, but also models that better localize the
evidence that support their decision.