Generating Counterfactual Explanations with Natural Language

Hendricks, Lisa Anne; Hu, Ronghang; Darrell, Trevor; Akata, Zeynep

DetailsSummary

Generating Counterfactual Explanations with Natural Language

Hendricks, L. A., Hu, R., Darrell, T., & Akata, Z. (2018). Generating Counterfactual Explanations with Natural Language. In B. Kim, K. R. Varshney, & A. Weller (Eds.), Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning. Retrieved from http://arxiv.org/abs/1806.09809.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0002-18CE-C Version Permalink: https://hdl.handle.net/21.11116/0000-0002-18CF-B

Genre: Conference Paper

Files

show Files

hide Files

:

arXiv:1806.09809.pdf (Preprint), 549KB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0002-18D0-8

Name:
arXiv:1806.09809.pdf

Description:
File downloaded from arXiv at 2018-09-17 14:26 presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://arxiv.org/help/license

Locators

show

Creators

show

hide

Creators:
Hendricks, Lisa Anne¹, Author
Hu, Ronghang¹, Author
Darrell, Trevor¹, Author
Akata, Zeynep², Author

Affiliations:
1External Organizations, ou_persistent22
2Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society, ou_1116547

Content

show

hide

Free keywords: Computer Science, Computer Vision and Pattern Recognition, cs.CV

Abstract: Natural language explanations of deep neural network decisions provide an intuitive way for a AI agent to articulate a reasoning process. Current textual explanations learn to discuss class discriminative features in an image. However, it is also helpful to understand which attributes might change a classification decision if present in an image (e.g., "This is not a Scarlet Tanager because it does not have black wings.") We call such textual explanations counterfactual explanations, and propose an intuitive method to generate counterfactual explanations by inspecting which evidence in an input is missing, but might contribute to a different classification decision if present in the image. To demonstrate our method we consider a fine-grained image classification task in which we take as input an image and a counterfactual class and output text which explains why the image does not belong to a counterfactual class. We then analyze our generated counterfactual explanations both qualitatively and quantitatively using proposed automatic metrics.

Details

show

hide

Language(s): eng - English

Dates: Created: 2018-06-26Published Online: 2018

Publication Status: Published online

Pages: 4 p.

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 1806.09809
URI: http://arxiv.org/abs/1806.09809
BibTex Citekey: Hendricks_WHI2018

Degree: -

Event

show

hide

Title: ICML Workshop on Human Interpretability in Machine Learning

Place of Event: Stockholm, Sweden

Start-/End Date: 2018-07-14 - 2018-07-14

Legal Case

show

Project information

show

Source 1

show

hide

Title: Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning

Abbreviation : WHI 2018

Source Genre: Proceedings

Creator(s):
Kim, Been¹, Editor
Varshney, Kush R.¹, Editor
Weller, Adrian¹, Editor

Affiliations:
1 External Organizations, ou_persistent22

Publ. Info: -

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: - Identifier: -