Textual Explanations for Self-Driving Vehicles

Kim, Jinkyu; Rohrbach, Anna; Darrell, Trevor; Canny, John; Akata, Zeynep

doi:10.1007/978-3-030-01216-8_35

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

会議論文

Textual Explanations for Self-Driving Vehicles

MPS-Authors

/persons/resource/persons79477

Rohrbach, Anna
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons127761

Akata, Zeynep
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Kim, J., Rohrbach, A., Darrell, T., Canny, J., & Akata, Z. (2018). Textual Explanations for Self-Driving Vehicles. In V., Ferrari, M., Hebert, C., Sminchisescu, & Y., Weiss (Eds.), Computer Vision -- ECCV 2018 (pp. 577-593). Berlin: Springer. doi:10.1007/978-3-030-01216-8_35.

引用: https://hdl.handle.net/21.11116/0000-0001-DE86-E

要旨

Deep neural perception and control networks have become key com-
ponents of self-driving vehicles. User acceptance is likely to benefit from easy-
to-interpret textual explanations which allow end-users to understand what trig-
gered a particular behavior. Explanations may be triggered by the neural con-
troller, namely
introspective explanations
, or informed by the neural controller’s
output, namely
rationalizations
. We propose a new approach to introspective ex-
planations which consists of two parts. First, we use a visual (spatial) attention
model to train a convolutional network end-to-end from images to the vehicle
control commands,
i
.
e
., acceleration and change of course. The controller’s at-
tention identifies image regions that potentially influence the network’s output.
Second, we use an attention-based video-to-text model to produce textual ex-
planations of model actions. The attention maps of controller and explanation
model are aligned so that explanations are grounded in the parts of the scene that
mattered to the controller. We explore two approaches to attention alignment,
strong- and weak-alignment. Finally, we explore a version of our model that
generates rationalizations, and compare with introspective explanations on the
same video segments. We evaluate these models on a novel driving dataset with
ground-truth human explanations, the Berkeley DeepDrive eXplanation (BDD-
X) dataset. Code is available at
https://github.com/JinkyuKimUCB/explainable-deep-driving