Two steps to risk sensitivity

Gagne, C; Dayan, P

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_3353635_5

DetailsSummary

Released

Conference Paper

Two steps to risk sensitivity

MPS-Authors

/persons/resource/persons245568

Gagne, C
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons217460

Dayan, P
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://openreview.net/pdf?id=HnLDt9v6Q-j
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Gagne, C., & Dayan, P. (in press). Two steps to risk sensitivity. In M. Ranzato, A. Beygelzimer, P. Liang, J. Vaughan, & Y. Dauphin (Eds.), Advances in Neural Information Processing Systems 34.

Cite as: https://hdl.handle.net/21.11116/0000-0009-8A1E-B

Abstract

Distributional reinforcement learning (RL) – in which agents learn about all the possible long-term consequences of their actions, and not just the expected value – is of great recent interest. One of the most important affordances of a distributional view is facilitating a modern, measured, approach to risk when outcomes are not completely certain. By contrast, psychological and neuroscientific investigations into decision making under risk have utilized a variety of more venerable theoretical models such as prospect theory that lack axiomatically desirable properties such as coherence. Here, we consider a particularly relevant risk measure for modeling human and animal planning, called conditional value-at-risk (CVaR), which quantifies worst-case outcomes (e.g., vehicle accidents or predation). We first adopt a conventional distributional approach to CVaR in a sequential setting and reanalyze the choices of human decision-makers in the well-known two-step task, revealing substantial risk aversion that had been lurking under stickiness and perseveration. We then consider a further critical property of risk sensitivity, namely time consistency, showing alternatives to this form of CVaR that enjoy this desirable characteristic. We use simulations to examine settings in which the various forms differ in ways that have implications for human and animal planning and behavior.