Help Privacy Policy Disclaimer
  Advanced SearchBrowse





The pursuit of happiness: A reinforcement learning perspective on habituation and comparisons


Dayan,  P       
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Dubey, R., Griffiths, T., & Dayan, P. (2022). The pursuit of happiness: A reinforcement learning perspective on habituation and comparisons. Talk presented at Annual Meeting of the Society for NeuroEconomics (SNE 2022). Arlington, VA, USA. 2022-09-30 - 2022-10-02.

Cite as: https://hdl.handle.net/21.11116/0000-000B-0CF9-0
Objective: In evaluating our choices, we often suffer from two tragic relativities. First, when our lives change for the better, we rapidly habituate to the higher standard of living. Second, we cannot escape comparing ourselves to various relative standards. Habituation and comparisons can be very disruptive to our happiness and decision-making, and to date, it remains a puzzle why they have come to be a part of cognition in the first place. This study's objective is to provide a precise characterization of how and why these relative aspects might be desirable features of intelligent agents. Methods: Here, we adopt the computational framework of reinforcement learning (RL). In standard RL theory, the reward function serves the role of defining optimal behavior i.e., what the agent ought to accomplish. However, recent work on reward design has embraced the observation that the reward function plays a second, critical, role in RL in steering the agent from incompetence to mastery. These steering reward functions, often provided by the designer to the agent, have subjective features detached from the particular task but can nevertheless guide the learning of the agent. Here, we use this idea and endow agents with a subjective reward function that, in addition to the reward provided by the underlying task, also depends on prior expectations and relative comparisons. We then embed these agents in various parameterized environments and compare their performance against standard RL agents whose reward function depends on just the task reward value. Results: Extensive simulations reveal that agents equipped with this reward function learn and explore very efficiently in a wide range of settings. Notably, they significantly outperform standard reward-based agents in sparsely-rewarded, t(198) = 35.6, p < 0.01, and non-stationary environments t(198) = 30.1, p < 0.01. Our simulations also reveal potential drawbacks of this reward function and show that agents perform sub-optimally when comparisons are left unchecked and when there are too many similar options. Conclusions: Our results suggest that a subjective reward function based on prior expectations and comparisons might play an important role in promoting adaptive behavior by serving as a powerful learning signal. This provides computational support for a longstanding assumption in the field and explains why the human reward function might be based on these features. Together, our results help explain why we are prone to becoming trapped in a cycle of never-ending wants and desires, and may shed light on psychopathologies such as depression, materialism, and overconsumption.