Abstract
Objective: In evaluating our choices, we often suffer from two tragic relativities. First, when our lives change for the better, we rapidly habituate to the higher standard of living. Second, we cannot escape comparing ourselves to various relative standards. Habituation and comparisons can be very disruptive to our happiness and decision-making, and to date, it remains a puzzle why they have come to be a part of cognition in the first place. This study's objective is to provide a precise characterization of how and why these relative aspects might be desirable features of intelligent agents. Methods: Here, we adopt the computational framework of reinforcement learning (RL). In standard RL theory, the reward function serves the role of defining optimal behavior i.e., what the agent ought to accomplish. However, recent work on reward design has embraced the observation that the reward function plays a second, critical, role in RL in steering the agent from incompetence to mastery. These steering reward functions, often provided by the designer to the agent, have subjective features detached from the particular task but can nevertheless guide the learning of the agent. Here, we use this idea and endow agents with a subjective reward function that, in addition to the reward provided by the underlying task, also depends on prior expectations and relative comparisons. We then embed these agents in various parameterized environments and compare their performance against standard RL agents whose reward function depends on just the task reward value. Results: Extensive simulations reveal that agents equipped with this reward function learn and explore very efficiently in a wide range of settings. Notably, they significantly outperform standard reward-based agents in sparsely-rewarded, t(198) = 35.6, p < 0.01, and non-stationary environments t(198) = 30.1, p < 0.01. Our simulations also reveal potential drawbacks of this reward function and show that agents perform sub-optimally when comparisons are left unchecked and when there are too many similar options. Conclusions: Our results suggest that a subjective reward function based on prior expectations and comparisons might play an important role in promoting adaptive behavior by serving as a powerful learning signal. This provides computational support for a longstanding assumption in the field and explains why the human reward function might be based on these features. Together, our results help explain why we are prone to becoming trapped in a cycle of never-ending wants and desires, and may shed light on psychopathologies such as depression, materialism, and overconsumption.