Help Privacy Policy Disclaimer
  Advanced SearchBrowse





On the forking paths of exactitude


Bányai,  M
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;


Dayan,  P
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Bányai, M., & Dayan, P. (2022). On the forking paths of exactitude. Poster presented at 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022), Providence, RI, USA.

Cite as: http://hdl.handle.net/21.11116/0000-000A-8251-7
How inputs are represented is critical for performance in decision-making problems since it determines how superficial distinctions are discarded or parametrically suppressed. It is thus a central facet in RL, and also a focus of human and animal behavioural neuroscience. Superficiality depends on what a decision-maker currently knows and, most critically, what they expect to find out next - as an aggregation at one point in learning can affect potential disaggregations at later points. Thus, the optimal representation at any particular juncture is neither that which compactly summarises past observations nor that which supports the ultimately optimal policy. Here, we analyze this problem, showing that decision-makers need to plan in the space of possible future representations in the same careful way they balance exploration/exploitation of actions - for instance, via value estimation in a tree spanning possible future belief states and representations. In a contextual bandit in which states can optimally be aggregated into discrete abstractions, a representational trajectory corresponds to the temporal order by which finer or coarser-grained distinctions are made between different state space regions. We show how the optimal representational trajectory depends on the discount factor in addition to the belief state, predicting that the same series of observations should lead to different representational refinements at different discounting values. We show that representational coarse-graining is similarly beneficial for decision-makers who are only approximately Bayesian, using their representations to encode their beliefs about the reward structure of the environment. In sum, representational planning provides a general and flexible framework for modelling human statistical learning and decision making, for principled evaluation of heuristics, and for making predictions about both behavioural and neural signatures of learning.