Help Privacy Policy Disclaimer
  Advanced SearchBrowse





On the forking paths of exactitude


Bányai,  M
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;


Dayan,  P
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Bányai, M., & Dayan, P. (2022). On the forking paths of exactitude. Poster presented at 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022), Providence, RI, USA.

Cite as: https://hdl.handle.net/21.11116/0000-000A-8251-7
How inputs are represented is critical for performance in decision-making problems since it determines how superficial distinctions
are discarded or parametrically suppressed. It is thus a central facet in RL, and also a focus of human and animal
behavioural neuroscience. Superficiality depends on what a decision-maker currently knows and, most critically, what they
expect to find out next - as an aggregation at one point in learning can affect potential disaggregations at later points. Thus,
the optimal representation at any particular juncture is neither that which compactly summarises past observations nor that
which supports the ultimately optimal policy. Here, we analyze this problem, showing that decision-makers need to plan in
the space of possible future representations in the same careful way they balance exploration/exploitation of actions - for
instance, via value estimation in a tree spanning possible future belief states and representations. In a contextual bandit in
which states can optimally be aggregated into discrete abstractions, a representational trajectory corresponds to the temporal
order by which finer or coarser-grained distinctions are made between different state space regions. We show how the
optimal representational trajectory depends on the discount factor in addition to the belief state, predicting that the same
series of observations should lead to different representational refinements at different discounting values. We show that
representational coarse-graining is similarly beneficial for decision-makers who are only approximately Bayesian, using their
representations to encode their beliefs about the reward structure of the environment. In sum, representational planning
provides a general and flexible framework for modelling human statistical learning and decision making, for principled evaluation
of heuristics, and for making predictions about both behavioural and neural signatures of learning.