English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Poster

Hierarchical deconstruction and memoization of goal-directed plans

MPS-Authors
There are no MPG-Authors in the publication available
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Huys, Q., Lally, N., Falkner, P., Gershman, P., Dayan, P., & Roiser, J. (2013). Hierarchical deconstruction and memoization of goal-directed plans. Poster presented at 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2013), Princeton, NJ, USA.


Cite as: https://hdl.handle.net/21.11116/0000-0004-DAF0-7
Abstract
Humans cannot exactly solve most planning problems they face, but must approximate them.
Research has characterized one class of approximations, whereby experience accumulated in habits substi-
tutes for the computational expense of searching goal-directed decision-trees. Building on previous work in
which we characterised Pavlovian pruning of decision trees, we explore more efficient approximations to
the planning problem. We focus on habit-like caching (’memoization’) of more complex action sequences
in dynamic, hierarchical decompositions of complex decision-trees.
Three groups of subjects performed a planning task. They first learned a transition matrix and then learned
about rewards associated with the transitions. They then produced choice sequences of a given length from
a random starting state to maximise their total earnings. Using reinforcement learning models nested inside
a Chinese Restaurant Process we infer subject’s hierarchical decomposition, stochastic memoization and
pruning strategies. Hierarchical decomposition and stochastic memoization models give detailed accounts
of complex features of choice data. We characterise how subjects dynamically establish subgoals; that their
decomposition strategy achieves a near optimal trade-off between computational costs and gains; that sub-
jects memoized and re-used complex choice sequences; that this correlated negatively with their ability to
search the tree; and replicated our previous findings whereby subjects disregard (prune) subtrees that lie
below large losses. We replicate all findings in two further datasets.
Humans employ multiple approximations when solving complex planning problems. They simplify the
problem by establishing sub-goals and decomposing the decision-tree around these in a manner that trades
computational costs for gains near optimally. They do not re-compute solutions on every trial but rather re-use previous solutions.