hide
Free keywords:
-
Abstract:
Humans cannot exactly solve most planning problems they face, but must approximate them.
Research has characterized one class of approximations, whereby experience accumulated in habits substi-
tutes for the computational expense of searching goal-directed decision-trees. Building on previous work in
which we characterised Pavlovian pruning of decision trees, we explore more efficient approximations to
the planning problem. We focus on habit-like caching (’memoization’) of more complex action sequences
in dynamic, hierarchical decompositions of complex decision-trees.
Three groups of subjects performed a planning task. They first learned a transition matrix and then learned
about rewards associated with the transitions. They then produced choice sequences of a given length from
a random starting state to maximise their total earnings. Using reinforcement learning models nested inside
a Chinese Restaurant Process we infer subject’s hierarchical decomposition, stochastic memoization and
pruning strategies. Hierarchical decomposition and stochastic memoization models give detailed accounts
of complex features of choice data. We characterise how subjects dynamically establish subgoals; that their
decomposition strategy achieves a near optimal trade-off between computational costs and gains; that sub-
jects memoized and re-used complex choice sequences; that this correlated negatively with their ability to
search the tree; and replicated our previous findings whereby subjects disregard (prune) subtrees that lie
below large losses. We replicate all findings in two further datasets.
Humans employ multiple approximations when solving complex planning problems. They simplify the
problem by establishing sub-goals and decomposing the decision-tree around these in a manner that trades
computational costs for gains near optimally. They do not re-compute solutions on every trial but rather re-use previous solutions.