Abstract
We have developed two multi-step decision tasks involving chains of actions, intended for use in rodent electrophysiology. In both tasks an initial decision between a pair of actions (lever presses) makes available oneof two further ‘second link’ actions (retractable levers), which in turn lead to the trial outcome (reward or time-out). The first task aims to dissociate model-based and model-free learning by revaluing the states reached as aconsequence of the initial decision through experience with the second link actions under changed reward con-tingencies. Revaluation occurs using trials in which the animal choses directly between the second link actions(Fig. 1a). We have shown that mice are able to use experience on these revaluation trials to guide subsequentchoices on decisions between the first links (Fig. 1b). The second task, adapted from a recent design by Daw etal. (2011), uses probabilistic transitions between the initial decision and the second link states, such that each ofthe initially available actions has a normal transition which makes one of the second link actions available, and arare transition with makes the other available (Fig. 2a). We have shown in mice that the effect of trial outcome(reward or timeout) on choice probability in the subsequent trial depends on whether the outcome followed anormal or rare transition (Fig. 2b), consistent with the use of a model-based strategy. One concern with thistype of task is that sophisticated model free strategies may exist which can produce behaviour closely resemblingmodel based control. Specifically, in our tasks, outcomes from second link actions on a given trial could be usedas discriminative stimuli to guide subsequent choices between first links. We are working to identify how wellwe can distinguish between these possibilities by comparing different reinforcement learning models fitted to the behaviour.