Abstract
I will discuss the exploratory and exploitative behavior of human subjects in a four-armed restless bandit task. Despite our best analytical efforts, we could find no evidence that subjects awarded exploration bonuses to options they hadn’t tried for a while. Instead, their exploratory behaviour was well-captured by a form of softmax choice in a conventional reinforcement learning model. Fronto-polar cortex, a large and poorly understood area of the human brain, was specifically activated on trials that this model classified as exploratory, to a degree that depended on the requirement for cognitive control associated with those trials.