hide
Free keywords:
-
Abstract:
Planning problems are often solved approximately using simulation based methods such as Monte
Carlo Tree Search (MCTS). Indeed, UCT, perhaps the most popular MCTS algorithm, lies at the heart of
many successful applications. However, UCT is fundamentally inefficient as a planning algorithm, since it
is not focused exclusively on the value of the action that is ultimately chosen. Accordingly, even as simple
a modification to UCT as accounting for myopic information values at the root of the search tree can result
in significant performance improvements. Here, we propose a method that extends value of information-
like computations to arbitrarily many nodes of the search tree for simple acyclic MDPs. We demonstrate significant performance improvements over other planning algorithms.