hide
Free keywords:
-
Abstract:
Only rich and sophisticated statistical models are adequate for agents that must learn to navi-
gate complex environments. However, it has not been clear how methods for planning can take advantage
of models, such as those incorporating Bayesian non-parametric devices, that are sufficiently intricate as
to demand approximate sampling schemes. We show that Bayes-Adaptive planning can be combined in a
principled way with approximate sampling, and demonstrate the power of the resulting method in a chal-
lenging task involving safe exploration which defeats myopic methods such as Thompson Sampling. This
highlights the importance of propagating beliefs in realistic cases involving trade-offs between exploration
and exploitation. The next challenge is to employ function approximation to represent the belief-state value to improve search efficiency further and thus enable longer search horizons.