English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Reinforcement Learning with Bounded Information Loss

MPS-Authors
/persons/resource/persons84135

Peters,  J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84097

Mülling,  K
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84206

Seldin,  Y
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons83782

Altun,  Y
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Peters, J., Mülling, K., Seldin, Y., & Altun, Y. (2011). Reinforcement Learning with Bounded Information Loss. In A. Mohammad-Djafari, J.-F. Bercher, & P. Bessière (Eds.), AIP Conference Proceedings (pp. 365-372). Woodbury, NY, USA: American Institute of Physics. doi:10.1063/1.3573639.


Cite as: http://hdl.handle.net/21.11116/0000-0002-81C9-9
Abstract
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant or natural policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest two reinforcement learning methods, i.e., a model‐based and a model free algorithm that bound the loss in relative entropy while maximizing their return. The resulting methods differ significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems as well as novel evaluations in robotics. We also show a Bayesian bound motivation of this new approach [8].