Reinforcement Learning with Bounded Information Loss

Peters, J; Mülling, K; Seldin, Y; Altun, Y

doi:10.1063/1.3573639

Local TagsRelease HistoryDetailsSummary

Reinforcement Learning with Bounded Information Loss

Peters, J., Mülling, K., Seldin, Y., & Altun, Y. (2011). Reinforcement Learning with Bounded Information Loss. In A. Mohammad-Djafari, J.-F. Bercher, & P. Bessière (Eds.), AIP Conference Proceedings (pp. 365-372). Woodbury, NY, USA: American Institute of Physics. doi:10.1063/1.3573639.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0002-81C9-9 Version Permalink: https://hdl.handle.net/21.11116/0000-0002-81CA-8

Genre: Conference Paper

Files

show Files

Locators

show

hide

Locator:
https://aip.scitation.org/doi/pdf/10.1063/1.3573639?class=pdf (Publisher version) Open Access status unknown

Description:
-

OA-Status:

Creators

show

hide

Creators:
Peters, J^{1, 2}, Author
Mülling, K^{1, 2}, Author
Seldin, Y^{1, 2}, Author
Altun, Y^{1, 2}, Author

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794

Content

show

hide

Free keywords: -

Abstract: Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant or natural policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest two reinforcement learning methods, i.e., a model‐based and a model free algorithm that bound the loss in relative entropy while maximizing their return. The resulting methods differ significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems as well as novel evaluations in robotics. We also show a Bayesian bound motivation of this new approach [8].

Details

show

hide

Language(s):

Dates: Published Online: 2011-03

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1063/1.3573639

Degree: -

Event

show

hide

Title: 30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2010)

Place of Event: Chamonix, France

Start-/End Date: 2010-07-04 - 2010-07-09

Legal Case

show

Project information

show

Source 1

show

hide

Title: AIP Conference Proceedings

Source Genre: Proceedings

Creator(s):
Mohammad-Djafari, A, Editor
Bercher, J-F, Editor
Bessière, P, Editor

Affiliations:
-

Publ. Info: Woodbury, NY, USA : American Institute of Physics

Pages: - Volume / Issue: 1305 (1) Sequence Number: - Start / End Page: 365 - 372 Identifier: ISBN: 978-073540860-9