Reinforcement learning by reward-weighted regression for operational space 
control

Peters, J; Schaal, S

doi:10.1145/1273496.1273590

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

会議論文

Reinforcement learning by reward-weighted regression for operational space control

MPS-Authors

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://dl.acm.org/citation.cfm?doid=1273496.1273590
(出版社版)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

ICML-2007-Peters.pdf
(全文テキスト（全般）), 399KB

付随資料 (公開)

There is no public supplementary material available

引用

Peters, J., & Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control. In Z., Ghahramani (Ed.), ICML '07: 24th International Conference on Machine Learning (pp. 745-750). New York, NY, USA: ACM Press.

引用: https://hdl.handle.net/11858/00-001M-0000-0013-CD69-F

要旨

Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan amp;amp;amp;amp; Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degreeof-freedom robots.