English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

MPS-Authors
/persons/resource/persons84135

Peters,  J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Hachiya, H., Peters, J., & Sugiyama, M. (2011). Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning. Neural computation, 23(11), 2798-2832. doi:10.1162/NECO_a_00199.


Cite as: http://hdl.handle.net/11858/00-001M-0000-0013-B90C-2
Abstract
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.