English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

MPS-Authors

Hachiya,  H.
Max Planck Society;

/persons/resource/persons84135

Peters,  J.
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

External Ressource
No external resources are shared
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Hachiya, H., Peters, J., & Sugiyama, M. (2011). Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning. Neural Computation, 23(11), 2798-2832. doi:10.1162/NECO_a_00199.


Cite as: http://hdl.handle.net/11858/00-001M-0000-0010-4C08-6
Abstract
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.