Reward-Weighted Regression with Sample Reuse for Direct Policy Search in 
Reinforcement Learning

Hachiya, H; Peters, J; Sugiyama, M

doi:10.1162/NECO_a_00199

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

MPS-Authors

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://www.mitpressjournals.org/doi/10.1162/NECO_a_00199
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Hachiya, H., Peters, J., & Sugiyama, M. (2011). Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning. Neural computation, 23(11), 2798-2832. doi:10.1162/NECO_a_00199.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-B90C-2

Abstract

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.