English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Policy gradient methods for machine learning

Peters, J., Theodoru, E., & Schaal, S. (2007). Policy gradient methods for machine learning. Poster presented at 14th INFORMS Applied Probability Conference (INFORMS 2007), Eindhoven, The Netherlands.

Item is

Files

show Files
hide Files
:
INFORMS-2007-Peters.pdf (Abstract), 25KB
Name:
INFORMS-2007-Peters.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Peters, J1, Author           
Theodoru, E, Author
Schaal, S1, Author           
Affiliations:
1External Organizations, ou_persistent22              

Content

show
hide
Free keywords: -
 Abstract: We present an in-depth survey of policy gradient methods as they are used in the machine learning community for optimizing parameterized, stochastic control policies in Markovian systems with respect to the expected reward. Despite having been developed separately in the reinforcement learning literature, policy gradient methods employ likelihood ratio gradient estimators as also suggested in the stochastic simulation optimization community. It is well-known that this approach to policy gradient estimation traditionally suffers from three drawbacks, i.e., large variance, a strong dependence on baseline functions and a inefficient gradient descent. In this talk, we will present a series of recent results which tackles each of these problems. The variance of the gradient estimation can be reduced significantly through recently introduced techniques such as optimal baselines, compatible function approximations and all-action gradients. However, as even the analytically obtainable policy gradients perform unnatur
ally slow, it required the step from ÔvanillaÕ policy gradient methods towards natural policy gradients in order to overcome the inefficiency of the gradient descent. This development resulted into the Natural Actor-Critic architecture which can be shown to be very efficient in application to motor primitive learning for robotics.

Details

show
hide
Language(s):
 Dates: 2007-07
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: 4726
 Degree: -

Event

show
hide
Title: 14th INFORMS Applied Probability Conference (INFORMS 2007)
Place of Event: Eindhoven, The Netherlands
Start-/End Date: 2007-07-09 - 2007-07-11

Legal Case

show

Project information

show

Source

show