Policy gradient methods

Peters, J

doi:10.4249/scholarpedia.3698

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

学術論文

Policy gradient methods

MPS-Authors

/persons/resource/persons84135

Peters, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

http://www.scholarpedia.org/article/Policy_gradient_methods
(出版社版)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Peters, J. (2010). Policy gradient methods. Scholarpedia, 5(11), 3698. doi:10.4249/scholarpedia.3698.

引用: https://hdl.handle.net/11858/00-001M-0000-0013-BD68-3

要旨

Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing parametrized policies with respect to the expected return (long-term cumulative reward) by gradient descent. They do not suffer from many of the problems that have been marring traditional reinforcement learning approaches such as the lack of guarantees of a value function, the intractability problem resulting from uncertain state information and the complexity arising from continuous states actions.