Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Conference Paper

Solving Deep Memory POMDPs with Recurrent Policy Gradients

There are no MPG-Authors in the publication available
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Wierstra, D., Förster, A., Peters, J., & Schmidhuber, J. (2007). Solving Deep Memory POMDPs with Recurrent Policy Gradients. In J. Marques de Sá, L. Alexandre, W. Duch, & D. Mandic (Eds.), Artificial Neural Networks – ICANN 2007: 7th International Conference, Porto, Portugal, September 9-13, 2007 (pp. 697-706). Berlin, Germany: Springer.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-CBF9-E
This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic
policies for partially observable Markov decision problems (POMDPs)
that require long-term memories of past observations. The approach
involves approximating a policy gradient for a Recurrent Neural Network
(RNN) by backpropagating return-weighted characteristic eligibilities
through time. Using a “Long Short-Term Memory” architecture, we
are able to outperform other RL methods on two important benchmark
tasks. Furthermore, we show promising results on a complex car driving simulation task.