Model-Based Reinforcement Learning with Continuous States and Actions

Deisenroth, MP; Rasmussen, CE; Peters, J

Local TagsRelease HistoryDetailsSummary

Model-Based Reinforcement Learning with Continuous States and Actions

Deisenroth, M., Rasmussen, C., & Peters, J. (2008). Model-Based Reinforcement Learning with Continuous States and Actions. In M. Verleysen (Ed.), Advances in computational intelligence and learning: 16th European Symposium on Artificial Neural Networks (pp. 19-24). Evere, Belgium: d-side.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-C9E1-0 Version Permalink: https://hdl.handle.net/21.11116/0000-0003-7FD9-B

Genre: Conference Paper

Files

show Files

hide Files

:

ESANN-2008-Deisenroth.pdf (Any fulltext), 300KB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0003-7FDA-A

Name:
ESANN-2008-Deisenroth.pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

hide

Locator:
https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2008-8.pdf (Publisher version) Open Access status unknown

Description:
-

OA-Status:

Creators

show

hide

Creators:
Deisenroth, MP, Author
Rasmussen, CE^{1, 2}, Author
Peters, J^{1, 2}, Author

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794

Content

show

hide

Free keywords: -

Abstract: Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions
are often inevitable. GPDP is an approximate dynamic programming algorithm
based on Gaussian process (GP) models for the value functions. In
this paper, we extend GPDP to the case of unknown transition dynamics.
After building a GP model for the transition dynamics, we apply GPDP
to this model and determine a continuous-valued policy in the entire state
space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.

Details

show

hide

Language(s):

Dates: Date issued: 2008-04

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: URI: http://www.dice.ucl.ac.be/esann/index.php?pg=pgm
BibTex Citekey: 4977

Degree: -

Event

show

hide

Title: 16th European Symposium on Artificial Neural Networks (ESANN 2008)

Place of Event: Bruges, Belgium

Start-/End Date: 2008-04-23 - 2008-04-25

Legal Case

show

Project information

show

Source 1

show

hide

Title: Advances in computational intelligence and learning: 16th European Symposium on Artificial Neural Networks

Source Genre: Proceedings

Creator(s):
Verleysen, M, Editor

Affiliations:
-

Publ. Info: Evere, Belgium : d-side

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 19 - 24 Identifier: -