Reinforcement learning by reward-weighted regression for operational space 
control

Peters, J; Schaal, S

doi:10.1145/1273496.1273590

Lokale TagsFreigabegeschichteDetailsÜbersicht

Reinforcement learning by reward-weighted regression for operational space control

Peters, J., & Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control. In Z. Ghahramani (Ed.), ICML '07: 24th International Conference on Machine Learning (pp. 745-750). New York, NY, USA: ACM Press.

Item is Freigegeben

einblenden: alle

Basisdaten

ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-CD69-F Versions-Permalink: https://hdl.handle.net/21.11116/0000-0003-E2A7-1

Genre: Konferenzbeitrag

Dateien

ausblenden: Dateien

:

ICML-2007-Peters.pdf (beliebiger Volltext), 399KB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/21.11116/0000-0003-E2A8-0

Name:
ICML-2007-Peters.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

ausblenden:

externe Referenz:
https://dl.acm.org/citation.cfm?doid=1273496.1273590 (Verlagsversion) Open Access Status unbekannt

Beschreibung:
-

OA-Status:

Urheber

ausblenden:

Urheber:
Peters, J^{1, 2}, Autor
Schaal, S, Autor

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497794

Inhalt

ausblenden:

Schlagwörter: -

Zusammenfassung: Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan amp;amp;amp;amp; Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degreeof-freedom robots.

Details

ausblenden:

Sprache(n):

Datum: Erschienen: 2007-06

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: DOI: 10.1145/1273496.1273590
BibTex Citekey: 4493

Art des Abschluß: -

Veranstaltung

ausblenden:

Titel: 24th Annual International Conference on Machine Learning (ICML 2007)

Veranstaltungsort: Corvallis, OR, USA

Start-/Enddatum: 2007-06-20 - 2007-06-24

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

ausblenden:

Titel: ICML '07: 24th International Conference on Machine Learning

Genre der Quelle: Konferenzband

Urheber:
Ghahramani, Z, Herausgeber

Affiliations:
-

Ort, Verlag, Ausgabe: New York, NY, USA : ACM Press

Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 745 - 750 Identifikator: ISBN: 978-1-59593-793-3