Bootstrapping Apprenticeship Learning

Boularias, A; Chaib-Draa, B

Lokale TagsFreigabegeschichteDetailsÜbersicht

Bootstrapping Apprenticeship Learning

Boularias, A., & Chaib-Draa, B. (2011). Bootstrapping Apprenticeship Learning. Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, 289-297.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-BB72-C Versions-Permalink: https://hdl.handle.net/21.11116/0000-0002-0AAA-4

Genre: Konferenzbeitrag

Dateien

einblenden: Dateien

Externe Referenzen

einblenden:

ausblenden:

externe Referenz:
https://papers.nips.cc/paper/4160-bootstrapping-apprenticeship-learning.pdf (Verlagsversion) Open Access Status unbekannt

Beschreibung:
-

OA-Status:

Urheber

einblenden:

ausblenden:

Urheber:
Boularias, A^{1, 2}, Autor
Chaib-Draa, B, Autor

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of state-action features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert's policy. In this paper, we show that the quality of the learned policies is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming that: (i), the expert is (near-)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our approach is able to learn good policies from a small number of demonstrations.

Details

einblenden:

ausblenden:

Sprache(n):

Datum: Erschienen: 2011-06

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: BibTex Citekey: 6826

Art des Abschluß: -

Veranstaltung

einblenden:

ausblenden:

Titel: Twenty-Fourth Annual Conference on Neural Information Processing Systems (NIPS 2010)

Veranstaltungsort: Vancouver, BC, Canada

Start-/Enddatum: 2010-12-06 - 2010-12-11

ausblenden:

Titel: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010

Genre der Quelle: Zeitschrift

Urheber:
Lafferty, J, Herausgeber
Williams, CKI, Herausgeber
Shawe-Taylor, J, Herausgeber
Zemel, RS, Herausgeber
Culotta, A, Herausgeber

Affiliations:
-

Ort, Verlag, Ausgabe: Red Hook, NY, USA : Curran

Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 289 - 297 Identifikator: ISBN: 978-1-617-82380-0

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1