Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  Teaching Inverse Reinforcement Learners via Features and Demonstrations

Haug, L., Tschiatschek, S., & Singla, A. (2018). Teaching Inverse Reinforcement Learners via Features and Demonstrations. Retrieved from http://arxiv.org/abs/1810.08926.

Item is

Basisdaten

einblenden: ausblenden:
Genre: Forschungspapier

Dateien

einblenden: Dateien
ausblenden: Dateien
:
arXiv:1810.08926.pdf (Preprint), 613KB
Name:
arXiv:1810.08926.pdf
Beschreibung:
File downloaded from arXiv at 2019-04-03 13:13
OA-Status:
Sichtbarkeit:
Öffentlich
MIME-Typ / Prüfsumme:
application/pdf / [MD5]
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Haug, Luis1, Autor
Tschiatschek, Sebastian1, Autor
Singla, Adish2, Autor                 
Affiliations:
1External Organizations, ou_persistent22              
2Group A. Singla, Max Planck Institute for Software Systems, Max Planck Society, ou_2541698              

Inhalt

einblenden:
ausblenden:
Schlagwörter: Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML
 Zusammenfassung: Learning near-optimal behaviour from an expert's demonstrations typically
relies on the assumption that the learner knows the features that the true
reward function depends on. In this paper, we study the problem of learning
from demonstrations in the setting where this is not the case, i.e., where
there is a mismatch between the worldviews of the learner and the expert. We
introduce a natural quantity, the teaching risk, which measures the potential
suboptimality of policies that look optimal to the learner in this setting. We
show that bounds on the teaching risk guarantee that the learner is able to
find a near-optimal policy using standard algorithms based on inverse
reinforcement learning. Based on these findings, we suggest a teaching scheme
in which the expert can decrease the teaching risk by updating the learner's
worldview, and thus ultimately enable her to find a near-optimal policy.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2018-10-212019-03-272018
 Publikationsstatus: Online veröffentlicht
 Seiten: 13 p.
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: arXiv: 1810.08926
URI: http://arxiv.org/abs/1810.08926
BibTex Citekey: Haug_arXiv1810.08926
 Art des Abschluß: -

Veranstaltung

einblenden:

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle

einblenden: