Deep Reinforcement Learning of Marked Temporal Point Processes

Upadhyay, Utkarsh; De, Abir; Gomez Rodriguez, Manuel

Lokale TagsFreigabegeschichteDetailsÜbersicht

Deep Reinforcement Learning of Marked Temporal Point Processes

Upadhyay, U., De, A., & Gomez Rodriguez, M. (2018). Deep Reinforcement Learning of Marked Temporal Point Processes. Retrieved from http://arxiv.org/abs/1805.09360.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-0003-4E2E-4 Versions-Permalink: https://hdl.handle.net/21.11116/0000-0003-4E2F-3

Genre: Forschungspapier

Dateien

einblenden: Dateien

ausblenden: Dateien

:

arXiv:1805.09360.pdf (Preprint), 6MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/21.11116/0000-0003-4E30-0

Name:
arXiv:1805.09360.pdf

Beschreibung:
File downloaded from arXiv at 2019-04-03 13:04

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
http://arxiv.org/licenses/nonexclusive-distrib/1.0/

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Upadhyay, Utkarsh¹, Autor
De, Abir¹, Autor
Gomez Rodriguez, Manuel¹, Autor

Affiliations:
1Group M. Gomez Rodriguez, Max Planck Institute for Software Systems, Max Planck Society, ou_2105290

Inhalt

einblenden:

ausblenden:

Schlagwörter: Computer Science, Learning, cs.LG,cs.SI,Statistics, Machine Learning, stat.ML

Zusammenfassung: In a wide variety of applications, humans interact with a complex environment
by means of asynchronous stochastic discrete events in continuous time. Can we
design online interventions that will help humans achieve certain goals in such
asynchronous setting? In this paper, we address the above problem from the
perspective of deep reinforcement learning of marked temporal point processes,
where both the actions taken by an agent and the feedback it receives from the
environment are asynchronous stochastic discrete events characterized using
marked temporal point processes. In doing so, we define the agent's policy
using the intensity and mark distribution of the corresponding process and then
derive a flexible policy gradient method, which embeds the agent's actions and
the feedback it receives into real-valued vectors using deep recurrent neural
networks. Our method does not make any assumptions on the functional form of
the intensity and mark distribution of the feedback and it allows for
arbitrarily complex reward functions. We apply our methodology to two different
applications in personalized teaching and viral marketing and, using data
gathered from Duolingo and Twitter, we show that it may be able to find
interventions to help learners and marketers achieve their goals more
effectively than alternatives.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erstellt: 2018-05-23Geändert: 2018-11-06Online veröffentlicht: 2018

Publikationsstatus: Online veröffentlicht

Seiten: 20 p.

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: arXiv: 1805.09360
URI: http://arxiv.org/abs/1805.09360
BibTex Citekey: Upadhyay_arXiv1805.09360

Art des Abschluß: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle