A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari 
Games

Leibfried, F; Kushman, N; Hofmann, K

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

MPS-Authors

/persons/resource/persons192683

Leibfried, F
Max Planck Institute for Biological Cybernetics, Max Planck Society;
Research Group Sensorimotor Learning and Decision-Making, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://openreview.net/pdf?id=BJxhLAuxg
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Leibfried, F., Kushman, N., & Hofmann, K. (2017). A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games. In 5th International Conference on Learning Representations (ICLR 2017) (pp. 1-17).

Cite as: https://hdl.handle.net/21.11116/0000-0000-C3B7-5

Abstract

Reinforcement learning is concerned with learning to interact with environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as DQN, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient, but need to acquire explicit knowledge about the environment dynamics or the reward structure.
In this paper we take a step towards using model-based techniques in environments with high-dimensional visual state space when system dynamics and the reward structure are both unknown and need to be learned, by demonstrating that it is possible to learn both jointly. Empirical evaluation on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these positive results as opening up important directions for model-based RL in complex, initially unknown environments.