English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Q-learning

Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4), 279-292. doi:10.1007/BF00992698.

Item is

Basic

show hide
Item Permalink: http://hdl.handle.net/21.11116/0000-0002-D738-D Version Permalink: http://hdl.handle.net/21.11116/0000-0002-D739-C
Genre: Journal Article

Files

show Files

Locators

show
hide
Description:
-

Creators

show
hide
 Creators:
Watkins, CJCH, Author
Dayan, P1, Author              
Affiliations:
1External Organizations, ou_persistent22              

Content

show
hide
Free keywords: -
 Abstract: Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where manyQ values can be changed each iteration, rather than just one.

Details

show
hide
Language(s):
 Dates: 1992-05
 Publication Status: Published in print
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.1007/BF00992698
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Machine Learning
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Dordrecht : Springer
Pages: - Volume / Issue: 8 (3-4) Sequence Number: - Start / End Page: 279 - 292 Identifier: ISSN: 0885-6125
CoNE: https://pure.mpg.de/cone/journals/resource/08856125