The Convergence of TD(λ) for General λ

Dayan, P

doi:10.1023/A:1022632907294

Local TagsRelease HistoryDetailsSummary

The Convergence of TD(λ) for General λ

Dayan, P. (1992). The Convergence of TD(λ) for General λ. Machine Learning, 8(3-4), 341-362. doi:10.1023/A:1022632907294.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0002-D743-0 Version Permalink: https://hdl.handle.net/21.11116/0000-0002-D744-F

Genre: Journal Article

Files

show Files

Locators

show

hide

Locator:
https://link.springer.com/content/pdf/10.1023%2FA%3A1022632907294.pdf (Publisher version) Open Access status unknown

Description:
-

OA-Status:

Creators

show

hide

Creators:
Dayan, P¹, Author

Affiliations:
1External Organizations, ou_persistent22

Content

show

hide

Free keywords: -

Abstract: The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones.

It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that Q
-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.

Details

show

hide

Language(s):

Dates: Date issued: 1992-05

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1023/A:1022632907294

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Machine Learning

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Dordrecht : Springer

Pages: - Volume / Issue: 8 (3-4) Sequence Number: - Start / End Page: 341 - 362 Identifier: ISSN: 0885-6125
CoNE: https://pure.mpg.de/cone/journals/resource/08856125