TD(λ) converges with probability 1

Dayan, P; Sejnowski, TJ

doi:10.1007/BF00993978

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

TD(λ) converges with probability 1

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

https://link.springer.com/content/pdf/10.1007%2FBF00993978.pdf
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Dayan, P., & Sejnowski, T. (1994). TD(λ) converges with probability 1. Machine Learning, 14(3), 295-301. doi:10.1007/BF00993978.

Cite as: https://hdl.handle.net/21.11116/0000-0002-D6E2-D

Abstract

The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future.

Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as large samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result that the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.