Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Conference Paper

Reinforcement comparison

There are no MPG-Authors in the publication available
External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Dayan, P. (1991). Reinforcement comparison. In D. Touretzky, J. Elman, T. Sejnowski, & G. Hinton (Eds.), Connectionist Models: Proceedings of the 1990 Summer School (pp. 45-51). San Mateo, CA, USA: Morgan Kaufmann.

Cite as: https://hdl.handle.net/21.11116/0000-0007-5540-1
Sutton [in his PhD thesis] introduced a reinforcement comparison term into the equations governing certain stochastic learning automata, arguing that it should speed up learning, particularly for unbalanced reinforcement tasks. Williams's subsequent extensions [REINFORCE] to the class of algorithms demonstrated that they were all performing approximate stochastic gradient ascent, but that, in terms of expectations, the comparison term has no first order effect. This paper analyses the second order contribution, and uses the criterion that its modulus should be minimised to determine an optimal value for the comparison term. This value turns out to be different from the one Sutton used, and simulations suggest at its efficacy.