Help Privacy Policy Disclaimer
  Advanced SearchBrowse





Reinforcing connectionism: learning the statistical way

There are no MPG-Authors in the publication available
External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

(Publisher version), 9MB

Supplementary Material (public)
There is no public supplementary material available

Dayan, P. (1991). Reinforcing connectionism: learning the statistical way. PhD Thesis, University of Edinburgh, Edinburgh, UK.

Cite as: https://hdl.handle.net/21.11116/0000-0002-D8A5-0
Connectionism's main contribution to cognitive science will prove to be the renewed impetus it has imparted to learning. Learning can be integrated into the existing theoretical foundations of the subject, and the combination, statistical computational theories, provide a framework within which many connectionist mathematical mechanisms naturally fit. Examples from supervised and reinforcement learning demonstrate this. Statistical computational theories already exist for certainn associative matrix memories. This work is extended, allowing real valued synapses and arbitrarily biased inputs. It shows that a covariance learning rule optimises the signal/noise ratio, a measure of the potential quality of the memory, and quantifies the performance penalty incurred by other rules. In particular two that have been suggested as occuring naturally are shown to be asymptotically optimal in the limit of sparse coding. The mathematical model is justified in comparison with other treatments whose results differ. Reinforcement comparison is a way of hastening the learning of reinforcement learning systems in statistical environments. Previous theoretical analysis has not distinguished between different comparison terms, even though empirically, a covariance rule has been shown to be better than just a constant one. The workings of reinforcement comparison are investigated by a second order analysis of the expected statistical performance of learning, and an alternative rule is proposed and empirically justified. The existing proof that temporal difference prediction learning converges in the mean is extended from a special case involving adjacent time steps to the general case involving arbitary ones. The interaction between the statistical mechanism of temporal difference and the linear representation is particularly stark. The performance of the method given a linearly dependent representation is also analysed.