English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Generalized Thompson sampling for sequential decision-making and causal inference

MPS-Authors
/persons/resource/persons83827

Braun,  DA
Research Group Sensorimotor Learning and Decision-Making, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Ortega, P., & Braun, D. (2014). Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling, 2(2), 1-23. doi:10.1186/2194-3206-2-2.


Cite as: https://hdl.handle.net/11858/00-001M-0000-0027-804C-8
Abstract
Purpose Sampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling. Methods Although mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal. Results Here we discuss two important features of this approach. First, we show in how far such generalized Thompson sampling can be regarded as an optimal strategy under limited information processing capabilities that constrain the sampling complexity of the decision-making process. Second, we show how such Thompson sampling can be extended to solve causal inference problems when interacting with an environment in a sequential fashion. Conclusion In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.