English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning

MPS-Authors
/persons/resource/persons144534

Majumdar,  Rupak
Group R. Majumdar, Max Planck Institute for Software Systems, Max Planck Society;

/persons/resource/persons287474

Mandal,  Debmalya
Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society;

/persons/resource/persons246492

Radanovic,  Goran
Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2306.03832.pdf
(Preprint), 698KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Gan, J., Majumdar, R., Mandal, D., & Radanovic, G. (2023). Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning. Retrieved from https://arxiv.org/abs/2306.03832.


Cite as: https://hdl.handle.net/21.11116/0000-000D-F9A7-E
Abstract
We study a sequential decision making problem between a principal and an
agent with incomplete information on both sides. In this model, the principal
and the agent interact in a stochastic environment, and each is privy to
observations about the state not available to the other. The principal has the
power of commitment, both to elicit information from the agent and to provide
signals about her own information. The principal and the agent communicate
their signals to each other, and select their actions independently based on
this communication. Each player receives a payoff based on the state and their
joint actions, and the environment moves to a new state. The interaction
continues over a finite time horizon, and both players act to optimize their
own total payoffs over the horizon. Our model encompasses as special cases
stochastic games of incomplete information and POMDPs, as well as sequential
Bayesian persuasion and mechanism design problems. We study both computation of
optimal policies and learning in our setting. While the general problems are
computationally intractable, we study algorithmic solutions under a conditional
independence assumption on the underlying state-observation distributions. We
present an polynomial-time algorithm to compute the principal's optimal policy
up to an additive approximation. Additionally, we show an efficient learning
algorithm in the case where the transition probabilities are not known
beforehand. The algorithm guarantees sublinear regret for both players.