Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning

Ahilan, S; Dayan, P

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_3152813_3

DetailsSummary

Released

Poster

Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning

MPS-Authors

/persons/resource/persons217460

Dayan, P
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

http://rldm.org/papers/abstracts.pdf
(Abstract)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Ahilan, S., & Dayan, P. (2019). Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning. Poster presented at 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019), Montreal, Canada.

Cite as: https://hdl.handle.net/21.11116/0000-0004-7955-5

Abstract

We investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from human societies, in which successful coordination of many individuals is often facilitated by hierarchical organisation, we introduce Feudal Multi-agent Hierarchies (FMH). In this framework, a ‘manager’ agent, which is tasked with maximising the environmentally-determined reward function, learns to communicate
subgoals to multiple, simultaneously-operating, ‘worker’ agents. Workers, which are rewarded for achieving managerial subgoals, take concurrent actions in the world. We outline the structure of FMH and
demonstrate its potential for decentralised learning and control. We find that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use a shared reward function.