Balancing Safety and Exploitability in Opponent Modeling

Wang, Z.; Boularias, A.; Mülling, K.; Peters, J.

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

会議論文

Balancing Safety and Exploitability in Opponent Modeling

MPS-Authors

/persons/resource/persons76262

Wang, Z.
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

/persons/resource/persons83823

Boularias, A.
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

/persons/resource/persons84097

Mülling, K.
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

/persons/resource/persons84135

Peters, J.
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Wang, Z., Boularias, A., Mülling, K., & Peters, J. (2011). Balancing Safety and Exploitability in Opponent Modeling. In Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011) (pp. 1515-1520).

引用: https://hdl.handle.net/11858/00-001M-0000-0010-7611-7

要旨

Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of his opponents. We introduce a new modeling technique that adaptively balances exploitability and risk reduction. An opponent’s strategy is modeled with a set of possible strategies that contain the actual strategy with a high probability. The algorithm is safe as the expected payoff is above the minimax payoff with a high probability, and can exploit the opponents’ preferences when sufficient observations have been obtained. We apply them to normal-form games and stochastic games with a finite number of stages. The performance of the proposed approach is first demonstrated on repeated rock-paper-scissors games. Subsequently, the approach is evaluated in a human-robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent’s preferences, leading to a higher rate of successful returns.