Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games

Yannik Mahlau; Frederik Schubert; Bodo Rosenhahn

Details

Original language	English
Pages (from-to)	34253-34280
Number of pages	28
Journal	Proceedings of Machine Learning Research
Volume	235
Publication status	Published - 2024
Event	41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria Duration: 21 Jul 2024 → 27 Jul 2024

Abstract

The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.

ASJC Scopus subject areas

Computer Science(all)
Artificial Intelligence
Computer Science(all)
Software
Engineering(all)
Control and Systems Engineering
Mathematics(all)
Statistics and Probability

Cite this

Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games. / Mahlau, Yannik; Schubert, Frederik; Rosenhahn, Bodo.
In: Proceedings of Machine Learning Research, Vol. 235, 2024, p. 34253-34280.

Research output: Contribution to journal › Conference article › Research › peer review

Mahlau, Y, Schubert, F & Rosenhahn, B 2024, 'Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games', Proceedings of Machine Learning Research, vol. 235, pp. 34253-34280.

Mahlau, Y., Schubert, F., & Rosenhahn, B. (2024). Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games. Proceedings of Machine Learning Research, 235, 34253-34280.

Mahlau Y, Schubert F, Rosenhahn B. Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games. Proceedings of Machine Learning Research. 2024;235:34253-34280.

Mahlau, Yannik ; Schubert, Frederik ; Rosenhahn, Bodo. / Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games. In: Proceedings of Machine Learning Research. 2024 ; Vol. 235. pp. 34253-34280.

Download

@article{889a3cf7bd55421190ce5399688fd133,

title = "Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games",

abstract = "The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.",

author = "Yannik Mahlau and Frederik Schubert and Bodo Rosenhahn",

year = "2024",

language = "English",

volume = "235",

pages = "34253--34280",

note = "41st International Conference on Machine Learning, ICML 2024 ; Conference date: 21-07-2024 Through 27-07-2024",

}

Download

TY - JOUR

T1 - Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games

AU - Mahlau, Yannik

AU - Schubert, Frederik

AU - Rosenhahn, Bodo

PY - 2024

Y1 - 2024

N2 - The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.

AB - The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.

UR - http://www.scopus.com/inward/record.url?scp=85203797356&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85203797356

VL - 235

SP - 34253

EP - 34280

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 41st International Conference on Machine Learning, ICML 2024

Y2 - 21 July 2024 through 27 July 2024

ER -

Research@Leibniz University

Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games

Authors

Research Organisations

Details

Abstract

ASJC Scopus subject areas

Cite this

By the same author(s)

CHOTA: A Higher Order Accuracy Metric for Cell Tracking

Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents

Guest Editorial: Special Issue on Multimodal Learning

Attribute-Centric Compositional Text-to-Image Generation

Automl for Multi-Class Anomaly Compensation of Sensor Drift

CHOTA: A Higher Order Accuracy Metric for Cell Tracking

Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents

Guest Editorial: Special Issue on Multimodal Learning

Attribute-Centric Compositional Text-to-Image Generation

Automl for Multi-Class Anomaly Compensation of Sensor Drift

CHOTA: A Higher Order Accuracy Metric for Cell Tracking