Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 34253-34280 |
Seitenumfang | 28 |
Fachzeitschrift | Proceedings of Machine Learning Research |
Jahrgang | 235 |
Publikationsstatus | Veröffentlicht - 2024 |
Veranstaltung | 41st International Conference on Machine Learning, ICML 2024 - Vienna, Österreich Dauer: 21 Juli 2024 → 27 Juli 2024 |
Abstract
The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Artificial intelligence
- Informatik (insg.)
- Software
- Ingenieurwesen (insg.)
- Steuerungs- und Systemtechnik
- Mathematik (insg.)
- Statistik und Wahrscheinlichkeit
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Proceedings of Machine Learning Research, Jahrgang 235, 2024, S. 34253-34280.
Publikation: Beitrag in Fachzeitschrift › Konferenzaufsatz in Fachzeitschrift › Forschung › Peer-Review
}
TY - JOUR
T1 - Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games
AU - Mahlau, Yannik
AU - Schubert, Frederik
AU - Rosenhahn, Bodo
PY - 2024
Y1 - 2024
N2 - The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.
AB - The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.
UR - http://www.scopus.com/inward/record.url?scp=85203797356&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85203797356
VL - 235
SP - 34253
EP - 34280
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 41st International Conference on Machine Learning, ICML 2024
Y2 - 21 July 2024 through 27 July 2024
ER -