MARL Evaluation 2020


Date and Location


Time Talk Speaker
14:00 - 15:00 Sharing on Multi-player Game AI Evaluation (Slides) Yushan Zhou
15:00 - 16:00 An Overview of Evaluation Methods in Games (Slides) Yaodong Yang
16:00 - 17:00 A Critical Review of Multi-agent Evaluation (Slides) Xue Yan

Invited Talks

Talk 1: Sharing on Multi-player Game AI Evaluation


yushan zhou

Yushan Zhou
Yushan Zhou is currently studying in the third year of master's degree in AI Lab, School of Electronics Engineering and Computer Science, Peking University. She focuses on Botzone (, a universal online multi-agent game AI platform, designed to evaluate different implementations of game AI by applying them to agents and compete with each other, featuring an ELO ranking system and a contest system for users to evaluate their AI programs. She has held seven multi-agent game AI competitions on Botzone. This year, as a student chair, she is currently organizing IJCAI 2020 Mahjong AI Competition, held as part of IJCAI-PRICAI 2020.


Intelligence exists when we measure it! A game-based AI competition explicitly depicts our imagination of intelligence, therefore recently, holding this kind of competition is quite popular in AI conferences such as AAAI and IJCAI. With its straightforward and accurate definition of problems, unified platform environment, fair performance assessment mechanism, open data set, and benchmark, game-based AI competition has attracted many researchers, thus accelerating the development of AI technology. There is a new trend of game-based competitions to be hosted on an online platform, and this will encourage researchers and fans of AI to continuously work on a task and share information at any time. In this trend, we are facing the problem of evaluating an enormous amount of bots quickly and fairly. Having held more than 10 multi-agent game AI competitions on Botzone platform, we have experience in choosing an appropriate contest system in games like FightTheLandlord and Chinese Standard Mahjong. Specifically, we are holding IJCAI Mahjong AI Competition this year, and there are two stages, a practice round and a competitive round. In both rounds, we adopt Swiss-round with Duplicate Format tournament.

Talk 2: An Overview of Evaluation Methods in Games


teacher 杨耀东

Yaodong Yang
I am a machine learning researcher focusing on reinforcement learning, multi-agent learning, and Bayesian inference. Currently, I am the techlead of multi-agent learning at Huawei Noah's Ark (AI Lab), responsible for delivering research work on multi-agent system and its application for autonomous driving. Before joining Huawei, I was a senior manager at the Science department of American International Group, where I led a machine learning research team to develop AI-powered methodology innovations for insurance problems. In 2018, I was awarded UK Exceptional Talent in Machine Learning/AI by the Home Office. Click to Enter Personal Home Page.


Evaluating policies in the GAME AI design is non-trivial due to the fact that a game can contain both transitive components and non-transitive component. In this talk, I will briefly reviewed recent metric including ELo, Nash equilibrium, Replicator dynamics, and alpha-rank, with their pros and cons on GAME AI evaluation

Talk 3: A Critical Review of Multi-agent Evaluation


Xue Yan

Xue Yan
Xue Yan is a senior student in Shandong University, majoring in computer science and technology. She will go to the Institute of Automation, Chinese Academy of Sciences in 2021 to pursue her doctoral degree. Her research interest is multi-agent reinforcement learning.


We are investigating some classic Evaluation methods in the field of reinforcement learning and tried to analyze the applicable scenarios, advantages and disadvantages of these methods. Evaluations based on Elo or α-Rank typically assume complete game information, despite the data often being collected from noisy simulations, making this assumption unrealistic in practice. We review multiagent evaluation in the incomplete information regime . Our survey focuses on precise sample bounds which describe how many data samples are required to achieve a good approximation . At the same time, we are interested in adaptive sample algorithm for selecting agent interactions in an efficient manner for accurate evaluation.