🔥 GitHub Roast
← Back to the board
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
Shengyuan Ding, Xilin Wei, Xinyu Fang, Haodong Duan et al.
62.00/100
🫥 Mediocre
Incremental, thin
Content 62.0 · Citation bonus +0.0 · no citation data

💡 This paper proposes RNG-Bench, a benchmark to evaluate MLLMs' ability to reconstruct past observations and act accordingly in non-Markov games, finding that the main bottleneck of frontier models is f

#多模态大模型评估#非马尔可夫决策#记忆能力基准#遗忘机制分析#游戏AI测试#MLLM evaluation#non-Markov decision#memory benchmark#forgetting analysis#game AI testing

Score breakdown

Novelty7.0 / 10
Rigor8.0 / 10
Significance8.0 / 10
Clarity9.0 / 10
Reproducibility7.0 / 10

This tone hasn't been generated yet — roast it again to create it.