Item: Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
Rating: 62
Author: GitHub Roast

← Back to the board

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

Shengyuan Ding, Xilin Wei, Xinyu Fang, Haodong Duan et al.

62.00/100

🫥 Mediocre

Incremental, thin

Content 62.0 · Citation bonus +0.0 · no citation data

💡 This paper proposes RNG-Bench, a benchmark to evaluate MLLMs' ability to reconstruct past observations and act accordingly in non-Markov games, finding that the main bottleneck of frontier models is f

#多模态大模型评估#非马尔可夫决策#记忆能力基准#遗忘机制分析#游戏AI测试#MLLM evaluation#non-Markov decision#memory benchmark#forgetting analysis#game AI testing

Roast another paper →

Score breakdown

Novelty7.0 / 10

Rigor8.0 / 10

Significance8.0 / 10

Clarity9.0 / 10

Reproducibility7.0 / 10

🌸 Praise

🌶️ Roast 🌸 Praise

This tone hasn't been generated yet — roast it again to create it.