🔥 GitHub Roast
← Back to the board
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
Yupu Hao, Zhuoran Jin, Huanxuan Liao, Kang Liu et al.
52.40/100
🫥 Mediocre
Incremental, thin
Content 52.4 · Citation bonus +0.0 · no citation data

💡 This paper reveals that multi-step tool-use RL collapse stems from control token probability spikes disrupting structured execution, and through systematic comparison of supervisory signals and traini

#工具调用RL#模式崩塌归因#监督信号调优#LLM Agent训练#泛化稳定性#Tool-use RL#Mode Collapse Attributio#Supervisory Signal Tunin#LLM Agent Training#Generalization Stability

Score breakdown

Novelty5.0 / 10
Rigor6.0 / 10
Significance7.0 / 10
Clarity8.0 / 10
Reproducibility8.0 / 10

This tone hasn't been generated yet — roast it again to create it.