Item: Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
Rating: 52.4
Author: GitHub Roast

← Back to the board

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

Yupu Hao, Zhuoran Jin, Huanxuan Liao, Kang Liu et al.

52.40/100

🫥 Mediocre

Incremental, thin

Content 52.4 · Citation bonus +0.0 · no citation data

💡 This paper reveals that multi-step tool-use RL collapse stems from control token probability spikes disrupting structured execution, and through systematic comparison of supervisory signals and traini

#工具调用RL#模式崩塌归因#监督信号调优#LLM Agent训练#泛化稳定性#Tool-use RL#Mode Collapse Attributio#Supervisory Signal Tunin#LLM Agent Training#Generalization Stability

Roast another paper →

Score breakdown

Novelty5.0 / 10

Rigor6.0 / 10

Significance7.0 / 10

Clarity8.0 / 10

Reproducibility8.0 / 10

🌶️ Roast

🌶️ Roast 🌸 Praise

This tone hasn't been generated yet — roast it again to create it.