🔥 GitHub Roast
← Back to the board
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han et al.
52.80/100
🫥 Mediocre
Incremental, thin
Content 52.8 · Citation bonus +0.0 · no citation data

💡 This paper identifies that low-probability tokens dominate LLM RL training updates due to large gradient magnitudes, proposes Advantage Reweighting and Lopti to suppress low-probability token gradient

#RL训练LLM#梯度不平衡#GRPO优化#低概率token#逻辑推理#RL for LLM#gradient imbalance#GRPO optimization#low-probability token#logic reasoning

Score breakdown

Novelty6.0 / 10
Rigor5.0 / 10
Significance8.0 / 10
Clarity8.0 / 10
Reproducibility7.0 / 10

This tone hasn't been generated yet — roast it again to create it.