Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu et al.
73.95/100
📘 Readable
Decent, has merit
Content 66.4 · Citation bonus +7.5 · 258 citations
💡 We propose Absolute Zero, a paradigm where LLMs self-evolve reasoning by self-generating verifiable code tasks and self-playing RL without any external data, achieving SOTA on coding and math reasonin
#零数据RL#自对弈推理#大模型自演化#代码验证器#RLVR范式突破#zero-data RL#self-play reasoning#LLM self-evolution#code verifier#RLVR paradigm shift
Score breakdown
Novelty9.0 / 10
Rigor8.0 / 10
Significance9.0 / 10
Clarity8.0 / 10
Reproducibility7.0 / 10
This tone hasn't been generated yet — roast it again to create it.