🔥 GitHub Roast
← Back to the board
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields
Liya Zhu, Jingzhe Ding, Jian Zhang, Jianbo Xue et al.
47.60/100
🫥 Mediocre
Incremental, thin
Content 47.6 · Citation bonus +0.0 · no citation data

💡 This paper proposes Workflow-GYM, a benchmark for long-horizon GUI tasks in professional domains, finds that state-of-the-art agents only achieve slightly above 30% success rate, revealing defects lik

#GUI智能体基准#专业长工作流评估#落地痛点揭示#GUI Agent Benchmark#Professional Workflow Ev#Deployment Pain Point Re

Score breakdown

Novelty6.0 / 10
Rigor5.0 / 10
Significance7.0 / 10
Clarity8.0 / 10
Reproducibility4.0 / 10

This tone hasn't been generated yet — roast it again to create it.