Item: Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields
Rating: 47.6
Author: GitHub Roast

← Back to the board

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Liya Zhu, Jingzhe Ding, Jian Zhang, Jianbo Xue et al.

47.60/100

🫥 Mediocre

Incremental, thin

Content 47.6 · Citation bonus +0.0 · no citation data

💡 This paper proposes Workflow-GYM, a benchmark for long-horizon GUI tasks in professional domains, finds that state-of-the-art agents only achieve slightly above 30% success rate, revealing defects lik

#GUI智能体基准#专业长工作流评估#落地痛点揭示#GUI Agent Benchmark#Professional Workflow Ev#Deployment Pain Point Re

Roast another paper →

Score breakdown

Novelty6.0 / 10

Rigor5.0 / 10

Significance7.0 / 10

Clarity8.0 / 10

Reproducibility4.0 / 10

🌸 Praise

🌶️ Roast 🌸 Praise

This tone hasn't been generated yet — roast it again to create it.