Item: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Rating: 70
Author: GitHub Roast

← Back to the board

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li et al.

70.00/100

📘 Readable

Decent, has merit

Content 70.0 · Citation bonus +0.0 · no citation data

💡 OSWorld is the first cross-real-OS (Ubuntu/Windows/macOS) open-ended benchmark for multimodal agents, featuring 369 real-world tasks with automated execution-based evaluation, revealing the best SOTA

#GUI智能体基准#真实OS测试#多模态agent评估#执行式评测#人机交互研究#GUI Agent Benchmark#Real-OS Evaluation#Multimodal Agent Testbed#Execution-based Assessme#Human-AI Interaction Res

Roast another paper →

Score breakdown

Novelty8.0 / 10

Rigor9.0 / 10

Significance9.0 / 10

Clarity9.0 / 10

Reproducibility9.0 / 10

🌶️ Roast

🌶️ Roast 🌸 Praise

This tone hasn't been generated yet — roast it again to create it.