OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li et al.
70.00/100
📘 Readable
Decent, has merit
Content 70.0 · Citation bonus +0.0 · no citation data
💡 OSWorld is the first cross-real-OS (Ubuntu/Windows/macOS) open-ended benchmark for multimodal agents, featuring 369 real-world tasks with automated execution-based evaluation, revealing the best SOTA
#GUI智能体基准#真实OS测试#多模态agent评估#执行式评测#人机交互研究#GUI Agent Benchmark#Real-OS Evaluation#Multimodal Agent Testbed#Execution-based Assessme#Human-AI Interaction Res
Score breakdown
Novelty8.0 / 10
Rigor9.0 / 10
Significance9.0 / 10
Clarity9.0 / 10
Reproducibility9.0 / 10
This tone hasn't been generated yet — roast it again to create it.