🏆 Hall of Fame
- 1🥇 80.00Attention Is All You NeedAshish Vaswani, Noam Shazeer, Niki Parmar · #Transformer seminal work
- 2🥇 80.00Deep Residual Learning for Image RecognitionKaiming He, Xiangyu Zhang, Shaoqing Ren · #Residual Skip
- 3📘 78.81MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language ModelsChaoyou Fu, Peixian Chen, Yunhang Shen · #MLLM Exam
- 4📘 75.60Denoising Diffusion Probabilistic ModelsJonathan Ho, Ajay Jain, Pieter Abbeel · #diffusion model origin
- 5📘 73.95Absolute Zero: Reinforced Self-play Reasoning with Zero DataAndrew Zhao, Yiran Wu, Yang Yue · #zero-data RL
- 6📘 71.43Mean Flows for One-step Generative ModelingZhengyang Geng, Mingyang Deng, Xingjian Bai · #one-shot gen
- 7📘 71.20Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScoreJunchao Wu, Runzhe Zhan, Derek F. Wong · #LLM-generated text detec
- 8📘 70.00OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsTianbao Xie, Danyang Zhang, Jixuan Chen · #GUI Agent Benchmark
- 9📘 69.87DetectRL: Benchmarking LLM-Generated Text Detection in Real-World ScenariosJunchao Wu, Runzhe Zhan, Derek F. Wong · #LLM Detection Benchmark
- 10📘 68.83WildClawBench: A Benchmark for Real-World, Long-Horizon Agent EvaluationShuangrui Ding, Xuanlang Dai, Long Xing · #Sandbox Debunker
- 11📘 68.64StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video UnderstandingJunming Lin, Zheng Fang, Chi Chen · #streaming video understa
- 12📘 65.97d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive CachingYuchu Jiang, Yue Cai, Xiangzhong Luo · #Diffusion LLM Inference
- 13📘 65.60Tranception: protein fitness prediction with autoregressive transformers and inference-time retrievalPascal Notin, Mafalda Dias, Jonathan Frazer · #protein fitness predicti
- 14🫥 64.80SkillOpt: Executive Strategy for Self-Evolving Agent SkillsYifan Yang, Ziyang Gong, Weiquan Huang · #agent skill alchemy
- 15🫥 63.60NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding AgentsJingzhe Ding, Shengda Long, Changxin Pu · #coding agent truth serum
- 16🫥 63.20VideoRoPE: What Makes for Good Video Rotary Position Embedding?Xilin Wei, Xiaoran Liu, Yuhang Zang · #video positional encodin
- 17🫥 63.20Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPOZhiyuan Zeng, Jiameng Huang, Zhangyue Yin · #GRPO aggregation mystery
- 18🫥 62.40Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual EditingXiangyu Zhao, Peiyuan Zhang, Kexian Tang · #Visual Editing Benchmark
- 19🫥 62.40MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and ThoroughlyZhaowei Wang, Wenhao Yu, Xiyu Ren · #long-context multimodal
- 20🫥 62.00Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov GamesShengyuan Ding, Xilin Wei, Xinyu Fang · #MLLM evaluation
- 21🫥 62.00Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K ContextZhaowei Wang, Lishu Luo, Haodong Duan · #long-context LVLM
- 22🫥 60.80SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept PredictionZhixiong Zhang, Yizhuo Li, Shuangrui Ding · #LVLM finally knows multi
- 23🫥 60.40GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?Tongxu Luo, Rongsheng Wang, Jiaxi Bi · #game generation benchmar
- 24🫥 59.60DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text DetectionJunchao Wu, Yefeng Liu, Chenyu Zhu · #multilingual text detect
- 25🫥 58.00MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMsJiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara · #MLLM eye-brain separatio
- 26🫥 57.20Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-TrainingWenyu Du, Tongxu Luo, Zihan Qiu · #practical model growth g
- 27🫥 56.40Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among InstancesYi Yu, Botao Ren, Peiyuan Zhang · #point-supervised OOD
- 28🫥 55.60SeHDR: Single-Exposure HDR Novel View Synthesis via 3D Gaussian BracketingYiyu Li, Haoyuan Wang, Ke Xu · #single-exposure HDR
- 29🫥 55.60Agentifying Patient Dynamics within LLMs through Interacting with Clinical World ModelMinghao Wu, Yuting Yan, Zhenyang Cai · #sepsis decision making
- 30🫥 54.40ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual ReasoningShengyuan Ding, Xinyu Fang, Ziyu Liu · #multimodal reward model
- 31🫥 54.40Knowledge Index of Noah's ArkSheng Jin, Minghao Liu, Yunze Xiao · #LLM Knowledge Evaluation
- 32🫥 54.40GenExam: A Multidisciplinary Text-to-Image ExamZhaokai Wang, Penghao Yin, Xiangyu Zhao · #text-to-image evaluation
- 33🫥 53.60OneRec Technical ReportGuorui Zhou, Jiaxin Deng, Jinghao Zhang · #industrial recommender s
- 34🫥 53.20The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought ReasoningQiguang Chen, Yantao Du, Ziniu Li · #Long CoT analysis
- 35🫥 52.40Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix ItYupu Hao, Zhuoran Jin, Huanxuan Liao · #Tool-use RL
- 36🫥 52.40SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image ClassificationJunyan Lin, Feng Gao, Xiaocheng Shi · #Remote Sensing Classific
- 37🫥 52.40Fast Large Language Model Collaborative Decoding via SpeculationJiale Fu, Yuchu Jiang, Junkai Chen · #LLM Acceleration
- 38🫥 52.40Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and GenerationXiangyu Zhao, Peiyuan Zhang, Junming Lin · #Reward Model Dehallucina
- 39🫥 52.40DebCSE: Rethinking Unsupervised Contrastive Sentence Embedding Learning in the Debiasing PerspectivePu Miao, Zeyao Du, Junlin Zhang · #Sentence Embedding Debia
- 40🫥 51.20RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation PatternsXin Chen, Junchao Wu, Shu Yang · #AI-generated text detect
- 41🫥 50.40Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement LearningZhaoyang Wang, Canwen Xu, Boyi Liu · #agent-env-savior
- 42🫥 50.40Learning from Peers in Reasoning ModelsTongxu Luo, Wenyu Du, Jiaxi Bi · #prefix trap observation
- 43🫥 50.40Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best PracticesJunyan Lin, Haoran Chen, Yue Fan · #Multimodal LLM
- 44🫥 50.00Qwen-AgentWorld: Language World Models for General AgentsYuxin Zuo, Zikai Xiao, Li Sheng · #world model training
- 45🫥 50.00Kwai Keye-VL-2.0 Technical ReportKwai Keye Team, Bin Wen, Changyi Liu · #Long-video Understanding
- 46🫥 50.00Generative Modeling via DriftingMingyang Deng, He Li, Tianhong Li · #one-step generation
- 47🫥 49.60DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial PriorsRunqi Wang, Yang Chen, Sijie Xu · #face swapping
- 48🫥 49.60MM-IFEngine: Towards Multimodal Instruction FollowingShengyuan Ding, Shenxi Wu, Xiangyu Zhao · #Multimodal LLM
- 49🫥 49.44MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy OptimizationXiangyu Zhao, Junming Lin, Tianhao Liang · #Multimodal LLMs
- 50🫥 48.00Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety AlignmentZhixue Song, Boyan Han, Yiwei Wang · #multimodal safety