MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski
58.00/100
🫥 Mediocre
Incremental, thin
Content 58.0 · Citation bonus +0.0 · no citation data
💡 This paper identifies a causal "eye-brain separation" flaw in MLLMs where they can locate small visual details but give wrong answers, and proposes a training-free intervention using internal attentio
#MLLM眼脑分离#零训练干预#注意力薅羊毛#小细节识别痛点#视觉问答优化#MLLM eye-brain separatio#training-free interventi#attention reuse#fine-grained perception #VQA improvement
Score breakdown
Novelty6.0 / 10
Rigor7.0 / 10
Significance8.0 / 10
Clarity8.0 / 10
Reproducibility8.0 / 10
This tone hasn't been generated yet — roast it again to create it.