Tailai Chen
2026
Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling
Yujie Chen | Tailai Chen | Yifeng Gao | Zoe Wanying He | Yijue Xu | Shaobo Wang | Linfeng Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yujie Chen | Tailai Chen | Yifeng Gao | Zoe Wanying He | Yijue Xu | Shaobo Wang | Linfeng Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Prefilling computational costs pose a significant bottleneck for Large Language Models (LLMs) and Large Multimodal Models (LMMs) in long-context settings. While token pruning reduces sequence length, prior methods rely on heuristics that break compatibility with hardware-efficient kernels like FlashAttention. In this work, we observe that tokens evolve toward semantic fixing points, making further processing redundant. To this end, we introduce Delta Attention Selective Halting (DASH), a training-free policy that monitors the layer-wise update dynamics of the self-attention mechanism to selectively halt stabilized tokens. Extensive evaluation confirms that DASH generalizes across language and vision benchmarks, delivering significant prefill speedups while preserving model accuracy and hardware efficiency. Code will be released at https://github.com/verach3n/DASH.git .
Can MLLMs Reason Beyond Language? VisReason: A Comprehensive Benchmark for Vision-Centric Reasoning
Longteng Guo | Yifan Wang | Pengkang Huo | Tailai Chen | Yuze Wu | Jing Liu | Xinxin Zhu
Findings of the Association for Computational Linguistics: ACL 2026
Longteng Guo | Yifan Wang | Pengkang Huo | Tailai Chen | Yuze Wu | Jing Liu | Xinxin Zhu
Findings of the Association for Computational Linguistics: ACL 2026
Recent multimodal large language models (MLLMs) achieve strong performance on visual reasoning benchmarks, yet it remains unclear to what extent such performance reflects reasoning directly grounded in visual evidence. We introduce VisReason, a benchmark for vision-centric reasoning in everyday scenarios where perception and inference are tightly coupled. VisReason contains 1,505 questions across 10 categories spanning perceptual, structural, and conceptual reasoning. Our evaluation shows that VisReason poses a qualitatively different challenge from existing benchmarks, exposing substantial gaps between humans and current MLLMs and revealing limited benefits from test-time reasoning strategies. VisReason offers a focused diagnostic for evaluating vision-centric reasoning beyond language.