Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage

Junhao Hu; Fangze Li; Mingtao Xu; Feifan Meng; Shiju Zhao; Tiancheng Hu; Ting Peng; Anmin Liu; Wenrui Huang; Chenxu Liu; Ziyue Hua; Tao Xie

Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage

Junhao Hu, Fangze Li, Mingtao Xu, Feifan Meng, Shiju Zhao, Tiancheng Hu, Ting Peng, Anmin Liu, Wenrui Huang, Chenxu Liu, Ziyue Hua, Tao Xie

Abstract

Large language models (LLMs) demonstrate strong capabilities across a wide range of complex tasks and are increasingly deployed at scale, placing significant demands on inference efficiency. Prior work typically decomposes inference into prefill and decode stages, with the decode stage dominating total latency. To reduce time and memory complexity in the decode stage, a line of work introduces sparse-attention algorithms. In this paper, we show, both empirically and theoretically, that sparse attention can paradoxically increase end-to-end complexity: information loss often induces significantly longer sequences, a phenomenon we term “Less is Less” (Lil). To mitigate the Lil problem, we propose an early-stopping algorithm that detects the threshold where information loss exceeds information gain during sparse decoding. Our early-stopping algorithm reduces token consumption by up to 90% with a marginal accuracy degradation of less than 2% across reasoning-intensive benchmarks.

Anthology ID:: 2026.findings-acl.91
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1898–1912
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.91/
DOI:
Bibkey:
Cite (ACL):: Junhao Hu, Fangze Li, Mingtao Xu, Feifan Meng, Shiju Zhao, Tiancheng Hu, Ting Peng, Anmin Liu, Wenrui Huang, Chenxu Liu, Ziyue Hua, and Tao Xie. 2026. Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage. In Findings of the Association for Computational Linguistics: ACL 2026, pages 1898–1912, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage (Hu et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.91.pdf
Checklist:: 2026.findings-acl.91.checklist.pdf

PDF Cite Search Checklist Fix data