From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion

Yanli Wang, Yanlin Wang, Bowen Zhang, Yiwei Zhang, Daya Guo, Jiachi Chen, Hongyu Zhang, Zibin Zheng


Abstract
Code Large Language Models face critical Time-To-First-Token (TTFT) latency challenges when handling long code completion due to the quadratic complexity (O(n2)) of attention mechanisms. While existing sparse attention methods attempt to address this issue, they suffer from three key limitations: (1) general sparse patterns cause excessive accuracy degradation without considering code structure, (2) code-specific methods achieve only logical sparsity without actual computational speedup, and (3) limited adaptation to complex scenarios such as repository-level completion. We propose **SabreCoder**, a training-free **S**tructure-**a**ware **b**lock-spa**r**s**e** attention mechanism that bridges the gap between logical and computational sparsity. SabreCoder parses code into semantic chunks, constructs chunk-level sparse patterns through dependency analysis and similarity matching, and maps them to GPU-friendly block-sparse formats. Extensive experiments on LCC and CrossCodeEval benchmarks demonstrate that SabreCoder reduces TTFT by 45-55% while maintaining accuracy within 3% of dense attention.
Anthology ID:
2026.acl-long.1189
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25927–25942
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1189/
DOI:
Bibkey:
Cite (ACL):
Yanli Wang, Yanlin Wang, Bowen Zhang, Yiwei Zhang, Daya Guo, Jiachi Chen, Hongyu Zhang, and Zibin Zheng. 2026. From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25927–25942, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion (Wang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1189.pdf
Checklist:
 2026.acl-long.1189.checklist.pdf