From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion

Yanli Wang; Yanlin Wang; Bowen Zhang; Yiwei Zhang; Daya Guo; Jiachi Chen; Hongyu Zhang; Zibin Zheng

From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion

Yanli Wang, Yanlin Wang, Bowen Zhang, Yiwei Zhang, Daya Guo, Jiachi Chen, Hongyu Zhang, Zibin Zheng

Abstract

Code Large Language Models face critical Time-To-First-Token (TTFT) latency challenges when handling long code completion due to the quadratic complexity (O(n²)) of attention mechanisms. While existing sparse attention methods attempt to address this issue, they suffer from three key limitations: (1) general sparse patterns cause excessive accuracy degradation without considering code structure, (2) code-specific methods achieve only logical sparsity without actual computational speedup, and (3) limited adaptation to complex scenarios such as repository-level completion. We propose **SabreCoder**, a training-free **S**tructure-**a**ware **b**lock-spa**r**s**e** attention mechanism that bridges the gap between logical and computational sparsity. SabreCoder parses code into semantic chunks, constructs chunk-level sparse patterns through dependency analysis and similarity matching, and maps them to GPU-friendly block-sparse formats. Extensive experiments on LCC and CrossCodeEval benchmarks demonstrate that SabreCoder reduces TTFT by 45-55% while maintaining accuracy within 3% of dense attention.

Anthology ID:: 2026.acl-long.1189
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25927–25942
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1189/
DOI:
Bibkey:
Cite (ACL):: Yanli Wang, Yanlin Wang, Bowen Zhang, Yiwei Zhang, Daya Guo, Jiachi Chen, Hongyu Zhang, and Zibin Zheng. 2026. From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25927–25942, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: From Logical to Computational Sparsity: Structure-Aware Block-Sparse Attention for Long-Code Completion (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1189.pdf
Checklist:: 2026.acl-long.1189.checklist.pdf

PDF Cite Search Checklist Fix data