Duplicate-Aware Controlled Code Generation: Enhancing Copyright Protection with Targeted Reordering Beam Search in LLMs

Junbo Fu, Guoshuai Zhao, Linkang Yang, Yunqi Mi, Xueming Qian


Abstract
The increasing integration of large language models (LLMs) in code generation has raised critical copyright concerns, particularly regarding the verbatim repetition of copyrighted code. To address this challenge, we propose a novel task: Duplicate-Aware Controlled Code Generation (DACCG), which aims to mitigate verbatim repetition while preserving the quality of generated code. To this end, we introduce Targeted Reordering Beam Search (TRBS), a plug-and-play decoding method that dynamically reorders beam candidates to reduce direct copying. TRBS leverages the FM-index for efficient substring detection and employs a spike-entropy-based protection mechanism to safeguard structural anchors critical to code coherence. Experimental results on a multi-language code generation benchmark demonstrate that TRBS effectively reduces verbatim repetition while maintaining functional adequacy. Our research represents a pioneering effort in code copyright protection from the model user’s perspective, offering novel insights into responsible code generation practices.
Anthology ID:
2026.findings-acl.280
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5695–5707
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.280/
DOI:
Bibkey:
Cite (ACL):
Junbo Fu, Guoshuai Zhao, Linkang Yang, Yunqi Mi, and Xueming Qian. 2026. Duplicate-Aware Controlled Code Generation: Enhancing Copyright Protection with Targeted Reordering Beam Search in LLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 5695–5707, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Duplicate-Aware Controlled Code Generation: Enhancing Copyright Protection with Targeted Reordering Beam Search in LLMs (Fu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.280.pdf
Checklist:
 2026.findings-acl.280.checklist.pdf