Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs

Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki M Asano


Abstract
Decoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing significant improvements in performance on commonsense reasoning, arithmetic, and language understanding tasks. Furthermore, extensive ablation studies validate the role of each component of the method, and demonstrate that Bitune is compatible with various parameter-efficient finetuning techniques and full model finetuning.
Anthology ID:
2025.emnlp-main.481
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9521–9547
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.481/
DOI:
Bibkey:
Cite (ACL):
Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. 2025. Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 9521–9547, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs (Kopiczko et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.481.pdf
Checklist:
 2025.emnlp-main.481.checklist.pdf