Dawid Jan Kopiczko
2025
Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
Dawid Jan Kopiczko
|
Tijmen Blankevoort
|
Yuki M Asano
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Decoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing significant improvements in performance on commonsense reasoning, arithmetic, and language understanding tasks. Furthermore, extensive ablation studies validate the role of each component of the method, and demonstrate that Bitune is compatible with various parameter-efficient finetuning techniques and full model finetuning.