Abstract
Retrieval-augmented generation framework addresses the limitations of large language models by enabling real-time knowledge updates for more accurate answers. An efficient way in the training phase of retrieval-augmented models is attention distillation, which uses attention scores as supervision signals instead of manually annotated query-document pairs. Despite its growing popularity, the detailed mechanisms behind the success of attention distillation remain unexplored, particularly the specific patterns it leverages to benefit training. In this paper, we address this gap by conducting a comprehensive investigation of attention distillation workflow and identifying key factors influencing the learning performance of retrieval-augmented language models. We further propose several insightful indicators for optimizing models’ training methods and avoiding ineffective training.- Anthology ID:
- 2024.naacl-short.65
- Volume:
- Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 745–754
- Language:
- URL:
- https://aclanthology.org/2024.naacl-short.65
- DOI:
- Cite (ACL):
- Zizhong Li, Haopeng Zhang, and Jiawei Zhang. 2024. Unveiling the Magic: Investigating Attention Distillation in Retrieval-Augmented Generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 745–754, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Unveiling the Magic: Investigating Attention Distillation in Retrieval-Augmented Generation (Li et al., NAACL 2024)
- PDF:
- https://preview.aclanthology.org/ingestion-checklist/2024.naacl-short.65.pdf