CUB: Benchmarking Context Utilisation Techniques for Language Models
Lovisa Hagstr\"om, Youna Kim, Haeun Yu, Sang-goo Lee, Richard Johansson, Hyunsoo Cho, Isabelle Augenstein
Abstract
Incorporating external knowledge is crucial for knowledge-intensive tasks, such as question answering and fact checking. However, language models (LMs) may ignore relevant information that contradicts outdated parametric memory or be distracted by irrelevant contexts. While many context utilisation manipulation techniques (CMTs) have recently been proposed to alleviate these issues, few have seen systematic comparison. In this paper, we develop CUB (Context Utilisation Benchmark) - the first comprehensive benchmark designed to help diagnose CMTs under diverse noisy context conditions within retrieval-augmented generation (RAG). With this benchmark, we conduct the most extensive evaluation to date of seven state-of-the-art methods, representative of the main categories of CMTs, across three diverse datasets and tasks, applied to 11 LMs. Our findings expose critical gaps in current CMT evaluation practices, demonstrating the need for holistic testing. We reveal that most existing CMTs struggle to handle the full spectrum of context types encountered in real-world RAG scenarios. We also find that many CMTs display inflated performance on simple synthesised datasets, compared to more realistic datasets with naturally occurring samples.- Anthology ID:
- 2026.acl-long.1151
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 25101–25133
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1151/
- DOI:
- Cite (ACL):
- Lovisa Hagstr\"om, Youna Kim, Haeun Yu, Sang-goo Lee, Richard Johansson, Hyunsoo Cho, and Isabelle Augenstein. 2026. CUB: Benchmarking Context Utilisation Techniques for Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25101–25133, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- CUB: Benchmarking Context Utilisation Techniques for Language Models (Hagstr"om et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1151.pdf