L-CiteEval: A Suite for Evaluating Fidelity of Long-context Models

Zecheng Tang (汤泽成); Keyan Zhou; Juntao Li; Baibei Ji; Jianye Hou; Min Zhang (张民)

L-CiteEval: A Suite for Evaluating Fidelity of Long-context Models

Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang

Abstract

Long-context models(LCMs) have witnessed remarkable advancements in recent years, facilitating real-world tasks like long-document QA. The success of LCMs is founded on the hypothesis that the model demonstrates strong fidelity, enabling it to respond based on the provided long context rather than relying solely on the intrinsic knowledge acquired during pre-training. Yet, in this paper, we find that open-sourced LCMs are not as faithful as expected. We introduce L-CiteEval, an out-of-the-box suite that can assess both generation quality and fidelity in long-context understanding tasks. It covers 11 tasks with context lengths ranging from 8K to 48K and a corresponding automatic evaluation pipeline. Evaluation of 11 cutting-edge closed-source and open-source LCMs indicates that, while there are minor differences in their generation, open-source models significantly lag behind closed-source counterparts in terms of fidelity. Furthermore, we analyze the benefits of citation generation for LCMs from both the perspective of explicit model output and the internal attention mechanism.

Anthology ID:: 2025.acl-long.263
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5254–5277
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.263/
DOI:
Bibkey:
Cite (ACL):: Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, and Min Zhang. 2025. L-CiteEval: A Suite for Evaluating Fidelity of Long-context Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5254–5277, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: L-CiteEval: A Suite for Evaluating Fidelity of Long-context Models (Tang et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.263.pdf

PDF Cite Search Fix data