FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering

Yitao Long; Tiansheng Hu; Yilun Zhao; Arman Cohan; Chen Zhao

doi:10.18653/v1/2025.findings-emnlp.908

FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering

Yitao Long, Tiansheng Hu, Yilun Zhao, Arman Cohan, Chen Zhao

Abstract

Large Language Models (LLMs) frequently hallucinate to long-form questions, producing plausible yet factually incorrect answers. A common mitigation strategy is to provide attribution to LLM outputs. However, existing benchmarks primarily focus on simple attribution that retrieves supporting textual evidence as references. We argue that in real-world scenarios such as financial applications, attribution goes beyond reference retrieval.We introduce FinLFQA, a benchmark designed to evaluate the ability of LLMs to generate long-form answers to complex financial questions with reliable and nuanced attributions. FinLFQA evaluates three critical aspects of attribution through human annotations: (1) supporting evidence extracted from financial reports, (2) intermediate numerical reasoning steps, and (3) domain-specific financial knowledge that informs the reasoning process.We further provide an automatic evaluation framework covering both answer quality and attribution quality. Through extensive experiments on eight LLMs across multiple attribution-generation paradigms, we find that fine-grained metrics are important to distinguish model capabilities, that end-to-end generation achieves comparable performance to post-hoc approaches, and that iterative refinement only helps when guided by external feedback.

Anthology ID:: 2025.findings-emnlp.908
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16730–16750
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.908/
DOI:: 10.18653/v1/2025.findings-emnlp.908
Bibkey:
Cite (ACL):: Yitao Long, Tiansheng Hu, Yilun Zhao, Arman Cohan, and Chen Zhao. 2025. FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 16730–16750, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering (Long et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.908.pdf
Checklist:: 2025.findings-emnlp.908.checklist.pdf

PDF Cite Search Checklist Fix data