KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration

Yajing Yang, Tony Deng, Min-Yen Kan


Abstract
We propose KAHAN, a knowledge-augmented hierarchical framework that systematically extracts insights from raw tabular data at entity, pairwise, group, and system levels. KAHAN uniquely leverages LLMs as domain experts to drive the analysis. On DataTales financial reporting benchmark, KAHAN outperforms existing approaches by over 20% on narrative quality (GPT-4o), maintains 98.2% factuality, and demonstrates practical utility in human evaluation. Our results reveal that knowledge quality drives model performance through distillation, hierarchical analysis benefits vary with market complexity, and the framework transfers effectively to healthcare domains. The data and code are available at https://github.com/yajingyang/kahan.
Anthology ID:
2025.findings-emnlp.1405
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25761–25785
Language:
URL:
https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.1405/
DOI:
10.18653/v1/2025.findings-emnlp.1405
Bibkey:
Cite (ACL):
Yajing Yang, Tony Deng, and Min-Yen Kan. 2025. KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25761–25785, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration (Yang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.1405.pdf
Checklist:
 2025.findings-emnlp.1405.checklist.pdf