Synthesizing question answering data from financial documents: An End-to-End Multi-Agent Approach
Chetan Harsha, Karmvir Singh Phogat, Sridhar Dasaratha, Shashishekar Ramakrishna
Abstract
Answering complex questions that require numerical reasoning over financial documents is challenging due to the diverse and scatterednature of relevant information. While large language models (LLMs) excel at financial reasoning, their enterprise deployment is often limited by cost and latency. Small language models (SLMs) present a cost-effective alternative but need to be fine-tuned with high-quality, domain-specific question-answer (QA) data. Acquiring such data requires manual expert annotation, presenting a bottleneck to the wider application of SLMs.This work introduces a modular, scalable end-to-end agentic pipeline that extracts and selects relevant content from unstructured financial documents and then generates QA pairs from the selected content for SLM fine-tuning. Compared to the same models trained on previous manually generated data for the task, one of the models trained on our pipeline-produced synthetic data achieved competitive in-distribution performance, and all tested models demonstrated superior generalization. The framework thus demonstrates considerable potential to accelerate the deployment of smaller, cost-effective models by reducing manual data creation efforts.- Anthology ID:
- 2026.eacl-industry.51
- Volume:
- Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 669–687
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-industry.51/
- DOI:
- Cite (ACL):
- Chetan Harsha, Karmvir Singh Phogat, Sridhar Dasaratha, and Shashishekar Ramakrishna. 2026. Synthesizing question answering data from financial documents: An End-to-End Multi-Agent Approach. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 669–687, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Synthesizing question answering data from financial documents: An End-to-End Multi-Agent Approach (Harsha et al., EACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-industry.51.pdf