Phong Tuan Trinh


2025

pdf bib
FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis
Hung Quang Nguyen | Anh Phuong Trinh | Hung Phan Quoc Mai | Phong Tuan Trinh
Proceedings of the 18th International Natural Language Generation Conference

Despite recent advances in LLMs, Text2SQL remains challenging for complex, domain-specific queries such as finance, where database designs and reporting standards vary widely. We introduce FinStat2SQL, a lightweight pipeline enabling natural language queries over financial statements, tailored to local standards like VAS. Our multi-agent setup combines large and small language models for entity extraction, SQL generation, and self-correction, and includes a fully automatic pipeline for synthetic data generation. Leveraging this synthetic data, we fine-tuned a 7B model that achieves 61.33% accuracy with sub-4s latency on consumer hardware, outperforming GPT-4o-mini on SQL generation. FinStat2SQL provides a scalable, cost-efficient solution for financial analysis. We made our source code publicly available at: https://github.com/hung20gg/chatbot_financial_statement.