Bridging the Data Gap in Financial Sentiment: LLM-Driven Augmentation

Rohit Kumar, Chandan Nolbaria


Abstract
Static and outdated datasets hinder the accuracy of Financial Sentiment Analysis (FSA) in capturing rapidly evolving market sentiment. We tackle this by proposing a novel data augmentation technique using Retrieval Augmented Generation (RAG). Our method leverages a generative LLM to infuse established benchmarks with up-to-date contextual information from contemporary financial news. This RAG-based augmentation significantly modernizes the data’s alignment with current financial language. Furthermore, a robust BERT-BiGRU judge model verifies that the sentiment of the original annotations is faithfully preserved, ensuring the generation of high-quality, temporally relevant, and sentiment-consistent data suitable for advancing FSA model development.
Anthology ID:
2025.acl-srw.98
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Jin Zhao, Mingyang Wang, Zhu Liu
Venues:
ACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1246–1254
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-srw.98/
DOI:
Bibkey:
Cite (ACL):
Rohit Kumar and Chandan Nolbaria. 2025. Bridging the Data Gap in Financial Sentiment: LLM-Driven Augmentation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 1246–1254, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Bridging the Data Gap in Financial Sentiment: LLM-Driven Augmentation (Kumar & Nolbaria, ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-srw.98.pdf