Chandan Nolbaria


2025

pdf bib
Bridging the Data Gap in Financial Sentiment: LLM-Driven Augmentation
Rohit Kumar | Chandan Nolbaria
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Static and outdated datasets hinder the accuracy of Financial Sentiment Analysis (FSA) in capturing rapidly evolving market sentiment. We tackle this by proposing a novel data augmentation technique using Retrieval Augmented Generation (RAG). Our method leverages a generative LLM to infuse established benchmarks with up-to-date contextual information from contemporary financial news. This RAG-based augmentation significantly modernizes the data’s alignment with current financial language. Furthermore, a robust BERT-BiGRU judge model verifies that the sentiment of the original annotations is faithfully preserved, ensuring the generation of high-quality, temporally relevant, and sentiment-consistent data suitable for advancing FSA model development.