Ricardo Shirota Filho
2025
SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation
Qilong Wu
|
Xiaoneng Xiang
|
Huang Hejia
|
Xuan Wang
|
Yeo Wei Jie
|
Ranjan Satapathy
|
Ricardo Shirota Filho
|
Bharadwaj Veeravalli
Findings of the Association for Computational Linguistics: NAACL 2025
The rapid growth of the financial sector and the increasing focus on Environmental, Social, and Governance (ESG) considerations have created a pressing need for advanced natural language processing (NLP) tools. Despite recent advancements, there is still a notable absence of open-source Large Language Models (LLMs) that are proficient across both general finance and ESG domains, such as generating ESG reports. To address this gap, we introduce SusGen-30k, a high-quality, category-balanced dataset comprising seven financial NLP tasks. In addition, we propose TCFD-Bench, a benchmark designed to improve the evaluation of sustainability report generation. Our data-centric approach led to the development of a suite of models, SusGen-GPT, trained on the curated dataset. These models were evaluated across six adapted tasks and two off-the-shelf tasks, showing state-of-the-art performance, surpassing all other models except GPT-4. Remarkably, SusGen-GPT achieved an average score only 0.02 below GPT-4, despite using models with only 7-8B parameters compared to much larger GPT-4. This demonstrates the efficiency of our approach in delivering high performance with significantly fewer resources, addressing existing challenges and fostering further advancements in the financial and ESG research community.
Search
Fix data
Co-authors
- Huang Hejia 1
- Ranjan Satapathy 1
- Bharadwaj Veeravalli 1
- Xuan Wang 1
- Yeo Wei Jie 1
- show all...