Finance Language Model Evaluation (FLaME)
Glenn Matlin, Mika Okamoto, Huzaifa Pardawala, Yang Yang, Sudheer Chava
Abstract
Language Models (LMs) have demonstrated impressive capabilities with core Natural Language Processing (NLP) tasks. The effectiveness of LMs for highly specialized knowledge-intensive tasks in finance remains difficult to assess due to major gaps in the methodologies of existing evaluation frameworks, which have caused an erroneous belief in a far lower bound of LMs’ performance on common Finance NLP (FinNLP) tasks. To demonstrate the potential of LMs for these FinNLP tasks, we present the first holistic benchmarking suite for Financial Language Model Evaluation (FLaME). We are the first research paper to comprehensively study LMs against ‘reasoning-reinforced’ LMs, with an empirical study of 23 foundation LMs over 20 core NLP tasks in finance. We open-source our framework software along with all data and results.- Anthology ID:
- 2025.gem-1.72
- Volume:
- Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria and virtual meeting
- Editors:
- Ofir Arviv, Miruna Clinciu, Kaustubh Dhole, Rotem Dror, Sebastian Gehrmann, Eliya Habba, Itay Itzhak, Simon Mille, Yotam Perlitz, Enrico Santus, João Sedoc, Michal Shmueli Scheuer, Gabriel Stanovsky, Oyvind Tafjord
- Venues:
- GEM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 880–926
- Language:
- URL:
- https://preview.aclanthology.org/nschneid-patch-1/2025.gem-1.72/
- DOI:
- Cite (ACL):
- Glenn Matlin, Mika Okamoto, Huzaifa Pardawala, Yang Yang, and Sudheer Chava. 2025. Finance Language Model Evaluation (FLaME). In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 880–926, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
- Cite (Informal):
- Finance Language Model Evaluation (FLaME) (Matlin et al., GEM 2025)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2025.gem-1.72.pdf