SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
Noga BenYoash, Menachem Brief, Oded Ovadia, Gil Shenderovitz, Moshik Mishaeli, Rachel Lemberg, Eitam Sheetrit
Abstract
We introduce SECQUE, a comprehensive benchmark for evaluating large language models (LLMs) in financial analysis tasks. SECQUE comprises 565 expert-written questions covering SEC filings analysis across four key categories: comparison analysis, ratio calculation, risk assessment, and financial insight generation. To assess model performance, we develop SECQUE-Judge, an evaluation mechanism leveraging multiple LLM-based judges, which demonstrates strong alignment with human evaluations. Additionally, we provide an extensive analysis of various models’ performance on our benchmark. By making SECQUE publicly available (https://huggingface.co/datasets/nogabenyoash/SecQue), we aim to facilitate further research and advancements in financial AI.- Anthology ID:
- 2025.gem-1.16
- Volume:
- Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria and virtual meeting
- Editors:
- Kaustubh Dhole, Miruna Clinciu
- Venues:
- GEM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 212–230
- Language:
- URL:
- https://preview.aclanthology.org/transition-to-people-yaml/2025.gem-1.16/
- DOI:
- Cite (ACL):
- Noga BenYoash, Menachem Brief, Oded Ovadia, Gil Shenderovitz, Moshik Mishaeli, Rachel Lemberg, and Eitam Sheetrit. 2025. SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 212–230, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
- Cite (Informal):
- SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities (BenYoash et al., GEM 2025)
- PDF:
- https://preview.aclanthology.org/transition-to-people-yaml/2025.gem-1.16.pdf