SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities

Noga BenYoash, Menachem Brief, Oded Ovadia, Gil Shenderovitz, Moshik Mishaeli, Rachel Lemberg, Eitam Sheetrit


Abstract
We introduce SECQUE, a comprehensive benchmark for evaluating large language models (LLMs) in financial analysis tasks. SECQUE comprises 565 expert-written questions covering SEC filings analysis across four key categories: comparison analysis, ratio calculation, risk assessment, and financial insight generation. To assess model performance, we develop SECQUE-Judge, an evaluation mechanism leveraging multiple LLM-based judges, which demonstrates strong alignment with human evaluations. Additionally, we provide an extensive analysis of various models’ performance on our benchmark. By making SECQUE publicly available (https://huggingface.co/datasets/nogabenyoash/SecQue), we aim to facilitate further research and advancements in financial AI.
Anthology ID:
2025.gem-1.16
Volume:
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:
July
Year:
2025
Address:
Vienna, Austria and virtual meeting
Editors:
Kaustubh Dhole, Miruna Clinciu
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
212–230
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.gem-1.16/
DOI:
Bibkey:
Cite (ACL):
Noga BenYoash, Menachem Brief, Oded Ovadia, Gil Shenderovitz, Moshik Mishaeli, Rachel Lemberg, and Eitam Sheetrit. 2025. SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 212–230, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities (BenYoash et al., GEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.gem-1.16.pdf