Eitam Sheetrit
2025
SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
Noga BenYoash
|
Menachem Brief
|
Oded Ovadia
|
Gil Shenderovitz
|
Moshik Mishaeli
|
Rachel Lemberg
|
Eitam Sheetrit
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
We introduce SECQUE, a comprehensive benchmark for evaluating large language models (LLMs) in financial analysis tasks. SECQUE comprises 565 expert-written questions covering SEC filings analysis across four key categories: comparison analysis, ratio calculation, risk assessment, and financial insight generation. To assess model performance, we develop SECQUE-Judge, an evaluation mechanism leveraging multiple LLM-based judges, which demonstrates strong alignment with human evaluations. Additionally, we provide an extensive analysis of various models’ performance on our benchmark. By making SECQUE publicly available (https://huggingface.co/datasets/nogabenyoash/SecQue), we aim to facilitate further research and advancements in financial AI.
Search
Fix author
Co-authors
- Noga BenYoash 1
- Menachem Brief 1
- Rachel Lemberg 1
- Moshik Mishaeli 1
- Oded Ovadia 1
- show all...