Eitam Sheetrit


2025

pdf bib
SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
Noga BenYoash | Menachem Brief | Oded Ovadia | Gil Shenderovitz | Moshik Mishaeli | Rachel Lemberg | Eitam Sheetrit
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

We introduce SECQUE, a comprehensive benchmark for evaluating large language models (LLMs) in financial analysis tasks. SECQUE comprises 565 expert-written questions covering SEC filings analysis across four key categories: comparison analysis, ratio calculation, risk assessment, and financial insight generation. To assess model performance, we develop SECQUE-Judge, an evaluation mechanism leveraging multiple LLM-based judges, which demonstrates strong alignment with human evaluations. Additionally, we provide an extensive analysis of various models’ performance on our benchmark. By making SECQUE publicly available (https://huggingface.co/datasets/nogabenyoash/SecQue), we aim to facilitate further research and advancements in financial AI.