SAHM: A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning

Rania Elbadry; Sarfraz Ahmad; Ahmed Heakl; Dani Bouch; Momina Ahsan; Muhra AlMahri; Marwa Elsaid Khalil; Yuxia Wang; Salem Lahlou; Sophia Ananiadou; Veselin Stoyanov; Jimin Huang; Xueqing Peng; Preslav Nakov; Zhuohan Xie

SAHM: A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning

Rania Elbadry, Sarfraz Ahmad, Ahmed Heakl, Dani Bouch, Momina Ahsan, Muhra AlMahri, Marwa Elsaid Khalil, Yuxia Wang, Salem Lahlou, Sophia Ananiadou, Veselin Stoyanov, Jimin Huang, Xueqing Peng, Preslav Nakov, Zhuohan Xie

Abstract

English financial NLP has progressed rapidly through benchmarks for sentiment, document understanding, and financial question answering, while Arabic financial NLP remains comparatively under-explored despite strong practical demand for trustworthy finance and Islamic-finance assistants. We introduce SAHM, a document-grounded benchmark and instruction-tuning dataset for Arabic financial NLP and Shari’ah-compliant reasoning. SAHM contains 14,380 expert-verified instances spanning seven tasks: AAOIFI standards QA, fatwa-based QA/MCQ, accounting and business exams, financial sentiment analysis, extractive summarization, and event–cause reasoning, curated from authentic regulatory, juristic, and corporate sources. We evaluate 19 strong open and proprietary LLMs using task-specific metrics and rubric-based scoring for open-ended outputs, and find that Arabic fluency does not reliably translate to evidence-grounded financial reasoning: models are substantially stronger on recognition-style tasks than on generation and causal reasoning, with the largest gaps on event–cause reasoning. We release the benchmark, evaluation framework, and an instruction-tuned model to support future research on trustworthy Arabic financial NLP.

Anthology ID:: 2026.acl-long.1593
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34509–34536
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1593/
DOI:
Bibkey:
Cite (ACL):: Rania Elbadry, Sarfraz Ahmad, Ahmed Heakl, Dani Bouch, Momina Ahsan, Muhra AlMahri, Marwa Elsaid Khalil, Yuxia Wang, Salem Lahlou, Sophia Ananiadou, Veselin Stoyanov, Jimin Huang, Xueqing Peng, Preslav Nakov, and Zhuohan Xie. 2026. SAHM: A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 34509–34536, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SAHM: A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning (Elbadry et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1593.pdf
Checklist:: 2026.acl-long.1593.checklist.pdf

PDF Cite Search Checklist Fix data