Benchmarking Arabic Authorship Attribution and Style Transfer with Large Language Models

Injy Hamed, Bashar Alhafni, Nizar Habash, Thamar Solorio


Abstract
Writing style is a fundamental component of natural language. However, significant research gaps remain in two key style-centric tasks: authorship attribution (AA) and authorship style transfer, particularly for Arabic. In this work, we revisit both tasks in that context. We introduce a new AA dataset comprising texts in Modern Standard and Dialectal Arabic. We train transformer-based AA models using dual cross-entropy and contrastive learning loss objectives, and validate model performance through human evaluation. We then utilize the trained AA model to benchmark a range of large language models (LLMs) on style recognition and generation tasks, providing new insights into their capabilities in modeling Arabic writing styles. Our work reveals limitations of current models and provides resources to advance research in this direction.
Anthology ID:
2026.lrec-main.576
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
7262–7278
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.576/
DOI:
Bibkey:
Cite (ACL):
Injy Hamed, Bashar Alhafni, Nizar Habash, and Thamar Solorio. 2026. Benchmarking Arabic Authorship Attribution and Style Transfer with Large Language Models. International Conference on Language Resources and Evaluation, main:7262–7278.
Cite (Informal):
Benchmarking Arabic Authorship Attribution and Style Transfer with Large Language Models (Hamed et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.576.pdf