Are LLMs Good Text Diacritizers? An Arabic and Yoruba Case Study

Hawau Olamide Toyin, Samar Mohamed Magdy, Hanan Aldarmaki


Abstract
We investigate the effectiveness of large language models (LLMs) for text diacritization in two typologically distinct languages: Arabic and Yoruba. To enable a rigorous evaluation, we introduce a novel multilingual dataset MultiDiac, with diverse samples that capture a range of diacritic ambiguities. We evaluate 12 LLMs varying in size, accessibility, and language coverage, and benchmark them against 4 specialized diacritization models. Additionally, we fine-tune four small open-source models using LoRA for Yoruba. Our results show that many off-the-shelf LLMs outperform specialized diacritization models for both Arabic and Yoruba, but smaller models suffer from hallucinations. We find that fine-tuning on a small dataset can help improve diacritization performance and reduce hallucination rates for Yoruba.
Anthology ID:
2026.lrec-main.40
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
580–589
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.40/
DOI:
Bibkey:
Cite (ACL):
Hawau Olamide Toyin, Samar Mohamed Magdy, and Hanan Aldarmaki. 2026. Are LLMs Good Text Diacritizers? An Arabic and Yoruba Case Study. International Conference on Language Resources and Evaluation, main:580–589.
Cite (Informal):
Are LLMs Good Text Diacritizers? An Arabic and Yoruba Case Study (Toyin et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.40.pdf