AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks
Mohamed Eltahir, Osamah Sarraj, Abdulrahman M. Alfrihidi, Taha Alshatiri, Mohammed Khurd, Mohammed Bremoo, Tanveer Hussain
Abstract
Video-to-text and text-to-video retrieval are dominated by English benchmarks (e.g. DiDeMo, MSR-VTT) and recent multilingual corpora (e.g. RUDDER), yet Arabic remains underserved, lacking localized evaluation metrics. We introduce a three-stage framework, AutoArabic, utilizing state-of-the-art large language models (LLMs) to translate non-Arabic benchmarks into Modern Standard Arabic, reducing the manual revision required by nearly fourfold. The framework incorporates an error detection module that automatically flags potential translation errors with 97% accuracy. Applying the framework to DiDeMo, a video retrieval benchmark produces DiDeMo-AR, an Arabic variant with 40,144 fluent Arabic descriptions. An analysis of the translation errors is provided and organized into an insightful taxonomy to guide future Arabic localization efforts. We train a CLIP-style baseline with identical hyperparameters on the Arabic and English variants of the benchmark, finding a moderate performance gap (𝛥 ≈ 3pp at Recall@1), indicating that Arabic localization preserves benchmark difficulty. We evaluate three post-editing budgets (zero/ flagged-only/ full) and find that performance improves monotonically with more post-editing, while the raw LLM output (zero-budget) remains usable. To ensure reproducibility to other languages, we made the code available at https://github.com/Tahaalshatiri/AutoArabic.- Anthology ID:
- 2025.arabicnlp-main.23
- Volume:
- Proceedings of The Third Arabic Natural Language Processing Conference
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
- Venue:
- ArabicNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 288–297
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.arabicnlp-main.23/
- DOI:
- 10.18653/v1/2025.arabicnlp-main.23
- Cite (ACL):
- Mohamed Eltahir, Osamah Sarraj, Abdulrahman M. Alfrihidi, Taha Alshatiri, Mohammed Khurd, Mohammed Bremoo, and Tanveer Hussain. 2025. AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 288–297, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks (Eltahir et al., ArabicNLP 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.arabicnlp-main.23.pdf