Arabic ChartSumm: An English-to-Arabic Benchmark for Metadata-to-Text Summarization

Passant Elchafei; Amany Fashwan

Arabic ChartSumm: An English-to-Arabic Benchmark for Metadata-to-Text Summarization

Abstract

Generating summaries from chart metadata in Arabic presents unique challenges at the intersection of cross-lingual transfer and data-to-text generation. Chart-to-text benchmarks have advanced English-language research, yet Arabic remains without a comparable resource, underscoring its continued underrepresentation in NLP. To cover this gap, we construct the first Arabic ChartSumm benchmark by translating chart metadata and reference summaries from English into Modern Standard Arabic (MSA). Two high-quality machine translation models with contrasting architectures are employed: NLLB-200-distilled-600M, designed for low-resource coverage, and Qwen2.5-1.5B, an open large language model with general multilingual capabilities. A central contribution of this work is a translation quality evaluation that systematically assesses both systems using BLEU, chrF, COMET_ref, and COMET_QE metrics against a Google-Translate Arabic pivot. Results demonstrate that NLLB achieves markedly higher lexical and semantic fidelity. Building on this foundation, we fine-tune two models, mT5 (multilingual) and CAMeL-Lab’s AraBART (Arabic-specific), to generate Arabic summaries from structured chart metadata. Experimental results show that AraBART trained on NLLB translations outperforms other configurations, achieving ROUGE-L = 63.8 and BLEU = 33.1, highlighting the strong dependency of downstream summarization quality on translation accuracy and demonstrating its superior capacity for Arabic generation.

Anthology ID:: 2026.lrec-main.819
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 10447–10456
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.819/
DOI:
Bibkey:
Cite (ACL):: Passant Elchafei and Amany Fashwan. 2026. Arabic ChartSumm: An English-to-Arabic Benchmark for Metadata-to-Text Summarization. International Conference on Language Resources and Evaluation, main:10447–10456.
Cite (Informal):: Arabic ChartSumm: An English-to-Arabic Benchmark for Metadata-to-Text Summarization (Elchafei & Fashwan, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.819.pdf
Optionalsupplementarymaterial:: 2026.lrec-main.819.OptionalSupplementaryMaterial.pdf

PDF Cite Search Optionalsupplementarymaterial Fix data