Amany Fashwan
2026
Arabic ChartSumm: An English-to-Arabic Benchmark for Metadata-to-Text Summarization
Passant Elchafei | Amany Fashwan
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Passant Elchafei | Amany Fashwan
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Generating summaries from chart metadata in Arabic presents unique challenges at the intersection of cross-lingual transfer and data-to-text generation. Chart-to-text benchmarks have advanced English-language research, yet Arabic remains without a comparable resource, underscoring its continued underrepresentation in NLP. To cover this gap, we construct the first Arabic ChartSumm benchmark by translating chart metadata and reference summaries from English into Modern Standard Arabic (MSA). Two high-quality machine translation models with contrasting architectures are employed: NLLB-200-distilled-600M, designed for low-resource coverage, and Qwen2.5-1.5B, an open large language model with general multilingual capabilities. A central contribution of this work is a translation quality evaluation that systematically assesses both systems using BLEU, chrF, COMET_ref, and COMET_QE metrics against a Google-Translate Arabic pivot. Results demonstrate that NLLB achieves markedly higher lexical and semantic fidelity. Building on this foundation, we fine-tune two models, mT5 (multilingual) and CAMeL-Lab’s AraBART (Arabic-specific), to generate Arabic summaries from structured chart metadata. Experimental results show that AraBART trained on NLLB translations outperforms other configurations, achieving ROUGE-L = 63.8 and BLEU = 33.1, highlighting the strong dependency of downstream summarization quality on translation accuracy and demonstrating its superior capacity for Arabic generation.
2025
VLCAP at ImageEval 2025 Shared Task: Multimodal Arabic Captioning with Interpretable Visual Concept Integration
Passant Elchafei | Amany Fashwan
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Passant Elchafei | Amany Fashwan
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
2022
Developing a Tag-Set and Extracting the Morphological Lexicons to Build a Morphological Analyzer for Egyptian Arabic
Amany Fashwan | Sameh Alansary
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Amany Fashwan | Sameh Alansary
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
This paper sheds light on an in-progress work for building a morphological analyzer for Egyptian Arabic (EGY). To build such a tool, a tag-set schema is developed depending on a corpus of 527,000 EGY words covering different sources and genres. This tag-set schema is used in annotating about 318,940 words, morphologically, according to their contexts. Each annotated word is associated with its suitable prefix(s), original stem, tag, suffix(s), glossary, number, gender, definiteness, and conventional lemma and stem. These morphologically annotated words, in turns, are used in developing the proposed morphological analyzer where the morphological lexicons and the compatibility tables are extracted and tested. The system is compared with one of best EGY morphological analyzers; CALIMA.
2017
SHAKKIL: An Automatic Diacritization System for Modern Standard Arabic Texts
Amany Fashwan | Sameh Alansary
Proceedings of the Third Arabic Natural Language Processing Workshop
Amany Fashwan | Sameh Alansary
Proceedings of the Third Arabic Natural Language Processing Workshop
This paper sheds light on a system that would be able to diacritize Arabic texts automatically (SHAKKIL). In this system, the diacritization problem will be handled through two levels; morphological and syntactic processing levels. The adopted morphological disambiguation algorithm depends on four layers; Uni-morphological form layer, rule-based morphological disambiguation layer, statistical-based disambiguation layer and Out Of Vocabulary (OOV) layer. The adopted syntactic disambiguation algorithms is concerned with detecting the case ending diacritics depending on a rule based approach simulating the shallow parsing technique. This will be achieved using an annotated corpus for extracting the Arabic linguistic rules, building the language models and testing the system output. This system is considered as a good trial of the interaction between rule-based approach and statistical approach, where the rules can help the statistics in detecting the right diacritization and vice versa. At this point, the morphological Word Error Rate (WER) is 4.56% while the morphological Diacritic Error Rate (DER) is 1.88% and the syntactic WER is 9.36%. The best WER is 14.78% compared to the best-published results, of (Abandah, 2015); 11.68%, (Rashwan, et al., 2015); 12.90% and (Metwally, Rashwan, & Atiya, 2016); 13.70%.