Mariam E. Hassib
2025
Open-domain Arabic Conversational Question Answering with Question Rewriting
Mariam E. Hassib
|
Nagwa El-Makky
|
Marwan Torki
Proceedings of The Third Arabic Natural Language Processing Conference
Conversational question-answering (CQA) plays a crucial role in bridging the gap between human language and machine understanding, enabling more natural and interactive interactions with AI systems. In this work, we present the first results on open-domain Arabic CQA using deep learning. We introduce AraQReCC, a large-scale Arabic CQA dataset containing 9K conversations with 62K question-answer pairs, created by translating a subset of the QReCC dataset. To ensure data quality, we used COMET-based filtering and manual ratings from large language models (LLMs), such as GPT-4 and LLaMA, selecting conversations with COMET scores, along with LLM ratings of 4 or more. AraQReCC facilitates advanced research in Arabic CQA, improving clarity and relevance through question rewriting. We applied AraT5 for question rewriting and used BM25 and Dense Passage Retrieval (DPR) for passage retrieval. AraT5 is also used for question answering, completing the end-to-end system. Our experiments show that the best performance is achieved with DPR, attaining an F1 score of 21.51% on the test set. While this falls short of the human upper bound of 40.22%, it underscores the importance of question rewriting and quality-controlled data in enhancing system performance.