@inproceedings{pavlova-2023-leveraging,
    title = "Leveraging Domain Adaptation and Data Augmentation to Improve Qur{'}anic {IR} in {E}nglish and {A}rabic",
    author = "Pavlova, Vera",
    editor = "Sawaf, Hassan  and
      El-Beltagy, Samhaa  and
      Zaghouani, Wajdi  and
      Magdy, Walid  and
      Abdelali, Ahmed  and
      Tomeh, Nadi  and
      Abu Farha, Ibrahim  and
      Habash, Nizar  and
      Khalifa, Salam  and
      Keleg, Amr  and
      Haddad, Hatem  and
      Zitouni, Imed  and
      Mrini, Khalil  and
      Almatham, Rawan",
    booktitle = "Proceedings of ArabicNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore (Hybrid)",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.arabicnlp-1.7/",
    doi = "10.18653/v1/2023.arabicnlp-1.7",
    pages = "76--88",
    abstract = "In this work, we approach the problem of Qur{'}anic information retrieval (IR) in Arabic and English. Using the latest state-of-the-art methods in neural IR, we research what helps to tackle this task more efficiently. Training retrieval models requires a lot of data, which is difficult to obtain for training in-domain. Therefore, we commence with training on a large amount of general domain data and then continue training on in-domain data. To handle the lack of in-domain data, we employed a data augmentation technique, which considerably improved results in MRR@10 and NDCG@5 metrics, setting the state-of-the-art in Qur{'}anic IR for both English and Arabic. The absence of an Islamic corpus and domain-specific model for IR task in English motivated us to address this lack of resources and take preliminary steps of the Islamic corpus compilation and domain-specific language model (LM) pre-training, which helped to improve the performance of the retrieval models that use the domain-specific LM as the shared backbone. We examined several language models (LMs) in Arabic to select one that efficiently deals with the Qur{'}anic IR task. Besides transferring successful experiments from English to Arabic, we conducted additional experiments with retrieval task in Arabic to amortize the scarcity of general domain datasets used to train the retrieval models. Handling Qur{'}anic IR task combining English and Arabic allowed us to enhance the comparison and share valuable insights across models and languages."
}Markdown (Informal)
[Leveraging Domain Adaptation and Data Augmentation to Improve Qur’anic IR in English and Arabic](https://preview.aclanthology.org/ingest-emnlp/2023.arabicnlp-1.7/) (Pavlova, ArabicNLP 2023)
ACL