Establishing a Baseline for Arabic Patents Classification: A Comparison of Twelve Approaches

Taif Omar Al-Omar; Hend Al-Khalifa; Rawan Al-Matham

doi:10.18653/v1/2022.wanlp-1.26

Establishing a Baseline for Arabic Patents Classification: A Comparison of Twelve Approaches

Taif Omar Al-Omar, Hend Al-Khalifa, Rawan Al-Matham

Abstract

Nowadays, the number of patent applications is constantly growing and there is an economical interest on developing accurate and fast models to automate their classification task. In this paper, we introduce the first public Arabic patent dataset called ArPatent and experiment with twelve classification approaches to develop a baseline for Arabic patents classification. To achieve the goal of finding the best baseline for classifying Arabic patents, different machine learning, pre-trained language models as well as ensemble approaches were conducted. From the obtained results, we can observe that the best performing model for classifying Arabic patents was ARBERT with F1 of 66.53%, while the ensemble approach of the best three performing language models, namely: ARBERT, CAMeL-MSA, and QARiB, achieved the second best F1 score, i.e., 64.52%.

Anthology ID:: 2022.wanlp-1.26
Volume:: Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:: WANLP
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 287–294
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.wanlp-1.26/
DOI:: 10.18653/v1/2022.wanlp-1.26
Bibkey:
Cite (ACL):: Taif Omar Al-Omar, Hend Al-Khalifa, and Rawan Al-Matham. 2022. Establishing a Baseline for Arabic Patents Classification: A Comparison of Twelve Approaches. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 287–294, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Establishing a Baseline for Arabic Patents Classification: A Comparison of Twelve Approaches (Al-Omar et al., WANLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.wanlp-1.26.pdf

PDF Cite Search Fix data