ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model

Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, Didier Schwab

[How to correct problems with metadata yourself]


Abstract
Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition, we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.
Anthology ID:
W19-4605
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–48
Language:
URL:
https://aclanthology.org/W19-4605
DOI:
10.18653/v1/W19-4605
Bibkey:
Cite (ACL):
Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, and Didier Schwab. 2019. ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 40–48, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model (Lachraf et al., WANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/W19-4605.pdf