Abstract
We present an approach for automatic verification and augmentation of multilingual lexica. We exploit existing parallel and monolingual corpora to extract multilingual correspondents via tri-angulation. We demonstrate the efficacy of our approach on two publicly available resources: Tharwa, a three-way lexicon comprising Dialectal Arabic, Modern Standard Arabic and English lemmas among other information (Diab et al., 2014); and BabelNet, a multilingual thesaurus comprising over 276 languages including Arabic variant entries (Navigli and Ponzetto, 2012). Our automated approach yields an F1-score of 71.71% in generating correct multilingual correspondents against gold Tharwa, and 54.46% against gold BabelNet without any human intervention.- Anthology ID:
- W16-4810
- Volume:
- Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
- Venue:
- VarDial
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 73–81
- Language:
- URL:
- https://aclanthology.org/W16-4810
- DOI:
- Cite (ACL):
- Maryam Aminian, Mohamed Al-Badrashiny, and Mona Diab. 2016. Automatic Verification and Augmentation of Multilingual Lexicons. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 73–81, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Automatic Verification and Augmentation of Multilingual Lexicons (Aminian et al., VarDial 2016)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/W16-4810.pdf