Identification of Multiword Expressions for Latvian and Lithuanian: Hybrid Approach

Justina Mandravickaitė, Tomas Krilavičius


Abstract
We discuss an experiment on automatic identification of bi-gram multi-word expressions in parallel Latvian and Lithuanian corpora. Raw corpora, lexical association measures (LAMs) and supervised machine learning (ML) are used due to deficit and quality of lexical resources (e.g., POS-tagger, parser) and tools. While combining LAMs with ML is rather effective for other languages, it has shown some nice results for Lithuanian and Latvian as well. Combining LAMs with ML we have achieved 92,4% precision and 52,2% recall for Latvian and 95,1% precision and 77,8% recall for Lithuanian.
Anthology ID:
W17-1712
Volume:
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
97–101
Language:
URL:
https://aclanthology.org/W17-1712
DOI:
10.18653/v1/W17-1712
Bibkey:
Cite (ACL):
Justina Mandravickaitė and Tomas Krilavičius. 2017. Identification of Multiword Expressions for Latvian and Lithuanian: Hybrid Approach. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 97–101, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Identification of Multiword Expressions for Latvian and Lithuanian: Hybrid Approach (Mandravickaitė & Krilavičius, MWE 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/W17-1712.pdf