Abstract
As multiword expressions (MWEs) exhibit a range of idiosyncrasies, their automatic detection warrants the use of many different features. Tsvetkov and Wintner (2014) proposed a Bayesian network model that combines linguistically motivated features and also models their interactions. In this paper, we extend their model with new features and apply it to Croatian, a morphologically complex and a relatively free word order language, achieving a satisfactory performance of 0.823 F1-score. Furthermore, by comparing against (semi)naive Bayes models, we demonstrate that manually modeling feature interactions is indeed important. We make our annotated dataset of Croatian MWEs freely available.- Anthology ID:
- W17-1727
- Volume:
- Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze
- Venue:
- MWE
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 194–199
- Language:
- URL:
- https://aclanthology.org/W17-1727
- DOI:
- 10.18653/v1/W17-1727
- Cite (ACL):
- Maja Buljan and Jan Šnajder. 2017. Combining Linguistic Features for the Detection of Croatian Multiword Expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 194–199, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Combining Linguistic Features for the Detection of Croatian Multiword Expressions (Buljan & Šnajder, MWE 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/W17-1727.pdf