A Multi-word Expression Dataset for Swedish

Murathan Kurfalı, Robert Östling, Johan Sjons, Mats Wirén


Abstract
We present a new set of 96 Swedish multi-word expressions annotated with degree of (non-)compositionality. In contrast to most previous compositionality datasets we also consider syntactically complex constructions and publish a formal specification of each expression. This allows evaluation of computational models beyond word bigrams, which have so far been the norm. Finally, we use the annotations to evaluate a system for automatic compositionality estimation based on distributional semantics. Our analysis of the disagreements between human annotators and the distributional model reveal interesting questions related to the perception of compositionality, and should be informative to future work in the area.
Anthology ID:
2020.lrec-1.542
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4402–4409
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.542
DOI:
Bibkey:
Cite (ACL):
Murathan Kurfalı, Robert Östling, Johan Sjons, and Mats Wirén. 2020. A Multi-word Expression Dataset for Swedish. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4402–4409, Marseille, France. European Language Resources Association.
Cite (Informal):
A Multi-word Expression Dataset for Swedish (Kurfalı et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.lrec-1.542.pdf