A Dataset for Noun Compositionality Detection for a Slavic Language
Dmitry Puzyrev, Artem Shelmanov, Alexander Panchenko, Ekaterina Artemova
Abstract
This paper presents the first gold-standard resource for Russian annotated with compositionality information of noun compounds. The compound phrases are collected from the Universal Dependency treebanks according to part of speech patterns, such as ADJ+NOUN or NOUN+NOUN, using the gold-standard annotations. Each compound phrase is annotated by two experts and a moderator according to the following schema: the phrase can be either compositional, non-compositional, or ambiguous (i.e., depending on the context it can be interpreted both as compositional or non-compositional). We conduct an experimental evaluation of models and methods for predicting compositionality of noun compounds in unsupervised and supervised setups. We show that methods from previous work evaluated on the proposed Russian-language resource achieve the performance comparable with results on English corpora.- Anthology ID:
- W19-3708
- Volume:
- Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
- Venue:
- BSNLP
- SIG:
- SIGSLAV
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 56–62
- Language:
- URL:
- https://aclanthology.org/W19-3708
- DOI:
- 10.18653/v1/W19-3708
- Cite (ACL):
- Dmitry Puzyrev, Artem Shelmanov, Alexander Panchenko, and Ekaterina Artemova. 2019. A Dataset for Noun Compositionality Detection for a Slavic Language. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 56–62, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- A Dataset for Noun Compositionality Detection for a Slavic Language (Puzyrev et al., BSNLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/W19-3708.pdf
- Code
- slangtech/ru-comps
- Data
- Universal Dependencies