Clustering of Russian Adjective-Noun Constructions using Word Embeddings

Andrey Kutuzov, Elizaveta Kuzmenko, Lidia Pivovarova


Abstract
This paper presents a method of automatic construction extraction from a large corpus of Russian. The term ‘construction’ here means a multi-word expression in which a variable can be replaced with another word from the same semantic class, for example, ‘a glass of [water/juice/milk]’. We deal with constructions that consist of a noun and its adjective modifier. We propose a method of grouping such constructions into semantic classes via 2-step clustering of word vectors in distributional models. We compare it with other clustering techniques and evaluate it against A Russian-English Collocational Dictionary of the Human Body that contains manually annotated groups of constructions with nouns meaning human body parts. The best performing method is used to cluster all adjective-noun bigrams in the Russian National Corpus. Results of this procedure are publicly available and can be used for building Russian construction dictionary as well as to accelerate theoretical studies of constructions.
Anthology ID:
W17-1402
Volume:
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Tomaž Erjavec, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:
BSNLP
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
3–13
Language:
URL:
https://aclanthology.org/W17-1402
DOI:
10.18653/v1/W17-1402
Bibkey:
Cite (ACL):
Andrey Kutuzov, Elizaveta Kuzmenko, and Lidia Pivovarova. 2017. Clustering of Russian Adjective-Noun Constructions using Word Embeddings. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 3–13, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Clustering of Russian Adjective-Noun Constructions using Word Embeddings (Kutuzov et al., BSNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/W17-1402.pdf