RuSemShift: a dataset of historical lexical semantic change in Russian

Julia Rodina, Andrey Kutuzov


Abstract
We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times. Target words were annotated by multiple crowd-source workers. The annotation process was organized following the DURel framework and was based on sentence contexts extracted from the Russian National Corpus. Additionally, we report the performance of several distributional approaches on RuSemShift, achieving promising results, which at the same time leave room for other researchers to improve.
Anthology ID:
2020.coling-main.90
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1037–1047
Language:
URL:
https://aclanthology.org/2020.coling-main.90
DOI:
10.18653/v1/2020.coling-main.90
Bibkey:
Cite (ACL):
Julia Rodina and Andrey Kutuzov. 2020. RuSemShift: a dataset of historical lexical semantic change in Russian. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1037–1047, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
RuSemShift: a dataset of historical lexical semantic change in Russian (Rodina & Kutuzov, COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.coling-main.90.pdf