Alexey Tikhonov


2021

pdf bib
It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
Alexey Tikhonov | Max Ryabinin
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
StoryDB: Broad Multi-language Narrative Dataset
Alexey Tikhonov | Igor Samenko | Ivan Yamshchikov
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

This paper presents StoryDB — a broad multi-language dataset of narratives. StoryDB is a corpus of texts that includes stories in 42 different languages. Every language includes 500+ stories. Some of the languages include more than 20 000 stories. Every story is indexed across languages and labeled with tags such as a genre or a topic. The corpus shows rich topical and language variation and can serve as a resource for the study of the role of narrative in natural language processing across various languages including low resource ones. We also demonstrate how the dataset could be used to benchmark three modern multilanguage models, namely, mDistillBERT, mBERT, and XLM-RoBERTa.

2019

pdf bib
Dyr Bul Shchyl. Proxying Sound Symbolism With Word Embeddings
Ivan P. Yamshchikov | Viascheslav Shibaev | Alexey Tikhonov
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

This paper explores modern word embeddings in the context of sound symbolism. Using basic properties of the representations space one can construct semantic axes. A method is proposed to measure if the presence of individual sounds in a given word shifts its semantics of that word along a specific axis. It is shown that, in accordance with several experimental and statistical results, word embeddings capture symbolism for certain sounds.

pdf bib
Style Transfer for Texts: Retrain, Report Errors, Compare with Rewrites
Alexey Tikhonov | Viacheslav Shibaev | Aleksander Nagaev | Aigul Nugmanova | Ivan P. Yamshchikov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper shows that standard assessment methodology for style transfer has several significant problems. First, the standard metrics for style accuracy and semantics preservation vary significantly on different re-runs. Therefore one has to report error margins for the obtained results. Second, starting with certain values of bilingual evaluation understudy (BLEU) between input and output and accuracy of the sentiment transfer the optimization of these two standard metrics diverge from the intuitive goal of the style transfer task. Finally, due to the nature of the task itself, there is a specific dependence between these two metrics that could be easily manipulated. Under these circumstances, we suggest taking BLEU between input and human-written reformulations into consideration for benchmarks. We also propose three new architectures that outperform state of the art in terms of this metric.

pdf bib
Decomposing Textual Information For Style Transfer
Ivan P. Yamshchikov | Viacheslav Shibaev | Aleksander Nagaev | Jürgen Jost | Alexey Tikhonov
Proceedings of the 3rd Workshop on Neural Generation and Translation

This paper focuses on latent representations that could effectively decompose different aspects of textual information. Using a framework of style transfer for texts, we propose several empirical methods to assess information decomposition quality. We validate these methods with several state-of-the-art textual style transfer methods. Higher quality of information decomposition corresponds to higher performance in terms of bilingual evaluation understudy (BLEU) between output and human-written reformulations.