2019
pdf
abs
Char-RNN for Word Stress Detection in East Slavic Languages
Ekaterina Chernyak
|
Maria Ponomareva
|
Kirill Milintsevich
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
We explore how well a sequence labeling approach, namely, recurrent neural network, is suited for the task of resource-poor and POS tagging free word stress detection in the Russian, Ukranian, Belarusian languages. We present new datasets, annotated with the word stress, for the three languages and compare several RNN models trained on three languages and explore possible applications of the transfer learning for the task. We show that it is possible to train a model in a cross-lingual setting and that using additional languages improves the quality of the results.
2017
pdf
abs
Comparison of String Similarity Measures for Obscenity Filtering
Ekaterina Chernyak
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collection and a baseline for the task. Our experiments show that a novel string similarity measure based on the notion of an annotated suffix tree outperforms some of the other well known measures.
pdf
abs
Automated Word Stress Detection in Russian
Maria Ponomareva
|
Kirill Milintsevich
|
Ekaterina Chernyak
|
Anatoly Starostin
Proceedings of the First Workshop on Subword and Character Level Models in NLP
In this study we address the problem of automated word stress detection in Russian using character level models and no part-speech-taggers. We use a simple bidirectional RNN with LSTM nodes and achieve accuracy of 90% or higher. We experiment with two training datasets and show that using the data from an annotated corpus is much more efficient than using only a dictionary, since it allows to retain the context of the word and its morphological features.
2016
pdf
Visualization of Dynamic Reference Graphs
Ivan Rodin
|
Ekaterina Chernyak
|
Mikhail Dubov
|
Boris Mirkin
Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing
pdf
abs
Extracting Social Networks from Literary Text with Word Embedding Tools
Gerhard Wohlgenannt
|
Ekaterina Chernyak
|
Dmitry Ilvovsky
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)
In this paper a social network is extracted from a literary text. The social network shows, how frequent the characters interact and how similar their social behavior is. Two types of similarity measures are used: the first applies co-occurrence statistics, while the second exploits cosine similarity on different types of word embedding vectors. The results are evaluated by a paid micro-task crowdsourcing survey. The experiments suggest that specific types of word embeddings like word2vec are well-suited for the task at hand and the specific circumstances of literary fiction text.