Elena Chistova


2019

pdf bib
Towards the Data-driven System for Rhetorical Parsing of Russian Texts
Artem Shelmanov | Dina Pisarevskaya | Elena Chistova | Svetlana Toldova | Maria Kobozeva | Ivan Smirnov
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.

pdf bib
Semantic Role Labeling with Pretrained Language Models for Known and Unknown Predicates
Daniil Larionov | Artem Shelmanov | Elena Chistova | Ivan Smirnov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

We build the first full pipeline for semantic role labelling of Russian texts. The pipeline implements predicate identification, argument extraction, argument classification (labeling), and global scoring via integer linear programming. We train supervised neural network models for argument classification using Russian semantically annotated corpus – FrameBank. However, we note that this resource provides annotations only to a very limited set of predicates. We combat the problem of annotation scarcity by introducing two models that rely on different sets of features: one for “known” predicates that are present in the training set and one for “unknown” predicates that are not. We show that the model for “unknown” predicates can alleviate the lack of annotation by using pretrained embeddings. We perform experiments with various types of embeddings including the ones generated by deep pretrained language models: word2vec, FastText, ELMo, BERT, and show that embeddings generated by deep pretrained language models are superior to classical shallow embeddings for argument classification of both “known” and “unknown” predicates.