Towards the Data-driven System for Rhetorical Parsing of Russian Texts

Elena Chistova, Maria Kobozeva, Dina Pisarevskaya, Artem Shelmanov, Ivan Smirnov, Svetlana Toldova

[How to correct problems with metadata yourself]


Abstract
Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.
Anthology ID:
W19-2711
Volume:
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Month:
June
Year:
2019
Address:
Minneapolis, MN
Editors:
Amir Zeldes, Debopam Das, Erick Maziero Galani, Juliano Desiderato Antonio, Mikel Iruskieta
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
82–87
Language:
URL:
https://aclanthology.org/W19-2711
DOI:
10.18653/v1/W19-2711
Bibkey:
Cite (ACL):
Elena Chistova, Maria Kobozeva, Dina Pisarevskaya, Artem Shelmanov, Ivan Smirnov, and Svetlana Toldova. 2019. Towards the Data-driven System for Rhetorical Parsing of Russian Texts. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 82–87, Minneapolis, MN. Association for Computational Linguistics.
Cite (Informal):
Towards the Data-driven System for Rhetorical Parsing of Russian Texts (Chistova et al., NAACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/W19-2711.pdf
Poster:
 W19-2711.Poster.pdf