Pavel Braslavski


2021

pdf bib
NEREL: A Russian Dataset with Nested Named Entities, Relations and Events
Natalia Loukachevitch | Ekaterina Artemova | Tatiana Batura | Pavel Braslavski | Ilia Denisov | Vladimir Ivanov | Suresh Manandhar | Alexander Pugachev | Elena Tutubalina
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL.

2019

pdf bib
Large Dataset and Language Model Fun-Tuning for Humor Recognition
Vladislav Blinov | Valeria Bolotova-Baranova | Pavel Braslavski
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The task of humor recognition has attracted a lot of attention recently due to the urge to process large amounts of user-generated texts and rise of conversational agents. We collected a dataset of jokes and funny dialogues in Russian from various online resources and complemented them carefully with unfunny texts with similar lexical properties. The dataset comprises of more than 300,000 short texts, which is significantly larger than any previous humor-related corpus. Manual annotation of 2,000 items proved the reliability of the corpus construction approach. Further, we applied language model fine-tuning for text classification and obtained an F1 score of 0.91 on a test set, which constitutes a considerable gain over baseline methods. The dataset is freely available for research community.

2016

pdf bib
YARN: Spinning-in-Progress
Pavel Braslavski | Dmitry Ustalov | Mikhail Mukhin | Yuri Kiselev
Proceedings of the 8th Global WordNet Conference (GWC)

YARN (Yet Another RussNet), a project started in 2013, aims at creating a large open WordNet-like thesaurus for Russian by means of crowdsourcing. The first stage of the project was to create noun synsets. Currently, the resource comprises 48K+ word entries and 44K+ synsets. More than 200 people have taken part in assembling synsets throughout the project. The paper describes the linguistic, technical, and organizational principles of the project, as well as the evaluation results, lessons learned, and the future plans.

2014

pdf bib
A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus
Pavel Braslavski | Dmitry Ustalov | Mikhail Mukhin
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
English-to-Russian MT evaluation campaign
Pavel Braslavski | Alexander Beloborodov | Maxim Khalilov | Serge Sharoff
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2008

pdf bib
Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems
Georg Rehm | Marina Santini | Alexander Mehler | Pavel Braslavski | Rüdiger Gleim | Andrea Stubbe | Svetlana Symonenko | Mirko Tavosanis | Vedrana Vidulin
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished reference corpus to contain multi-level tags of the respective genre or genres a web document or a website instantiates. As the construction of such a corpus is by no means a trivial task, we discuss several alternatives that are, for the time being, mostly based on existing collections. Furthermore, we discuss a shared set of genre categories and a multi-purpose tool as two additional prerequisites for a reference corpus of web genres.