Milena Slavcheva

2022

We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese {& Subtask 4 English.

2021

pdf abs
Monitoring Fact Preservation, Grammatical Consistency and Ethical Behavior of Abstractive Summarization Neural Models
Iva Marinova | Yolina Petrova | Milena Slavcheva | Petya Osenova | Ivaylo Radev | Kiril Simov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The paper describes a system for automatic summarization in English language of online news data that come from different non-English languages. The system is designed to be used in production environment for media monitoring. Automatic summarization can be very helpful in this domain when applied as a helper tool for journalists so that they can review just the important information from the news channels. However, like every software solution, the automatic summarization needs performance monitoring and assured safe environment for the clients. In media monitoring environment the most problematic features to be addressed are: the copyright issues, the factual consistency, the style of the text and the ethical norms in journalism. Thus, the main contribution of our present work is that the above mentioned characteristics are successfully monitored in neural automatic summarization models and improved with the help of validation, fact-preserving and fact-checking procedures.

This paper presents a semantic classification of reflexive verbs in Bulgarian, augmenting the morphosyntactic classes of verbs in the large Bulgarian Lexical Data Base - a language resource utilized in a number of Language Engineering (LE) applications. Thesemantic descriptors conform to the Unified Eventity Representation (UER), developed by Andrea Schalley. The UER is a graphical formalism, introducing the object-oriented system design to linguistic semantics. Reflexive/non-reflexive verb pairs are analyzed where the non-reflexive member of the opposition, a two-place predicate, is considered the initial linguistic entity from which the reflexive correlate is derived. The reflexive verbs are distributed into initial syntactic-semantic classes which serve as the basis for defining the relevant semantic descriptors in the form of EVENTITY FRAME diagrams. The factors that influence the categorization of the reflexives are the lexical paradigmaticapproach to the data, the choice of only one reading for each verb, top level generalization of the semantic descriptors. The language models described in this paper provide the possibility for building linguistic components utilizable in knowledge-driven systems.

2004

pdf
Verb Valency Descriptors for a Syntactic Treebank
Milena Slavcheva
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf
Some Aspects of the Morphological Processing of Bulgarian
Milena Slavcheva
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages

2002

1993

pdf
The Long Journey from the Core to the Real Size of Large LDBs
Elena Paskaleva | Kiril Simov | Mariana Damova | Milena Slavcheva
Acquisition of Lexical Knowledge from Text