Ivelina Stoyanova


Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset
Svetla Koeva | Ivelina Stoyanova | Jordan Kralev
Proceedings of the Thirteenth Language Resources and Evaluation Conference

One of the processing tasks for large multimodal data streams is automatic image description (image classification, object segmentation and classification). Although the number and the diversity of image datasets is constantly expanding, still there is a huge demand for more datasets in terms of variety of domains and object classes covered. The goal of the project Multilingual Image Corpus (MIC 21) is to provide a large image dataset with annotated objects and object descriptions in 24 languages. The Multilingual Image Corpus consists of an Ontology of visual objects (based on WordNet) and a collection of thematically related images whose objects are annotated with segmentation masks and labels describing the ontology classes. The dataset is designed both for image classification and object detection and for semantic segmentation. The main contributions of our work are: a) the provision of large collection of high quality copyright-free images; b) the formulation of the Ontology of visual objects based on WordNet noun hierarchies; c) the precise manual correction of automatic object segmentation within the images and the annotation of object classes; and d) the association of objects and images with extended multilingual descriptions based on WordNet inner- and interlingual relations. The dataset can be used also for multilingual image caption generation, image-to-text alignment and automatic question answering for images and videos.

WordNet-Based Bulgarian Sign Language Dictionary of Crisis Management Terminology
Slavina Lozanova | Ivelina Stoyanova
Proceedings of the 5th International Conference on Computational Linguistics in Bulgaria (CLIB 2022)

This paper presents an online Bulgarian sign language dictionary covering terminology related to crisis management. The pressing need for such a resource became evident during the COVID pandemic when critical information regarding government measures was delivered on a regular basis to the public including Deaf citizens. The dictionary is freely available on the internet and is aimed at the Deaf, sign language interpreters, learners of sign language, social workers and the wide public. Each dictionary entry is supplied with synonyms in spoken Bulgarian, a definition, one or more signs corresponding to the concept in Bulgarian sign language, additional information about derivationally related words and similar signs with different meaning, as well as links to translations in other languages, including American sign language.

Linked Resources towards Enhancing the Conceptual Description of General Lexis Verbs Using Syntactic Information
Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the 5th International Conference on Computational Linguistics in Bulgaria (CLIB 2022)



Semantic Analysis of Verb-Noun Derivation in Princeton WordNet
Verginica Mititelu | Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the 11th Global Wordnet Conference

We present here the results of a morphosemantic analysis of the verb-noun pairs in the Princeton WordNet as reflected in the standoff file containing pairs annotated with a set of 14 semantic relations. We have automatically distinguished between zero-derivation and affixal derivation in the data and identified the affixes and manually checked the results. The data show that for each semantic relation an affix prevails in creating new words, although we cannot talk about their specificity with respect to such a relation. Moreover, certain pairs of verb-noun semantic primes are better represented for each semantic relation, and some semantic clusters (in the form of WordNet subtrees) take shape as a result. We thus employ a large-scale data-driven linguistically motivated analysis afforded by the rich derivational and morphosemantic description in WordNet to the end of capturing finer regularities in the process of derivation as represented in the semantic properties of the words involved and as reflected in the structure of the lexicon.


It Takes Two to Tango – Towards a Multilingual MWE Resource
Svetlozara Leseva | Verginica Barbu Mititelu | Ivelina Stoyanova
Proceedings of the 4th International Conference on Computational Linguistics in Bulgaria (CLIB 2020)

Mature wordnets offer the opportunity of digging out interesting linguistic information otherwise not explicitly marked in the network. The focus in this paper is on the ways the results already obtained at two levels, derivation and multiword expressions, may be further employed. The parallel recent development of the two resources under discussion, the Bulgarian and the Romanian wordnets, has enabled interlingual analyses that reveal similarities and differences between the linguistic knowledge encoded in the two wordnets. In this paper we show how the resources developed and the knowledge gained are put together towards devising a linked MWE resource that is informed by layered dictionary representation and corpus annotation and analysis. This work is a proof of concept for the adopted method of compiling a multilingual MWE resource on the basis of information extracted from the Bulgarian, the Romanian and the Princeton wordnet, as well as additional language resources and automatic procedures.

Consistency Evaluation towards Enhancing the Conceptual Representation of Verbs in WordNet
Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the 4th International Conference on Computational Linguistics in Bulgaria (CLIB 2020)

This paper outlines the process of enhancing the conceptual description of verb synsets in WordNet using FrameNet frames. On the one hand we expand the coverage of the mapping between WordNet and FrameNet, while on the other – we improve the quality of the mapping using a set of consistency checks and verification procedures. The procedures include an automatic identification of potential inconsistencies and imbalanced relations, as well as suggestions for a more precise frame assignment followed by manual validation. We perform an evaluation of the procedures in terms of the quality of the suggestions measured as the potential improvement in precision and coverage, the relevance of the result and the efficiency of the procedure.


Structural Approach to Enhancing WordNet with Conceptual Frame Semantics
Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper outlines procedures for enhancing WordNet with conceptual information from FrameNet. The mapping of the two resources is non-trivial. We define a number of techniques for the validation of the consistency of the mapping and the extension of its coverage which make use of the structure of both resources and the systematic relations between synsets in WordNet and between frames in FrameNet, as well as between synsets and frames). We present a case study on causativity, a relation which provides enhancement complementary to the one using hierarchical relations, by means of linking in a systematic way large parts of the lexicon. We show how consistency checks and denser relations may be implemented on the basis of this relation. We, then, propose new frames based on causative-inchoative correspondences and in conclusion touch on the possibilities for defining new frames based on the types of specialisation that takes place from parent to child synset.

Enhancing Conceptual Description through Resource Linking and Exploration of Semantic Relations
Ivelina Stoyanova | Svetlozara Leseva
Proceedings of the 10th Global Wordnet Conference

The paper presents current efforts towards linking two large lexical semantic resources – WordNet and FrameNet – to the end of their mutual enrichment and the facilitation of the access, extraction and analysis of various types of semantic and syntactic information. In the second part of the paper, we go on to examine the relation of inheritance and other semantic relations as represented in WordNet and FrameNet and how they correspond to each other when the resources are aligned. We discuss the implications with respect to the enhancement of the two resources through the definition of new relations and the detailisation of conceptual frames.

pdf bib
Hear about Verbal Multiword Expressions in the Bulgarian and the Romanian Wordnets Straight from the Horse’s Mouth
Verginica Barbu Mititelu | Ivelina Stoyanova | Svetlozara Leseva | Maria Mitrofan | Tsvetana Dimitrova | Maria Todorova
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

In this paper we focus on verbal multiword expressions (VMWEs) in Bulgarian and Romanian as reflected in the wordnets of the two languages. The annotation of VMWEs relies on the classification defined within the PARSEME Cost Action. After outlining the properties of various types of VMWEs, a cross-language comparison is drawn, aimed to highlight the similarities and the differences between Bulgarian and Romanian with respect to the lexicalization and distribution of VMWEs. The contribution of this work is in outlining essential features of the description and classification of VMWEs and the cross-language comparison at the lexical level, which is essential for the understanding of the need for uniform annotation guidelines and a viable procedure for validation of the annotation.


Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.


The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.


Automatic Prediction of Morphosemantic Relations
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Tsvetana Dimitrova | Maria Todorova
Proceedings of the 8th Global WordNet Conference (GWC)

This paper presents a machine learning method for automatic identification and classification of morphosemantic relations (MSRs) between verb and noun synset pairs in the Bulgarian WordNet (BulNet). The core training data comprise 6,641 morphosemantically related verb–noun literal pairs from BulNet. The core dataset were preprocessed quality-wise by applying validation and reorganisation procedures. Further, the data were supplemented with negative examples of literal pairs not linked by an MSR. The designed supervised machine learning method uses the RandomTree algorithm and is implemented in Java with the Weka package. A set of experiments were performed to test various approaches to the task. Future work on improving the classifier includes adding more training data, employing more features, and fine-tuning. Apart from the language specific information about derivational processes, the proposed method is language independent.


Automatic Classification of WordNet Morphosemantic Relations
Svetlozara Leseva | Ivelina Stoyanova | Maria Todorova | Tsvetana Dimitrova | Borislav Rizov | Svetla Koeva
The 5th Workshop on Balto-Slavic Natural Language Processing


Wordnet-Based Cross-Language Identification of Semantic Relations
Ivelina Stoyanova | Svetla Koeva | Svetlozara Leseva
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing

Text Modification for Bulgarian Sign Language Users
Slavina Lozanova | Ivelina Stoyanova | Svetlozara Leseva | Svetla Koeva | Boian Savtchev
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations


Bulgarian X-language Parallel Corpus
Svetla Koeva | Ivelina Stoyanova | Rositsa Dekova | Borislav Rizov | Angel Genov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent in terms of compilation methodology, text representation, metadata description and annotation conventions. The approaches implemented in the construction of Bul-X-Cor include using readily available text collections on the web, manual compilation (by means of Internet browsing) and preferably automatic compilation (by means of web crawling ― general and focused). Certain levels of annotation applied to Bul-X-Cor are taken as obligatory (sentence segmentation and sentence alignment), while others depend on the availability of tools for a particular language (morpho-syntactic tagging, lemmatisation, syntactic parsing, named entity recognition, word sense disambiguation, etc.) or for a particular task (word and clause alignment). To achieve uniformity of the annotation we have either annotated raw data from scratch or transformed the already existing annotation to follow the conventions accepted for BulNC. Finally, actual uses of the corpora are presented and conclusions are drawn with respect to future work.

Application of Clause Alignment for Statistical Machine Translation
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Rositsa Dekova | Angel Genov | Borislav Rizov | Tsvetana Dimitrova | Ekaterina Tarpomanova | Hristina Kukova
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation