Zara Kancheva

2021

This paper describes Slav-NER: the 3rd Multilingual Named Entity Challenge in Slavic languages. The tasks involve recognizing mentions of named entities in Web documents, normalization of the names, and cross-lingual linking. The Challenge covers six languages and five entity types, and is organized as part of the 8th Balto-Slavic Natural Language Processing Workshop, co-located with the EACL 2021 Conference. Ten teams participated in the competition. Performance for the named entity recognition task reached 90% F-measure, much higher than reported in the first edition of the Challenge. Seven teams covered all six languages, and five teams participated in the cross-lingual entity linking task. Detailed valuation information is available on the shared task web page.

pdf abs
Handling synset overgeneration: Sense Merging in BTB-WN
Ivaylo Radev | Zara Kancheva
Proceedings of the Student Research Workshop Associated with RANLP 2021

The paper reports on an effort to reconsider the representation of some cases of derivational paradigm patterns in Bulgarian. The new treatment implemented within BulTreeBank-WordNet (BTB-WN), a wordnet for Bulgarian, is the grouping together of related words that have a common main meaning in the same synset while the nuances in sense are to be encoded within the synset as a modification functions over the main meaning. In this way, we can solve the following challenges: (1) to avoid the influence of English Wordnet (EWN) synset distinctions over Bulgarian that was a result from the translation of some of the synsets from Core WordNet; (2) to represent the common meaning of such derivation patterns just once and to improve the management of BTB-WN, and (3) to encode idiosyncratic usages locally to the corresponding synsets instead of introducing new semantic relations.

2020

pdf abs
Linguistic vs. encyclopedic knowledge. Classification of MWEs on the base of domain information
Zara Kancheva | Ivaylo Radev
Proceedings of the 4th International Conference on Computational Linguistics in Bulgaria (CLIB 2020)

This paper reports on the first steps in the creation of linked data through the mapping of BTB-WordNet and the Bulgarian Wikipedia. The task of expanding the BTB-WordNet with encyclopedic knowledge is done by mapping its synsets to Wikipedia pages with many MWEs found in the articles and subjected to further analysis. We look for a way to filter the Wikipedia MWEs in the effort of selecting the ones most beneficial to the enrichment of BTB-WN.

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

2019

pdf abs
Cross-Lingual Coreference: The Case of Bulgarian and English
Zara Kancheva
Proceedings of the Student Research Workshop Associated with RANLP 2019

The paper presents several common approaches towards cross- and multi-lingual coreference resolution in a search of the most effective practices to be applied within the work on Bulgarian-English manual coreference annotation of a short story. The work aims at outlining the typology of the differences in the annotated parallel texts. The results of the research prove to be comparable with the tendencies observed in similar works on other Slavic languages and show surprising differences between the types of markables and their frequency in Bulgarian and English.

pdf abs
Aligning the Bulgarian BTB WordNet with the Bulgarian Wikipedia
Kiril Simov | Petya Osenova | Laska Laskova | Ivajlo Radev | Zara Kancheva
Proceedings of the 10th Global Wordnet Conference

The paper reports on an ongoing work that manually maps the Bulgarian WordNet BTB-WN with Bulgarian Wikipedia. The preparatory work of extracting the Wikipedia articles and provisionally relating them to the WordNet lemmas was done automatically. The manual work includes checking of the corresponding senses in both resources as well as the missing ones. The main cases of mapping are considered. The first experiments of mapping about 1000 synsets show the establishment of more than 78 % of exact correspondences and nearly 15 % of new synsets.

pdf abs
Modeling MWEs in BTB-WN
Laska Laskova | Petya Osenova | Kiril Simov | Ivajlo Radev | Zara Kancheva
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

The paper presents the characteristics of the predominant types of MultiWord expressions (MWEs) in the BulTreeBank WordNet – BTB-WN. Their distribution in BTB-WN is discussed with respect to the overall hierarchical organization of the lexical resource. Also, a catena-based modeling is proposed for handling the issues of lexical semantics of MWEs.