This paper describes Slav-NER: the 3rd Multilingual Named Entity Challenge in Slavic languages. The tasks involve recognizing mentions of named entities in Web documents, normalization of the names, and cross-lingual linking. The Challenge covers six languages and five entity types, and is organized as part of the 8th Balto-Slavic Natural Language Processing Workshop, co-located with the EACL 2021 Conference. Ten teams participated in the competition. Performance for the named entity recognition task reached 90% F-measure, much higher than reported in the first edition of the Challenge. Seven teams covered all six languages, and five teams participated in the cross-lingual entity linking task. Detailed valuation information is available on the shared task web page.
The paper reports on an effort to reconsider the representation of some cases of derivational paradigm patterns in Bulgarian. The new treatment implemented within BulTreeBank-WordNet (BTB-WN), a wordnet for Bulgarian, is the grouping together of related words that have a common main meaning in the same synset while the nuances in sense are to be encoded within the synset as a modification functions over the main meaning. In this way, we can solve the following challenges: (1) to avoid the influence of English Wordnet (EWN) synset distinctions over Bulgarian that was a result from the translation of some of the synsets from Core WordNet; (2) to represent the common meaning of such derivation patterns just once and to improve the management of BTB-WN, and (3) to encode idiosyncratic usages locally to the corresponding synsets instead of introducing new semantic relations.
This paper reports on the first steps in the creation of linked data through the mapping of BTB-WordNet and the Bulgarian Wikipedia. The task of expanding the BTB-WordNet with encyclopedic knowledge is done by mapping its synsets to Wikipedia pages with many MWEs found in the articles and subjected to further analysis. We look for a way to filter the Wikipedia MWEs in the effort of selecting the ones most beneficial to the enrichment of BTB-WN.
Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.
The paper presents several common approaches towards cross- and multi-lingual coreference resolution in a search of the most effective practices to be applied within the work on Bulgarian-English manual coreference annotation of a short story. The work aims at outlining the typology of the differences in the annotated parallel texts. The results of the research prove to be comparable with the tendencies observed in similar works on other Slavic languages and show surprising differences between the types of markables and their frequency in Bulgarian and English.
The paper reports on an ongoing work that manually maps the Bulgarian WordNet BTB-WN with Bulgarian Wikipedia. The preparatory work of extracting the Wikipedia articles and provisionally relating them to the WordNet lemmas was done automatically. The manual work includes checking of the corresponding senses in both resources as well as the missing ones. The main cases of mapping are considered. The first experiments of mapping about 1000 synsets show the establishment of more than 78 % of exact correspondences and nearly 15 % of new synsets.
The paper presents the characteristics of the predominant types of MultiWord expressions (MWEs) in the BulTreeBank WordNet – BTB-WN. Their distribution in BTB-WN is discussed with respect to the overall hierarchical organization of the lexical resource. Also, a catena-based modeling is proposed for handling the issues of lexical semantics of MWEs.