2024
pdf
abs
Multilinguality in the VIGILANT project
Brendan Spillane
|
Carolina Scarton
|
Robert Moro
|
Petar Ivanov
|
Andrey Tagarev
|
Jakub Simko
|
Ibrahim Abu Farha
|
Gary Munnelly
|
Filip Uhlárik
|
Freddy Heppell
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)
VIGILANT (Vital IntelliGence to Investigate ILlegAl DisiNformaTion) is a three-year Horizon Europe project that will equip European Law Enforcement Agencies (LEAs) with advanced disinformation detection and analysis tools to investigate and prevent criminal activities linked to disinformation. These include disinformation instigating violence towards minorities, promoting false medical cures, and increasing tensions between groups causing civil unrest and violent acts. VIGILANT’s four LEAs require support for English, Spanish, Catalan, Greek, Estonian, Romanian and Russian. Therefore, multilinguality is a major challenge and we present the current status of our tools and our plans to improve their performance.
2023
pdf
abs
Comparison of Multilingual Entity Linking Approaches
Ivelina Bozhinova
|
Andrey Tagarev
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Despite rapid developments in the field of Natural Language Processing (NLP) in the past few years, the task of Multilingual Entity Linking (MEL) and especially its end-to-end formulation remains challenging. In this paper we aim to evaluate solutions for general end-to-end multilingual entity linking by conducting experiments using both existing complete approaches and novel combinations of pipelines for solving the task. The results identify the best performing current solutions and suggest some directions for further research.
pdf
abs
NEXT: An Event Schema Extension Approach for Closed-Domain Event Extraction Models
Elena Tuparova
|
Petar Ivanov
|
Andrey Tagarev
|
Svetla Boytcheva
|
Ivan Koychev
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Event extraction from textual data is a NLP research task relevant to a plethora of domains. Most approaches aim to recognize events from a predefined event schema, consisting of event types and their corresponding arguments. For domains, such as disinformation, where new event types emerge frequently, there is a need to adapt such fixed event schemas to accommodate for new event types. We present NEXT (New Event eXTraction) - a resource-sparse approach to extending a close-domain model to novel event types, that requires a very small number of annotated samples for fine-tuning performed on a single GPU. Furthermore, our results suggest that this approach is suitable not only for extraction of new event types, but also for recognition of existing event types, as the use of this approach on a new dataset leads to improved recall for all existing events while retaining precision.
2021
pdf
abs
Tackling Multilinguality and Internationality in Fake News
Andrey Tagarev
|
Krasimira Bozhanova
|
Ivelina Nikolova-Koleva
|
Ivan Ivanov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
The last several years have seen a massive increase in the quantity and influence of disinformation being spread online. Various approaches have been developed to target the process at different stages from identifying sources to tracking distribution in social media to providing follow up debunks to people who have encountered the disinformation. One common conclusion in each of these approaches is that disinformation is too nuanced and subjective a topic for fully automated solutions to work but the quantity of data to process and cross-reference is too high for humans to handle unassisted. Ultimately, the problem calls for a hybrid approach of human experts with technological assistance. In this paper we will demonstrate the application of certain state-of-the-art NLP techniques in assisting expert debunkers and fact checkers as well as the role of these NLP algorithms within a more holistic approach to analyzing and countering the spread of disinformation. We will present a multilingual corpus of disinformation and debunks which contains text, concept tags, images and videos as well as various methods for searching and leveraging the content.
2019
pdf
abs
Comparison of Machine Learning Approaches for Industry Classification Based on Textual Descriptions of Companies
Andrey Tagarev
|
Nikola Tulechki
|
Svetla Boytcheva
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
This paper addresses the task of categorizing companies within industry classification schemes. The datasets consists of encyclopedic articles about companies and their economic activities. The target classification schema is build by mapping linked open data in a semi-supervised manner. Target classes are build bottom-up from DBpedia. We apply several state of the art text classification techniques, based both on deep-learning and classical vector-space models.