Hristo Tanev

Also published as: Hristo Tannev


2024

pdf bib
Event Detection in the Socio Political Domain
Emmanuel Cartier | Hristo Tanev
Proceedings of the Second Workshop on Natural Language Processing for Political Sciences @ LREC-COLING 2024

In this paper we present two approaches for detection of socio political events: the first is based on manually crafted keyword combinations and the second one is based on a BERT classifier. We compare the performance of the two systems on a dataset of socio-political events. Interestingly, the systems demonstrate complementary performance: both showing their best accuracy on non overlapping sets of event types. In the evaluation section we provide insights on the effect of taxonomy mapping on the event detection evaluation. We also review in the related work section the most important resources and approaches for event extraction in the recent years.

pdf bib
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Ali Hürriyetoğlu | Hristo Tanev | Surendrabikram Thapa | Gökçe Uludoğan
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

pdf
Leveraging Approximate Pattern Matching with BERT for Event Detection
Hristo Tanev
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

We describe a new weakly supervised method for sentence-level event detection, based exclusively on linear prototype patterns like “people got sick” or “a roadside bomb killed people”. We propose a new BERT based algorithm for approximate pattern matching to identify event phrases, semantically similar to these prototypes. To the best of our knowledge, a similar approach has not been used in the context of event detection. We experimented with two event corpora in the area of disease outbreaks and terrorism and we achieved promising results in sentence level event identification: 0.78 F1 score for new disease cases detection and 0.68 F1 in detecting terrorist attacks. Results were in line with some state-of-the-art systems.

pdf
JRC at ClimateActivism 2024: Lexicon-based Detection of Hate Speech
Hristo Tanev
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

In this paper we describe the participation of the JRC team in the Sub-task A: “Hate Speech Detection” in the Shared task on Hate Speech and Stance Detection during Climate Activism at the CASE 2024 workshop. Our system is purely lexicon (keyword) based and does not use any statistical classifier. The system ranked 18 out of 22 participants with F1 of 0.83, only one point below a system, based on LLM. Our system also obtained one the highest achieved precision scores among all participating algo- rithms.

pdf
A Concise Report of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Ali Hürriyetoğlu | Surendrabikram Thapa | Gökçe Uludoğan | Somaiyeh Dehghan | Hristo Tanev
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

In this paper, we provide a brief overview of the 7th workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) co-located with EACL 2024. This workshop consisted of regular papers, system description papers submitted by shared task participants, and overview papers of shared tasks held. This workshop series has been bringing together experts and enthusiasts from technical and social science fields, providing a platform for better understanding event information. This workshop not only advances text-based event extraction but also facilitates research in event extraction in multimodal settings.

2023

pdf bib
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Ali Hürriyetoğlu | Hristo Tanev | Vanni Zavarella | Reyyan Yeniterzi | Erdem Yörük | Milena Slavcheva
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

pdf bib
Where “where” Matters : Event Location Disambiguation with a BERT Language Model
Hristo Tanev | Bertrand De Longueville
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

The method method presented in this paper uses a BERT model for classifying location mentions in event reporting news texts into two classes: a place of an event, called main location, or another location mention, called here secondary location. Our evaluation on articles, reporting protests, shows promising results and demonstrates the feasibility of our approach and the event geolocation task in general. We evaluate our method against a simple baseline and state of the art ML models and we achieve a significant improvement in all cases by using the BERT model. In contrast to other location classification approaches, we completelly avoid lingusitic pre processing and feature engineering, which is a pre-requisite for all multi-domain and multilingual applications.

pdf
On the Road to a Protest Event Ontology for Bulgarian: Conceptual Structures and Representation Design
Milena Slavcheva | Hristo Tanev | Onur Uca
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

The paper presents a semantic model of protest events, called Semantic Interpretations of Protest Events (SemInPE). The analytical framework used for building the semantic representations is inspired by the object-oriented paradigm in computer science and a cognitive approach to the linguistic analysis. The model is a practical application of the Unified Eventity Representation (UER) formalism, which is based on the Unified Modeling Language (UML). The multi-layered architecture of the model provides flexible means for building the semantic representations of the language objects along a scale of generality and specificity. Thus, it is a suitable environment for creating the elements of ontologies on various topics and for different languages.

pdf
Detecting and Geocoding Battle Events from Social Media Messages on the Russo-Ukrainian War: Shared Task 2, CASE 2023
Hristo Tanev | Nicolas Stefanovitch | Andrew Halterman | Onur Uca | Vanni Zavarella | Ali Hurriyetoglu | Bertrand De Longueville | Leonida Della Rocca
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

The purpose of the shared task 2 at the Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) 2023 workshop was to test the abilities of the participating models and systems to detect and geocode armed conflicts events in social media messages from Telegram channels reporting on the Russo Ukrainian war. The evaluation followed an approach which was introduced in CASE 2021 (Giorgi et al., 2021): For each system we consider the correlation of the spatio-temporal distribution of its detected events and the events identified for the same period in the ACLED (Armed Conflict Location and Event Data Project) database (Raleigh et al., 2010). We use ACLED for the ground truth, since it is a well established standard in the field of event extraction and political trend analysis, which relies on human annotators for the encoding of security events using a fine grained taxonomy. Two systems participated in this shared task, we report in this paper on both the shared task and the participating systems.

pdf
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2023): Workshop and Shared Task Report
Ali Hürriyetoğlu | Hristo Tanev | Osman Mutlu | Surendrabikram Thapa | Fiona Anting Tan | Erdem Yörük
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

We provide a summary of the sixth edition of the CASE workshop that is held in the scope of RANLP 2023. The workshop consists of regular papers, three keynotes, working papers of shared task participants, and shared task overview papers. This workshop series has been bringing together all aspects of event information collection across technical and social science fields. In addition to contributing to the progress in text based event extraction, the workshop provides a space for the organization of a multimodal event information collection task.

2022

pdf bib
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
Ali Hürriyetoğlu | Hristo Tanev | Vanni Zavarella | Erdem Yörük
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

pdf
Tracking COVID-19 protest events in the United States. Shared Task 2: Event Database Replication, CASE 2022
Vanni Zavarella | Hristo Tanev | Ali Hürriyetoğlu | Peratham Wiriyathammabhum | Bertrand De Longueville
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

The goal of Shared Task 2 is evaluating state-of-the-art event detection systems by comparing the spatio-temporal distribution of the events they detect with existing event databases. The task focuses on some usability requirements of event detection systems in real worldscenarios. Namely, it aims to measure the ability of such a system to: (i) detect socio-political event mentions in news and social media, (ii) properly find their geographical locations, (iii) de-duplicate reports extracted from multiple sources referring to the same actual event. Building an annotated corpus for training and evaluating jointly these sub-tasks is highly time consuming. One possible way to indirectly evaluate a system’s output without an annotated corpus available is to measure its correlation with human-curated event data sets. In the last three years, the COVID-19 pandemic became motivation for restrictions and anti-pandemic measures on a world scale. This has triggered a wave of reactions and citizen actions in many countries. Shared Task 2 challenges participants to identify COVID-19 related protest actions from large unstructureddata sources both from mainstream and social media. We assess each system’s ability to model the evolution of protest events both temporally and spatially by using a number of correlation metrics with respect to a comprehensive and validated data set of COVID-related protest events (Raleigh et al., 2010).

pdf
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2022): Workshop and Shared Task Report
Ali Hürriyetoğlu | Hristo Tanev | Vanni Zavarella | Reyyan Yeniterzi | Osman Mutlu | Erdem Yörük
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP 2022. The workshop consists of regular papers, two keynotes, working papers of shared task participants, and task overview papers. This workshop has been bringing together all aspects of event information collection across technical and social science fields. In addition to the progress in depth, the submission and acceptance of multimodal approaches show the widening of this interdisciplinary research topic.

pdf bib
OntoPopulis, a System for Learning Semantic Classes
Hristo Tanev
Proceedings of the 5th International Conference on Computational Linguistics in Bulgaria (CLIB 2022)

Ontopopulis is a multilingual weakly supervised terminology learning algorithm which takes on its input a set of seed terms for a semantic category and an unannotated text corpus. The algorithm learns additional terms, which belong to this category. For example, for the category “environmental disasters” the input seed set in English is environmental disaster, water pollution, climate change. Among the highest ranked new terms which the system learns for this semantic class are deforestation, global warming and so on.

2021

pdf bib
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021): Workshop and Shared Task Report
Ali Hürriyetoğlu | Hristo Tanev | Vanni Zavarella | Jakub Piskorski | Reyyan Yeniterzi | Osman Mutlu | Deniz Yuret | Aline Villavicencio
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

This workshop is the fourth issue of a series of workshops on automatic extraction of socio-political events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of socio-political events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the state-of-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi- and cross-lingual machine learning in few- and zero-shot settings.

pdf
Discovering Black Lives Matter Events in the United States: Shared Task 3, CASE 2021
Salvatore Giorgi | Vanni Zavarella | Hristo Tanev | Nicolas Stefanovitch | Sy Hwang | Hansi Hettiarachchi | Tharindu Ranasinghe | Vivek Kalyan | Paul Tan | Shaun Tan | Martin Andrews | Tiancheng Hu | Niklas Stoehr | Francesco Ignazio Re | Daniel Vegh | Dennis Atzenhofer | Brenda Curtis | Ali Hürriyetoğlu
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

Evaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently. But, the ability to both (1) extract events “in the wild” from text and (2) properly evaluate event detection systems has potential to support a wide variety of tasks such as monitoring the activity of socio-political movements, examining media coverage and public support of these movements, and informing policy decisions. Therefore, we study performance of the best event detection systems on detecting Black Lives Matter (BLM) events from tweets and news articles. The murder of George Floyd, an unarmed Black man, at the hands of police officers received global attention throughout the second half of 2020. Protests against police violence emerged worldwide and the BLM movement, which was once mostly regulated to the United States, was now seeing activity globally. This shared task asks participants to identify BLM related events from large unstructured data sources, using systems pretrained to extract socio-political events from text. We evaluate several metrics, accessing each system’s ability to identify protest events both temporally and spatially. Results show that identifying daily protest counts is an easier task than classifying spatial and temporal protest trends simultaneously, with maximum performance of 0.745 and 0.210 (Pearson r), respectively. Additionally, all baselines and participant systems suffered from low recall, with a maximum recall of 5.08.

2020

pdf bib
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020
Ali Hürriyetoğlu | Erdem Yörük | Vanni Zavarella | Hristo Tanev
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020

pdf bib
Automated Extraction of Socio-political Events from News (AESPEN): Workshop and Shared Task Report
Ali Hürriyetoğlu | Vanni Zavarella | Hristo Tanev | Erdem Yörük | Ali Safaya | Osman Mutlu
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020

We describe our effort on automated extraction of socio-political events from news in the scope of a workshop and a shared task we organized at Language Resources and Evaluation Conference (LREC 2020). We believe the event extraction studies in computational linguistics and social and political sciences should further support each other in order to enable large scale socio-political event information collection across sources, countries, and languages. The event consists of regular research papers and a shared task, which is about event sentence coreference identification (ESCI), tracks. All submissions were reviewed by five members of the program committee. The workshop attracted research papers related to evaluation of machine learning methodologies, language resources, material conflict forecasting, and a shared task participation report in the scope of socio-political event information collection. It has shown us the volume and variety of both the data sources and event information collection approaches related to socio-political events and the need to fill the gap between automated text processing techniques and requirements of social and political sciences.

2019

pdf
JRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task
Guillaume Jacquet | Jakub Piskorski | Hristo Tanev | Ralf Steinberger
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

We report on the participation of the JRC Text Mining and Analysis Competence Centre (TMA-CC) in the BSNLP-2019 Shared Task, which focuses on named-entity recognition, lemmatisation and cross-lingual linking. We propose a hybrid system combining a rule-based approach and light ML techniques. We use multilingual lexical resources such as JRC-NAMES and BABELNET together with a named entity guesser to recognise names. In a second step, we combine known names with wild cards to increase recognition recall by also capturing inflection variants. In a third step, we increase precision by filtering these name candidates with automatically learnt inflection patterns derived from name occurrences in large news article collections. Our major requirement is to achieve high precision. We achieved an average of 65% F-measure with 93% precision on the four languages.

2017

pdf
On the Creation of a Security-Related Event Corpus
Martin Atkinson | Jakub Piskorski | Hristo Tanev | Vanni Zavarella
Proceedings of the Events and Stories in the News Workshop

This paper reports on an effort of creating a corpus of structured information on security-related events automatically extracted from on-line news, part of which has been manually curated. The main motivation behind this effort is to provide material to the NLP community working on event extraction that could be used both for training and evaluation purposes.

pdf
Large-scale news entity sentiment analysis
Ralf Steinberger | Stefanie Hegele | Hristo Tanev | Leonida Della Rocca
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We work on detecting positive or negative sentiment towards named entities in very large volumes of news articles. The aim is to monitor changes over time, as well as to work towards media bias detection by com-paring differences across news sources and countries. With view to applying the same method to dozens of languages, we use lin-guistically light-weight methods: searching for positive and negative terms in bags of words around entity mentions (also consid-ering negation). Evaluation results are good and better than a third-party baseline sys-tem, but precision is not sufficiently high to display the results publicly in our multilin-gual news analysis system Europe Media Monitor (EMM). In this paper, we focus on describing our effort to improve the English language results by avoiding the biggest sources of errors. We also present new work on using a syntactic parser to identify safe opinion recognition rules, such as predica-tive structures in which sentiment words di-rectly refer to an entity. The precision of this method is good, but recall is very low.

2016

pdf
Deftor at SemEval-2016 Task 14: Taxonomy enrichment using definition vectors
Hristo Tanev | Agata Rotondi
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
Detecting Implicit Expressions of Affect from Text using Semantic Knowledge on Common Concept Properties
Alexandra Balahur | Hristo Tanev
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Emotions are an important part of the human experience. They are responsible for the adaptation and integration in the environment, offering, most of the time together with the cognitive system, the appropriate responses to stimuli in the environment. As such, they are an important component in decision-making processes. In today’s society, the avalanche of stimuli present in the environment (physical or virtual) makes people more prone to respond to stronger affective stimuli (i.e., those that are related to their basic needs and motivations ― survival, food, shelter, etc.). In media reporting, this is translated in the use of arguments (factual data) that are known to trigger specific (strong, affective) behavioural reactions from the readers. This paper describes initial efforts to detect such arguments from text, based on the properties of concepts. The final system able to retrieve and label this type of data from the news in traditional and social platforms is intended to be integrated Europe Media Monitor family of applications to detect texts that trigger certain (especially negative) reactions from the public, with consequences on citizen safety and security.

2015

pdf
Towards Multilingual Event Extraction Evaluation: A Case Study for the Czech Language
Josef Steinberger | Hristo Tanev
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
The 5th Workshop on Balto-Slavic Natural Language Processing
Jakub Piskorski | Lidia Pivovarova | Jan Šnajder | Hristo Tanev | Roman Yangarber
The 5th Workshop on Balto-Slavic Natural Language Processing

2014

pdf
Challenges in Creating a Multilingual Sentiment Analysis Application for Social Media Mining
Alexandra Balahur | Hristo Tanev | Erik van der Goot
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf
Event Extraction for Balkan Languages
Vanni Zavarella | Dilek Küçük | Hristo Tanev | Ali Hürriyetoğlu
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing
Jakub Piskorski | Lidia Pivovarova | Hristo Tanev | Roman Yangarber
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing

pdf
Semi-automatic Acquisition of Lexical Resources and Grammars for Event Extraction in Bulgarian and Czech
Hristo Tanev | Josef Steinberger
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing

pdf
FSS-TimEx for TempEval-3: Extracting Temporal Information from Text
Vanni Zavarella | Hristo Tanev
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf
Acronym recognition and processing in 22 languages
Maud Ehrmann | Leonida Della Rocca | Ralf Steinberger | Hristo Tannev
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf
Detecting Event-Related Links and Sentiments from Social Media Texts
Alexandra Balahur | Hristo Tanev
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2011

pdf
Creating Sentiment Dictionaries via Triangulation
Josef Steinberger | Polina Lenkova | Mohamed Ebrahim | Maud Ehrmann | Ali Hurriyetoglu | Mijail Kabadjov | Ralf Steinberger | Hristo Tanev | Vanni Zavarella | Silvia Vázquez
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf
Pattern Learning for Event Extraction using Monolingual Statistical Machine Translation
Marco Turchi | Vanni Zavarella | Hristo Tanev
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2008

pdf bib
Online-Monitoring of Security-Related Events
Martin Atkinson | Jakub Piskorski | Bruno Pouliquen | Ralf Steinberger | Hristo Tanev | Vanni Zavarella
Coling 2008: Companion volume: Demonstrations

2007

pdf bib
Proceedings of the Workshop on Balto-Slavonic Natural Language Processing
Jakub Piskorski | Hristo Tanev
Proceedings of the Workshop on Balto-Slavonic Natural Language Processing

2006

pdf
Weakly Supervised Approaches for Ontology Population
Hristo Tanev | Bernardo Magnini
11th Conference of the European Chapter of the Association for Computational Linguistics

2004

pdf
Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions
Hristo Tanev | Milen Kouylekov | Matteo Negri | Bonaventura Coppola | Bernardo Magnini
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Scaling Web-based Acquisition of Entailment Relations
Idan Szpektor | Hristo Tanev | Ido Dagan | Bonaventura Coppola
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2002

pdf
Towards Automatic Evaluation of Question/Answering Systems
Bernardo Magnini | Matteo Negri | Roberto Prevete | Hristo Tanev
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf
Is It the Right Answer? Exploiting Web Redundancy for Answer Validation
Bernardo Magnini | Matteo Negri | Roberto Prevete | Hristo Tanev
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf
Shallow Language Processing Architecture for Bulgarian
Hristo Tanev | Ruslan Mitkov
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
A WordNet-Based Approach to Named Entites Recognition
Bernardo Magnini | Matteo Negri | Roberto Prevete | Hristo Tanev
COLING-02: SEMANET: Building and Using Semantic Networks

2000

pdf
LINGUA: a robust architecture for text processing and anaphora resolution in Bulgarian
Hristo Tanev | Ruslan Mitkov
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000