2023
pdf
abs
TeamEC at SemEval-2023 Task 4: Transformers vs. Low-Resource Dictionaries, Expert Dictionary vs. Learned Dictionary
Nicolas Stefanovitch
|
Bertrand De Longueville
|
Mario Scharfbillig
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes the system we used to participate in the shared task, as well as additional experiments beyond the scope of the shared task, but using its data. Our primary goal is to compare the effectiveness of transformers model compared to low-resource dictionaries. Secondly, we compare the difference in performance of a learned dictionary and of a dictionary designed by experts in the field of values. Our findings surprisingly show that transformers perform on par with a dictionary containing less than 1k words, when evaluated with 19 fine-grained categories, and only outperform a dictionary-based approach in a coarse setting with 10 categories. Interestingly, the expert dictionary has a precision on par with the learned one, while its recall is clearly lower, potentially an indication of overfitting of topics to values in the shared task’s dataset. Our findings should be of interest to both the NLP and Value scientific communities on the use of automated approaches for value classification
pdf
bib
abs
Where “where” Matters : Event Location Disambiguation with a BERT Language Model
Hristo Tanev
|
Bertrand De Longueville
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
The method method presented in this paper uses a BERT model for classifying location mentions in event reporting news texts into two classes: a place of an event, called main location, or another location mention, called here secondary location. Our evaluation on articles, reporting protests, shows promising results and demonstrates the feasibility of our approach and the event geolocation task in general. We evaluate our method against a simple baseline and state of the art ML models and we achieve a significant improvement in all cases by using the BERT model. In contrast to other location classification approaches, we completelly avoid lingusitic pre processing and feature engineering, which is a pre-requisite for all multi-domain and multilingual applications.
pdf
abs
Detecting and Geocoding Battle Events from Social Media Messages on the Russo-Ukrainian War: Shared Task 2, CASE 2023
Hristo Tanev
|
Nicolas Stefanovitch
|
Andrew Halterman
|
Onur Uca
|
Vanni Zavarella
|
Ali Hurriyetoglu
|
Bertrand De Longueville
|
Leonida Della Rocca
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
The purpose of the shared task 2 at the Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) 2023 workshop was to test the abilities of the participating models and systems to detect and geocode armed conflicts events in social media messages from Telegram channels reporting on the Russo Ukrainian war. The evaluation followed an approach which was introduced in CASE 2021 (Giorgi et al., 2021): For each system we consider the correlation of the spatio-temporal distribution of its detected events and the events identified for the same period in the ACLED (Armed Conflict Location and Event Data Project) database (Raleigh et al., 2010). We use ACLED for the ground truth, since it is a well established standard in the field of event extraction and political trend analysis, which relies on human annotators for the encoding of security events using a fine grained taxonomy. Two systems participated in this shared task, we report in this paper on both the shared task and the participating systems.
2022
pdf
abs
Tracking COVID-19 protest events in the United States. Shared Task 2: Event Database Replication, CASE 2022
Vanni Zavarella
|
Hristo Tanev
|
Ali Hürriyetoğlu
|
Peratham Wiriyathammabhum
|
Bertrand De Longueville
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
The goal of Shared Task 2 is evaluating state-of-the-art event detection systems by comparing the spatio-temporal distribution of the events they detect with existing event databases. The task focuses on some usability requirements of event detection systems in real worldscenarios. Namely, it aims to measure the ability of such a system to: (i) detect socio-political event mentions in news and social media, (ii) properly find their geographical locations, (iii) de-duplicate reports extracted from multiple sources referring to the same actual event. Building an annotated corpus for training and evaluating jointly these sub-tasks is highly time consuming. One possible way to indirectly evaluate a system’s output without an annotated corpus available is to measure its correlation with human-curated event data sets. In the last three years, the COVID-19 pandemic became motivation for restrictions and anti-pandemic measures on a world scale. This has triggered a wave of reactions and citizen actions in many countries. Shared Task 2 challenges participants to identify COVID-19 related protest actions from large unstructureddata sources both from mainstream and social media. We assess each system’s ability to model the evolution of protest events both temporally and spatially by using a number of correlation metrics with respect to a comprehensive and validated data set of COVID-related protest events (Raleigh et al., 2010).