Sylvain Gatepaille


2025

Narratives are a new tool to propagate ideas that are sometimes well hidden in press articles. The SemEval-2025 Task 10 focuses on detecting and extracting such narratives in multiple languages. In this paper, we explore the capabilities of encoder-based language models to classify texts according to the narrative they contain. We show that multilingual encoders outperform monolingual models on this dataset, which is challenging due to the small number of samples per class per language. We perform additional experiments to measure the generalization of features in multilingual models to new languages.

2023

Maritime security requires full-time monitoring of the situation, mainly based on technical data (radar, AIS) but also from OSINT-like inputs (e.g., newspapers). Some threats to the operational reliability of this maritime surveillance, such as malicious actors, introduce discrepancies between hard and soft data (sensors and texts), either by tweaking their AIS emitters or by emitting false information on pseudo-newspapers. Many techniques exist to identify these pieces of false information, including using knowledge base population techniques to build a structured view of the information. This paper presents a use case for suspect data identification in a maritime setting. The proposed system UMBAR ingests data from sensors and texts, processing them through an information extraction step, in order to feed a Knowledge Base and finally perform coherence checks between the extracted facts.