Jean-Philippe Cointet

2026

SPOT: An Annotated French Corpus and Benchmark for Detecting Critical Interventions in Online Conversations
Manon Berriche | Célia Nouri | Chloé Clavel | Jean-Philippe Cointet
Proceedings of the Fifteenth Language Resources and Evaluation Conference

We introduce SPOT (Stopping Points in Online Threads), the first annotated corpus translating the sociological concept of stopping point into a reproducible NLP task. Stopping points are ordinary critical interventions that pause or redirect online discussions through a range of forms — irony, subtle doubt or fragmentary arguments— that frameworks like counterspeech or social correction often overlook. We operationalize this concept as a binary classification task and provide reliable annotation guidelines. The corpus contains 43,305 manually annotated French Facebook comments linked to URLs flagged as false information by social media users, enriched with contextual metadata (article, post, parent comment, page or group, and source). We benchmark fine-tuned encoder models (CamemBERT) and instruction-tuned LLMs under various prompting strategies. Results show that fine-tuned encoders outperform prompted LLMs in F1 score by more than 10 percentage points, confirming the importance of supervised learning for emerging non-English social media tasks. Incorporating contextual metadata further improves encoder models F1 scores from 0.75 to 0.78. We release the anonymized dataset, along with the annotation guidelines and code in our code repository, to foster transparency and reproducible research.

2025

pdf bib abs

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights
Célia Nouri | Jean-Philippe Cointet | Chloé Clavel
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Detecting abusive language in social media conversations poses significant challenges, as identifying abusiveness often depends on the conversational context, characterized by the content and topology of preceding comments. Traditional Abusive Language Detection (ALD) models often overlook this context, which can lead to unreliable performance metrics. Recent Natural Language Processing (NLP) approaches that incorporate conversational context often rely on limited or overly simplified representations of this context, leading to inconsistent and sometimes inconclusive results. In this paper, we propose a novel approach that utilizes graph neural networks (GNNs) to model social media conversations as graphs, where nodes represent comments, and edges capture reply structures. We systematically investigate various graph representations and context windows to identify the optimal configurations for ALD. Our GNN model outperforms both context-agnostic baselines and linear context-aware methods, achieving significant improvements in F1 scores. These findings demonstrate the critical role of structured conversational context and establish GNNs as a robust framework for advancing context-aware ALD.

2014

pdf bib

Argumentative analysis of the ACL Anthology (Analyse argumentative du corpus de l’ACL (ACL Anthology)) [in French]
Elisa Omodei | Yufan Guo | Jean-Philippe Cointet | Thierry Poibeau
Proceedings of TALN 2014 (Volume 2: Short Papers)

pdf bib

Social and Semantic Diversity: Socio-semantic Representation of a Scientific Corpus
Thierry Poibeau | Elisa Omodei | Jean-Philippe Cointet | Yufan Guo
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

pdf bib abs

Mapping the Natural Language Processing Domain: Experiments using the ACL Anthology
Elisa Omodei | Jean-Philippe Cointet | Thierry Poibeau
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper investigates the evolution of the computational linguistics domain through a quantitative analysis of the ACL Anthology (containing around 12,000 papers published between 1985 and 2008). Our approach combines complex system methods with natural language processing techniques. We reconstruct the socio-semantic landscape of the domain by inferring a co-authorship and a semantic network from the analysis of the corpus. First, keywords are extracted using a hybrid approach mixing linguistic patterns with statistical information. Then, the semantic network is built using a co-occurrence analysis of these keywords within the corpus. Combining temporal and network analysis techniques, we are able to examine the main evolutions of the field and the more active subfields over time. Lastly we propose a model to explore the mutual influence of the social and the semantic network over time, leading to a socio-semantic co-evolutionary system.

Co-authors

Manon Berriche 1

Venues

Fix author