Vincent Guigue


2021

pdf
Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction
Bruno Taillé | Vincent Guigue | Geoffrey Scoutheeten | Patrick Gallinari
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

State-of-the-art NLP models can adopt shallow heuristics that limit their generalization capability (McCoy et al., 2019). Such heuristics include lexical overlap with the training set in Named-Entity Recognition (Taille et al., 2020) and Event or Type heuristics in Relation Extraction (Rosenman et al., 2020). In the more realistic end-to-end RE setting, we can expect yet another heuristic: the mere retention of training relation triples. In this paper we propose two experiments confirming that retention of known facts is a key factor of performance on standard benchmarks. Furthermore, one experiment suggests that a pipeline model able to use intermediate type representations is less prone to over-rely on retention.

2020

pdf
Let’s Stop Incorrect Comparisons in End-to-end Relation Extraction!
Bruno Taillé | Vincent Guigue | Geoffrey Scoutheeten | Patrick Gallinari
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Despite efforts to distinguish three different evaluation setups (Bekoulis et al., 2018), numerous end-to-end Relation Extraction (RE) articles present unreliable performance comparison to previous work. In this paper, we first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the most common mistake’s impact and evaluate it leads to overestimating the final RE performance by around 5% on ACE05. We also seize this opportunity to study the unexplored ablations of two recent developments: the use of language model pretraining (specifically BERT) and span-level NER. This meta-analysis emphasizes the need for rigor in the report of both the evaluation setting and the dataset statistics. We finally call for unifying the evaluation setting in end-to-end RE.

2019

pdf
Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses
Étienne Simon | Vincent Guigue | Benjamin Piwowarski
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Unsupervised relation extraction aims at extracting relations between entities in text. Previous unsupervised approaches are either generative or discriminative. In a supervised setting, discriminative approaches, such as deep neural network classifiers, have demonstrated substantial improvement. However, these models are hard to train without supervision, and the currently proposed solutions are unstable. To overcome this limitation, we introduce a skewness loss which encourages the classifier to predict a relation with confidence given a sentence, and a distribution distance loss enforcing that all relations are predicted in average. These losses improve the performance of discriminative based models, and enable us to train deep neural networks satisfactorily, surpassing current state of the art on three different datasets.

2018

pdf
DEFT 2018: Attention sélective pour classification de microblogs (DEFT 2018 : Selective Attention for Microblogging Classification )
Charles-Emmanuel Dias | Clara de Forsan de Gainon Gabriac | Patrick Gallinari | Vincent Guigue
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

Dans le cadre de l’atelier DEFT 2018 nous nous sommes intéressés à la classification de microblogs (ici, des tweets) rédigés en français. Ici, nous proposons une méthode se basant sur un réseau hiérarchique de neurones récurrent avec attention. La spécificité de notre architecture est de prendre en compte –via un mechanisme d’attention et de portes– les hashtags et les mentions directes (e.g., @user), spécifiques aux microblogs. Notre modèle a obtenu de très bon résultats sur la première tâche et des résultats compétitifs sur la seconde.