2024
pdf
abs
Information Extraction with Differentiable Beam Search on Graph RNNs
Niama El Khbir
|
Nadi Tomeh
|
Thierry Charnois
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Information extraction (IE) from text documents is an important NLP task that includes entity, relation, and event extraction. These tasks are often addressed jointly as a graph generation problem, where entities and event triggers represent nodes and where relations and event arguments represent edges. Most existing systems use local classifiers for nodes and edges, trained using cross-entropy loss, and employ inference strategies such as beam search to approximate the optimal graph structure. These approaches typically suffer from exposure bias due to the discrepancy between training and decoding. In this paper, we tackle this problem by casting graph generation as auto-regressive sequence labeling and making its training aware of the decoding procedure by using a differentiable version of beam search. We evaluate the effectiveness of our approach through extensive experiments conducted on the ACE05 and ConLL04 datasets across diverse languages. Our experimental findings affirm that our model outperforms its non-decoding-aware version for all datasets employed. Furthermore, we conduct ablation studies that emphasize the effectiveness of aligning training and inference. Additionally, we introduce a novel quantification of exposure bias within this context, providing valuable insights into the functioning of our model.
2023
pdf
abs
Filtered Semi-Markov CRF
Urchade Zaratiana
|
Nadi Tomeh
|
Niama El Khbir
|
Pierre Holat
|
Thierry Charnois
Findings of the Association for Computational Linguistics: EMNLP 2023
Semi-Markov CRF has been proposed as an alternative to the traditional Linear Chain CRF for text segmentation tasks such as Named Entity Recognition (NER). Unlike CRF, which treats text segmentation as token-level prediction, Semi-CRF considers segments as the basic unit, making it more expressive. However, Semi-CRF suffers from two major drawbacks: (1) quadratic complexity over sequence length, as it operates on every span of the input sequence, and (2) inferior performance compared to CRF for sequence labeling tasks like NER. In this paper, we introduce Filtered Semi-Markov CRF, a variant of Semi-CRF that addresses these issues by incorporating a filtering step to eliminate irrelevant segments, reducing complexity and search space. Our approach is evaluated on several NER benchmarks, where it outperforms both CRF and Semi-CRF while being significantly faster. The implementation of our method is available on Github.
pdf
abs
Sélection globale de segments pour la reconnaissance d’entités nommées
Urchade Zaratiana
|
Niama El Khbir
|
Pierre Holat
|
Nadi Tomeh
|
Thierry Charnois
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 4 : articles déjà soumis ou acceptés en conférence internationale
La reconnaissance d’entités nommées est une tâche importante en traitement automatique du langage naturel avec des applications dans de nombreux domaines. Dans cet article, nous décrivons une nouvelle approche pour la reconnaissance d’entités nommées, dans laquelle nous produisons un ensemble de segmentations en maximisant un score global. Pendant l’entraînement, nous optimisons notre modèle en maximisant la probabilité de la segmentation correcte. Pendant l’inférence, nous utilisons la programmation dynamique pour sélectionner la meilleure segmentation avec une complexité linéaire. Nous prouvons que notre approche est supérieure aux modèles champs de Markov conditionnels et semi-CMC pour la reconnaissance d’entités nommées.
pdf
abs
LIPN at WojoodNER shared task: A Span-Based Approach for Flat and Nested Arabic Named Entity Recognition
Niama El Khbir
|
Urchade Zaratiana
|
Nadi Tomeh
|
Thierry Charnois
Proceedings of ArabicNLP 2023
The Wojood Named Entity Recognition (NER) shared task introduces a comprehensive Arabic NER dataset encompassing both flat and nested entity tasks, addressing the challenge of limited Arabic resources. In this paper, we present our team
LIPN approach to addressing the two subtasks of WojoodNER SharedTask. We frame NER as a span classification problem. We employ a pretrained language model for token representations and neural network classifiers. We use global decoding for flat NER and a greedy strategy for nested NER. Our model secured the first position in flat NER and the fourth position in nested NER during the competition, with an F-score of 91.96 and 92.45 respectively. Our code is publicly available (
https://github.com/niamaelkhbir/LIPN-at-WojoodSharedTask).
2022
pdf
bib
abs
Global Span Selection for Named Entity Recognition
Urchade Zaratiana
|
Niama El khbir
|
Pierre Holat
|
Nadi Tomeh
|
Thierry Charnois
Proceedings of the Workshop on Unimodal and Multimodal Induction of Linguistic Structures (UM-IoS)
Named Entity Recognition (NER) is an important task in Natural Language Processing with applications in many domains. In this paper, we describe a novel approach to named entity recognition, in which we output a set of spans (i.e., segmentations) by maximizing a global score. During training, we optimize our model by maximizing the probability of the gold segmentation. During inference, we use dynamic programming to select the best segmentation under a linear time complexity. We prove that our approach outperforms CRF and semi-CRF models for Named Entity Recognition. We will make our code publicly available.
pdf
abs
ArabIE: Joint Entity, Relation and Event Extraction for Arabic
Niama El Khbir
|
Nadi Tomeh
|
Thierry Charnois
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Previous work on Arabic information extraction has mainly focused on named entity recognition and very little work has been done on Arabic relation extraction and event recognition. Moreover, modeling Arabic data for such tasks is not straightforward because of the morphological richness and idiosyncrasies of the Arabic language. We propose in this article the first neural joint information extraction system for the Arabic language.