Mohamed Ettaleb
2026
ReTaT: A Unified Benchmark for Relation Extraction across Text and Table
Mohamed Ettaleb | Thibault Ehrhart | Nathalie Aussenac-Gilles | Yoan Chabot | Mouna Kamel | Véronique MORICEAU | Raphael Troncy | Fanfu Wei
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Mohamed Ettaleb | Thibault Ehrhart | Nathalie Aussenac-Gilles | Yoan Chabot | Mouna Kamel | Véronique MORICEAU | Raphael Troncy | Fanfu Wei
Proceedings of the Fifteenth Language Resources and Evaluation Conference
While prior work in Information Extraction (IE) has focused on extracting information from either textual content or tables in isolation, they miss critical information that emerges only from their interplay. Indeed, tables may summarize facts sparse in the text, while text can disambiguate or elaborate on table entries. This complementarity may take the form of relations which are expressed across text and tables. In this context, we are interested in the task of extracting such relations whose expression spans the two modalities. This task is an original one, for which no reference evaluation corpora exists. Thus we created ReTaT, a corpus that can be used to train and evaluate systems for extracting such relations. This corpus is composed of (table, surrounding text) pairs extracted from Wikipedia pages and has been manually annotated with relation triples. ReTaT is organized in three datasets with distinct characteristics: domain (business, telecommunication and female celebrities), size (from 50 to 255 pairs), language (English vs French), type of relations (data vs object properties), close vs open list of relation, size of the surrounding text (paragraph vs full page). We then assessed its quality and suitability for the joint table-text relation extraction task using Large Language Models (LLMs), at a time when LLMs have demonstrated their ability to extract relations from either text or tables in isolation.
2025
The contribution of LLMs to relation extraction in the economic field
Mohamed Ettaleb | Mouna Kamel | Nathalie Aussenac-Gilles | Véronique Moriceau
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Mohamed Ettaleb | Mouna Kamel | Nathalie Aussenac-Gilles | Véronique Moriceau
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Relation Extraction (RE) is a fundamental task in natural language processing, aimed at deducing semantic relationships between entities in a text. Traditional supervised extraction methods relation extraction methods involve training models to annotate tokens representing entity mentions, followed by predicting the relationship between these entities. However, recent advancements have transformed this task into a sequence-to-sequence problem. This involves converting relationships between entities into target string, which are then generated from the input text. Thus, language models now appear as a solution to this task and have already been used in numerous studies, with various levels of refinement, across different domains. The objective of the present study is to evaluate the contribution of large language models (LLM) to the task of relation extraction in a specific domain (in this case, the economic domain), compared to smaller language models. To do this, we considered as a baseline a model based on the BERT architecture, trained in this domain, and four LLM, namely FinGPT specific to the financial domain, XLNet, ChatGLM, and Llama3, which are generalists. All these models were evaluated on the same extraction task, with zero-shot for the general-purpose LLM, as well as refinements through few-shot learning and fine-tuning. The experiments showedthat the best performance in terms of F-score was achieved with fine-tuned LLM, with Llama3 achieving the highest performance.
2023
Qui de DrBERT, Wikipédia ou Flan-T5 s’y connaît le plus en questions médicales ?
Clément Besnard | Mohamed Ettaleb | Christian Raymond | Nathalie Camelin
Actes de CORIA-TALN 2023. Actes du Défi Fouille de Textes@TALN2023
Clément Besnard | Mohamed Ettaleb | Christian Raymond | Nathalie Camelin
Actes de CORIA-TALN 2023. Actes du Défi Fouille de Textes@TALN2023
Ce papier décrit la participation de l’équipe LIUM-IRISA à la campagne d’évaluation DEFT 2023.Notre équipe a participé à la tâche principale. Cette année, celle-ci consiste à la mise en placed’approches afin de répondre automatiquement à des questions à choix multiples. Nous avons mis enplace plusieurs systèmes, un premier avec une base de connaissances, un second système utilisant unmodèle génératif, un système à base de similarité et un dernier système combinant un ensemble dedescripteurs.