Thibault Ehrhart

Also published as: Pasquale Lisena


2026

Event Relation Extraction (ERE) aims to identify and classify semantic relationships between events expressed in text. While existing work has mainly addressed temporal or simple causal links, fine-grained causal relations such as enable, prevent, and intend remain insufficiently explored, partly due to limited and imbalanced labeled datasets. We present a novel framework that leverages large language models (LLMs) and common-sense knowledge to jointly perform event extraction and relation classification. Our contribution includes (1) the creation of the CausalSense large-scale dataset containing more than 500k sentences from news data and commonsense knowledge extracted from ATOMIC, and enriched synthetically; and (2) the evaluation of multiple architectures, including transformer-based models and end-to-end multitask systems for extracting fine-grained causal relationships. Experimental results show that our best-performing model achieves a 32.3% improvement in average F1-score over the current state of the art. The integration of commonsense knowledge substantially enhances fine-grained causal relation detection. The CausalSense dataset, our code and models are released as open source to support future research on causal event relationship extraction.
While prior work in Information Extraction (IE) has focused on extracting information from either textual content or tables in isolation, they miss critical information that emerges only from their interplay. Indeed, tables may summarize facts sparse in the text, while text can disambiguate or elaborate on table entries. This complementarity may take the form of relations which are expressed across text and tables. In this context, we are interested in the task of extracting such relations whose expression spans the two modalities. This task is an original one, for which no reference evaluation corpora exists. Thus we created ReTaT, a corpus that can be used to train and evaluate systems for extracting such relations. This corpus is composed of (table, surrounding text) pairs extracted from Wikipedia pages and has been manually annotated with relation triples. ReTaT is organized in three datasets with distinct characteristics: domain (business, telecommunication and female celebrities), size (from 50 to 255 pairs), language (English vs French), type of relations (data vs object properties), close vs open list of relation, size of the surrounding text (paragraph vs full page). We then assessed its quality and suitability for the joint table-text relation extraction task using Large Language Models (LLMs), at a time when LLMs have demonstrated their ability to extract relations from either text or tables in isolation.

2023

This paper presents D2KLab’s system used for the shared task of “Multilingual Complex Named Entity Recognition (MultiCoNER II)”, as part of SemEval 2023 Task 2. The system relies on a fine-tuned transformer based language model for extracting named entities. In addition to the architecture of the system, we discuss our results and observations.

2022

We present a benchmark in six European languages containing manually annotated information about olfactory situations and events following a FrameNet-like approach. The documents selection covers ten domains of interest to cultural historians in the olfactory domain and includes texts published between 1620 to 1920, allowing a diachronic analysis of smell descriptions. With this work, we aim to foster the development of olfactory information extraction approaches as well as the analysis of changes in smell descriptions over time.

2021

From statistical to neural models, a wide variety of topic modelling algorithms have been proposed in the literature. However, because of the diversity of datasets and metrics, there have not been many efforts to systematically compare their performance on the same benchmarks and under the same conditions. In this paper, we present a selection of 9 topic modelling techniques from the state of the art reflecting a diversity of approaches to the task, an overview of the different metrics used to compare their performance, and the challenges of conducting such a comparison. We empirically evaluate the performance of these models on different settings reflecting a variety of real-life conditions in terms of dataset size, number of topics, and distribution of topics, following identical preprocessing and evaluation processes. Using both metrics that rely on the intrinsic characteristics of the dataset (different coherence metrics), as well as external knowledge (word embeddings and ground-truth topic labels), our experiments reveal several shortcomings regarding the common practices in topic models evaluation.

2020

From LDA to neural models, different topic modeling approaches have been proposed in the literature. However, their suitability and performance is not easy to compare, particularly when the algorithms are being used in the wild on heterogeneous datasets. In this paper, we introduce ToModAPI (TOpic MOdeling API), a wrapper library to easily train, evaluate and infer using different topic modeling algorithms through a unified interface. The library is extensible and can be used in Python environments or through a Web API.