Mael Houbre

Also published as: Maël Houbre

2026

Evaluating the Homogeneity of Keyphrase Prediction Models
Mael Houbre | Florian Boudin | Beatrice Daille
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Keyphrases which are useful in several NLP and IR applications are either extracted from text or predicted by generative models. Contrarily to keyphrase extraction approaches, keyphrase generation models can predict keyphrases that do not appear in a document’s text called ‘absent keyphrases‘. This ability means that keyphrase generation models can associate a document to a notion that is not explicitly mentioned in its text. Intuitively, this suggests that for two documents treating the same subjects, a keyphrase generation model is more likely to be homogeneous in their indexing i.e. predict the same keyphrase for both documents, regardless of those keyphrases appearing in their respective text or not; something a keyphrase extraction model would fail to do. Yet, homogeneity of keyphrase prediction models is not covered by current benchmarks. In this work, we introduce a method to evaluate the homogeneity of keyphrase prediction models and study if absent keyphrase generation capabilities actually help the model to be more homogeneous. To our surprise, we show that keyphrase extraction methods are competitive with generative models, and that depending on the evaluation scenario, having the ability to generate absent keyphrases can actually act to the detriment of homogeneity. Our data, code and prompts are available on Huggingface and github.

2023

pdf bib

Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques" (ARTS)@TALN 2023
Florian Boudin | Béatrice Daille | Richard Dufour | Oumaima El | Maël Houbre | Léane Jourdan | Nihel Kooli
Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques" (ARTS)@TALN 2023

pdf bib abs

Classification de relation pour la génération de mots-clés absents
Maël Houbre | Florian Boudin | Béatrice Daille
Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques" (ARTS)@TALN 2023

Les modèles encodeur-décodeur constituent l’état de l’art en génération de mots-clés. Cependant, malgré de nombreuses adaptations de cette architecture, générer des mots-clés absents du texte du document est toujours une tâche difficile. Cette étude montre qu’entraîner au préalable un modèle sur une tâche de classification de relation entre un document et un mot-clé, permet d’améliorer la génération de mots-clés absents.

2022

pdf bib abs

A Large-Scale Dataset for Biomedical Keyphrase Generation
Maël Houbre | Florian Boudin | Beatrice Daille
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)

Keyphrase generation is the task consisting in generating a set of words or phrases that highlight the main topics of a document. There are few datasets for keyphrase generation in the biomedical domain and they do not meet the expectations in terms of size for training generative models. In this paper, we introduce kp-biomed, the first large-scale biomedical keyphrase generation dataset collected from PubMed abstracts. We train and release several generative models and conduct a series of experiments showing that using large scale datasets improves significantly the performances for present and absent keyphrase generation. The dataset and models are available online.

Co-authors

Nihel Kooli 1

Venues

Fix author