Léo Hemamou

Also published as: Leo Hemamou


2025

pdf bib
StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos
Valentin Barriere | Nahuel Gomez | Léo Hemamou | Sofia Callejas | Brian Ravenet
Findings of the Association for Computational Linguistics: EMNLP 2025

Aiming towards improving current computational models of humor detection, we propose a new multimodal dataset of stand-up comedies, in seven languages: English, French, Spanish, Italian, Portuguese, Hungarian and Czech. Our dataset of more than 330 hours %, is at the time of writing the biggest available for this type of task, and the most diverse. The whole dataset is automatically annotated in laughter (from the audience), and the subpart left for model validation is manually annotated.% Contrary to contemporary approaches, we do not frame the task of humor detection as a binary sequence classification, but as word-level sequence labeling, in order to take into account all the context of the sequence and to capture the continuous joke tagging mechanism typically occurring in natural conversations. As par with unimodal baselines results, we propose a method for e propose a method to enhance the automatic laughter detection based on Audio Speech Recognition errors. Our code and data are available online: https://tinyurl.com/EMNLPHumourStandUpAnonym

2022

pdf bib
Delivering Fairness in Human Resources AI: Mutual Information to the Rescue
Leo Hemamou | William Coleman
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Automatic language processing is used frequently in the Human Resources (HR) sector for automated candidate sourcing and evaluation of resumes. These models often use pre-trained language models where it is difficult to know if possible biases exist. Recently, Mutual Information (MI) methods have demonstrated notable performance in obtaining representations agnostic to sensitive variables such as gender or ethnicity. However, accessing these variables can sometimes be challenging, and their use is prohibited in some jurisdictions. These factors can make detecting and mitigating biases challenging. In this context, we propose to minimize the MI between a candidate’s name and a latent representation of their CV or short biography. This method may mitigate bias from sensitive variables without requiring the collection of these variables. We evaluate this methodology by first projecting the name representation into a smaller space to prevent potential MI minimization problems in high dimensions.

pdf bib
CoFE: A New Dataset of Intra-Multilingual Multi-target Stance Classification from an Online European Participatory Democracy Platform
Valentin Barriere | Guillaume Guillaume Jacquet | Leo Hemamou
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Stance Recognition over proposals is the task of automatically detecting whether a comment on a specific proposal is in favor of this proposal, against this proposal or that neither inference is likely. The dataset that we propose to use is an online debating platform inaugurated in 2021, where users can submit proposals and comment over proposals or over other comments. It contains 4.2k proposals and 20k comments focused on various topics. Every comment and proposal can come written in another language, with more than 40% of the proposal/comment pairs containing at least two languages, creating a unique intra-multilingual setting. A portion of the data (more than 7k comment/proposal pairs, in 26 languages) was annotated by the writers with a self-tag assessing whether they are in favor or against the proposal. Another part of the data (without self-tag) has been manually annotated: 1206 comments in 6 morphologically different languages (fr, de, en, el, it, hu) were tagged, leading to a Krippendorff’s α of 0.69. This setting allows defining an intra-multilingual and multi-target stance classification task over online debates.