Gaël Dias

Also published as: Gäel Dias, Gael Dias

2025

Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation
François Ledoyen | Gaël Dias | Jeremie Pantin | Alexis Lechervy | Fabrice Maurel | Youssef Chahir
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Simplifying complex texts is essential to ensure equitable access to information, particularly for individuals with cognitive impairments. The Easy-to-Read (ETR) initiative provides a framework to make content more accessible for these individuals. However, manually creating such texts remains time-consuming and resource-intensive. In this work, we investigate the potential of large language models (LLMs) to automate the generation of ETR content. To address the scarcity of aligned corpora and the specific constraints of ETR, we propose a multi-task learning (MTL) approach that trains models jointly on text summarization, text simplification, and ETR generation. We explore two complementary strategies: multi-task retrieval-augmented generation (RAG) for in-context learning (ICL), and MTL-LoRA for parameter-efficient fine-tuning (PEFT). Our experiments with Mistral-7B and LLaMA-3-8B, conducted on ETR-fr, a new high-quality dataset, show that MTL-LoRA consistently outperforms all other strategies in in-domain settings, while the MTL-RAG-based approach achieves better generalization in out-of-domain scenarios. Our code is publicly available at https://github.com/FrLdy/ETR-PEFT-Composition.

2024

pdf bib abs

Evaluating Lexicon Incorporation for Depression Symptom Estimation
Kirill Milintsevich | Gaël Dias | Kairit Sirts
Proceedings of the 6th Clinical Natural Language Processing Workshop

This paper explores the impact of incorporating sentiment, emotion, and domain-specific lexicons into a transformer-based model for depression symptom estimation. Lexicon information is added by marking the words in the input transcripts of patient-therapist conversations as well as in social media posts. Overall results show that the introduction of external knowledge within pre-trained language models can be beneficial for prediction performance, while different lexicons show distinct behaviours depending on the targeted task. Additionally, new state-of-the-art results are obtained for the estimation of depression level over patient-therapist interviews.

pdf bib abs

Analysing relevance of Discourse Structure for Improved Mental Health Estimation
Navneet Agarwal | Gaël Dias | Sonia Dollfus
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

Automated depression estimation has received significant research attention in recent years as a result of its growing impact on the global community. Within the context of studies based on patient-therapist interview transcripts, most researchers treat the dyadic discourse as a sequence of unstructured sentences, thus ignoring the discourse structure within the learning process. In this paper we propose Multi-view architectures that divide the input transcript into patient and therapist views based on sentence type in an attempt to utilize symmetric discourse structure for improved model performance. Experiments on DAIC-WOZ dataset for binary classification task within depression estimation show advantages of Multi-view architecture over sequential input representations. Our model also outperforms the current state-of-the-art results and provide new SOTA performance on test set of DAIC-WOZ dataset.

pdf bib abs

Your Model Is Not Predicting Depression Well And That Is Why: A Case Study of PRIMATE Dataset
Kirill Milintsevich | Kairit Sirts | Gaël Dias
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

This paper addresses the quality of annotations in mental health datasets used for NLP-based depression level estimation from social media texts. While previous research relies on social media-based datasets annotated with binary categories, i.e. depressed or non-depressed, recent datasets such as D2S and PRIMATE aim for nuanced annotations using PHQ-9 symptoms. However, most of these datasets rely on crowd workers without the domain knowledge for annotation. Focusing on the PRIMATE dataset, our study reveals concerns regarding annotation validity, particularly for the lack of interest or pleasure symptom. Through reannotation by a mental health professional, we introduce finer labels and textual spans as evidence, identifying a notable number of false positives. Our refined annotations, to be released under a Data Use Agreement, offer a higher-quality test set for anhedonia detection. This study underscores the necessity of addressing annotation quality issues in mental health datasets, advocating for improved methodologies to enhance NLP model reliability in mental health assessments.

pdf bib abs

Analyzing Symptom-based Depression Level Estimation through the Prism of Psychiatric Expertise
Navneet Agarwal | Kirill Milintsevich | Lucie Metivier | Maud Rotharmel | Gaël Dias | Sonia Dollfus
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The ever-growing number of people suffering from mental distress has motivated significant research initiatives towards automated depression estimation. Despite the multidisciplinary nature of the task, very few of these approaches include medical professionals in their research process, thus ignoring a vital source of domain knowledge. In this paper, we propose to bring the domain experts back into the loop and incorporate their knowledge within the gold-standard DAIC-WOZ dataset. In particular, we define a novel transformer-based architecture and analyse its performance in light of our expert annotations. Overall findings demonstrate a strong correlation between the psychological tendencies of medical professionals and the behavior of the proposed model, which additionally provides new state-of-the-art results.

2022

pdf bib abs

Transfer Learning for Humor Detection by Twin Masked Yellow Muppets
Aseem Arora | Gaël Dias | Adam Jatowt | Asif Ekbal
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Humorous texts can be of different forms such as punchlines, puns, or funny stories. Existing humor classification systems have been dealing with such diverse forms by treating them independently. In this paper, we argue that different forms of humor share a common background either in terms of vocabulary or constructs. As a consequence, it is likely that classification performance can be improved by jointly tackling different humor types. Hence, we design a shared-private multitask architecture following a transfer learning paradigm and perform experiments over four gold standard datasets. Empirical results steadily confirm our hypothesis by demonstrating statistically-significant improvements over baselines and accounting for new state-of-the-art figures for two datasets.

Gaël Dias

2025

2024

2022

2021

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2004

2003

2001

2000

Co-authors

Venues