Galo Castillo-López

Also published as: Galo Castillo-lópez

2026

How DDAIR you? Disambiguated Data Augmentation for Intent Recognition
Galo Castillo-López | Alexis Lombard | Nasredine Semmar | Gaël de Chalendar
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Large Language Models (LLMs) are effective for data augmentation in classification tasks like intent detection. In some cases, they inadvertently produce examples that are ambiguous with regard to untargeted classes. We present DDAIR (Disambiguated Data Augmentation for Intent Recognition) to mitigate this problem. We use Sentence Transformers to detect ambiguous class-guided augmented examples generated by LLMs for intent recognition in low-resource scenarios. We identify synthetic examples that are semantically more similar to another intent than to their target one. We also provide an iterative re-generation method to mitigate such ambiguities. Our findings show that sentence embeddings effectively help to (re)generate less ambiguous examples, and suggest promising potential to improve classification performance in scenarios where intents are loosely or broadly defined.

2025

pdf bib abs

A Survey of Recent Advances on Turn-taking Modeling in Spoken Dialogue Systems
Galo Castillo-López | Gael de Chalendar | Nasredine Semmar
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology

The rapid growth of dialogue systems adoption to serve humans in daily tasks has increased the realism expected from these systems. One trait of realism is the way speaking agents take their turns. We provide here a review of recent methods on turn-taking modeling and thoroughly describe the corpora used in these studies. We observe that 72% of the reviewed works in this survey do not compare their methods with previous efforts. We argue that one of the challenges in the field is the lack of well-established benchmarks to monitor progress. This work aims to provide the community with a better understanding of the current state of research around turn-taking modeling and future directions to build more realistic spoken conversational agents.

pdf bib abs

Intent Recognition and Out-of-Scope Detection using LLMs in Multi-party Conversations
Galo Castillo-López | Gael de Chalendar | Nasredine Semmar
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Intent recognition is a fundamental component in task-oriented dialogue systems (TODS). Determining user intents and detecting whether an intent is Out-of-Scope (OOS) is crucial for TODS to provide reliable responses. However, traditional TODS require large amount of annotated data. In this work we propose a hybrid approach to combine BERT and LLMs in zero and few-shot scenarios to recognize intents and detect OOS utterances. Our approach leverages LLMs generalization power and BERT’s computational efficiency in such scenarios. We evaluate our method on multi-party conversation corpora and observe that sharing information from BERT outputs to LLMs lead to system performance improvement.

2023

pdf bib abs

Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection
Galo Castillo-lópez | Arij Riabi | Djamé Seddah
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

Hate speech detection in online platforms has been widely studied inthe past. Most of these works were conducted in English and afew rich-resource languages. Recent approaches tailored forlow-resource languages have explored the interests of zero-shot cross-lingual transfer learning models in resource-scarce scenarios. However, languages variations between geolects such as AmericanEnglish and British English, Latin-American Spanish, and EuropeanSpanish is still a problem for NLP models that often relies on(latent) lexical information for their classification tasks. Moreimportantly, the cultural aspect, crucial for hate speech detection,is often overlooked. In this work, we present the results of a thorough analysis of hatespeech detection models performance on different variants of Spanish,including a new hate speech toward immigrants Twitter data set we built to cover these variants. Using mBERT and Beto, a monolingual Spanish Bert-based language model, as the basis of our transfer learning architecture, our results indicate that hate speech detection models for a given Spanish variant are affected when different variations of such language are not considered. Hate speech expressions could vary from region to region where the same language is spoken. Our new dataset, models and guidelines are freely available.

Co-authors

Venues

Fix author