New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark
Nadège Alavoine, Gaëlle Laperrière, Christophe Servan, Sahar Ghannay, Sophie Rosset
Abstract
Intent classification and slot-filling are essential tasks of Spoken Language Understanding (SLU). In most SLU systems, those tasks are realized by independent modules, but for about fifteen years, models achieving both of them jointly and exploiting their mutual enhancement have been proposed. A multilingual module using a joint model was envisioned to create a touristic dialogue system for a European project, HumanE-AI-Net. A combination of multiple datasets, including the MEDIA dataset, was suggested for training this joint model. The MEDIA SLU dataset is a French dataset distributed since 2005 by ELRA, mainly used by the French research community and free for academic research since 2020. Unfortunately, it is annotated only in slots but not intents. An enhanced version of MEDIA annotated with intents has been built to extend its use to more tasks and use cases. This paper presents the semi-automatic methodology used to obtain this enhanced version. In addition, we present the first results of SLU experiments on this enhanced dataset using joint models for intent classification and slot-filling.- Anthology ID:
- 2024.lrec-main.1070
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 12227–12246
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1070
- DOI:
- Cite (ACL):
- Nadège Alavoine, Gaëlle Laperrière, Christophe Servan, Sahar Ghannay, and Sophie Rosset. 2024. New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12227–12246, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark (Alavoine et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1070.pdf