SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding

Haroun Elleuch, Salima Mdhaffar, Yannick Estève, Fethi Bougares


Abstract
Spoken Language Understanding (SLU) aims to extract the semantic information from the speech utterance of user queries. It is a core component in a task-oriented dialog system. With the spectacular progress of deep neural network models and the evolution of pre-trained language models, SLU has obtained significant breakthroughs. However, only a few high-resource languages have taken advantage of this progress due to the absence of SLU resources. In this paper, we seek to mitigate this obstacle by introducing SLURP-TN. This dataset was created by recording 55 native speakers uttering sentences in Tunisian dialect, manually translated from six SLURP domains. The result is an SLU Tunisian dialect dataset that comprises 4165 sentences recorded into around 5 hours of acoustic material. We also develop a number of Automatic Speech Recognition and SLU models exploiting SLUTP-TN. The Dataset and baseline models are available at: https://huggingface.co/datasets/Elyadata/SLURP-TN.
Anthology ID:
2026.lrec-main.119
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
1544–1552
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.119/
DOI:
Bibkey:
Cite (ACL):
Haroun Elleuch, Salima Mdhaffar, Yannick Estève, and Fethi Bougares. 2026. SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding. International Conference on Language Resources and Evaluation, main:1544–1552.
Cite (Informal):
SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding (Elleuch et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.119.pdf