Voices of Luxembourg: Tackling Dialect Diversity in a Low-Resource Setting

Nina Hosseini-Kivanani; Christoph Schommer; Peter Gilles

Voices of Luxembourg: Tackling Dialect Diversity in a Low-Resource Setting

Nina Hosseini-Kivanani, Christoph Schommer, Peter Gilles

Abstract

Dialect classification is essential for preserving linguistic diversity, particularly in low-resource languages such as Luxembourgish. This study introduces one of the first systematic approaches to classifying Luxembourgish dialects, addressing phonetic, prosodic, and lexical variations across four major regions. We benchmarked multiple models, including state-of-the-art pre-trained speech models like Wav2Vec2, XLSR-Wav2Vec2, and Whisper, alongside traditional approaches such as Random Forest and CNN-LSTM. To overcome data limitations, we applied targeted data augmentation strategies and analyzed their impact on model performance. Our findings highlight the superior performance of CNN-Spectrogram and CNN-LSTM models while identifying the strengths and limitations of data augmentation. This work establishes foundational benchmarks and provides actionable insights for advancing dialectal NLP in Luxembourgish and other low-resource languages.

Anthology ID:: 2025.resourceful-1.29
Volume:: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Month:: March
Year:: 2025
Address:: Tallinn, Estonia
Editors:: Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, Crina Madalina Tudor
Venues:: RESOURCEFUL | WS
SIG:
Publisher:: University of Tartu Library, Estonia
Note:
Pages:: 143–152
Language:
URL:: https://preview.aclanthology.org/corrections-2025-06/2025.resourceful-1.29/
DOI:
Bibkey:
Cite (ACL):: Nina Hosseini-Kivanani, Christoph Schommer, and Peter Gilles. 2025. Voices of Luxembourg: Tackling Dialect Diversity in a Low-Resource Setting. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 143–152, Tallinn, Estonia. University of Tartu Library, Estonia.
Cite (Informal):: Voices of Luxembourg: Tackling Dialect Diversity in a Low-Resource Setting (Hosseini-Kivanani et al., RESOURCEFUL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-06/2025.resourceful-1.29.pdf

PDF Cite Search Fix data