Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study
Eeham Khan, Firas Saidani, Owen Van Esbroeck, Richard Khoury, Leila Kosseim
Abstract
Despite the widespread adoption of Large Language Models (LLMs), their strongest capabilities remain largely confined to a small number of high-resource languages for which there is abundant training data. Recently, continual pre-training (CPT) has emerged as a means to fine-tune these models to low-resource regional dialects. In this paper, we study the use of CPT for dialect learning under tight data and compute budgets. Using low-rank adaptation (LoRA) and compute-efficient continual pre-training, we adapt three LLMs to the Québec French dialect using a very small dataset and benchmark them on the COLE suite. Our experiments demonstrate an improvement on the minority dialect benchmarks with minimal regression on the prestige language benchmarks with around 1% of model parameters updated. Analysis of the results demonstrate that gains are highly contingent on corpus composition. These findings indicate that CPT with parameter-efficient fine-tuning (PEFT) can narrow the dialect gap by providing cost-effective and sustainable language resource creation, expanding high-quality LLM access to minority linguistic communities. To support reproducibility and broaden access, we release the first Québec French LLMs on Hugging Face.- Anthology ID:
- 2026.lrec-main.840
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 10723–10734
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.840/
- DOI:
- Cite (ACL):
- Eeham Khan, Firas Saidani, Owen Van Esbroeck, Richard Khoury, and Leila Kosseim. 2026. Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study. International Conference on Language Resources and Evaluation, main:10723–10734.
- Cite (Informal):
- Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study (Khan et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.840.pdf