Resource-Lean Lexicon Induction for German Dialects

Robert Litschko; Barbara Plank; Diego Frassinelli

Resource-Lean Lexicon Induction for German Dialects

Robert Litschko, Barbara Plank, Diego Frassinelli

Abstract

Automatic induction of high-quality dictionaries is essential for building lexical resources, yet low-resource languages and dialects pose several challenges: limited access to annotators, high degree of spelling variations, and poor performance of large language models (LLMs). We empirically show that statistical models (random forests) trained on string similarity features are surprisingly effective for inducing German dialect lexicons. They outperform LLMs, enable cross-dialect transfer, and offer a lightweight data-driven alternative. We evaluate our models intrinsically on bilingual lexicon induction (BLI) and extrinsically on dialect information retrieval (IR). On BLI, random forests outperform Mistral-123b while being more resource-lean. On dialect IR with BM25, using our dialect dictionaries for query expansion yields relative improvements of up to 28.9% in nDCG@10 and 50.7% in Recall@100. Motivated by the resource scarcity in dialects, we further investigate the extent to which models transfer across different German dialects, and their performance under varying amounts of training data.

Anthology ID:: 2026.lrec-main.711
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 9044–9050
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.711/
DOI:
Bibkey:
Cite (ACL):: Robert Litschko, Barbara Plank, and Diego Frassinelli. 2026. Resource-Lean Lexicon Induction for German Dialects. International Conference on Language Resources and Evaluation, main:9044–9050.
Cite (Informal):: Resource-Lean Lexicon Induction for German Dialects (Litschko et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.711.pdf

PDF Cite Search Fix data