Anton Ehrmanntraut
2026
New Encoders for German Trained from Scratch: Comparing ModernGBERT with Converted LLM2Vec Models
Julia Wunderle | Anton Ehrmanntraut | Jan Pfister | Fotis Jannidis | Andreas Hotho
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Julia Wunderle | Anton Ehrmanntraut | Jan Pfister | Fotis Jannidis | Andreas Hotho
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Encoders remain essential for efficient German NLP and NLU scenarios despite the rise of decoder-only LLMs. This work studies two routes to high-quality German encoders under identical data and training constraints: a) training from scratch and b) converting decoders via LLMVec. We introduce two resources: ModernGBERT (134M, 1B), fully transparent German encoders in the ModernBERT style, and LLäMmleinVec (120M, 1B, 7B), decoder-to-encoder conversions trained with masked next-token prediction, both undergoing a context extension to 8192 tokens. Across SuperGLEBer, ModernGBERT 1B sets a new state of the art (avg 0.808), surpassing GBERTlarge (+4%) and the seven-times larger converted 7B model (0.787). On German MTEB after supervised fine-tuning, ModernGBERT 1B (0.551) approaches the converted 7B model (0.557). We release all models, checkpoints, datasets, and full training records, and introduce an encoder-adapted QA-NIAH evaluation. All in all, our results provide actionable guidance: when parameter efficiency and latency matter, from-scratch encoders dominate. When a pre-trained decoder exists and compute is a limited, conversion offers an effective alternative.