Arturs Znotins
Other people with similar names: Arturs Znotins
2026
Pretraining and Benchmarking Modern Encoders for Latvian
Arturs Znotins
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Arturs Znotins
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Encoder-only transformers remain essential for practical NLP tasks. While recent advances in multilingual models have improved cross-lingual capabilities, low-resource languages such as Latvian remain underrepresented in pretraining corpora, and few monolingual Latvian encoders currently exist. We address this gap by pretraining a suite of Latvian-specific encoders based on RoBERTa, DeBERTaV3, and ModernBERT architectures, including long-context variants, and evaluating them on a comprehensive Latvian benchmark suite. Our models are competitive with existing monolingual and multilingual encoders while benefiting from recent architectural and efficiency advances. Our best model, lv-deberta-base (111M parameters), achieves the strongest overall performance, outperforming larger multilingual baselines and prior Latvian-specific encoders. We release all pretrained models and evaluation resources to support further research and practical applications in Latvian NLP.
Improving Latvian Morphosyntactic Parsing with Pretrained Encoders and Analyzer-Constrained Decoding
Arturs Znotins
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Arturs Znotins
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present a systematic evaluation of Latvian morphosyntactic parsing with pretrained transformer encoders in a unified joint architecture for tagging, lemmatization, and dependency parsing. We benchmark multilingual and Latvian-specific models and show that language-specific adaptation, even with modest in-language data, substantially improves performance. We further demonstrate that factored morphological modeling improves robustness and that integrating a Latvian morphological analyzer through constrained decoding yields consistent gains in XPOS tagging and lemmatization. The best system achieves new state-of-the-art results, reaching 95.22% XPOS accuracy, 98.72% lemma accuracy, and 93.19% LAS.