Large-Scale Contextualised Language Modelling for Norwegian

Andrey Kutuzov; Jeremy Barnes; Erik Velldal; Lilja Øvrelid; Stephan Oepen

Large-Scale Contextualised Language Modelling for Norwegian

Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen

[How to correct problems with metadata yourself]

Abstract

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see: http://norlm.nlpl.eu

Anthology ID:: 2021.nodalida-main.4
Volume:: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:: May 31--2 June
Year:: 2021
Address:: Reykjavik, Iceland (Online)
Editors:: Simon Dobnik, Lilja Øvrelid
Venue:: NoDaLiDa
SIG:
Publisher:: Linköping University Electronic Press, Sweden
Note:
Pages:: 30–40
Language:
URL:: https://aclanthology.org/2021.nodalida-main.4
DOI:
Bibkey:
Cite (ACL):: Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, and Stephan Oepen. 2021. Large-Scale Contextualised Language Modelling for Norwegian. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 30–40, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):: Large-Scale Contextualised Language Modelling for Norwegian (Kutuzov et al., NoDaLiDa 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/teach-a-man-to-fish/2021.nodalida-main.4.pdf
Code: ltgoslo/NorBERT + additional community code
Data: NorNE

PDF Search Code