Lars Bungum
2026
Reformulate and Create, Don’t Translate: Creating Natural Prompts for Underserved Languages
Annika Simonsen | Mathias Stenlund | Lars Bungum | Marc Daníel Skipstað Volhardt | Hafsteinn Einarsson
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Annika Simonsen | Mathias Stenlund | Lars Bungum | Marc Daníel Skipstað Volhardt | Hafsteinn Einarsson
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present a methodology for creating high-quality instruction prompts for low-resource Germanic languages that addresses a critical challenge: small annotator pools risk producing datasets reflecting narrow individual interests rather than diverse user needs. In this work, native speakers reformulate existing English prompts from OpenAssistant or create entirely original prompts, adapting them to reflect local contexts and natural language patterns while preserving broad task and topic diversity. This approach produced high-quality prompt datasets totaling 6,950 prompts across seven Germanic languages (German, Dutch, Swedish, Norwegian Bokmål/Nynorsk, Danish, Icelandic and Faroese) with validated coverage of diverse tasks and topics. Blind evaluation demonstrates that human-reformulated prompts significantly outperform synthetically generated prompts in naturalness and comprehensibility, particularly for low-resource languages like Icelandic and Faroese. For the bigger Scandinavian lan- guage, Danish, the difference was less pronounced. The prompt dataset is released under an open-source license at https://huggingface.co/datasets/AnnikaSimonsen/TrustLLM-reformulation-prompts.
Synthetic Instruction Generation for Low-Resource Nordic Languages: Viability and Limitations in LLM Instruction-Tuning
Mathias Stenlund | Annika Simonsen | Lars Bungum | Jan Ebert | Jiangtao Wang | Oleg Filatov | Hemanadhan Myneni | Morris Riedel | Hafsteinn Einarsson
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Mathias Stenlund | Annika Simonsen | Lars Bungum | Jan Ebert | Jiangtao Wang | Oleg Filatov | Hemanadhan Myneni | Morris Riedel | Hafsteinn Einarsson
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Pretrained large language models (LLMs) gain instruction-following abilities through instruction-tuning, a method which relies on datasets of instruction–response pairs. However, for low-resource languages, collecting human-authored instructions is costly, raising the question of whether synthetic instructions can substitute human-authored instructions for non-English languages. We compare instruction-tuning of a smaller pretrained LLM in four Nordic languages using (a) human-authored instructions paired with synthetic responses and (b) fully synthetic instruction–response pairs generated with a minimal-effort pipeline. Native-speaker evaluations show that models instruction-tuned on synthetic instructions perform on par with those trained on human-authored instructions for the largest Nordic languages, suggesting that minimal-effort synthetic instructions can serve as a practical alternative. In contrast, response quality deteriorates sharply for Icelandic, underscoring the limitations of current synthetic data generation pipelines when the LLM competence in the target language is weak. Overall, our results highlight that while synthetic instructions can enable cost-efficient instruction-tuning for the largest Nordic languages, they remain insufficient for Icelandic, clarifying when minimal-effort synthetic approaches suffice and when they fall short.
2023
NTNU-TRH system at the MultiGED-2023 Shared on Multilingual Grammatical Error Detection
Lars Bungum | Björn Gambäck | Arild Brandrud Næss
Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning
Lars Bungum | Björn Gambäck | Arild Brandrud Næss
Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning
2016
NTNUSentEval at SemEval-2016 Task 4: Combining General Classifiers for Fast Twitter Sentiment Analysis
Brage Ekroll Jahren | Valerij Fredriksen | Björn Gambäck | Lars Bungum
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
Brage Ekroll Jahren | Valerij Fredriksen | Björn Gambäck | Lars Bungum
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
2015
Self-Organizing Maps for Classification of a Multi-Labeled Corpus
Lars Bungum | Björn Gambäck
Proceedings of the 12th International Conference on Natural Language Processing
Lars Bungum | Björn Gambäck
Proceedings of the 12th International Conference on Natural Language Processing
Negation Scope Detection for Twitter Sentiment Analysis
Johan Reitan | Jørgen Faret | Björn Gambäck | Lars Bungum
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Johan Reitan | Jørgen Faret | Björn Gambäck | Lars Bungum
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
2014
Agent-based modeling of language evolution
Torvald Lekvam | Björn Gambäck | Lars Bungum
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)
Torvald Lekvam | Björn Gambäck | Lars Bungum
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)
Extracting and Selecting Relevant Corpora for Domain Adaptation in MT
Lars Bungum
Proceedings of the 11th International Conference on Natural Language Processing
Lars Bungum
Proceedings of the 11th International Conference on Natural Language Processing
2013
Improving Word Translation Disambiguation by Capturing Multiword Expressions with Dictionaries
Lars Bungum | Björn Gambäck | André Lynum | Erwin Marsi
Proceedings of the 9th Workshop on Multiword Expressions
Lars Bungum | Björn Gambäck | André Lynum | Erwin Marsi
Proceedings of the 9th Workshop on Multiword Expressions
NTNU-CORE: Combining strong features for semantic similarity
Erwin Marsi | Hans Moen | Lars Bungum | Gleb Sizov | Björn Gambäck | André Lynum
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
Erwin Marsi | Hans Moen | Lars Bungum | Gleb Sizov | Björn Gambäck | André Lynum
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
NTNU: Domain Semi-Independent Short Message Sentiment Classification
Øyvind Selmer | Mikael Brevik | Björn Gambäck | Lars Bungum
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)
Øyvind Selmer | Mikael Brevik | Björn Gambäck | Lars Bungum
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)