Dag Haug


2023

pdf
Rules and neural nets for morphological tagging of Norwegian - Results and challenges
Dag Haug | Ahmet Yildirim | Kristin Hagen | Anders Nøklestad
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

This paper reports on efforts to improve the Oslo-Bergen Tagger for Norwegian morphological tagging. We train two deep neural network-based taggers using the recently introduced Norwegian pre-trained encoder (a BERT model for Norwegian). The first network is a sequence-to-sequence encoder-decoder and the second is a sequence classifier. We test both these configurations in a hybrid system where they combine with the existing rule-based system, and on their own. The sequence-to-sequence system performs better in the hybrid configuration, but the classifier system performs so well that combining it with the rules is actually slightly detrimental to performance.

2022

pdf
NARCNorwegian Anaphora Resolution Corpus
Petter Mæhlum | Dag Haug | Tollef Jørgensen | Andre Kåsen | Anders Nøklestad | Egil Rønningstad | Per Erik Solberg | Erik Velldal | Lilja Øvrelid
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

We present the Norwegian Anaphora Resolution Corpus (NARC), the first publicly available corpus annotated with anaphoric relations between noun phrases for Norwegian. The paper describes the annotated data for 326 documents in Norwegian Bokmål, together with inter-annotator agreement and discussions of relevant statistics. We also present preliminary modelling results which are comparable to existing corpora for other languages, and discuss relevant problems in relation to both modelling and the annotations themselves.

2018

pdf
Expletives in Universal Dependency Treebanks
Gosse Bouma | Jan Hajic | Dag Haug | Joakim Nivre | Per Erik Solberg | Lilja Øvrelid
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

Although treebanks annotated according to the guidelines of Universal Dependencies (UD) now exist for many languages, the goal of annotating the same phenomena in a cross-linguistically consistent fashion is not always met. In this paper, we investigate one phenomenon where we believe such consistency is lacking, namely expletive elements. Such elements occupy a position that is structurally associated with a core argument (or sometimes an oblique dependent), yet are non-referential and semantically void. Many UD treebanks identify at least some elements as expletive, but the range of phenomena differs between treebanks, even for closely related languages, and sometimes even for different treebanks for the same language. In this paper, we present criteria for identifying expletives that are applicable across languages and compatible with the goals of UD, give an overview of expletives as found in current UD treebanks, and present recommendations for the annotation of expletives so that more consistent annotation can be achieved in future releases.

2010

pdf
Porting an Ancient Greek and Latin Treebank
John Lee | Dag Haug
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We have recently converted a dependency treebank, consisting of ancient Greek and Latin texts, from one annotation scheme to another that was independently designed. This paper makes two observations about this conversion process. First, we show that, despite significant surface differences between the two treebanks, a number of straightforward transformation rules yield a substantial level of compatibility between them, giving evidence for their sound design and high quality of annotation. Second, we analyze some linguistic annotations that require further disambiguation, proposing some simple yet effective machine learning methods.