The Norwegian Dependency Treebank

Per Erik Solberg, Arne Skjærholt, Lilja Øvrelid, Kristin Hagen, Janne Bondi Johannessen


Abstract
The Norwegian Dependency Treebank is a new syntactic treebank for Norwegian Bokmäl and Nynorsk with manual syntactic and morphological annotation, developed at the National Library of Norway in collaboration with the University of Oslo. It is the first publically available treebank for Norwegian. This paper presents the core principles behind the syntactic annotation and how these principles were employed in certain specific cases. We then present the selection of texts and distribution between genres, as well as the annotation process and an evaluation of the inter-annotator agreement. Finally, we present the first results of data-driven dependency parsing of Norwegian, contrasting four state-of-the-art dependency parsers trained on the treebank. The consistency and the parsability of this treebank is shown to be comparable to other large treebank initiatives.
Anthology ID:
L14-1273
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
789–795
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/303_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Per Erik Solberg, Arne Skjærholt, Lilja Øvrelid, Kristin Hagen, and Janne Bondi Johannessen. 2014. The Norwegian Dependency Treebank. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 789–795, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
The Norwegian Dependency Treebank (Solberg et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/303_Paper.pdf