2016
pdf
abs
NorGramBank: A ‘Deep’ Treebank for Norwegian
Helge Dyvik
|
Paul Meurer
|
Victoria Rosén
|
Koenraad De Smedt
|
Petter Haugereid
|
Gyri Smørdal Losnegaard
|
Gunn Inger Lyse
|
Martha Thunes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We present NorGramBank, a treebank for Norwegian with highly detailed LFG analyses. It is one of many treebanks made available through the INESS treebanking infrastructure. NorGramBank was constructed as a parsebank, i.e. by automatically parsing a corpus, using the wide coverage grammar NorGram. One part consisting of 350,000 words has been manually disambiguated using computer-generated discriminants. A larger part of 50 M words has been stochastically disambiguated. The treebank is dynamic: by global reparsing at certain intervals it is kept compatible with the latest versions of the grammar and the lexicon, which are continually further developed in interaction with the annotators. A powerful query language, INESS Search, has been developed for search across formalisms in the INESS treebanks, including LFG c- and f-structures. Evaluation shows that the grammar provides about 85% of randomly selected sentences with good analyses. Agreement among the annotators responsible for manual disambiguation is satisfactory, but also suggests desirable simplifications of the grammar.
2013
pdf
The INESS Treebanking Infrastructure
Paul Meurer
|
Helge Dyvik
|
Victoria Rosén
|
Koenraad De Smedt
|
Gunn Inger Lyse
|
Gyri Smørdal Losnegaard
|
Martha Thunes
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)
2012
pdf
abs
Creation of an Open Shared Language Resource Repository in the Nordic and Baltic Countries
Andrejs Vasiļjevs
|
Markus Forsberg
|
Tatiana Gornostay
|
Dorte Haltrup Hansen
|
Kristín Jóhannsdóttir
|
Gunn Lyse
|
Krister Lindén
|
Lene Offersgaard
|
Sussi Olsen
|
Bolette Pedersen
|
Eiríkur Rögnvaldsson
|
Inguna Skadiņa
|
Koenraad De Smedt
|
Ville Oksanen
|
Roberts Rozis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The META-NORD project has contributed to an open infrastructure for language resources (data and tools) under the META-NET umbrella. This paper presents the key objectives of META-NORD and reports on the results achieved in the first year of the project. META-NORD has mapped and described the national language technology landscape in the Nordic and Baltic countries in terms of language use, language technology and resources, main actors in the academy, industry, government and society; identified and collected the first batch of language resources in the Nordic and Baltic countries; documented, processed, linked, and upgraded the identified language resources to agreed standards and guidelines. The three horizontal multilingual actions in META-NORD are overviewed in this paper: linking and validating Nordic and Baltic wordnets, the harmonisation of multilingual Nordic and Baltic treebanks, and consolidating multilingual terminology resources across European countries. This paper also touches upon intellectual property rights for the sharing of language resources.