Helge Dyvik


Exploring Treebanks with INESS Search
Victoria Rosén | Helge Dyvik | Paul Meurer | Koenraad De Smedt
Proceedings of the 21st Nordic Conference on Computational Linguistics


NorGramBank: A ‘Deep’ Treebank for Norwegian
Helge Dyvik | Paul Meurer | Victoria Rosén | Koenraad De Smedt | Petter Haugereid | Gyri Smørdal Losnegaard | Gunn Inger Lyse | Martha Thunes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present NorGramBank, a treebank for Norwegian with highly detailed LFG analyses. It is one of many treebanks made available through the INESS treebanking infrastructure. NorGramBank was constructed as a parsebank, i.e. by automatically parsing a corpus, using the wide coverage grammar NorGram. One part consisting of 350,000 words has been manually disambiguated using computer-generated discriminants. A larger part of 50 M words has been stochastically disambiguated. The treebank is dynamic: by global reparsing at certain intervals it is kept compatible with the latest versions of the grammar and the lexicon, which are continually further developed in interaction with the annotators. A powerful query language, INESS Search, has been developed for search across formalisms in the INESS treebanks, including LFG c- and f-structures. Evaluation shows that the grammar provides about 85% of randomly selected sentences with good analyses. Agreement among the annotators responsible for manual disambiguation is satisfactory, but also suggests desirable simplifications of the grammar.


The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking
Victoria Rosén | Petter Haugereid | Martha Thunes | Gyri S. Losnegaard | Helge Dyvik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Automatic syntactic analysis of a corpus requires detailed lexical and morphological information that cannot always be harvested from traditional dictionaries. In building the INESS Norwegian treebank, it is often the case that necessary lexical information is missing in the morphology or lexicon. The approach used to build the treebank is incremental parsebanking; a corpus is parsed with an existing grammar, and the analyses are efficiently disambiguated by annotators. When the intended analysis is unavailable after parsing, the reason is often that necessary information is not available in the lexicon. INESS has therefore implemented a text preprocessing interface where annotators can enter unrecognized words before parsing. This may concern words that are unknown to the morphology and/or lexicon, and also words that are known, but for which important information is missing. When this information is added, either during text preprocessing or during disambiguation, the result is that after reparsing the intended analysis can be chosen and stored in the treebank. The lexical information added to the lexicon in this way may be of great interest both to lexicographers and to other language technology efforts, and the enriched lexical resource being developed will be made available at the end of the project.


The INESS Treebanking Infrastructure
Paul Meurer | Helge Dyvik | Victoria Rosén | Koenraad De Smedt | Gunn Inger Lyse | Gyri Smørdal Losnegaard | Martha Thunes
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

ParGramBank: The ParGram Parallel Treebank
Sebastian Sulger | Miriam Butt | Tracy Holloway King | Paul Meurer | Tibor Laczkó | György Rákosi | Cheikh Bamba Dione | Helge Dyvik | Victoria Rosén | Koenraad De Smedt | Agnieszka Patejuk | Özlem Çetinoğlu | I Wayan Arka | Meladel Mistica
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


Holistic regression testing for high-quality MT: some methodological and technological reflections
Stephan Oepen | Helge Dyvik | Dan Flickinger | Jan Tore Lønning | Paul Meurer | Victoria Rosén
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

SEM-I Rational MT: Enriching Deep Grammars with a Semantic Interface for Scalable Machine Translation
Dan Flickinger | Jan Tore Lønning | Helge Dyvik | Stephan Oepen | Francis Bond
Proceedings of Machine Translation Summit X: Papers

In the LOGON machine translation system where semantic transfer using Minimal Recursion Semantics is being developed in conjunction with two existing broad-coverage grammars of Norwegian and English, we motivate the use of a grammar-specific semantic interface (SEM-I) to facilitate the construction and maintenance of a scalable translation engine. The SEM-I is a theoretically grounded component of each grammar, capturing several classes of lexical regularities while also serving the crucial engineering function of supplying a reliable and complete specification of the elementary predications the grammar can realize. We make extensive use of underspecification and type hierarchies to maximize generality and precision.


pdf bib
Som å kapp-ete med trollet? – Towards MRS-based Norwegian-English machine translation
Stephan Oepen | Helge Dyvik | Jan Tore Lønning | Erik Velldal | Dorothee Beerman | John Carroll | Dan Flickinger | Lars Hellan | Janne Bondi Johannessen | Paul Meurer | Torbjørn Nordgård | Victoria Rosén
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages


The Parallel Grammar Project
Miriam Butt | Helge Dyvik | Tracy Holloway King | Hiroshi Masuichi | Christian Rohrer
COLING-02: Grammar Engineering and Evaluation


Linguistics and Machine Translation
Helge Dyvik
Proceedings of the 8th Nordic Conference of Computational Linguistics (NODALIDA 1991)


Parsing basert på LFG: Et MIT/Xerox-system applisert på norsk (Parsing based on LFG: A MIT/Xerox system applied on Norwegian) [In Norwegian]
Helge Dyvik | Knut Hofland
Proceedings of the 4th Nordic Conference of Computational Linguistics (NODALIDA 1983)