Paul Meurer

2018

pdf
The Abkhaz National Corpus
Paul Meurer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Quote Extraction and Attribution from Norwegian Newspapers
Andrew Salway | Paul Meurer | Knut Hofland | Øystein Reigem
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf
Exploring Treebanks with INESS Search
Victoria Rosén | Helge Dyvik | Paul Meurer | Koenraad De Smedt
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

We present NorGramBank, a treebank for Norwegian with highly detailed LFG analyses. It is one of many treebanks made available through the INESS treebanking infrastructure. NorGramBank was constructed as a parsebank, i.e. by automatically parsing a corpus, using the wide coverage grammar NorGram. One part consisting of 350,000 words has been manually disambiguated using computer-generated discriminants. A larger part of 50 M words has been stochastically disambiguated. The treebank is dynamic: by global reparsing at certain intervals it is kept compatible with the latest versions of the grammar and the lexicon, which are continually further developed in interaction with the annotators. A powerful query language, INESS Search, has been developed for search across formalisms in the INESS treebanks, including LFG c- and f-structures. Evaluation shows that the grammar provides about 85% of randomly selected sentences with good analyses. Agreement among the annotators responsible for manual disambiguation is satisfactory, but also suggests desirable simplifications of the grammar.

2013

In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control data. The texts as well as the personal data are marked up in XML according to the TEI Guidelines. In order to be able to classify errors in the texts, we have introduced new attributes to the TEI corr and sic tags. For each error tag, a correct form is also in the text annotation. Finally, we employ an automatic tagger developed for standard Norwegian, the Oslo-Bergen Tagger, together with a facility for manual tag correction. As corpus query system, we are using the Corpus Workbench developed at the University of Stuttgart together with a web search interface developed at Aksis, University of Bergen. The system allows for searching for combinations of words, error types, grammatical annotation and personal data.

2005

pdf
Holistic regression testing for high-quality MT: some methodological and technological reflections
Stephan Oepen | Helge Dyvik | Dan Flickinger | Jan Tore Lønning | Paul Meurer | Victoria Rosén
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

Venues

lrec3
nodalida3
tmi2
geaf1
acl1
show all...

eamt1

Paul Meurer

2018

2017

2016

2013

2008

2007

2006

2005

2004

Co-authors

Venues