Claus Povlsen

2016

Language resources (LR) are indispensable for the development of tools for machine translation (MT) or various kinds of computer-assisted translation (CAT). In particular language corpora, both parallel and monolingual are considered most important for instance for MT, not only SMT but also hybrid MT. The Language Technology Observatory will provide easy access to information about LRs deemed to be useful for MT and other translation tools through its LR Catalogue. In order to determine what aspects of an LR are useful for MT practitioners, a user study was made, providing a guide to the most relevant metadata and the most relevant quality criteria. We have seen that many resources exist which are useful for MT and similar work, but the majority are for (academic) research or educational use only, and as such not available for commercial use. Our work has revealed a list of gaps: coverage gap, awareness gap, quality gap, quantity gap. The paper ends with recommendations for a forward-looking strategy.

2014

pdf abs
Encompassing a spectrum of LT users in the CLARIN-DK Infrastructure
Lina Henriksen | Dorte Haltrup Hansen | Bente Maegaard | Bolette Sandford Pedersen | Claus Povlsen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

CLARIN-DK is a platform with language resources constituting the Danish part of the European infrastructure CLARIN ERIC. Unlike some other language based infrastructures CLARIN-DK is not solely a repository for upload and storage of data, but also a platform of web services permitting the user to process data in various ways. This involves considerable complications in relation to workflow requirements. The CLARIN-DK interface must guide the user to perform the necessary steps of a workflow; even when the user is inexperienced and perhaps has an unclear conception of the requested results. This paper describes a user driven approach to creating a user interface specification for CLARIN-DK. We indicate how different user profiles determined different crucial interface design options. We also describe some use cases established in order to give illustrative examples of how the platform may facilitate research.

2013

pdf
Let’sMT! as a Learning Platform for SMT
Hanne Fersøe | Dorte Haltrup Hansen | Lene Offersgaard | Susi Olsen | Claus Povlsen
Proceedings of Machine Translation Summit XIV: User track

2010

pdf abs
Incorporating Speech Synthesis in the Development of a Mobile Platform for e-learning.
Justus Roux | Pieter Scholtz | Daleen Klop | Claus Povlsen | Bart Jongejan | Asta Magnusdottir
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This presentation and accompanying demonstration focuses on the development of a mobile platform for e-learning purposes with enhanced text-to-speech capabilities. It reports on an international consortium project entitled Mobile E-learning for Africa (MELFA), which includes a reading and literacy training component, particularly focusing on an African language, isiXhosa. The high penetration rate of mobile phones within the African continent has created new opportunities for delivering various kinds of information, including e-learning material to communities that have not had appropriate infrastructures. Aspects of the mobile platform development are described paying attention to basic functionalities of the user interface, as well as to the underlying web technologies involved. Some of the main features of the literacy training module are described, such as grapheme-sound correspondence, syllabification-sound relationships, varying tempo of presentation. A particular point is made for using HMM (HTS) synthesis in this case, as it seems to be very appropriate for less resourced languages.

2008

pdf abs
Merging a Syntactic Resource with a WordNet: a Feasibility Study of a Merge between STO and DanNet
Bolette Sandford Pedersen | Anna Braasch | Lina Henriksen | Sussi Olsen | Claus Povlsen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents a feasibility study of a merge between SprogTeknologisk Ordbase (STO), which contains morphological and syntactic information, and DanNet, which is a Danish WordNet containing semantic information in terms of synonym sets and semantic relations. The aim of the merge is to develop a richer, composite resource which we believe will have a broader usage perspective than the two seen in isolation. In STO, the organizing principle is based on the observable syntactic features of a lemmas near context (labeled syntactic units or SynUs). In contrast, the basic unit in DanNet is constituted by semantic senses or - in wordnet terminology - synonym sets (synsets). The merge of the two resources is thus basically to be understood as a linking between SynUs and synsets. In the paper we discuss which parts of the merge can be performed semi-automatically and which parts require manual linguistic matching procedures. We estimate that this manual work will amount to approx. 39% of the lexicon material.

pdf
Domain specific MT in use
Lene Offersgaard | Claus Povlsen | Lisbeth Almsten | Bente Maegaard
Proceedings of the 12th Annual Conference of the European Association for Machine Translation

2007

pdf
Patent documentation – comparison of two MT strategies
Lene Offersgaard | Claus Povlsen
Proceedings of the Workshop on Patent translation

2006

pdf abs
EuroTermBank - a Terminology Resource based on Best Practice
Lina Henriksen | Claus Povlsen | Andrejs Vasiljevs
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The new EU member countries face the problems of terminology resource fragmentation and lack of coordination in terminology development in general. The EuroTermBank project aims at contributing to improve the terminology infrastructure of the new EU countries and the project will result in a centralized online terminology bank - interlinked to other terminology banks and resources - for languages of the new EU member countries. The main focus of this paper is on a description of how to identify best practice within terminology work seen from a broad perspective. Surveys of real life terminology work have been conducted and these surveys have resulted in identification of scenario specific best practice descriptions of terminology work. Furthermore, this paper will present an outline of the specific criteria that have been used for selection of existing term resources to be included in the EuroTermBank database.

The MULINCO project (MUltiLINgual Corpus of the University of Copenhagen) started early 2005. The purpose of this cross-disciplinary project is to create a corpus platform for education and research in monolingual and translation studies. The project covers two main types of corpus texts: literary and non-literary. The platform is being developed using available tools as far as possible, and integrating them in a very open architecture. In this paper we describe the current status and future developments of both the text and tool side of the corpus platform, and we show some examples of student exercises taking advantage of tagged and aligned texts.

2001

pdf abs
Ape: reducing the monkey business in post-editing by automating the task intelligently
Claus Povlsen | Annelise Bech
Proceedings of Machine Translation Summit VIII

For a professional user of MT, quality, performance and cost efficiency are critical. It is therefore surprising that only little attention – both in theory and in practice - has been given to the task of post-editing machine translated texts. This paper will focus on this important user aspect and demonstrate that substantial savings in time and effort can be achieved by implementing intelligent automatic tools. Our point of departure is the PaTrans MT-system, developed by CST and used by the Danish translation company Lingtech. An intelligent post-editing facility, Ape, has been developed and added to the system. We will outline and discuss this mechanism and its positive effects on the output. The underlying idea of the intelligent post-editing facility is to exploit the lexical and grammatical knowledge already present in the MT-system’s linguistic components. Conceptually, our approach is general, although its implementation remains system specific. Surveys of post-editor satisfaction and cost-efficiency improvements, as well as a quantitative, benchmark-based evaluation of the effect of Ape demonstrate the success of the approach and encourage further development.

1994

pdf
Natural language processing in dialogue systems with spoken input
Claus Povlsen
Proceedings of the 9th Nordic Conference of Computational Linguistics (NODALIDA 1993)