2015
pdf
Evaluation of the domain adaptation of MT systems in ACCURAT
Gregor Thurmair
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
pdf
Evaluation of the domain adaptation of MT systems in ACCURAT
Gregor Thurmair
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
2014
pdf
abs
Conceptual transfer: Using local classifiers for transfer selection
Gregor Thurmair
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
A key challenge for Machine Translation is transfer selection, i.e. to find the right translation for a given word from a set of alternatives (1:n). This problem becomes the more important the larger the dictionary is, as the number of alternatives increases. The contribution presents a novel approach for transfer selection, called conceptual transfer, where selection is done using classifiers based on the conceptual context of a translation candidate on the source language side. Such classifiers are built automatically by parallel corpus analysis: Creating subcorpora for each translation of a 1:n package, and identifying correlating concepts in these subcorpora as features of the classifier. The resulting resource can easily be linked to transfer components of MT systems as it does not depend on internal analysis structures. Tests show that conceptual transfer outperforms the selection techniques currently used in operational MT systems.
2013
pdf
A modular open-source focused crawler for mining monolingual and bilingual corpora from the web
Vassilis Papavassiliou
|
Prokopis Prokopidis
|
Gregor Thurmair
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora
2012
pdf
abs
Large Scale Lexical Analysis
Gregor Thurmair
|
Vera Aleksić
|
Christoph Schwarz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The following paper presents a lexical analysis component as implemented in the PANACEA project. The goal is to automatically extract lexicon entries from crawled corpora, in an attempt to use corpus-based methods for high-quality linguistic text processing, and to focus on the quality of data without neglecting quantitative aspects. Lexical analysis has the task to assign linguistic information (like: part of speech, inflectional class, gender, subcategorisation frame, semantic properties etc.) to all parts of the input text. If tokens are ambiguous, lexical analysis must provide all possible sets of annotation for later (syntactic) disambiguation, be it tagging, or full parsing. The paper presents an approach for assigning part-of-speech tags for German and English to large input corpora (> 50 mio tokens), providing a workflow which takes as input crawled corpora and provides POS-tagged lemmata ready for lexicon integration. Tools include sentence splitting, lexicon lookup, decomposition, and POS defaulting. Evaluation shows that the overall error rate can be brought down to about 2% if language resources are properly designed. The complete workflow is implemented as a sequence of web services integrated into the PANACEA platform.
pdf
Efficiency-based evaluation of aligners for industrial applications
Antonio. Toral
|
Marc Poch
|
Pavel Pecina
|
Gregor Thurmair
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
pdf
EASTIN-CL: A multilingual front-end to a database of Assistive Technology products
Gregor Thurmair
|
Andrea Agnoletto
|
Valerio Gower
|
Roberts Rozis
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
pdf
Creating Term and Lexicon Entries from Phrase Tables
Gregor Thurmair
|
Vera Aleksić
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
2011
pdf
Personal Translator at WMT2011
Vera Aleksić
|
Gregor Thurmair
Proceedings of the Sixth Workshop on Statistical Machine Translation
2009
pdf
Comparing different architectures of hybrid Machine Translation systems
Gregor Thurmair
Proceedings of Machine Translation Summit XII: Posters
2007
pdf
Generation issues in machine translation
Gregor Thurmair
Proceedings of the Workshop on Using corpora for natural language generation
pdf
bib
Proceedings of the Workshop on Automatic procedures in MT evaluation
Gregor Thurmair
|
Khalid Choukri
|
Bente Maegaard
Proceedings of the Workshop on Automatic procedures in MT evaluation
Automatic evaluation in MT system production
Gregor Thurmair
Proceedings of the Workshop on Automatic procedures in MT evaluation
2005
pdf
abs
Improving Machine Translation Quality
Gregor Thurmair
Proceedings of Machine Translation Summit X: Invited papers
This paper reports on measures to improve the quality of MT systems, by using a hybrid system architecture which adds corpus-based and statistical components to an existing rule-based system backbone. The focus is on improving the accuracy of the dictionary resources.
2004
pdf
Multilingual Content Processing
Gregor Thurmair
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2003
pdf
Making term extraction tools usable
Gregor Thurmair
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT
bib
The Comprendium Translator system
Juan A. Alonso
|
Gregor Thurmair
Proceedings of Machine Translation Summit IX: System Presentations
2002
pdf
From Resources to Applications. Designing the Multilingual ISLE Lexical Entry
Sue Atkins
|
Nuria Bel
|
Francesca Bertagna
|
Pierrette Bouillon
|
Nicoletta Calzolari
|
Christiane Fellbaum
|
Ralph Grishman
|
Alessandro Lenci
|
Catherine MacLeod
|
Martha Palmer
|
Gregor Thurmair
|
Marta Villegas
|
Antonio Zampolli
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
2001
pdf
abs
The ISLE in the ocean. Transatlantic standards for multilingual lexicons (with an eye to machine translation)
Nicoletta Calzolari
|
Alessandro Lenci
|
Antonio Zampolli
|
Nuria Bel
|
Marta Villegas
|
Gregor Thurmair
Proceedings of Machine Translation Summit VIII
The ISLE project is a continuation of the long standing EAGLES initiative, carried out under the Human Language Technology (HLT) programme in collaboration between American and European groups in the framework of the EU-US International Research Co-operation, supported by NSF and EC. In this paper we concentrate on the current position of the ISLE Computational Lexicon Working Group (CLWG), whose activities aim at defining a general schema for a multilingual lexical entry (MILE), as the basis for a standard framework for multilingual computational lexicons. The needs and features of existing Machine Translation systems provide the main reference points for the process of consensual definition of the MILE. The overall structure of the MILE will be illustrated with particular attention to some of the issues raised for multilingual lexicons by the need of expressing complex transfer conditions among translation equivalents
pdf
abs
The Open Lexicon Interchange Format (OLIF) comes of age
Christian Lieske
|
Susan McCormick
|
Gregor Thurmair
Proceedings of Machine Translation Summit VIII
This paper summarizes the current status of version 2 of the Open Lexicon Interchange Format (OLIF). As a natural extension of the OLIF prototype (OLIF version 1), version 2 has been modified with respect to content and formalization (e.g., it is now XML-compliant). These enhancements now make it possible to use OLIF in a variety of Natural Language Processing applications and general language technology environments (e.g., terminology management systems). At the time of writing, several industrial partners of the OLIF Consortium had already started work on implementing OLIF support. Details on OLIF can be found on www.olif.net.
2000
pdf
TQPro: Quality Tools for the Translation Process
Gregor Thurmair
Proceedings of Translating and the Computer 22
1999
The L&H approach to development of tools for new languages
Gregor Thurmair
|
Johannes Ritzke
EAMT Workshop: EU and the new languages
1997
pdf
abs
Exchange Interfaces for Translation Tools
Gregor Thurmair
Proceedings of Machine Translation Summit VI: Papers
The following paper presents an overview of current discussions of exchange interfaces in the area of multilingual processing. It first discusses the principles which are relevant for the definition of such interfaces; it then presents a state of the art and a proposal in the area of text interfaces, translation memory interfaces, and terminology exchange. The approach is bottom-up, i.e. it starts from existing interfaces and existing requirements, and intends to be of practical use. It reflects the discussions in current multilingual research projects of the EC, like OTELO and AVENTINUS.
pdf
abs
From METAL to T1: Systems and Components for Machine Translation Applications
Ulrike Schwall
|
Gregor Thurmair
Proceedings of Machine Translation Summit VI: Papers
This paper describes the progress which has been made to make MT systems usable in professional environments. After many years of significant investment, it was decided that the time was ripe for the METAL machine translation system to be better positioned in the market place. Two lines of action were followed: Introducing the system onto the PC market, using the GMS-T1 as a concrete example; Reusing system components in customized solutions, using the AVENTINUS project as an example, which is a multilingual information processing application. Both lines of action have far-reaching consequences for system development. But they also create new opportunities to improve the system's capabilities and flexibility.
1995
pdf
Multilingual information processing
Gregor Thurmair
Proceedings of Machine Translation Summit V
1991
pdf
bib
abs
An Architecture Sketch of Eurotra-II
Jörg Schütz
|
Gregor Thurmair
|
Roberto Cencioni
Proceedings of Machine Translation Summit III: Papers
This paper outlines a new architecture for a NLP/MT development environment for the EUROTRA project, which will be fully operational in the 1993-94 time frame. The proposed architecture provides a powerful and flexible platform for extensions and enhancements to the existing EUROTRA translation philosophy and the linguistic work done so far, thus allow- ing the reusability of existing grammatical and lexical resources, while ensuring the suitability of EUROTRA methods and tools for other NLP/MT system developers and researchers.
1990
pdf
Parsing for Grammar and Style Checking
Gregor Thurmair
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics