Gregor Thurmair

2015

pdf
Evaluation of the domain adaptation of MT systems in ACCURAT
Gregor Thurmair
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf abs
Conceptual transfer: Using local classifiers for transfer selection
Gregor Thurmair
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

A key challenge for Machine Translation is transfer selection, i.e. to find the right translation for a given word from a set of alternatives (1:n). This problem becomes the more important the larger the dictionary is, as the number of alternatives increases. The contribution presents a novel approach for transfer selection, called conceptual transfer, where selection is done using classifiers based on the conceptual context of a translation candidate on the source language side. Such classifiers are built automatically by parallel corpus analysis: Creating subcorpora for each translation of a 1:n package, and identifying correlating concepts in these subcorpora as features of the classifier. The resulting resource can easily be linked to transfer components of MT systems as it does not depend on internal analysis structures. Tests show that conceptual transfer outperforms the selection techniques currently used in operational MT systems.

2013

pdf
A modular open-source focused crawler for mining monolingual and bilingual corpora from the web
Vassilis Papavassiliou | Prokopis Prokopidis | Gregor Thurmair
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

2012

pdf abs
Large Scale Lexical Analysis
Gregor Thurmair | Vera Aleksić | Christoph Schwarz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The following paper presents a lexical analysis component as implemented in the PANACEA project. The goal is to automatically extract lexicon entries from crawled corpora, in an attempt to use corpus-based methods for high-quality linguistic text processing, and to focus on the quality of data without neglecting quantitative aspects. Lexical analysis has the task to assign linguistic information (like: part of speech, inflectional class, gender, subcategorisation frame, semantic properties etc.) to all parts of the input text. If tokens are ambiguous, lexical analysis must provide all possible sets of annotation for later (syntactic) disambiguation, be it tagging, or full parsing. The paper presents an approach for assigning part-of-speech tags for German and English to large input corpora (> 50 mio tokens), providing a workflow which takes as input crawled corpora and provides POS-tagged lemmata ready for lexicon integration. Tools include sentence splitting, lexicon lookup, decomposition, and POS defaulting. Evaluation shows that the overall error rate can be brought down to about 2% if language resources are properly designed. The complete workflow is implemented as a sequence of web services integrated into the PANACEA platform.

pdf
Efficiency-based evaluation of aligners for industrial applications
Antonio. Toral | Marc Poch | Pavel Pecina | Gregor Thurmair
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf
EASTIN-CL: A multilingual front-end to a database of Assistive Technology products
Gregor Thurmair | Andrea Agnoletto | Valerio Gower | Roberts Rozis
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf
Creating Term and Lexicon Entries from Phrase Tables
Gregor Thurmair | Vera Aleksić
Proceedings of the 16th Annual conference of the European Association for Machine Translation

This paper reports on measures to improve the quality of MT systems, by using a hybrid system architecture which adds corpus-based and statistical components to an existing rule-based system backbone. The focus is on improving the accuracy of the dictionary resources.

2004

pdf
Multilingual Content Processing
Gregor Thurmair
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf
Making term extraction tools usable
Gregor Thurmair
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT

bib
The Comprendium Translator system
Juan A. Alonso | Gregor Thurmair
Proceedings of Machine Translation Summit IX: System Presentations

2002

2001

pdf abs
The ISLE in the ocean. Transatlantic standards for multilingual lexicons (with an eye to machine translation)
Nicoletta Calzolari | Alessandro Lenci | Antonio Zampolli | Nuria Bel | Marta Villegas | Gregor Thurmair
Proceedings of Machine Translation Summit VIII

The ISLE project is a continuation of the long standing EAGLES initiative, carried out under the Human Language Technology (HLT) programme in collaboration between American and European groups in the framework of the EU-US International Research Co-operation, supported by NSF and EC. In this paper we concentrate on the current position of the ISLE Computational Lexicon Working Group (CLWG), whose activities aim at defining a general schema for a multilingual lexical entry (MILE), as the basis for a standard framework for multilingual computational lexicons. The needs and features of existing Machine Translation systems provide the main reference points for the process of consensual definition of the MILE. The overall structure of the MILE will be illustrated with particular attention to some of the issues raised for multilingual lexicons by the need of expressing complex transfer conditions among translation equivalents

pdf abs
The Open Lexicon Interchange Format (OLIF) comes of age
Christian Lieske | Susan McCormick | Gregor Thurmair
Proceedings of Machine Translation Summit VIII

This paper summarizes the current status of version 2 of the Open Lexicon Interchange Format (OLIF). As a natural extension of the OLIF prototype (OLIF version 1), version 2 has been modified with respect to content and formalization (e.g., it is now XML-compliant). These enhancements now make it possible to use OLIF in a variety of Natural Language Processing applications and general language technology environments (e.g., terminology management systems). At the time of writing, several industrial partners of the OLIF Consortium had already started work on implementing OLIF support. Details on OLIF can be found on www.olif.net.

2000

pdf
TQPro: Quality Tools for the Translation Process
Gregor Thurmair
Proceedings of Translating and the Computer 22

1999

The L&H approach to development of tools for new languages
Gregor Thurmair | Johannes Ritzke
EAMT Workshop: EU and the new languages

1997

pdf abs
Exchange Interfaces for Translation Tools
Gregor Thurmair
Proceedings of Machine Translation Summit VI: Papers

The following paper presents an overview of current discussions of exchange interfaces in the area of multilingual processing. It first discusses the principles which are relevant for the definition of such interfaces; it then presents a state of the art and a proposal in the area of text interfaces, translation memory interfaces, and terminology exchange. The approach is bottom-up, i.e. it starts from existing interfaces and existing requirements, and intends to be of practical use. It reflects the discussions in current multilingual research projects of the EC, like OTELO and AVENTINUS.

pdf abs
From METAL to T1: Systems and Components for Machine Translation Applications
Ulrike Schwall | Gregor Thurmair
Proceedings of Machine Translation Summit VI: Papers

This paper describes the progress which has been made to make MT systems usable in professional environments. After many years of significant investment, it was decided that the time was ripe for the METAL machine translation system to be better positioned in the market place. Two lines of action were followed: Introducing the system onto the PC market, using the GMS-T1 as a concrete example; Reusing system components in customized solutions, using the AVENTINUS project as an example, which is a multilingual information processing application. Both lines of action have far-reaching consequences for system development. But they also create new opportunities to improve the system's capabilities and flexibility.

1995

pdf
Multilingual information processing
Gregor Thurmair
Proceedings of Machine Translation Summit V

1991

pdf bib abs
An Architecture Sketch of Eurotra-II
Jörg Schütz | Gregor Thurmair | Roberto Cencioni
Proceedings of Machine Translation Summit III: Papers

This paper outlines a new architecture for a NLP/MT development environment for the EUROTRA project, which will be fully operational in the 1993-94 time frame. The proposed architecture provides a powerful and flexible platform for extensions and enhancements to the existing EUROTRA translation philosophy and the linguistic work done so far, thus allow- ing the reusability of existing grammatical and lexical resources, while ensuring the suitability of EUROTRA methods and tools for other NLP/MT system developers and researchers.

1990

pdf
Parsing for Grammar and Style Checking
Gregor Thurmair
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

Venues

mtsummit12
eamt7
lrec4
bucc1
coling1
show all...

tc1

wmt1