2021
pdf
bib
abs
eTranslation’s Submissions to the WMT 2021 News Translation Task
Csaba Oravecz
|
Katina Bontcheva
|
David Kolovratník
|
Bhavani Bhaskar
|
Michael Jellinghaus
|
Andreas Eisele
Proceedings of the Sixth Conference on Machine Translation
The paper describes the 3 NMT models submitted by the eTranslation team to the WMT 2021 news translation shared task. We developed systems in language pairs that are actively used in the European Commission’s eTranslation service. In the WMT news task, recent years have seen a steady increase in the need for computational resources to train deep and complex architectures to produce competitive systems. We took a different approach and explored alternative strategies focusing on data selection and filtering to improve the performance of baseline systems. In the domain constrained task for the French–German language pair our approach resulted in the best system by a significant margin in BLEU. For the other two systems (English–German and English-Czech) we tried to build competitive models using standard best practices.
2020
pdf
bib
abs
eTranslation’s Submissions to the WMT 2020 News Translation Task
Csaba Oravecz
|
Katina Bontcheva
|
László Tihanyi
|
David Kolovratnik
|
Bhavani Bhaskar
|
Adrien Lardilleux
|
Szymon Klocek
|
Andreas Eisele
Proceedings of the Fifth Conference on Machine Translation
The paper describes the submissions of the eTranslation team to the WMT 2020 news translation shared task. Leveraging the experience from the team’s participation last year we developed systems for 5 language pairs with various strategies. Compared to last year, for some language pairs we dedicated a lot more resources to training, and tried to follow standard best practices to build competitive systems which can achieve good results in the rankings. By using deep and complex architectures we sacrificed direct re-usability of our systems in production environments but evaluation showed that this approach could result in better models that significantly outperform baseline architectures. We submitted two systems to the zero shot robustness task. These submissions are described briefly in this paper as well.
2019
pdf
bib
abs
eTranslation’s Submissions to the WMT 2019 News Translation Task
Csaba Oravecz
|
Katina Bontcheva
|
Adrien Lardilleux
|
László Tihanyi
|
Andreas Eisele
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
This paper describes the submissions of the eTranslation team to the WMT 2019 news translation shared task. The systems have been developed with the aim of identifying and following rather than establishing best practices, under the constraints imposed by a low resource training and decoding environment normally used for our production systems. Thus most of the findings and results are transferable to systems used in the eTranslation service. Evaluations suggest that this approach is able to produce decent models with good performance and speed without the overhead of using prohibitively deep and complex architectures.
2012
pdf
bib
abs
MultiUN v2: UN Documents with Multilingual Alignments
Yu Chen
|
Andreas Eisele
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
MultiUN is a multilingual parallel corpus extracted from the official documents of the United Nations. It is available in the six official languages of the UN and a small portion of it is also available in German. This paper presents a major update on the first public version of the corpus released in 2010. This version 2 consists of over 513,091 documents, including more than 9% of new documents retrieved from the United Nations official document system. We applied several modifications to the corpus preparation method. In this paper, we describe the methods we used for processing the UN documents and aligning the sentences. The most significant improvement compared to the previous release is the newly added multilingual sentence alignment information. The alignment information is encoded together with the text in XML instead of additional files. Our representation of the sentence alignment allows quick construction of aligned texts parallel in arbitrary number of languages, which is essential for building machine translation systems.
pdf
bib
abs
DGT-TM: A freely available Translation Memory in 22 languages
Ralf Steinberger
|
Andreas Eisele
|
Szymon Klocek
|
Spyridon Pilos
|
Patrick Schlüter
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The European Commission's (EC) Directorate General for Translation, together with the EC's Joint Research Centre, is making available a large translation memory (TM; i.e. sentences and their professionally produced translations) covering twenty-two official European Union (EU) languages and their 231 language pairs. Such a resource is typically used by translation professionals in combination with TM software to improve speed and consistency of their translations. However, this resource has also many uses for translation studies and for language technology applications, including Statistical Machine Translation (SMT), terminology extraction, Named Entity Recognition (NER), multilingual classification and clustering, and many more. In this reference paper for DGT-TM, we introduce this new resource, provide statistics regarding its size, and explain how it was produced and how to use it.
2010
pdf
bib
abs
English — Oromo Machine Translation: An Experiment Using a Statistical Approach
Sisay Adugna
|
Andreas Eisele
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper deals with translation of English documents to Oromo using statistical methods. Whereas English is the lingua franca of online information, Oromo, despite its relative wide distribution within Ethiopia and neighbouring countries like Kenya and Somalia, is one of the most resource scarce languages. The paper has two main goals: one is to test how far we can go with the available limited parallel corpus for the English ― Oromo language pair and the applicability of existing Statistical Machine Translation (SMT) systems on this language pair. The second goal is to analyze the output of the system with the objective of identifying the challenges that need to be tackled. Since the language is resource scarce as mentioned above, we cannot get as many parallel documents as we want for the experiment. However, using a limited corpus of 20,000 bilingual sentences and 163,000 monolingual sentences, translation accuracy in terms of BLEU Score of 17.74% was achieved.
pdf
bib
abs
MultiUN: A Multilingual Corpus from United Nation Documents
Andreas Eisele
|
Yu Chen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper describes the acquisition, preparation and properties of a corpus extracted from the official documents of the United Nations (UN). This corpus is available in all 6 official languages of the UN, consisting of around 300 million words per language. We describe the methods we used for crawling, document formatting, and sentence alignment. This corpus also includes a common test set for machine translation. We present the results of a French-Chinese machine translation experiment performed on this corpus.
pdf
bib
abs
Integrating a Rule-based with a Hierarchical Translation System
Yu Chen
|
Andreas Eisele
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Recent developments on hybrid systems that combine rule-based machine translation (RBMT) systems with statistical machine translation (SMT) generally neglect the fact that RBMT systems tend to produce more syntactically well-formed translations than data-driven systems. This paper proposes a method that alleviates this issue by preserving more useful structures produced by RBMT systems and utilizing them in a SMT system that operates on hierarchical structures instead of flat phrases alone. For our experiments, we use Joshua as the decoder. It is the first attempt towards a tighter integration of MT systems from different paradigms that both support hierarchical analysis. Preliminary results show consistent improvements over the previous approach.
pdf
bib
English to Bangla Phrase-Based Machine Translation
Zahurul Islam
|
Jörg Tiedemann
|
Andreas Eisele
Proceedings of the 14th Annual conference of the European Association for Machine Translation
pdf
bib
Hierarchical Hybrid Translation between English and German
Yu Chen
|
Andreas Eisele
Proceedings of the 14th Annual conference of the European Association for Machine Translation
pdf
bib
Further Experiments with Shallow Hybrid MT Systems
Christian Federmann
|
Andreas Eisele
|
Yu Chen
|
Sabine Hunsicker
|
Jia Xu
|
Hans Uszkoreit
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
2009
pdf
bib
Intersecting Multilingual Data for Faster and Better Statistical Translations
Yu Chen
|
Martin Kay
|
Andreas Eisele
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
pdf
bib
Towards an effective toolkit for translators
Andreas Eisele
Proceedings of Translating and the Computer 31
pdf
bib
Combining Multi-Engine Translations with Moses
Yu Chen
|
Michael Jellinghaus
|
Andreas Eisele
|
Yi Zhang
|
Sabine Hunsicker
|
Silke Theison
|
Christian Federmann
|
Hans Uszkoreit
Proceedings of the Fourth Workshop on Statistical Machine Translation
pdf
bib
Translation Combination using Factored Word Substitution
Christian Federmann
|
Silke Theison
|
Andreas Eisele
|
Hans Uszkoreit
|
Yu Chen
|
Michael Jellinghaus
|
Sabine Hunsicker
Proceedings of the Fourth Workshop on Statistical Machine Translation
2008
pdf
bib
abs
Improving Statistical Machine Translation Efficiency by Triangulation
Yu Chen
|
Andreas Eisele
|
Martin Kay
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In current phrase-based Statistical Machine Translation systems, more training data is generally better than less. However, a larger data set eventually introduces a larger model that enlarges the search space for the decoder, and consequently requires more time and more resources to translate. This paper describes an attempt to reduce the model size by filtering out the less probable entries based on testing correlation using additional training data in an intermediate third language. The central idea behind the approach is triangulation, the process of incorporating multilingual knowledge in a single system, which eventually utilizes parallel corpora available in more than two languages. We conducted experiments using Europarl corpus to evaluate our approach. The reduction of the model size can be up to 70% while the translation quality is being preserved.
pdf
bib
Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System
Andreas Eisele
|
Christian Federmann
|
Hervé Saint-Amand
|
Michael Jellinghaus
|
Teresa Herrmann
|
Yu Chen
Proceedings of the Third Workshop on Statistical Machine Translation
pdf
bib
Hybrid machine translation architectures within and beyond the EuroMatrix project
Andreas Eisele
|
Christian Federmann
|
Hans Uszkoreit
|
Hervé Saint-Amand
|
Martin Kay
|
Michael Jellinghaus
|
Sabine Hunsicker
|
Teresa Herrmann
|
Yu Chen
Proceedings of the 12th Annual conference of the European Association for Machine Translation
pdf
bib
Hybrid Architectures for Multi-Engine Machine Translation
Andreas Eisele
Proceedings of Translating and the Computer 30
2007
pdf
bib
Multi-Engine Machine Translation with an Open-Source SMT Decoder
Yu Chen
|
Andreas Eisele
|
Christian Federmann
|
Eva Hasler
|
Michael Jellinghaus
|
Silke Theison
Proceedings of the Second Workshop on Statistical Machine Translation
2006
pdf
bib
abs
Parallel Corpora and Phrase-Based Statistical Machine Translation for New Language Pairs via Multiple Intermediaries
Andreas Eisele
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
We present a large parallel corpus of texts published by the United Nations Organization, which we exploit for the creation ofphrase-based statistical machine translation (SMT) systems for new language pairs. We present a setup where phrase tables for these language pairs are used for translation between languages for which parallel corpora of sufficient size are so far not available. We give some preliminary results for this novel application of SMT and discuss further refinements.
2005
pdf
bib
First Steps towards Multi-Engine Machine Translation
Andreas Eisele
Proceedings of the ACL Workshop on Building and Using Parallel Texts
2004
pdf
bib
Generating an Arabic Full-form Lexicon for Bidirectional Morphology Lookup
Abdelhadi Soudi
|
Andreas Eisele
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
pdf
bib
The DeepThought Core Architecture Framework
Ulrich Callmeier
|
Andreas Eisele
|
Ulrich Schäfer
|
Melanie Siegel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2002
pdf
bib
Towards a Road Map on Human Language Technology: Natural Language Processing
Andreas Eisele
|
Dorothea Ziegler-Eisele
COLING-02: A Roadmap for Computational Linguistics
pdf
bib
Towards a road map on human language technology: Natural language processing
Andreas Eisele
|
Dorothea Ziegler-Eisele
Workshop on machine translation roadmap
1993
pdf
bib
Recent Advances in Janus: A Speech Translation System
M. Woszczyna
|
N. Coccaro
|
A. Eisele
|
A. Lavie
|
A. McNair
|
T. Polzin
|
I. Rogina
|
C. P. Rose
|
T. Sloboda
|
M. Tomita
|
J. Tsutsumi
|
N. Aoki-Waibel
|
A. Waibel
|
W. Ward
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993
1990
pdf
bib
Feature Logic with Disjunctive Unification
Jochen Dorre
|
Andreas Eisele
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics
1988
pdf
bib
Unification of Disjunctive Feature Descriptions
Andreas Eisele
|
Jochen Dorre
26th Annual Meeting of the Association for Computational Linguistics
1986
pdf
bib
A Lexical Functional Grammar System in Prolog
Andreas Eisele
|
Jochen Dorre
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics