Ulrich Germann
2024
Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024)
Antonio Valerio Miceli-Barone | Fazl Barez | Shay Cohen | Elena Voita | Ulrich Germann | Michal Lukasik
Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024)
Antonio Valerio Miceli-Barone | Fazl Barez | Shay Cohen | Elena Voita | Ulrich Germann | Michal Lukasik
Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024)
2021
European Language Grid: A Joint Platform for the European Language Technology Community
Georg Rehm | Stelios Piperidis | Kalina Bontcheva | Jan Hajic | Victoria Arranz | Andrejs Vasiļjevs | Gerhard Backfried | Jose Manuel Gomez-Perez | Ulrich Germann | Rémi Calizzano | Nils Feldhus | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Julian Moreno-Schneider | Dimitris Galanis | Penny Labropoulou | Miltos Deligiannis | Katerina Gkirtzou | Athanasia Kolovou | Dimitris Gkoumas | Leon Voukoutis | Ian Roberts | Jana Hamrlova | Dusan Varis | Lukas Kacena | Khalid Choukri | Valérie Mapelli | Mickaël Rigault | Julija Melnika | Miro Janosik | Katja Prinz | Andres Garcia-Silva | Cristian Berrio | Ondrej Klejch | Steve Renals
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Georg Rehm | Stelios Piperidis | Kalina Bontcheva | Jan Hajic | Victoria Arranz | Andrejs Vasiļjevs | Gerhard Backfried | Jose Manuel Gomez-Perez | Ulrich Germann | Rémi Calizzano | Nils Feldhus | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Julian Moreno-Schneider | Dimitris Galanis | Penny Labropoulou | Miltos Deligiannis | Katerina Gkirtzou | Athanasia Kolovou | Dimitris Gkoumas | Leon Voukoutis | Ian Roberts | Jana Hamrlova | Dusan Varis | Lukas Kacena | Khalid Choukri | Valérie Mapelli | Mickaël Rigault | Julija Melnika | Miro Janosik | Katja Prinz | Andres Garcia-Silva | Cristian Berrio | Ondrej Klejch | Steve Renals
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Europe is a multilingual society, in which dozens of languages are spoken. The only option to enable and to benefit from multilingualism is through Language Technologies (LT), i.e., Natural Language Processing and Speech Technologies. We describe the European Language Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella platform for the European LT landscape, including research and industry, enabling all stakeholders to upload, share and distribute their services, products and resources. At the end of our EU project, which will establish a legal entity in 2022, the ELG will provide access to approx. 1300 services for all European languages as well as thousands of data sets.
The University of Edinburgh’s Submission to the IWSLT21 Simultaneous Translation Task
Sukanta Sen | Ulrich Germann | Barry Haddow
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Sukanta Sen | Ulrich Germann | Barry Haddow
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
We describe our submission to the IWSLT 2021 shared task on simultaneous text-to-text English-German translation. Our system is based on the re-translation approach where the agent re-translates the whole source prefix each time it receives a new source token. This approach has the advantage of being able to use a standard neural machine translation (NMT) inference engine with beam search, however, there is a risk that incompatibility between successive re-translations will degrade the output. To improve the quality of the translations, we experiment with various approaches: we use a fixed size wait at the beginning of the sentence, we use a language model score to detect translatable units, and we apply dynamic masking to determine when the translation is unstable. We find that a combination of dynamic masking and language model score obtains the best latency-quality trade-off.
The University of Edinburgh’s English-German and English-Hausa Submissions to the WMT21 News Translation Task
Pinzhen Chen | Jindřich Helcl | Ulrich Germann | Laurie Burchell | Nikolay Bogoychev | Antonio Valerio Miceli Barone | Jonas Waldendorf | Alexandra Birch | Kenneth Heafield
Proceedings of the Sixth Conference on Machine Translation
Pinzhen Chen | Jindřich Helcl | Ulrich Germann | Laurie Burchell | Nikolay Bogoychev | Antonio Valerio Miceli Barone | Jonas Waldendorf | Alexandra Birch | Kenneth Heafield
Proceedings of the Sixth Conference on Machine Translation
This paper presents the University of Edinburgh’s constrained submissions of English-German and English-Hausa systems to the WMT 2021 shared task on news translation. We build En-De systems in three stages: corpus filtering, back-translation, and fine-tuning. For En-Ha we use an iterative back-translation approach on top of pre-trained En-De models and investigate vocabulary embedding mapping.
2020
Character Mapping and Ad-hoc Adaptation: Edinburgh’s IWSLT 2020 Open Domain Translation System
Pinzhen Chen | Nikolay Bogoychev | Ulrich Germann
Proceedings of the 17th International Conference on Spoken Language Translation
Pinzhen Chen | Nikolay Bogoychev | Ulrich Germann
Proceedings of the 17th International Conference on Spoken Language Translation
This paper describes the University of Edinburgh’s neural machine translation systems submitted to the IWSLT 2020 open domain Japanese↔Chinese translation task. On top of commonplace techniques like tokenisation and corpus cleaning, we explore character mapping and unsupervised decoding-time adaptation. Our techniques focus on leveraging the provided data, and we show the positive impact of each technique through the gradual improvement of BLEU.
European Language Grid: An Overview
Georg Rehm | Maria Berger | Ela Elsholz | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Stelios Piperidis | Miltos Deligiannis | Dimitris Galanis | Katerina Gkirtzou | Penny Labropoulou | Kalina Bontcheva | David Jones | Ian Roberts | Jan Hajič | Jana Hamrlová | Lukáš Kačena | Khalid Choukri | Victoria Arranz | Andrejs Vasiļjevs | Orians Anvari | Andis Lagzdiņš | Jūlija Meļņika | Gerhard Backfried | Erinç Dikici | Miroslav Janosik | Katja Prinz | Christoph Prinz | Severin Stampler | Dorothea Thomas-Aniola | José Manuel Gómez-Pérez | Andres Garcia Silva | Christian Berrío | Ulrich Germann | Steve Renals | Ondrej Klejch
Proceedings of the Twelfth Language Resources and Evaluation Conference
Georg Rehm | Maria Berger | Ela Elsholz | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Stelios Piperidis | Miltos Deligiannis | Dimitris Galanis | Katerina Gkirtzou | Penny Labropoulou | Kalina Bontcheva | David Jones | Ian Roberts | Jan Hajič | Jana Hamrlová | Lukáš Kačena | Khalid Choukri | Victoria Arranz | Andrejs Vasiļjevs | Orians Anvari | Andis Lagzdiņš | Jūlija Meļņika | Gerhard Backfried | Erinç Dikici | Miroslav Janosik | Katja Prinz | Christoph Prinz | Severin Stampler | Dorothea Thomas-Aniola | José Manuel Gómez-Pérez | Andres Garcia Silva | Christian Berrío | Ulrich Germann | Steve Renals | Ondrej Klejch
Proceedings of the Twelfth Language Resources and Evaluation Conference
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented – by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 national competence centres and the European LT Council for outreach and coordination purposes.
Speed-optimized, Compact Student Models that Distill Knowledge from a Larger Teacher Model: the UEDIN-CUNI Submission to the WMT 2020 News Translation Task
Ulrich Germann | Roman Grundkiewicz | Martin Popel | Radina Dobreva | Nikolay Bogoychev | Kenneth Heafield
Proceedings of the Fifth Conference on Machine Translation
Ulrich Germann | Roman Grundkiewicz | Martin Popel | Radina Dobreva | Nikolay Bogoychev | Kenneth Heafield
Proceedings of the Fifth Conference on Machine Translation
We describe the joint submission of the University of Edinburgh and Charles University, Prague, to the Czech/English track in the WMT 2020 Shared Task on News Translation. Our fast and compact student models distill knowledge from a larger, slower teacher. They are designed to offer a good trade-off between translation quality and inference efficiency. On the WMT 2020 Czech ↔ English test sets, they achieve translation speeds of over 700 whitespace-delimited source words per second on a single CPU thread, thus making neural translation feasible on consumer hardware without a GPU.
The University of Edinburgh’s submission to the German-to-English and English-to-German Tracks in the WMT 2020 News Translation and Zero-shot Translation Robustness Tasks
Ulrich Germann
Proceedings of the Fifth Conference on Machine Translation
Ulrich Germann
Proceedings of the Fifth Conference on Machine Translation
This paper describes the University of Edinburgh’s submission of German <-> English systems to the WMT2020 Shared Tasks on News Translation and Zero-shot Robustness.
2019
The University of Edinburgh’s Submissions to the WMT19 News Translation Task
Rachel Bawden | Nikolay Bogoychev | Ulrich Germann | Roman Grundkiewicz | Faheem Kirefu | Antonio Valerio Miceli Barone | Alexandra Birch
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Rachel Bawden | Nikolay Bogoychev | Ulrich Germann | Roman Grundkiewicz | Faheem Kirefu | Antonio Valerio Miceli Barone | Alexandra Birch
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English↔Gujarati, English↔Chinese, German→English, and English→Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English↔Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German→English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English→Czech, we compared different preprocessing and tokenisation regimes.
2018
The SUMMA Platform: Scalable Understanding of Multilingual Media
Ulrich Germann | Peggy van der Kreeft | Guntis Barzdins | Alexandra Birch
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Ulrich Germann | Peggy van der Kreeft | Guntis Barzdins | Alexandra Birch
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
We present the latest version of the SUMMA platform, an open-source software platform for monitoring and interpreting multi-lingual media, from written news published on the internet to live media broadcasts via satellite or internet streaming.
The SUMMA Platform: A Scalable Infrastructure for Multi-lingual Multi-media Monitoring
Ulrich Germann | Renārs Liepins | Guntis Barzdins | Didzis Gosko | Sebastião Miranda | David Nogueira
Proceedings of ACL 2018, System Demonstrations
Ulrich Germann | Renārs Liepins | Guntis Barzdins | Didzis Gosko | Sebastião Miranda | David Nogueira
Proceedings of ACL 2018, System Demonstrations
The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. The Platform offers a fully automated media ingestion pipeline capable of recording live broadcasts, detection and transcription of spoken content, translation of all text (original or transcribed) into English, recognition and linking of Named Entities, topic detection, clustering and cross-lingual multi-document summarization of related media items, and last but not least, extraction and storage of factual claims in these news items. Browser-based graphical user interfaces provide humans with aggregated information as well as structured access to individual news items stored in the Platform’s database. This paper describes the intended use cases and provides an overview over the system’s implementation.
Marian: Fast Neural Machine Translation in C++
Marcin Junczys-Dowmunt | Roman Grundkiewicz | Tomasz Dwojak | Hieu Hoang | Kenneth Heafield | Tom Neckermann | Frank Seide | Ulrich Germann | Alham Fikri Aji | Nikolay Bogoychev | André F. T. Martins | Alexandra Birch
Proceedings of ACL 2018, System Demonstrations
Marcin Junczys-Dowmunt | Roman Grundkiewicz | Tomasz Dwojak | Hieu Hoang | Kenneth Heafield | Tom Neckermann | Frank Seide | Ulrich Germann | Alham Fikri Aji | Nikolay Bogoychev | André F. T. Martins | Alexandra Birch
Proceedings of ACL 2018, System Demonstrations
We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs. Marian is written entirely in C++. We describe the design of the encoder-decoder framework and demonstrate that a research-friendly toolkit can achieve high training and translation speed.
Integrating Multiple NLP Technologies into an Open-source Platform for Multilingual Media Monitoring
Ulrich Germann | Renārs Liepins | Didzis Gosko | Guntis Barzdins
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)
Ulrich Germann | Renārs Liepins | Didzis Gosko | Guntis Barzdins
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)
The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. It assembles numerous state-of-the-art NLP technologies into a fully automated media ingestion pipeline that can record live broadcasts, detect and transcribe spoken content, translate from several languages (original text or transcribed speech) into English, recognize Named Entities, detect topics, cluster and summarize documents across language barriers, and extract and store factual claims in these news items. This paper describes the intended use cases and discusses the system design decisions that allowed us to integrate state-of-the-art NLP modules into an effective workflow with comparatively little effort.
The University of Edinburgh’s Submissions to the WMT18 News Translation Task
Barry Haddow | Nikolay Bogoychev | Denis Emelin | Ulrich Germann | Roman Grundkiewicz | Kenneth Heafield | Antonio Valerio Miceli Barone | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Barry Haddow | Nikolay Bogoychev | Denis Emelin | Ulrich Germann | Roman Grundkiewicz | Kenneth Heafield | Antonio Valerio Miceli Barone | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs. We introduce new RNN-variant, mixed RNN/Transformer ensembles, data selection and weighting, and extensions to back-translation.
2017
Regularization techniques for fine-tuning in neural machine translation
Antonio Valerio Miceli Barone | Barry Haddow | Ulrich Germann | Rico Sennrich
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Antonio Valerio Miceli Barone | Barry Haddow | Ulrich Germann | Rico Sennrich
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset. In this scenario, overfitting is a major challenge. We investigate a number of techniques to reduce overfitting and improve transfer learning, including regularization techniques such as dropout and L2-regularization towards an out-of-domain prior. In addition, we introduce tuneout, a novel regularization technique inspired by dropout. We apply these techniques, alone and in combination, to neural machine translation, obtaining improvements on IWSLT datasets for English→German and English→Russian. We also investigate the amounts of in-domain training data needed for domain adaptation in NMT, and find a logarithmic relationship between the amount of training data and gain in BLEU score.
The SUMMA Platform Prototype
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.
The University of Edinburgh’s Neural MT Systems for WMT17
Rico Sennrich | Alexandra Birch | Anna Currey | Ulrich Germann | Barry Haddow | Kenneth Heafield | Antonio Valerio Miceli Barone | Philip Williams
Proceedings of the Second Conference on Machine Translation
Rico Sennrich | Alexandra Birch | Anna Currey | Ulrich Germann | Barry Haddow | Kenneth Heafield | Antonio Valerio Miceli Barone | Philip Williams
Proceedings of the Second Conference on Machine Translation
2016
Bilingual Document Alignment with Latent Semantic Indexing
Ulrich Germann
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Ulrich Germann
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
2014
Dynamic phrase tables for machine translation in an interactive post-editing scenario
Ulrich Germann
Workshop on interactive and adaptive machine translation
Ulrich Germann
Workshop on interactive and adaptive machine translation
This paper presents a phrase table implementation for the Moses system that computes phrase table entries for phrase-based statistical machine translation (PBSMT) on demand by sampling an indexed bitext. While this approach has been used for years in hierarchical phrase-based translation, the PBSMT community has been slow to adopt this paradigm, due to concerns that this would be slow and lead to lower translation quality. The experiments conducted in the course of this work provide evidence to the contrary: without loss in translation quality, the sampling phrase table ranks second out of four in terms of speed, being slightly slower than hash table look-up (Junczys-Dowmunt, 2012) and considerably faster than current implementations of the approach suggested by Zens and Ney (2007). In addition, the underlying parallel corpus can be updated in real time, so that professionally produced translations can be used to improve the quality of the machine translation engine immediately.
The MateCat Tool
Marcello Federico | Nicola Bertoldi | Mauro Cettolo | Matteo Negri | Marco Turchi | Marco Trombetti | Alessandro Cattelan | Antonio Farina | Domenico Lupinetti | Andrea Martines | Alberto Massidda | Holger Schwenk | Loïc Barrault | Frederic Blain | Philipp Koehn | Christian Buck | Ulrich Germann
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations
Marcello Federico | Nicola Bertoldi | Mauro Cettolo | Matteo Negri | Marco Turchi | Marco Trombetti | Alessandro Cattelan | Antonio Farina | Domenico Lupinetti | Andrea Martines | Alberto Massidda | Holger Schwenk | Loïc Barrault | Frederic Blain | Philipp Koehn | Christian Buck | Ulrich Germann
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations
CASMACAT: A Computer-assisted Translation Workbench
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
Ulrich Germann | Michael Carl | Philipp Koehn | Germán Sanchis-Trilles | Francisco Casacuberta | Robin Hill | Sharon O’Brien
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
Ulrich Germann | Michael Carl | Philipp Koehn | Germán Sanchis-Trilles | Francisco Casacuberta | Robin Hill | Sharon O’Brien
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
The Impact of Machine Translation Quality on Human Post-Editing
Philipp Koehn | Ulrich Germann
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
Philipp Koehn | Ulrich Germann
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
2013
The Feasibility of HMEANT as a Human MT Evaluation Metric
Alexandra Birch | Barry Haddow | Ulrich Germann | Maria Nadejde | Christian Buck | Philipp Koehn
Proceedings of the Eighth Workshop on Statistical Machine Translation
Alexandra Birch | Barry Haddow | Ulrich Germann | Maria Nadejde | Christian Buck | Philipp Koehn
Proceedings of the Eighth Workshop on Statistical Machine Translation
Two Approaches to Correcting Homophone Confusions in a Hybrid Machine Translation System
Pierrette Bouillon | Johanna Gerlach | Ulrich Germann | Barry Haddow | Manny Rayner
Proceedings of the Second Workshop on Hybrid Approaches to Translation
Pierrette Bouillon | Johanna Gerlach | Ulrich Germann | Barry Haddow | Manny Rayner
Proceedings of the Second Workshop on Hybrid Approaches to Translation
2012
Syntax-aware Phrase-based Statistical Machine Translation: System Description
Ulrich Germann
Proceedings of the Seventh Workshop on Statistical Machine Translation
Ulrich Germann
Proceedings of the Seventh Workshop on Statistical Machine Translation
2010
Lessons from NRC’s Portage System at WMT 2010
Samuel Larkin | Boxing Chen | George Foster | Ulrich Germann | Eric Joanis | Howard Johnson | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Samuel Larkin | Boxing Chen | George Foster | Ulrich Germann | Eric Joanis | Howard Johnson | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
2009
PortageLive: delivering machine translation technology via virtualization
Patrick Paul | Samuel Larkin | Ulrich Germann | Eric Joanis | Roland Kuhn
Proceedings of Machine Translation Summit XII: Plenaries
Patrick Paul | Samuel Larkin | Ulrich Germann | Eric Joanis | Roland Kuhn
Proceedings of Machine Translation Summit XII: Plenaries
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Ulrich Germann | Chirag Shah | Svetlana Stoyanchev | Carolyn Penstein Rosé | Anoop Sarkar
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Ulrich Germann | Chirag Shah | Svetlana Stoyanchev | Carolyn Penstein Rosé | Anoop Sarkar
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too
Ulrich Germann | Eric Joanis | Samuel Larkin
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)
Ulrich Germann | Eric Joanis | Samuel Larkin
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)
2008
2007
Two Tools for Creating and Visualizing Sub-sentential Alignments of Parallel Text
Ulrich Germann
Proceedings of the Linguistic Annotation Workshop
Ulrich Germann
Proceedings of the Linguistic Annotation Workshop
2003
Greedy Decoding for Statistical Machine Translation in Almost Linear Time
Ulrich Germann
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics
Ulrich Germann
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics
2001
Fast Decoding and Optimal Decoding for Machine Translation
Ulrich Germann | Michael Jahr | Kevin Knight | Daniel Marcu | Kenji Yamada
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics
Ulrich Germann | Michael Jahr | Kevin Knight | Daniel Marcu | Kenji Yamada
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics
Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect?
Ulrich Germann
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation
Ulrich Germann
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation
1999
A deterministic dependency parser for Japanese
Ulrich Germann
Proceedings of Machine Translation Summit VII
Ulrich Germann
Proceedings of Machine Translation Summit VII
We present a rule-based, deterministic dependency parser for Japanese. It was implemented in C++, using object classes that reflect linguistic concepts and thus facilitate the transfer of linguistic intuitions into code. The parser first chunks morphemes into one-word phrases and then parses from the right to the left. The average parsing accuracy is 83.6%.
1998
Making semantic interpretation parser-independent
Ulrich Germann
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers
Ulrich Germann
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers
We present an approach to semantic interpretation of syntactically parsed Japanese sentences that works largely parser-independent. The approach relies on a standardized parse tree format that restricts the number of syntactic configurations that the semantic interpretation rules have to anticipate. All parse trees are converted to this format prior to semantic interpretation. This setup allows us not only to apply the same set of semantic interpretation rules to output from different parsers, but also to independently develop parsers and semantic interpretation rules.
Search
Fix author
Co-authors
- Alexandra Birch 7
- Nikolay Bogoychev 6
- Barry Haddow 6
- Antonio Valerio Miceli-Barone 6
- Kenneth Heafield 5
- Philipp Koehn 5
- Guntis Barzdins 4
- Roman Grundkiewicz 4
- Rico Sennrich 4
- Christian Buck 3
- Eric Joanis 3
- Ondřej Klejch 3
- Samuel Larkin 3
- Renārs Liepins 3
- Steve Renals 3
- Victoria Arranz 2
- Gerhard Backfried 2
- Kalina Bontcheva 2
- Michael Carl 2
- Francisco Casacuberta 2
- Pinzhen Chen 2
- Khalid Choukri 2
- Shay B. Cohen 2
- Miltos Deligiannis 2
- Tomasz Dwojak 2
- Dimitrios Galanis 2
- Andres Garcia-Silva 2
- Katerina Gkirtzou 2
- Didzis Gosko 2
- José Manuel Gómez-Pérez 2
- Jan Hajic 2
- Jana Hamrlová 2
- Stefanie Hegele 2
- Robin L. Hill 2
- Marcin Junczys-Dowmunt 2
- Lukáš Kačena 2
- Florian Kintzel 2
- Roland Kuhn 2
- Penny Labropoulou 2
- Katrin Marheinecke 2
- Jūlija Meļņika 2
- Sebastião Miranda 2
- David Nogueira 2
- Stelios Piperidis 2
- Katja Prinz 2
- Georg Rehm 2
- Ian Roberts 2
- Germán Sanchis-Trilles 2
- Andrejs Vasiļjevs 2
- Peggy van der Kreeft 2
- Ahmed Abdelali 1
- Alham Fikri Aji 1
- Vicent Alabau 1
- Ahmed Ali 1
- Mariana S. C. Almeida 1
- Orians Anvari 1
- Pedro Balage Filho 1
- Fazl Barez 1
- Loic Barrault 1
- Rachel Bawden 1
- Peter Bell 1
- Maria Berger 1
- Cristian Berrio 1
- Christian Berrío 1
- Nicola Bertoldi 1
- Frédéric Blain 1
- Pierrette Bouillon 1
- Hervé Bourlard 1
- Laurie Burchell 1
- Rémi Calizzano 1
- Alessandro Cattelan 1
- Mauro Cettolo 1
- Boxing Chen 1
- Anna Currey 1
- Fahim Dalvi 1
- Marco Damonte 1
- Erinç Dikici 1
- Radina Dobreva 1
- Nadir Durrani 1
- Ela Elsholz 1
- Denis Emelin 1
- Antonio Farina 1
- Marcello Federico 1
- Nils Feldhus 1
- George Foster 1
- Mercedes García-Martínez 1
- Philip N. Garner 1
- Johanna Gerlach 1
- Andreas Giefer 1
- Dimitris Gkoumas 1
- Jesús González-Rubio 1
- Jindřich Helcl 1
- Chris Hernon 1
- Hieu Hoang 1
- Hina Imran 1
- Michael E. Jahr 1
- Miroslav Janosik 1
- Miro Janosik 1
- Howard Johnson 1
- David Jones 1
- Clive Jones 1
- Sameer Khurana 1
- Faheem Kirefu 1
- Kevin Knight 1
- Athanasia Kolovou 1
- Andis Lagzdiņš 1
- Alexandros Lazaridis 1
- Luis A. Leiva 1
- Michal Lukasik 1
- Domenico Lupinetti 1
- Valérie Mapelli 1
- Daniel Marcu 1
- Andrea Martines 1
- André F. T. Martins 1
- Alberto Massidda 1
- Alfonso Mendes 1
- Bartolomé Mesa-Lao 1
- Lesly Miculicich Werlen 1
- Jeff Mitchell 1
- Julian Moreno Schneider 1
- Maria Nadejde 1
- Shashi Narayan 1
- Tom Neckermann 1
- Matteo Negri 1
- Abiola Obamuyide 1
- Daniel Ortiz-Martínez 1
- Sharon O’Brien 1
- Nikos Papasarantopoulos 1
- Nikolaos Pappas 1
- Patrick Paul 1
- Martin Popel 1
- Andrei Popescu-Belis 1
- João Prieto 1
- Christoph Prinz 1
- Manny Rayner 1
- Sebastian Riedel 1
- Mickaël Rigault 1
- Carolyn Rose 1
- Herve Saint-Amand 1
- Hassan Sajjad 1
- Anoop Sarkar 1
- Holger Schwenk 1
- Frank Seide 1
- Sukanta Sen 1
- Chirag Shah 1
- David Sheppey 1
- Severin Stampler 1
- Svetlana Stoyanchev 1
- Dorothea Thomas-Aniola 1
- Sibo Tong 1
- Marco Trombetti 1
- Chara Tsoukala 1
- Marco Turchi 1
- Dusan Varis 1
- Andreas Vlachos 1
- Stephan Vogel 1
- Elena Voita 1
- Leon Voukoutis 1
- Jonas Waldendorf 1
- Yang Wang 1
- Susanne Weber 1
- Philip Williams 1
- Kenji Yamada 1