Nicola Ueffing

2024

At eBay, we are automatically generating a large amount of natural language titles for eCommerce browse pages using machine translation (MT) technology. While automatic approaches can generate millions of titles very fast, they are prone to errors. We therefore develop quality estimation (QE) methods which can automatically detect titles with low quality in order to prevent them from going live. In this paper, we present different approaches: The first one is a Random Forest (RF) model that explores hand-crafted, robust features, which are a mix of established features commonly used in Machine Translation Quality Estimation (MTQE) and new features developed specifically for our task. The second model is based on Siamese Networks (SNs) which embed the metadata input sequence and the generated title in the same space and do not require hand-crafted features at all. We thoroughly evaluate and compare those approaches on in-house data. While the RF models are competitive for scenarios with smaller amounts of training data and somewhat more robust, they are clearly outperformed by the SN models when the amount of training data is larger.

pdf bib abs

Multi-lingual neural title generation for e-Commerce browse pages
Prashant Mathur | Nicola Ueffing | Gregor Leusch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

To provide better access of the inventory to buyers and better search engine optimization, e-Commerce websites are automatically generating millions of browse pages. A browse page consists of a set of slot name/value pairs within a given category, grouping multiple items which share some characteristics. These browse pages require a title describing the content of the page. Since the number of browse pages are huge, manual creation of these titles is infeasible. Previous statistical and neural approaches depend heavily on the availability of large amounts of data in a language. In this research, we apply sequence-to-sequence models to generate titles for high-resource as well as low-resource languages by leveraging transfer learning. We train these models on multi-lingual data, thereby creating one joint model which can generate titles in various different languages. Performance of the title generation system is evaluated on three different languages; English, German, and French, with a particular focus on low-resourced French language.

pdf bib

Tutorial: Corpora Quality Management for MT - Practices and Roles
Silvio Picinini | Pete Smith | Nicola Ueffing
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

pdf bib

Automatic Post-Editing and Machine Translation Quality Estimation at eBay
Nicola Ueffing
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

2017

pdf bib

A detailed investigation of Bias Errors in Post-editing of MT output
Silvio Picinini | Nicola Ueffing
Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track

pdf bib abs

Generating titles for millions of browse pages on an e-Commerce site
Prashant Mathur | Nicola Ueffing | Gregor Leusch
Proceedings of the 10th International Conference on Natural Language Generation

We present two approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish. These browse pages are structured search pages in an e-commerce domain. We first present a rule-based approach to generate these browse page titles. In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title. For the two languages English and German we have access to a large amount of already available rule-based generated and curated titles. For these languages we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles.

2008

pdf bib

2007

pdf bib

Word-Level Confidence Estimation for Machine Translation
Nicola Ueffing | Hermann Ney
Computational Linguistics, Volume 33, Number 1, March 2007

pdf bib

Transductive learning for statistical machine translation
Nicola Ueffing | Gholamreza Haffari | Anoop Sarkar
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib

NRC‘s PORTAGE System for WMT 2007
Nicola Ueffing | Michel Simard | Samuel Larkin | Howard Johnson
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib

Rule-Based Translation with Statistical Phrase-Based Post-Editing
Michel Simard | Nicola Ueffing | Pierre Isabelle | Roland Kuhn
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib

Using monolingual source-language data to improve MT performance
Nicola Ueffing
Proceedings of the Third International Workshop on Spoken Language Translation: Papers

pdf bib

Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment
Evgeny Matusov | Nicola Ueffing | Hermann Ney
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib

CDER: Efficient MT Evaluation Using Block Movements
Gregor Leusch | Nicola Ueffing | Hermann Ney
11th Conference of the European Chapter of the Association for Computational Linguistics

2005

pdf bib

Application of word-level confidence measures in interactive statistical machine translation
Nicola Ueffing | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib

Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models
Nicola Ueffing | Hermann Ney
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib

Preprocessing and Normalization for Automatic Evaluation of Machine Translation
Gregor Leusch | Nicola Ueffing | David Vilar | Hermann Ney
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

2004

pdf bib

2003

pdf bib abs

A novel string-to-string distance measure with applications to machine translation evaluation
Gregor Leusch | Nicola Ueffing | Hermann Ney
Proceedings of Machine Translation Summit IX: Papers

We introduce a string-to-string distance measure which extends the edit distance by block transpositions as constant cost edit operation. An algorithm for the calculation of this distance measure in polynomial time is presented. We then demonstrate how this distance measure can be used as an evaluation criterion in machine translation. The correlation between this evaluation criterion and human judgment is systematically compared with that of other automatic evaluation measures on two translation tasks. In general, like other automatic evaluation measures, the criterion shows low correlation at sentence level, but good correlation at system level.

pdf bib abs

Confidence measures for statistical machine translation
Nicola Ueffing | Klaus Macherey | Hermann Ney
Proceedings of Machine Translation Summit IX: Papers

In this paper, we present several confidence measures for (statistical) machine translation. We introduce word posterior probabilities for words in the target sentence that can be determined either on a word graph or on an N best list. Two alternative confidence measures that can be calculated on N best lists are proposed. The performance of the measures is evaluated on two different translation tasks: on spontaneously spoken dialogues from the domain of appointment scheduling, and on a collection of technical manuals.

pdf bib

Using POS Information for SMT into Morphologically Rich Languages
Nicola Ueffing | Hermann Ney
10th Conference of the European Chapter of the Association for Computational Linguistics