Miriam Exel


2020

pdf bib
Incorporating External Annotation to improve Named Entity Translation in NMT
Maciej Modrzejewski | Miriam Exel | Bianka Buschbeck | Thanh-Le Ha | Alexander Waibel
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The correct translation of named entities (NEs) still poses a challenge for conventional neural machine translation (NMT) systems. This study explores methods incorporating named entity recognition (NER) into NMT with the aim to improve named entity translation. It proposes an annotation method that integrates named entities and inside–outside–beginning (IOB) tagging into the neural network input with the use of source factors. Our experiments on English→German and English→ Chinese show that just by including different NE classes and IOB tagging, we can increase the BLEU score by around 1 point using the standard test set from WMT2019 and achieve up to 12% increase in NE translation rates over a strong baseline.

pdf bib
Terminology-Constrained Neural Machine Translation at SAP
Miriam Exel | Bianka Buschbeck | Lauritz Brandt | Simona Doneva
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper examines approaches to bias a neural machine translation model to adhere to terminology constraints in an industrial setup. In particular, we investigate variations of the approach by Dinu et al. (2019), which uses inline annotation of the target terms in the source segment plus source factor embeddings during training and inference, and compare them to constrained decoding. We describe the challenges with respect to terminology in our usage scenario at SAP and show how far the investigated methods can help to overcome them. We extend the original study to a new language pair and provide an in-depth evaluation including an error classification and a human evaluation.

pdf bib
A Parallel Evaluation Data Set of Software Documentation with Document Structure Annotation
Bianka Buschbeck | Miriam Exel
Proceedings of the 7th Workshop on Asian Translation

This paper accompanies the software documentation data set for machine translation, a parallel evaluation data set of data originating from the SAP Help Portal, that we released to the machine translation community for research purposes. It offers the possibility to tune and evaluate machine translation systems in the domain of corporate software documentation and contributes to the availability of a wider range of evaluation scenarios. The data set comprises of the language pairs English to Hindi, Indonesian, Malay and Thai, and thus also increases the test coverage for the many low-resource language pairs. Unlike most evaluation data sets that consist of plain parallel text, the segments in this data set come with additional metadata that describes structural information of the document context. We provide insights into the origin and creation, the particularities and characteristics of the data set as well as machine translation results.