Kristýna Neumannová

Also published as: Kristyna Neumannova


2023

pdf
The Role of Compounds in Human vs. Machine Translation Quality
Kristyna Neumannova | Ondřej Bojar
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

We focus on the production of German compounds in English-to-German manual and automatic translation. On the example of WMT21 news translation test set, we observe that even the best MT systems produce much fewer compounds compared to three independent manual translations. Despite this striking difference, we observe that this insufficiency is not apparent in manual evaluation methods that target the overall translation quality (DA and MQM). Simple automatic methods like BLEU somewhat surprisingly provide a better indication of this quality aspect. Our manual analysis of system outputs, including our freshly trained Transformer models, confirms that current deep neural systems operating at the level of subword units are capable of constructing novel words, including novel compounds. This effect however cannot be measured using static dictionaries of compounds such as GermaNet. German compounds thus pose an interesting challenge for future development of MT systems.

2022

pdf
CUNI Submission to the BUCC 2022 Shared Task on Bilingual Term Alignment
Borek Požár | Klára Tauchmanová | Kristýna Neumannová | Ivana Kvapilíková | Ondřej Bojar
Proceedings of the BUCC Workshop within LREC 2022

We present our submission to the BUCC Shared Task on bilingual term alignment in comparable specialized corpora. We devised three approaches using static embeddings with post-hoc alignment, the Monoses pipeline for unsupervised phrase-based machine translation, and contextualized multilingual embeddings. We show that contextualized embeddings from pretrained multilingual models lead to similar results as static embeddings but further improvement can be achieved by task-specific fine-tuning. Retrieving term pairs from the running phrase tables of the Monoses systems can match this enhanced performance and leads to an average precision of 0.88 on the train set.