Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing

Anastasia Shimorina; Elena Khasanova; Claire Gardent

doi:10.18653/v1/W19-3706

Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing

Anastasia Shimorina, Elena Khasanova, Claire Gardent

Abstract

In this paper, we propose an approach for semi-automatically creating a data-to-text (D2T) corpus for Russian that can be used to learn a D2T natural language generation model. An error analysis of the output of an English-to-Russian neural machine translation system shows that 80% of the automatically translated sentences contain an error and that 53% of all translation errors bear on named entities (NE). We therefore focus on named entities and introduce two post-editing techniques for correcting wrongly translated NEs.

Anthology ID:: W19-3706
Volume:: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Month:: August
Year:: 2019
Address:: Florence, Italy
Editors:: Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 44–49
Language:
URL:: https://aclanthology.org/W19-3706
DOI:: 10.18653/v1/W19-3706
Bibkey:
Cite (ACL):: Anastasia Shimorina, Elena Khasanova, and Claire Gardent. 2019. Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 44–49, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing (Shimorina et al., BSNLP 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-volume-bibkeys/W19-3706.pdf
Code: shimorina/bsnlp-2019
Data: WebNLG

PDF Search Code