Abstract
In this paper, we propose an approach for semi-automatically creating a data-to-text (D2T) corpus for Russian that can be used to learn a D2T natural language generation model. An error analysis of the output of an English-to-Russian neural machine translation system shows that 80% of the automatically translated sentences contain an error and that 53% of all translation errors bear on named entities (NE). We therefore focus on named entities and introduce two post-editing techniques for correcting wrongly translated NEs.- Anthology ID:
- W19-3706
- Volume:
- Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
- Venue:
- BSNLP
- SIG:
- SIGSLAV
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 44–49
- Language:
- URL:
- https://aclanthology.org/W19-3706
- DOI:
- 10.18653/v1/W19-3706
- Cite (ACL):
- Anastasia Shimorina, Elena Khasanova, and Claire Gardent. 2019. Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 44–49, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing (Shimorina et al., BSNLP 2019)
- PDF:
- https://preview.aclanthology.org/fix-volume-bibkeys/W19-3706.pdf
- Code
- shimorina/bsnlp-2019
- Data
- WebNLG