Abstract
Many contemporary NLP systems rely on neural decoders for text generation, which demonstrate an impressive ability to generate text approaching human fluency levels. However, in the case of neural machine translation networks, they often grapple with the production of repetitive content, also known as repetitive diction or word repetition, an aspect they weren’t explicitly trained to address. While not inherently negative, this repetition can make writing seem monotonous or awkward if not used intentionally for emphasis or stylistic purposes. This paper presents our submission to the WMT 2024 Non-Repetitive Translation Task, for which we adopt a repetition penalty method applied at learning inspired by the principles of label smoothing. No additional work is needed at inference time. We modify the ground-truth distribution to steer the model towards discouraging repetitions. Experiments show the ability of the proposed methods in reducing repetitions within neural machine translation engines, without compromising efficiency or translation quality.- Anthology ID:
- 2024.wmt-1.108
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1056–1062
- Language:
- URL:
- https://aclanthology.org/2024.wmt-1.108
- DOI:
- 10.18653/v1/2024.wmt-1.108
- Cite (ACL):
- Marko Avila and Josep Crego. 2024. SYSTRAN @ WMT24 Non-Repetitive Translation Task. In Proceedings of the Ninth Conference on Machine Translation, pages 1056–1062, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- SYSTRAN @ WMT24 Non-Repetitive Translation Task (Avila & Crego, WMT 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.wmt-1.108.pdf