Abstract
Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To expand multilingual resources, we compile a dataset of ~4k natural sentences containing idiomatic expressions in French, Finnish, and Japanese. To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. This not only improves the accuracy of a strong pretrained MT model on idiomatic sentences by up to 13% in absolute accuracy, but also holds potential benefits for non-idiomatic sentences.- Anthology ID:
- 2023.emnlp-main.933
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15095–15111
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.933
- DOI:
- 10.18653/v1/2023.emnlp-main.933
- Cite (ACL):
- Emmy Liu, Aditi Chaudhary, and Graham Neubig. 2023. Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15095–15111, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting (Liu et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2023.emnlp-main.933.pdf