Abstract
Data augmentation (DA) is a popular strategy to boost performance on neural machine translation tasks. The impact of data augmentation in low-resource environments, particularly for diverse and scarce languages, is understudied. In this paper, we introduce a simple yet novel metric to measure the impact of several different data augmentation strategies. This metric, which we call Data Augmentation Advantage (DAA), quantifies how many true data pairs a synthetic data pair is worth in a particular experimental context. We demonstrate the utility of this metric by training models for several linguistically-varied datasets using the data augmentation methods of back-translation, SwitchOut, and sentence concatenation. In lower-resource tasks, DAA is an especially valuable metric for comparing DA performance as it provides a more effective way to quantify gains when BLEU scores are especially small and results across diverse languages are more divergent and difficult to assess.- Anthology ID:
- 2023.loresmt-1.8
- Volume:
- Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jade Abbott, Jonathan Washington, Nathaniel Oco, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
- Venue:
- LoResMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 101–109
- Language:
- URL:
- https://aclanthology.org/2023.loresmt-1.8
- DOI:
- 10.18653/v1/2023.loresmt-1.8
- Cite (ACL):
- Annie Lamar and Zeyneb Kaya. 2023. Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT. In Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023), pages 101–109, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT (Lamar & Kaya, LoResMT 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2023.loresmt-1.8.pdf