Ashleigh Richardson
2022
A Systematic Study Reveals Unexpected Interactions in Pre-Trained Neural Machine Translation
Ashleigh Richardson
|
Janet Wiles
Proceedings of the Thirteenth Language Resources and Evaluation Conference
A significant challenge in developing translation systems for the world’s ∼7,000 languages is that very few have sufficient data for state-of-the-art techniques. Transfer learning is a promising direction for low-resource neural machine translation (NMT), but introduces many new variables which are often selected through ablation studies, costly trial-and-error, or niche expertise. When pre-training an NMT system for low-resource translation, the pre-training task is often chosen based on data abundance and similarity to the main task. Factors such as dataset sizes and similarity have typically been analysed independently in previous studies, due to the computational cost associated with systematic studies. However, these factors are not independent. We conducted a three-factor experiment to examine how language similarity, pre-training dataset size and main dataset size interacted in their effect on performance in pre-trained transformer-based low-resource NMT. We replicated the common finding that more data was beneficial in bilingual systems, but also found a statistically significant interaction between the three factors, which reduced the effectiveness of large pre-training datasets for some main task dataset sizes (p-value < 0.0018). The surprising trends identified in these interactions indicate that systematic studies of interactions may be a promising long-term direction for guiding research in low-resource neural methods.