Abstract
Transformers are the predominant model for machine translation. Recent works also showed that a single Transformer model can be trained to learn translation for multiple different language pairs, achieving promising results. In this work, we investigate how the multilingual Transformer model pays attention for translating different language pairs. We first performed automatic pruning to eliminate a large number of noisy heads and then analyzed the functions and behaviors of the remaining heads in both self-attention and cross-attention. We find that different language pairs, in spite of having different syntax and word orders, tended to share the same heads for the same functions, such as syntax heads and reordering heads. However, the different characteristics of different language pairs clearly caused interference in function heads and affected head accuracies. Additionally, we reveal an interesting behavior of the Transformer cross-attention: the deep-layer cross-attention heads work in a clear cooperative way to learn different options for word reordering, which can be caused by the nature of translation tasks having multiple different gold translations in the target language for the same source sentence.- Anthology ID:
- 2023.wmt-1.45
- Volume:
- Proceedings of the Eighth Conference on Machine Translation
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 496–506
- Language:
- URL:
- https://aclanthology.org/2023.wmt-1.45
- DOI:
- 10.18653/v1/2023.wmt-1.45
- Cite (ACL):
- Jingyi Zhang, Gerard de Melo, Hongfei Xu, and Kehai Chen. 2023. A Closer Look at Transformer Attention for Multilingual Translation. In Proceedings of the Eighth Conference on Machine Translation, pages 496–506, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- A Closer Look at Transformer Attention for Multilingual Translation (Zhang et al., WMT 2023)
- PDF:
- https://preview.aclanthology.org/vimeo_vids_to_local/2023.wmt-1.45.pdf