Abstract
A transformer model is used in general tasks such as pre-trained language models and specific tasks including machine translation. Such a model mainly relies on positional encodings (PEs) to handle the sequential order of input vectors. There are variations of PEs, such as absolute and relative, and several studies have reported on the superiority of relative PEs. In this paper, we focus on analyzing in which part of a transformer model PEs work and the different characteristics between absolute and relative PEs through a series of experiments. Experimental results indicate that PEs work in both self- and cross-attention blocks in a transformer model, and PEs should be added only to the query and key of an attention mechanism, not to the value. We also found that applying two PEs in combination, a relative PE in the self-attention block and an absolute PE in the cross-attention block, can improve translation quality.- Anthology ID:
- 2024.lrec-main.1478
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 17011–17018
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1478
- DOI:
- Cite (ACL):
- Taro Miyazaki, Hideya Mino, and Hiroyuki Kaneko. 2024. Understanding How Positional Encodings Work in Transformer Model. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17011–17018, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Understanding How Positional Encodings Work in Transformer Model (Miyazaki et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2024.lrec-main.1478.pdf