Locality-Sensitive Hashing for Long Context Neural Machine Translation
Frithjof Petrick, Jan Rosendahl, Christian Herold, Hermann Ney
Abstract
After its introduction the Transformer architecture quickly became the gold standard for the task of neural machine translation. A major advantage of the Transformer compared to previous architectures is the faster training speed achieved by complete parallelization across timesteps due to the use of attention over recurrent layers. However, this also leads to one of the biggest problems of the Transformer, namely the quadratic time and memory complexity with respect to the input length. In this work we adapt the locality-sensitive hashing approach of Kitaev et al. (2020) to self-attention in the Transformer, we extended it to cross-attention and apply this memory efficient framework to sentence- and document-level machine translation. Our experiments show that the LSH attention scheme for sentence-level comes at the cost of slightly reduced translation quality. For document-level NMT we are able to include much bigger context sizes than what is possible with the baseline Transformer. However, more context does neither improve translation quality nor improve scores on targeted test suites.- Anthology ID:
- 2022.iwslt-1.4
- Volume:
- Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland (in-person and online)
- Editors:
- Elizabeth Salesky, Marcello Federico, Marta Costa-jussà
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 32–42
- Language:
- URL:
- https://aclanthology.org/2022.iwslt-1.4
- DOI:
- 10.18653/v1/2022.iwslt-1.4
- Cite (ACL):
- Frithjof Petrick, Jan Rosendahl, Christian Herold, and Hermann Ney. 2022. Locality-Sensitive Hashing for Long Context Neural Machine Translation. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 32–42, Dublin, Ireland (in-person and online). Association for Computational Linguistics.
- Cite (Informal):
- Locality-Sensitive Hashing for Long Context Neural Machine Translation (Petrick et al., IWSLT 2022)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2022.iwslt-1.4.pdf