Non-autoregressive Streaming Transformer for Simultaneous Translation
Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, Yang Feng
Abstract
Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anticipation. We argue that such issue stems from the autoregressive architecture upon which most existing SiMT models are built. To address those issues, we propose non-autoregressive streaming Transformer (NAST) which comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism. We enable NAST to generate the blank token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and train it to maximize the non-monotonic latent alignment with an alignment-based latency loss. Experiments on various SiMT benchmarks demonstrate that NAST outperforms previous strong autoregressive SiMT baselines.- Anthology ID:
- 2023.emnlp-main.314
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5177–5190
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.314
- DOI:
- 10.18653/v1/2023.emnlp-main.314
- Cite (ACL):
- Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, and Yang Feng. 2023. Non-autoregressive Streaming Transformer for Simultaneous Translation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5177–5190, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Non-autoregressive Streaming Transformer for Simultaneous Translation (Ma et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.emnlp-main.314.pdf