Abstract
Transformer-based autoregressive and non-autoregressive models have played an essential role in sequence generation tasks. The autoregressive model can obtain excellent performance, while the non-autoregressive model brings fast decoding speed for inference. In this paper, we propose JANUS, a Joint Autoregressive and Non-autoregressive training method using aUxiliary losS to enhance the model performance in both AR and NAR manner simultaneously and effectively alleviate the problem of distribution discrepancy.Further, we pre-train BART with JANUS on a large corpus with minimal cost (16 GPU days) and make the BART-JANUS capable of non-autoregressive generation, demonstrating that our approach can transfer the AR knowledge to NAR. Empirically, we show our approach and BART-JANUS can achieve significant improvement on multiple generation tasks, including machine translation and GLGE benchmarks. Our code is available at Github.- Anthology ID:
- 2022.emnlp-main.550
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8050–8060
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.550
- DOI:
- 10.18653/v1/2022.emnlp-main.550
- Cite (ACL):
- Xiaobo Liang, Lijun Wu, Juntao Li, and Min Zhang. 2022. JANUS: Joint Autoregressive and Non-autoregressive Training with Auxiliary Loss for Sequence Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8050–8060, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- JANUS: Joint Autoregressive and Non-autoregressive Training with Auxiliary Loss for Sequence Generation (Liang et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.emnlp-main.550.pdf