Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation

Fei Huang, Pei Ke, Minlie Huang


Abstract
Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.1
Anthology ID:
2023.tacl-1.53
Volume:
Transactions of the Association for Computational Linguistics, Volume 11
Month:
Year:
2023
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
941–959
Language:
URL:
https://aclanthology.org/2023.tacl-1.53
DOI:
10.1162/tacl_a_00582
Bibkey:
Cite (ACL):
Fei Huang, Pei Ke, and Minlie Huang. 2023. Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation. Transactions of the Association for Computational Linguistics, 11:941–959.
Cite (Informal):
Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation (Huang et al., TACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2023.tacl-1.53.pdf