EdiT5: Semi-Autoregressive Text Editing with T5 Warm-Start
Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn
Abstract
We present EdiT5 - a novel semi-autoregressive text-editing approach designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster at inference times than conventional sequence-to-sequence (seq2seq) models, while being capable of modeling flexible input-output transformations.This is achieved by decomposing the generation process into three sub-tasks: (1) tagging to decide on the subset of input tokens to be preserved in the output, (2) re-ordering to define their order in the output text, and (3) insertion to infill the missing tokens that are not present in the input. The tagging and re-ordering steps, which are responsible for generating the largest portion of the output, are non-autoregressive, while the insertion uses an autoregressive decoder.Depending on the task, EdiT5 requires significantly fewer autoregressive steps demonstrating speedups of up to 25x when compared to classic seq2seq models. Quality-wise, EdiT5 is initialized with a pre-trained T5 checkpoint yielding comparable performance to T5 in high-resource settings and clearly outperforms it on low-resource settings when evaluated on three NLG tasks: Sentence Fusion, Grammatical Error Correction, and Decontextualization.- Anthology ID:
- 2022.findings-emnlp.156
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2126–2138
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2022.findings-emnlp.156/
- DOI:
- 10.18653/v1/2022.findings-emnlp.156
- Cite (ACL):
- Jonathan Mallinson, Jakub Adamek, Eric Malmi, and Aliaksei Severyn. 2022. EdiT5: Semi-Autoregressive Text Editing with T5 Warm-Start. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2126–2138, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- EdiT5: Semi-Autoregressive Text Editing with T5 Warm-Start (Mallinson et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2022.findings-emnlp.156.pdf