Improving Text Auto-Completion with Next Phrase Prediction

Dong-Ho Lee, Zhiqiang Hu, Roy Ka-Wei Lee


Abstract
Language models such as GPT-2 have performed well on constructing syntactically sound sentences for text auto-completion tasks. However, such models often require considerable training effort to adapt to specific writing domains (e.g., medical). In this paper, we propose an intermediate training strategy to enhance pre-trained language models’ performance in the text auto-completion task and fastly adapt them to specific domains. Our strategy includes a novel self-supervised training objective called Next Phrase Prediction (NPP), which encourages a language model to complete the partial query with enriched phrases and eventually improve the model’s text auto-completion performance. Preliminary experiments have shown that our approach is able to outperform the baselines in auto-completion for email and academic-writing domains.
Anthology ID:
2021.findings-emnlp.378
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4434–4438
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.findings-emnlp.378/
DOI:
10.18653/v1/2021.findings-emnlp.378
Bibkey:
Cite (ACL):
Dong-Ho Lee, Zhiqiang Hu, and Roy Ka-Wei Lee. 2021. Improving Text Auto-Completion with Next Phrase Prediction. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4434–4438, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Improving Text Auto-Completion with Next Phrase Prediction (Lee et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.findings-emnlp.378.pdf
Video:
 https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.findings-emnlp.378.mp4