Abstract
Paraphrasing is an important task demonstrating the ability to abstract semantic content from its surface form. Recent literature on automatic paraphrasing is dominated by methods leveraging machine translation as an intermediate step. This contrasts with humans, who can paraphrase without necessarily being bilingual. This work proposes to learn paraphrasing models only from a monolingual corpus. To that end, we propose a residual variant of vector-quantized variational auto-encoder. Our experiments consider paraphrase identification, and paraphrasing for training set augmentation, comparing to supervised and unsupervised translation-based approaches. Monolingual paraphrasing is shown to outperform unsupervised translation in all contexts. The comparison with supervised MT is more mixed: monolingual paraphrasing is interesting for identification and augmentation but supervised MT is superior for generation.- Anthology ID:
- P19-1605
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6033–6039
- Language:
- URL:
- https://aclanthology.org/P19-1605
- DOI:
- 10.18653/v1/P19-1605
- Cite (ACL):
- Aurko Roy and David Grangier. 2019. Unsupervised Paraphrasing without Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6033–6039, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Paraphrasing without Translation (Roy & Grangier, ACL 2019)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/P19-1605.pdf
- Data
- MRPC, SST