Abstract
In this work, we examine whether it is possible to achieve the state of the art performance in paraphrase generation with reduced vocabulary. Our approach consists of building a convolution to sequence model (Conv2Seq) partially guided by the reinforcement learning, and training it on the subword representation of the input. The experiment on the Quora dataset, which contains over 140,000 pairs of sentences and corresponding paraphrases, found that with less than 1,000 token types, we were able to achieve performance which exceeded that of the current state of the art.- Anthology ID:
- W19-8655
- Volume:
- Proceedings of the 12th International Conference on Natural Language Generation
- Month:
- October–November
- Year:
- 2019
- Address:
- Tokyo, Japan
- Editors:
- Kees van Deemter, Chenghua Lin, Hiroya Takamura
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 438–442
- Language:
- URL:
- https://aclanthology.org/W19-8655
- DOI:
- 10.18653/v1/W19-8655
- Cite (ACL):
- Tadashi Nomoto. 2019. Generating Paraphrases with Lean Vocabulary. In Proceedings of the 12th International Conference on Natural Language Generation, pages 438–442, Tokyo, Japan. Association for Computational Linguistics.
- Cite (Informal):
- Generating Paraphrases with Lean Vocabulary (Nomoto, INLG 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/W19-8655.pdf