Neural Paraphrase Identification of Questions with Noisy Pretraining
Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, Dipanjan Das
Abstract
We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (replacing the word embeddings of the decomposable attention model of Parikh et al. 2016 with character n-gram representations) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset.- Anthology ID:
- W17-4121
- Volume:
- Proceedings of the First Workshop on Subword and Character Level Models in NLP
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Manaal Faruqui, Hinrich Schuetze, Isabel Trancoso, Yadollah Yaghoobzadeh
- Venue:
- SCLeM
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 142–147
- Language:
- URL:
- https://aclanthology.org/W17-4121
- DOI:
- 10.18653/v1/W17-4121
- Cite (ACL):
- Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, and Dipanjan Das. 2017. Neural Paraphrase Identification of Questions with Noisy Pretraining. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, pages 142–147, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Neural Paraphrase Identification of Questions with Noisy Pretraining (Tomar et al., SCLeM 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/W17-4121.pdf
- Data
- Paralex, Quora Question Pairs