Predicate-Argument Based Bi-Encoder for Paraphrase Identification

Qiwei Peng, David Weir, Julie Weeds, Yekun Chai


Abstract
Paraphrase identification involves identifying whether a pair of sentences express the same or similar meanings. While cross-encoders have achieved high performances across several benchmarks, bi-encoders such as SBERT have been widely applied to sentence pair tasks. They exhibit substantially lower computation complexity and are better suited to symmetric tasks. In this work, we adopt a bi-encoder approach to the paraphrase identification task, and investigate the impact of explicitly incorporating predicate-argument information into SBERT through weighted aggregation. Experiments on six paraphrase identification datasets demonstrate that, with a minimal increase in parameters, the proposed model is able to outperform SBERT/SRoBERTa significantly. Further, ablation studies reveal that the predicate-argument based component plays a significant role in the performance gain.
Anthology ID:
2022.acl-long.382
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5579–5589
Language:
URL:
https://aclanthology.org/2022.acl-long.382
DOI:
10.18653/v1/2022.acl-long.382
Bibkey:
Cite (ACL):
Qiwei Peng, David Weir, Julie Weeds, and Yekun Chai. 2022. Predicate-Argument Based Bi-Encoder for Paraphrase Identification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5579–5589, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Predicate-Argument Based Bi-Encoder for Paraphrase Identification (Peng et al., ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.acl-long.382.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2022.acl-long.382.mp4
Data
GLUEPIT