CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction

Xinnan Guo, Wentao Deng, Yongrui Chen, Yang Li, Mengdi Zhou, Guilin Qi, Tianxing Wu, Dong Yang, Liubin Wang, Yong Pan


Abstract
Attribute Value Extraction (AVE) aims to automatically obtain attribute value pairs from product descriptions to aid e-commerce. Despite the progressive performance of existing approaches in e-commerce platforms, they still suffer from two challenges: 1) difficulty in identifying values at different scales simultaneously; 2) easy confusion by some highly similar fine-grained attributes. This paper proposes a pre-training technique for AVE to address these issues. In particular, we first improve the conventional token-level masking strategy, guiding the language model to understand multi-scale values by recovering spans at the phrase and sentence level. Second, we apply clustering to build a challenging negative set for each example and design a pre-training objective based on contrastive learning to force the model to discriminate similar attributes. Comprehensive experiments show that our solution provides a significant improvement over traditional pre-trained models in the AVE task, and achieves state-of-the-art on four benchmarks.
Anthology ID:
2023.findings-acl.373
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6007–6018
Language:
URL:
https://aclanthology.org/2023.findings-acl.373
DOI:
10.18653/v1/2023.findings-acl.373
Bibkey:
Cite (ACL):
Xinnan Guo, Wentao Deng, Yongrui Chen, Yang Li, Mengdi Zhou, Guilin Qi, Tianxing Wu, Dong Yang, Liubin Wang, and Yong Pan. 2023. CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6007–6018, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction (Guo et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.findings-acl.373.pdf