CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction
Xinnan Guo, Wentao Deng, Yongrui Chen, Yang Li, Mengdi Zhou, Guilin Qi, Tianxing Wu, Dong Yang, Liubin Wang, Yong Pan
Abstract
Attribute Value Extraction (AVE) aims to automatically obtain attribute value pairs from product descriptions to aid e-commerce. Despite the progressive performance of existing approaches in e-commerce platforms, they still suffer from two challenges: 1) difficulty in identifying values at different scales simultaneously; 2) easy confusion by some highly similar fine-grained attributes. This paper proposes a pre-training technique for AVE to address these issues. In particular, we first improve the conventional token-level masking strategy, guiding the language model to understand multi-scale values by recovering spans at the phrase and sentence level. Second, we apply clustering to build a challenging negative set for each example and design a pre-training objective based on contrastive learning to force the model to discriminate similar attributes. Comprehensive experiments show that our solution provides a significant improvement over traditional pre-trained models in the AVE task, and achieves state-of-the-art on four benchmarks.- Anthology ID:
- 2023.findings-acl.373
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6007–6018
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.373
- DOI:
- 10.18653/v1/2023.findings-acl.373
- Cite (ACL):
- Xinnan Guo, Wentao Deng, Yongrui Chen, Yang Li, Mengdi Zhou, Guilin Qi, Tianxing Wu, Dong Yang, Liubin Wang, and Yong Pan. 2023. CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6007–6018, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction (Guo et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.findings-acl.373.pdf