Weighted Contrastive Learning With False Negative Control to Help Long-tailed Product Classification

Tianqi Wang, Lei Chen, Xiaodan Zhu, Younghun Lee, Jing Gao


Abstract
Item categorization (IC) aims to classify product descriptions into leaf nodes in a categorical taxonomy, which is a key technology used in a wide range of applications. Along with the fact that most datasets often has a long-tailed distribution, classification performances on tail labels tend to be poor due to scarce supervision, causing many issues in real-life applications. To address IC task’s long-tail issue, K-positive contrastive loss (KCL) is proposed on image classification task and can be applied on the IC task when using text-based contrastive learning, e.g., SimCSE. However, one shortcoming of using KCL has been neglected in previous research: false negative (FN) instances may harm the KCL’s representation learning. To address the FN issue in the KCL, we proposed to re-weight the positive pairs in the KCL loss with a regularization that the sum of weights should be constrained to K+1 as close as possible. After controlling FN instances with the proposed method, IC performance has been further improved and is superior to other LT-addressing methods.
Anthology ID:
2023.acl-industry.55
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
574–580
Language:
URL:
https://aclanthology.org/2023.acl-industry.55
DOI:
10.18653/v1/2023.acl-industry.55
Bibkey:
Cite (ACL):
Tianqi Wang, Lei Chen, Xiaodan Zhu, Younghun Lee, and Jing Gao. 2023. Weighted Contrastive Learning With False Negative Control to Help Long-tailed Product Classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 574–580, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Weighted Contrastive Learning With False Negative Control to Help Long-tailed Product Classification (Wang et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.acl-industry.55.pdf