Abstract
Text classification is an important problem with a wide range of applications in NLP. However, naturally occurring data is imbalanced which can induce biases when training classification models. In this work, we introduce a novel contrastive learning (CL) approach to help with imbalanced text classification task. CL has an inherent structure which pushes similar data closer in embedding space and vice versa using data samples anchors. However, in traditional CL methods text embeddings are used as anchors, which are scattered over the embedding space. We propose a CL approach which learns key anchors in the form of label embeddings and uses them as anchors. This allows our approach to bring the embeddings closer to their labels in the embedding space and divide the embedding space between labels in a fairer manner. We also introduce a novel method to improve the interpretability of our approach in a multi-class classification scenario. This approach learns the inter-class relationships during training which provide insight into the model decisions. Since our approach is focused on dividing the embedding space between different labels we also experiment with hyperbolic embeddings since they have been proven successful in embedding hierarchical information. Our proposed method outperforms several state-of-the-art baselines by an average 11% F1. Our interpretable approach highlights key data relationships and our experiments with hyperbolic embeddings give us important insights for future investigations. We will release the implementation of our approach with the publication.- Anthology ID:
- 2024.wnut-1.6
- Volume:
- Proceedings of the Ninth Workshop on Noisy and User-generated Text (W-NUT 2024)
- Month:
- March
- Year:
- 2024
- Address:
- San Ġiljan, Malta
- Editors:
- Rob van der Goot, JinYeong Bak, Max Müller-Eberstein, Wei Xu, Alan Ritter, Tim Baldwin
- Venues:
- WNUT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 58–67
- Language:
- URL:
- https://aclanthology.org/2024.wnut-1.6
- DOI:
- Cite (ACL):
- Baber Khalid, Shuyang Dai, Tara Taghavi, and Sungjin Lee. 2024. Label Supervised Contrastive Learning for Imbalanced Text Classification in Euclidean and Hyperbolic Embedding Spaces. In Proceedings of the Ninth Workshop on Noisy and User-generated Text (W-NUT 2024), pages 58–67, San Ġiljan, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Label Supervised Contrastive Learning for Imbalanced Text Classification in Euclidean and Hyperbolic Embedding Spaces (Khalid et al., WNUT-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2024.wnut-1.6.pdf