Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation
Dan Li, Zi Long Zhu, Janneke van de Loo, Agnes Masip Gomez, Vikrant Yadav, Georgios Tsatsaronis, Zubair Afzal
Abstract
Extreme multi-label text classification is a prevalent task in industry, but it frequently encounters challenges in terms of machine learning perspectives, including model limitations, data scarcity, and time-consuming evaluation. This paper aims to mitigate these issues by introducing novel approaches. Firstly, we propose a label ranking model as an alternative to the conventional SciBERT-based classification model, enabling efficient handling of large-scale labels and accommodating new labels. Secondly, we present an active learning-based pipeline that addresses the data scarcity of new labels during the update of a classification system. Finally, we introduce ChatGPT to assist with model evaluation. Our experiments demonstrate the effectiveness of these techniques in enhancing the extreme multi-label text classification task.- Anthology ID:
- 2023.emnlp-industry.30
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Mingxuan Wang, Imed Zitouni
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 313–321
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-industry.30
- DOI:
- 10.18653/v1/2023.emnlp-industry.30
- Cite (ACL):
- Dan Li, Zi Long Zhu, Janneke van de Loo, Agnes Masip Gomez, Vikrant Yadav, Georgios Tsatsaronis, and Zubair Afzal. 2023. Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 313–321, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation (Li et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.emnlp-industry.30.pdf