Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation

Dan Li; Zi Long Zhu; Janneke van de Loo; Agnes Masip Gomez; Vikrant Yadav; Georgios Tsatsaronis; Zubair Afzal

doi:10.18653/v1/2023.emnlp-industry.30

Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation

Dan Li, Zi Long Zhu, Janneke van de Loo, Agnes Masip Gomez, Vikrant Yadav, Georgios Tsatsaronis, Zubair Afzal

Abstract

Extreme multi-label text classification is a prevalent task in industry, but it frequently encounters challenges in terms of machine learning perspectives, including model limitations, data scarcity, and time-consuming evaluation. This paper aims to mitigate these issues by introducing novel approaches. Firstly, we propose a label ranking model as an alternative to the conventional SciBERT-based classification model, enabling efficient handling of large-scale labels and accommodating new labels. Secondly, we present an active learning-based pipeline that addresses the data scarcity of new labels during the update of a classification system. Finally, we introduce ChatGPT to assist with model evaluation. Our experiments demonstrate the effectiveness of these techniques in enhancing the extreme multi-label text classification task.

Anthology ID:: 2023.emnlp-industry.30
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Mingxuan Wang, Imed Zitouni
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 313–321
Language:
URL:: https://aclanthology.org/2023.emnlp-industry.30
DOI:: 10.18653/v1/2023.emnlp-industry.30
Bibkey:
Cite (ACL):: Dan Li, Zi Long Zhu, Janneke van de Loo, Agnes Masip Gomez, Vikrant Yadav, Georgios Tsatsaronis, and Zubair Afzal. 2023. Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 313–321, Singapore. Association for Computational Linguistics.
Cite (Informal):: Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation (Li et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-5/2023.emnlp-industry.30.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-5/2023.emnlp-industry.30.mp4

PDF Search Video