SEP-MLDC: A Simple and Effective Paradigm for Multi-Label Document Classification

Han Liu, Shuqin Li, Xiaotong Zhang, Yuanyuan Wang, Feng Zhang, Hongyang Chen, Hong Yu


Abstract
Multi-label document classification (MLDC) aims to allocate more than one label to each document and attracts increasing attention in many practical applications. However, previous studies have failed to pay sufficient attention to the lack of semantic information on labels and the long-tail problem prevalent in the datasets. Additionally, most existing methods focus on optimizing document features, overlooking the potential of high-quality label features to enhance classification performance. In this paper, we propose a simple and effective paradigm for MLDC. Regarding the problem of insufficient label information and imbalance in the sample size of categories, we utilize large language models (LLMs) to semantically expand the label content and generate pseudo-samples for the tail categories. To optimize the features of both documents and labels, we design the contrastive learning boosted feature optimization module facilitated by the similarity matrices. Finally, we construct a label-guided feature selection module to incorporate the optimized label features into the input features to provide richer semantic information for the classifier. Extensive experiments have demonstrated that our proposed method significantly outperforms state-of-the-art baselines.
Anthology ID:
2025.findings-naacl.212
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3849–3859
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.212/
DOI:
Bibkey:
Cite (ACL):
Han Liu, Shuqin Li, Xiaotong Zhang, Yuanyuan Wang, Feng Zhang, Hongyang Chen, and Hong Yu. 2025. SEP-MLDC: A Simple and Effective Paradigm for Multi-Label Document Classification. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3849–3859, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
SEP-MLDC: A Simple and Effective Paradigm for Multi-Label Document Classification (Liu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.212.pdf