Yuanyuan Wang
2025
SEP-MLDC: A Simple and Effective Paradigm for Multi-Label Document Classification
Han Liu
|
Shuqin Li
|
Xiaotong Zhang
|
Yuanyuan Wang
|
Feng Zhang
|
Hongyang Chen
|
Hong Yu
Findings of the Association for Computational Linguistics: NAACL 2025
Multi-label document classification (MLDC) aims to allocate more than one label to each document and attracts increasing attention in many practical applications. However, previous studies have failed to pay sufficient attention to the lack of semantic information on labels and the long-tail problem prevalent in the datasets. Additionally, most existing methods focus on optimizing document features, overlooking the potential of high-quality label features to enhance classification performance. In this paper, we propose a simple and effective paradigm for MLDC. Regarding the problem of insufficient label information and imbalance in the sample size of categories, we utilize large language models (LLMs) to semantically expand the label content and generate pseudo-samples for the tail categories. To optimize the features of both documents and labels, we design the contrastive learning boosted feature optimization module facilitated by the similarity matrices. Finally, we construct a label-guided feature selection module to incorporate the optimized label features into the input features to provide richer semantic information for the classifier. Extensive experiments have demonstrated that our proposed method significantly outperforms state-of-the-art baselines.
Search
Fix data
Co-authors
- Hongyang Chen 1
- Shuqin Li 1
- Han Liu 1
- Hong Yu 1
- Xiaotong Zhang 1
- show all...