Abstract
Multi-label text classification (MLTC) is an important task in the field of natural language processing. Most existing models rely on high-quality text representations provided by pre-trained language models (PLMs). They hence face the challenge of input length limitation caused by PLMs, when dealing with long texts. In light of this, we introduce a comprehensive approach to multi-label long text classification. We propose a text segmentation algorithm, which guarantees to produce the optimal segmentation, to address the issue of input length limitation caused by PLMs. We incorporate external knowledge, labels’ co-occurrence relations, and attention mechanisms in representation learning to enhance both text and label representations. Our method’s effectiveness is validated through extensive experiments on various MLTC datasets, unraveling the intricate correlations between texts and labels.- Anthology ID:
- 2024.findings-emnlp.402
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6864–6873
- Language:
- URL:
- https://aclanthology.org/2024.findings-emnlp.402
- DOI:
- 10.18653/v1/2024.findings-emnlp.402
- Cite (ACL):
- Wang Zhang, Xin Wang, Qian Wang, Tao Deng, and Xiaoru Wu. 2024. From Text Segmentation to Enhanced Representation Learning: A Novel Approach to Multi-Label Classification for Long Texts. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6864–6873, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- From Text Segmentation to Enhanced Representation Learning: A Novel Approach to Multi-Label Classification for Long Texts (Zhang et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-emnlp.402.pdf