Random Label Forests: An Ensemble Method with Label Subsampling For Extreme Multi-Label Problems

Sheng-Wei Chen, Chih-Jen Lin


Abstract
Text classification is one of the essential topics in natural language processing, and each text is often associated with multiple labels. Recently, the number of labels has become larger and larger, especially in the applications of e-commerce, so handling text-related e-commerce problems further requires a large memory space in many existing multi-label learning methods. To address the space concern, utilizing a distributed system to share that large memory requirement is a possible solution. We propose “random label forests,” a distributed ensemble method with label subsampling, for handling extremely large-scale labels. Random label forests can reduce memory usage per computer while keeping competitive performances over real-world data sets.
Anthology ID:
2024.findings-emnlp.825
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14107–14119
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2024.findings-emnlp.825/
DOI:
10.18653/v1/2024.findings-emnlp.825
Bibkey:
Cite (ACL):
Sheng-Wei Chen and Chih-Jen Lin. 2024. Random Label Forests: An Ensemble Method with Label Subsampling For Extreme Multi-Label Problems. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 14107–14119, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Random Label Forests: An Ensemble Method with Label Subsampling For Extreme Multi-Label Problems (Chen & Lin, Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2024.findings-emnlp.825.pdf