Evaluating Hierarchical Document Categorisation
Qian Sun, Aili Shen, Hiyori Yoshikawa, Chunpeng Ma, Daniel Beck, Tomoya Iwakura, Timothy Baldwin
Abstract
Hierarchical document categorisation is a special case of multi-label document categorisation, where there is a taxonomic hierarchy among the labels. While various approaches have been proposed for hierarchical document categorisation, there is no standard benchmark dataset, resulting in different methods being evaluated independently and there being no empirical consensus on what methods perform best. In this work, we examine different combinations of neural text encoders and hierarchical methods in an end-to-end framework, and evaluate over three datasets. We find that the performance of hierarchical document categorisation is determined not only by how the hierarchical information is modelled, but also the structure of the label hierarchy and class distribution.- Anthology ID:
- 2021.alta-1.20
- Volume:
- Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association
- Month:
- December
- Year:
- 2021
- Address:
- Online
- Editors:
- Afshin Rahimi, William Lane, Guido Zuccon
- Venue:
- ALTA
- SIG:
- Publisher:
- Australasian Language Technology Association
- Note:
- Pages:
- 179–184
- Language:
- URL:
- https://aclanthology.org/2021.alta-1.20
- DOI:
- Cite (ACL):
- Qian Sun, Aili Shen, Hiyori Yoshikawa, Chunpeng Ma, Daniel Beck, Tomoya Iwakura, and Timothy Baldwin. 2021. Evaluating Hierarchical Document Categorisation. In Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association, pages 179–184, Online. Australasian Language Technology Association.
- Cite (Informal):
- Evaluating Hierarchical Document Categorisation (Sun et al., ALTA 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2021.alta-1.20.pdf
- Data
- RCV1, WOS