Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation
Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
Abstract
We consider the task of Extreme Multi-Label Text Classification (XMTC) in the legal domain. We release a new dataset of 57k legislative documents from EURLEX, the European Union’s public document database, annotated with concepts from EUROVOC, a multidisciplinary thesaurus. The dataset is substantially larger than previous EURLEX datasets and suitable for XMTC, few-shot and zero-shot learning. Experimenting with several neural classifiers, we show that BIGRUs with self-attention outperform the current multi-label state-of-the-art methods, which employ label-wise attention. Replacing CNNs with BIGRUs in label-wise attention networks leads to the best overall performance.- Anthology ID:
- W19-2209
- Volume:
- Proceedings of the Natural Legal Language Processing Workshop 2019
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Nikolaos Aletras, Elliott Ash, Leslie Barrett, Daniel Chen, Adam Meyers, Daniel Preotiuc-Pietro, David Rosenberg, Amanda Stent
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 78–87
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/W19-2209/
- DOI:
- 10.18653/v1/W19-2209
- Cite (ACL):
- Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation. In Proceedings of the Natural Legal Language Processing Workshop 2019, pages 78–87, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation (Chalkidis et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/W19-2209.pdf