Abstract
We present GerDaLIR, a German Dataset for Legal Information Retrieval based on case documents from the open legal information platform Open Legal Data. The dataset consists of 123K queries, each labelled with at least one relevant document in a collection of 131K case documents. We conduct several baseline experiments including BM25 and a state-of-the-art neural re-ranker. With our dataset, we aim to provide a standardized benchmark for German LIR and promote open research in this area. Beyond that, our dataset comprises sufficient training data to be used as a downstream task for German or multilingual language models.- Anthology ID:
- 2021.nllp-1.13
- Volume:
- Proceedings of the Natural Legal Language Processing Workshop 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro
- Venue:
- NLLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 123–128
- Language:
- URL:
- https://aclanthology.org/2021.nllp-1.13
- DOI:
- 10.18653/v1/2021.nllp-1.13
- Cite (ACL):
- Marco Wrzalik and Dirk Krechel. 2021. GerDaLIR: A German Dataset for Legal Information Retrieval. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 123–128, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- GerDaLIR: A German Dataset for Legal Information Retrieval (Wrzalik & Krechel, NLLP 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.nllp-1.13.pdf