Elena Leitner
2026
Common European Language Data Space: Development, Current Status, and Future Perspectives
Stelios Piperidis | Penny Labropoulou | Dimitrios Galanis | Khalid Choukri | Andrejs Vasiļjevs | Mitos Deligiannis | Katerina Gkirtzou | Dimitris Gkoumas | Athanasia Kolovou | Leon Voukoutis | Kanella Pouli | Maria Giagkou | Maria Gavriilidou | Katrin Marheinecke | Elena Leitner | Simon Ostermann | Stefania Raccioppa | Kossay Talmoudi | Victoria Arranz | Valérie Mapelli | Helene Mazo | Fernanda González Campo | Shi Yu | Aivars Bērziņš | Andis Lagzdiņš | Georg Rehm
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Stelios Piperidis | Penny Labropoulou | Dimitrios Galanis | Khalid Choukri | Andrejs Vasiļjevs | Mitos Deligiannis | Katerina Gkirtzou | Dimitris Gkoumas | Athanasia Kolovou | Leon Voukoutis | Kanella Pouli | Maria Giagkou | Maria Gavriilidou | Katrin Marheinecke | Elena Leitner | Simon Ostermann | Stefania Raccioppa | Kossay Talmoudi | Victoria Arranz | Valérie Mapelli | Helene Mazo | Fernanda González Campo | Shi Yu | Aivars Bērziņš | Andis Lagzdiņš | Georg Rehm
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Common European Data Spaces (CEDS) are aimed at creating a single market for data across the EU that will power AI innovation. CEDS cover 14 sectors/domains and will allow secure, trustworthy data/AI models exchange between companies, public administrations etc. The Common European Language Data Space (LDS) is part of CEDS and is already made available in beta phase. The paper presents its technical design and implementation, its governance framework as well as use cases that demonstrate its value. LDS aspires to become part of the future European Language Technology ecosystem.
A Novel Synthetic Dataset for Few-Shot Legal Relation Extraction in German
Shiva Banasaz Nouri | Elena Leitner | Julian Moreno-Schneider | Georg Rehm
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Shiva Banasaz Nouri | Elena Leitner | Julian Moreno-Schneider | Georg Rehm
Proceedings of the Fifteenth Language Resources and Evaluation Conference
The legal domain is particularly challenging for natural language processing due to the personal and confidential information it contains. Despite the significant advances of large language models (LLMs), applying them to relation extraction (RE) in legal texts remains challenging, not only because of the task’s linguistic and semantic complexity, but also due to privacy, compliance, and infrastructure constraints under regulations such as the EU AI Act. To address these challenges, we propose a novel synthetic dataset for German legal relation extraction, created using LLMs through a controlled, privacy-preserving, template-based pipeline. The dataset allows for reproducible and legally compliant experimentation. We benchmark it using two few-shot learning paradigms, a description-enhanced Model-Agnostic Meta-Learning (MAML) framework and Prototypical Networks with supervised contrastive loss and curriculum-aware prototype enrichment. Our results demonstrate that combining few-shot learning with structured semantic knowledge achieves robust and interpretable results, with the curriculum-aware Proto-Contrastive model reaching an F1-score of 99.83%.
2025
Automated Speech Act Classification in Offensive German Language Tweets
Melina Plakidis | Elena Leitner | Georg Rehm
Traitement Automatique des Langues, Volume 65, Numéro 3 : Discours de haine : ressources linguistiques, méthodes et applications [Abusive Language: Linguistic Resources, Methods and Applications]
Melina Plakidis | Elena Leitner | Georg Rehm
Traitement Automatique des Langues, Volume 65, Numéro 3 : Discours de haine : ressources linguistiques, méthodes et applications [Abusive Language: Linguistic Resources, Methods and Applications]
2024
Common European Language Data Space
Georg Rehm | Stelios Piperidis | Khalid Choukri | Andrejs Vasiļjevs | Katrin Marheinecke | Victoria Arranz | Aivars Bērziņš | Miltos Deligiannis | Dimitris Galanis | Maria Giagkou | Katerina Gkirtzou | Dimitris Gkoumas | Annika Grützner-Zahn | Athanasia Kolovou | Penny Labropoulou | Andis Lagzdiņš | Elena Leitner | Valérie Mapelli | Hélène Mazo | Simon Ostermann | Stefania Racioppa | Mickaël Rigault | Leon Voukoutis
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Georg Rehm | Stelios Piperidis | Khalid Choukri | Andrejs Vasiļjevs | Katrin Marheinecke | Victoria Arranz | Aivars Bērziņš | Miltos Deligiannis | Dimitris Galanis | Maria Giagkou | Katerina Gkirtzou | Dimitris Gkoumas | Annika Grützner-Zahn | Athanasia Kolovou | Penny Labropoulou | Andis Lagzdiņš | Elena Leitner | Valérie Mapelli | Hélène Mazo | Simon Ostermann | Stefania Racioppa | Mickaël Rigault | Leon Voukoutis
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
The Common European Language Data Space (LDS) is an integral part of the EU data strategy, which aims at developing a single market for data. Its decentralised technical infrastructure and governance scheme are currently being developed by the LDS project, which also has dedicated tasks for proof-of-concept prototypes, handling legal aspects, raising awareness and promoting the LDS through events and social media channels. The LDS is part of a broader vision for establishing all necessary components to develop European large language models.
2020
A Dataset of German Legal Documents for Named Entity Recognition
Elena Leitner | Georg Rehm | Julian Moreno-Schneider
Proceedings of the Twelfth Language Resources and Evaluation Conference
Elena Leitner | Georg Rehm | Julian Moreno-Schneider
Proceedings of the Twelfth Language Resources and Evaluation Conference
We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
2019
Developing and Orchestrating a Portfolio of Natural Legal Language Processing and Document Curation Services
Georg Rehm | Julián Moreno-Schneider | Jorge Gracia | Artem Revenko | Victor Mireles | Maria Khvalchik | Ilan Kernerman | Andis Lagzdins | Marcis Pinnis | Artus Vasilevskis | Elena Leitner | Jan Milde | Pia Weißenhorn
Proceedings of the Natural Legal Language Processing Workshop 2019
Georg Rehm | Julián Moreno-Schneider | Jorge Gracia | Artem Revenko | Victor Mireles | Maria Khvalchik | Ilan Kernerman | Andis Lagzdins | Marcis Pinnis | Artus Vasilevskis | Elena Leitner | Jan Milde | Pia Weißenhorn
Proceedings of the Natural Legal Language Processing Workshop 2019
We present a portfolio of natural legal language processing and document curation services currently under development in a collaborative European project. First, we give an overview of the project and the different use cases, while, in the main part of the article, we focus upon the 13 different processing services that are being deployed in different prototype applications using a flexible and scalable microservices architecture. Their orchestration is operationalised using a content and document curation workflow manager.
Search
Fix author
Co-authors
- Georg Rehm 6
- Andis Lagzdiņš 3
- Julian Moreno Schneider 3
- Victoria Arranz 2
- Aivars Bērziņš 2
- Dimitrios Galanis 2
- Maria Giagkou 2
- Katerina Gkirtzou 2
- Dimitris Gkoumas 2
- Athanasia Kolovou 2
- Valérie Mapelli 2
- Katrin Marheinecke 2
- Hélène Mazo 2
- Simon Ostermann 2
- Andrejs Vasiļjevs 2
- Leon Voukoutis 2
- Shiva Banasaz Nouri 1
- Khalid Choukri 1
- Khalid Choukri 1
- Mitos Deligiannis 1
- Miltos Deligiannis 1
- Maria Gavriilidou 1
- Fernanda González Campo 1
- Jorge Gracia 1
- Annika Grützner-Zahn 1
- Ilan Kernerman 1
- Maria Khvalchik 1
- Penny Labropoulou 1
- Penny Labropoulou 1
- Jan Milde 1
- Victor Mireles 1
- Mārcis Pinnis 1
- Stelios Piperidis 1
- Stelios Piperidis 1
- Melina Plakidis 1
- Kanella Pouli 1
- Stefania Raccioppa 1
- Stefania Racioppa 1
- Artem Revenko 1
- Mickaël Rigault 1
- Kossay Talmoudi 1
- Artus Vasilevskis 1
- Pia Weißenhorn 1
- Shi Yu (于是) 1