2024
pdf
abs
Common European Language Data Space
Georg Rehm
|
Stelios Piperidis
|
Khalid Choukri
|
Andrejs Vasiļjevs
|
Katrin Marheinecke
|
Victoria Arranz
|
Aivars Bērziņš
|
Miltos Deligiannis
|
Dimitris Galanis
|
Maria Giagkou
|
Katerina Gkirtzou
|
Dimitris Gkoumas
|
Annika Grützner-Zahn
|
Athanasia Kolovou
|
Penny Labropoulou
|
Andis Lagzdiņš
|
Elena Leitner
|
Valérie Mapelli
|
Hélène Mazo
|
Simon Ostermann
|
Stefania Racioppa
|
Mickaël Rigault
|
Leon Voukoutis
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
The Common European Language Data Space (LDS) is an integral part of the EU data strategy, which aims at developing a single market for data. Its decentralised technical infrastructure and governance scheme are currently being developed by the LDS project, which also has dedicated tasks for proof-of-concept prototypes, handling legal aspects, raising awareness and promoting the LDS through events and social media channels. The LDS is part of a broader vision for establishing all necessary components to develop European large language models.
pdf
abs
European Language Grid: One Year after
Georg Rehm
|
Stelios Piperidis
|
Dimitris Galanis
|
Penny Labropoulou
|
Maria Giagkou
|
Miltos Deligiannis
|
Leon Voukoutis
|
Martin Courtois
|
Julian Moreno-Schneider
|
Katrin Marheinecke
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
The European Language Grid (ELG) is a cloud platform for the whole European Language Technology community. While the EU project that developed the platform successfully concluded in June 2022, the ELG initiative has continued. This article provides a description of the current state of ELG in terms of user adoption and number of language resources and technologies available in early 2024. It also provides an overview of the various activities with regard to ELG since the end of the project and since the publication of the ELG book, especially the co-authors’ attempt to integrate the ELG platform into various data space initiatives. The article also provides an overview of the Digital Language Equality (DLE) dashboard and the current state of DLE in Europe.
pdf
bib
abs
Surveying the Technology Support of Languages
Annika Grützner-Zahn
|
Federico Gaspari
|
Maria Giagkou
|
Stefanie Hegele
|
Andy Way
|
Georg Rehm
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING 2024
Many of the world’s languages are left behind when it comes to Language Technology applications, since most of these are available only in a limited number of languages, creating a digital divide that affects millions of users worldwide. It is crucial, therefore, to monitor and quantify the progress of technology support for individual languages, which also enables comparisons across language communities. In this way, efforts can be directed towards reducing language barriers, promoting economic and social inclusion, and ensuring that all citizens can use their preferred language in the digital age. This paper critically reviews and compares recent quantitative approaches to measuring technology support for languages. Despite using different approaches and methodologies, the findings of all analysed papers demonstrate the unequal distribution of technology support and emphasise the existence of a digital divide among languages.
2022
pdf
abs
Overview of the ELE Project
Itziar Aldabe
|
Jane Dunne
|
Aritz Farwell
|
Owen Gallagher
|
Federico Gaspari
|
Maria Giagkou
|
Jan Hajic
|
Jens Peter Kückens
|
Teresa Lynn
|
Georg Rehm
|
German Rigau
|
Katrin Marheinecke
|
Stelios Piperidis
|
Natalia Resende
|
Tea Vojtěchová
|
Andy Way
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
This paper provides an overview of the ongoing European Language Equality(ELE) project, an 18-month action funded by the European Commission which involves 52 partners. The primary goal of ELE is to prepare the European Language Equality Programme, in the form of a strategic research, innovation and implementation agenda and a roadmap for achieving full digital language equality (DLE) in Europe by 2030.
pdf
bib
abs
Introducing the Digital Language Equality Metric: Technological Factors
Federico Gaspari
|
Owen Gallagher
|
Georg Rehm
|
Maria Giagkou
|
Stelios Piperidis
|
Jane Dunne
|
Andy Way
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference
This paper introduces the concept of Digital Language Equality (DLE) developed by the EU-funded European Language Equality (ELE) project, and describes the associated DLE Metric with a focus on its technological factors (TFs), which are complemented by situational contextual factors. This work aims at objectively describing the level of technological support of all European languages and lays the foundation to implement a large-scale EU-wide programme to ensure that these languages can continue to exist and prosper in the digital age, to serve the present and future needs of their speakers. The paper situates this ongoing work with a strong European focus in the broader context of related efforts, and explains how the DLE Metric can help track the progress towards DLE for all languages of Europe, focusing in particular on the role played by the TFs. These are derived from the European Language Grid (ELG) Catalogue, that provides the empirical basis to measure the level of digital readiness of all European languages. The DLE Metric scores can be consulted through an online interactive dashboard to show the level of technological support of each European language and track the overall progress toward DLE.
pdf
abs
Collaborative Metadata Aggregation and Curation in Support of Digital Language Equality Monitoring
Maria Giagkou
|
Stelios Piperidis
|
Penny Labropoulou
|
Miltos Deligiannis
|
Athanasia Kolovou
|
Leon Voukoutis
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference
The European Language Equality (ELE) project develops a strategic research, innovation and implementation agenda (SRIA) and a roadmap for achieving full digital language equality in Europe by 2030. Key component of the SRIA development is an accurate estimation of the current standing of languages with respect to their technological readiness. In this paper we present the empirical basis on which such estimation is grounded, its starting point and in particular the automatic and collaborative methods used for extending it. We focus on the collaborative expert activities, the challenges posed, and the solutions adopted. We also briefly present the dashboard application developed for querying and visualising the empirical data as well as monitoring and comparing the evolution of technological support within and across languages.
2021
pdf
abs
Broad Linguistic Complexity Analysis for Greek Readability Classification
Savvas Chatzipanagiotidis
|
Maria Giagkou
|
Detmar Meurers
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
This paper explores the linguistic complexity of Greek textbooks as a readability classification task. We analyze textbook corpora for different school subjects and textbooks for Greek as a Second Language, covering a very wide spectrum of school age groups and proficiency levels. A broad range of quantifiable linguistic complexity features (lexical, morphological and syntactic) are extracted and calculated. Conducting experiments with different feature subsets, we show that the different linguistic dimensions contribute orthogonal information, each contributing towards the highest result achieved using all linguistic feature subsets. A readability classifier trained on this basis reaches a classification accuracy of 88.16% for the Greek as a Second Language corpus. To investigate the generalizability of the classification models, we also perform cross-corpus evaluations. We show that the model trained on the most varied text collection (for Greek as a school subject) generalizes best. In addition to advancing the state of the art for Greek readability analysis, the paper also contributes insights on the role of different feature sets and training setups for generalizable readability classification.
2020
pdf
abs
Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures
Lilli Smal
|
Andrea Lösch
|
Josef van Genabith
|
Maria Giagkou
|
Thierry Declerck
|
Stephan Busemann
Proceedings of the Twelfth Language Resources and Evaluation Conference
Data is key in training modern language technologies. In this paper, we summarise the findings of the first pan-European study on obstacles to sharing language data across 29 EU Member States and CEF-affiliated countries carried out under the ELRC White Paper action on Sustainable Language Data Sharing to Support Language Equality in Multilingual Europe. Why Language Data Matters. We present the methodology of the study, the obstacles identified and report on recommendations on how to overcome those. The obstacles are classified into (1) lack of appreciation of the value of language data, (2) structural challenges, (3) disposition towards CAT tools and lack of digital skills, (4) inadequate language data management practices, (5) limited access to outsourced translations, and (6) legal concerns. Recommendations are grouped into addressing the European/national policy level, and the organisational/institutional level.
2018
pdf
Managing Public Sector Data for Multilingual Applications Development
Stelios Piperidis
|
Penny Labropoulou
|
Miltos Deligiannis
|
Maria Giagkou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2011
pdf
Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation
Pavel Pecina
|
Antonio Toral
|
Andy Way
|
Vassilis Papavassiliou
|
Prokopis Prokopidis
|
Maria Giagkou
Proceedings of the 15th Annual Conference of the European Association for Machine Translation