Maria Giagkou


Overview of the ELE Project
Itziar Aldabe | Jane Dunne | Aritz Farwell | Owen Gallagher | Federico Gaspari | Maria Giagkou | Jan Hajic | Jens Peter Kückens | Teresa Lynn | Georg Rehm | German Rigau | Katrin Marheinecke | Stelios Piperidis | Natalia Resende | Tea Vojtěchová | Andy Way
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This paper provides an overview of the ongoing European Language Equality(ELE) project, an 18-month action funded by the European Commission which involves 52 partners. The primary goal of ELE is to prepare the European Language Equality Programme, in the form of a strategic research, innovation and implementation agenda and a roadmap for achieving full digital language equality (DLE) in Europe by 2030.

pdf bib
Introducing the Digital Language Equality Metric: Technological Factors
Federico Gaspari | Owen Gallagher | Georg Rehm | Maria Giagkou | Stelios Piperidis | Jane Dunne | Andy Way
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference

This paper introduces the concept of Digital Language Equality (DLE) developed by the EU-funded European Language Equality (ELE) project, and describes the associated DLE Metric with a focus on its technological factors (TFs), which are complemented by situational contextual factors. This work aims at objectively describing the level of technological support of all European languages and lays the foundation to implement a large-scale EU-wide programme to ensure that these languages can continue to exist and prosper in the digital age, to serve the present and future needs of their speakers. The paper situates this ongoing work with a strong European focus in the broader context of related efforts, and explains how the DLE Metric can help track the progress towards DLE for all languages of Europe, focusing in particular on the role played by the TFs. These are derived from the European Language Grid (ELG) Catalogue, that provides the empirical basis to measure the level of digital readiness of all European languages. The DLE Metric scores can be consulted through an online interactive dashboard to show the level of technological support of each European language and track the overall progress toward DLE.

Collaborative Metadata Aggregation and Curation in Support of Digital Language Equality Monitoring
Maria Giagkou | Stelios Piperidis | Penny Labropoulou | Miltos Deligiannis | Athanasia Kolovou | Leon Voukoutis
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference

The European Language Equality (ELE) project develops a strategic research, innovation and implementation agenda (SRIA) and a roadmap for achieving full digital language equality in Europe by 2030. Key component of the SRIA development is an accurate estimation of the current standing of languages with respect to their technological readiness. In this paper we present the empirical basis on which such estimation is grounded, its starting point and in particular the automatic and collaborative methods used for extending it. We focus on the collaborative expert activities, the challenges posed, and the solutions adopted. We also briefly present the dashboard application developed for querying and visualising the empirical data as well as monitoring and comparing the evolution of technological support within and across languages.


Broad Linguistic Complexity Analysis for Greek Readability Classification
Savvas Chatzipanagiotidis | Maria Giagkou | Detmar Meurers
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

This paper explores the linguistic complexity of Greek textbooks as a readability classification task. We analyze textbook corpora for different school subjects and textbooks for Greek as a Second Language, covering a very wide spectrum of school age groups and proficiency levels. A broad range of quantifiable linguistic complexity features (lexical, morphological and syntactic) are extracted and calculated. Conducting experiments with different feature subsets, we show that the different linguistic dimensions contribute orthogonal information, each contributing towards the highest result achieved using all linguistic feature subsets. A readability classifier trained on this basis reaches a classification accuracy of 88.16% for the Greek as a Second Language corpus. To investigate the generalizability of the classification models, we also perform cross-corpus evaluations. We show that the model trained on the most varied text collection (for Greek as a school subject) generalizes best. In addition to advancing the state of the art for Greek readability analysis, the paper also contributes insights on the role of different feature sets and training setups for generalizable readability classification.


Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures
Lilli Smal | Andrea Lösch | Josef van Genabith | Maria Giagkou | Thierry Declerck | Stephan Busemann
Proceedings of the Twelfth Language Resources and Evaluation Conference

Data is key in training modern language technologies. In this paper, we summarise the findings of the first pan-European study on obstacles to sharing language data across 29 EU Member States and CEF-affiliated countries carried out under the ELRC White Paper action on Sustainable Language Data Sharing to Support Language Equality in Multilingual Europe. Why Language Data Matters. We present the methodology of the study, the obstacles identified and report on recommendations on how to overcome those. The obstacles are classified into (1) lack of appreciation of the value of language data, (2) structural challenges, (3) disposition towards CAT tools and lack of digital skills, (4) inadequate language data management practices, (5) limited access to outsourced translations, and (6) legal concerns. Recommendations are grouped into addressing the European/national policy level, and the organisational/institutional level.


Managing Public Sector Data for Multilingual Applications Development
Stelios Piperidis | Penny Labropoulou | Miltos Deligiannis | Maria Giagkou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation
Pavel Pecina | Antonio Toral | Andy Way | Vassilis Papavassiliou | Prokopis Prokopidis | Maria Giagkou
Proceedings of the 15th Annual conference of the European Association for Machine Translation