Valérie Mapelli

Also published as: Valerie Mapelli


2024

pdf
Common European Language Data Space
Georg Rehm | Stelios Piperidis | Khalid Choukri | Andrejs Vasiļjevs | Katrin Marheinecke | Victoria Arranz | Aivars Bērziņš | Miltos Deligiannis | Dimitris Galanis | Maria Giagkou | Katerina Gkirtzou | Dimitris Gkoumas | Annika Grützner-Zahn | Athanasia Kolovou | Penny Labropoulou | Andis Lagzdiņš | Elena Leitner | Valérie Mapelli | Hélène Mazo | Simon Ostermann | Stefania Racioppa | Mickaël Rigault | Leon Voukoutis
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The Common European Language Data Space (LDS) is an integral part of the EU data strategy, which aims at developing a single market for data. Its decentralised technical infrastructure and governance scheme are currently being developed by the LDS project, which also has dedicated tasks for proof-of-concept prototypes, handling legal aspects, raising awareness and promoting the LDS through events and social media channels. The LDS is part of a broader vision for establishing all necessary components to develop European large language models.

2022

pdf
Language Resources to Support Language Diversity – the ELRA Achievements
Valérie Mapelli | Victoria Arranz | Khalid Choukri | Hélène Mazo
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This article highlights ELRA’s latest achievements in the field of Language Resources (LRs) identification, sharing and production. It also reports on ELRA’s involvement in several national and international projects, as well as in the organization of events for the support of LRs and related Language Technologies, including for under-resourced languages. Over the past few years, ELRA, together with its operational agency ELDA, has continued to increase its catalogue offer of LRs, establishing worldwide partnerships for the production of various types of LRs (SMS, tweets, crawled data, MT aligned data, speech LRs, sentiment-based data, etc.). Through their consistent involvement in EU-funded projects, ELRA and ELDA have contributed to improve the access to multilingual information in the context of the pandemic, develop tools for the de-identification of texts in the legal and medical domains, support the EU eTranslation Machine Translation system, and set up a European platform providing access to both resources and services. In December 2019, ELRA co-organized the LT4All conference, whose main topics were Language Technologies for enabling linguistic diversity and multilingualism worldwide. Moreover, although LREC was cancelled in 2020, ELRA published the LREC 2020 proceedings for the Main conference and Workshops papers, and carried on its dissemination activities while targeting the new LREC edition for 2022.

pdf
Categorizing legal features in a metadata-oriented task: defining the conditions of use
Mickaël Rigault | Victoria Arranz | Valérie Mapelli | Penny Labropoulou | Stelios Piperidis
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference

2021

pdf
European Language Grid: A Joint Platform for the European Language Technology Community
Georg Rehm | Stelios Piperidis | Kalina Bontcheva | Jan Hajic | Victoria Arranz | Andrejs Vasiļjevs | Gerhard Backfried | Jose Manuel Gomez-Perez | Ulrich Germann | Rémi Calizzano | Nils Feldhus | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Julian Moreno-Schneider | Dimitris Galanis | Penny Labropoulou | Miltos Deligiannis | Katerina Gkirtzou | Athanasia Kolovou | Dimitris Gkoumas | Leon Voukoutis | Ian Roberts | Jana Hamrlova | Dusan Varis | Lukas Kacena | Khalid Choukri | Valérie Mapelli | Mickaël Rigault | Julija Melnika | Miro Janosik | Katja Prinz | Andres Garcia-Silva | Cristian Berrio | Ondrej Klejch | Steve Renals
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Europe is a multilingual society, in which dozens of languages are spoken. The only option to enable and to benefit from multilingualism is through Language Technologies (LT), i.e., Natural Language Processing and Speech Technologies. We describe the European Language Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella platform for the European LT landscape, including research and industry, enabling all stakeholders to upload, share and distribute their services, products and resources. At the end of our EU project, which will establish a legal entity in 2022, the ELG will provide access to approx. 1300 services for all European languages as well as thousands of data sets.

2020

pdf
Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid
Penny Labropoulou | Katerina Gkirtzou | Maria Gavriilidou | Miltos Deligiannis | Dimitris Galanis | Stelios Piperidis | Georg Rehm | Maria Berger | Valérie Mapelli | Michael Rigault | Victoria Arranz | Khalid Choukri | Gerhard Backfried | José Manuel Gómez-Pérez | Andres Garcia-Silva
Proceedings of the Twelfth Language Resources and Evaluation Conference

The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies (processing and generation services and tools, models, corpora, term lists, etc.), as well as related entities (e.g., organizations, projects, supporting documents, etc.). The schema powers the European Language Grid platform that aims to be the primary hub and marketplace for industry-relevant Language Technology in Europe. ELG-SHARE has been based on various metadata schemas, vocabularies, and ontologies, as well as related recommendations and guidelines.

2018

pdf
Data Management Plan (DMP) for Language Data under the New General Da-ta Protection Regulation (GDPR)
Pawel Kamocki | Valérie Mapelli | Khalid Choukri
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
European Language Resource Coordination: Collecting Language Resources for Public Sector Multilingual Information Management
Andrea Lösch | Valérie Mapelli | Stelios Piperidis | Andrejs Vasiļjevs | Lilli Smal | Thierry Declerck | Eileen Schnur | Khalid Choukri | Josef van Genabith
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
New directions in ELRA activities
Valérie Mapelli | Victoria Arranz | Hélène Mazo | Pawel Kamocki | Vladimir Popescu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf
ELRA Activities and Services
Khalid Choukri | Valérie Mapelli | Hélène Mazo | Vladimir Popescu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

After celebrating its 20th anniversary in 2015, ELRA is carrying on its strong involvement in the HLT field. To share ELRA’s expertise of those 21 past years, this article begins with a presentation of ELRA’s strategic Data and LR Management Plan for a wide use by the language communities. Then, we further report on ELRA’s activities and services provided since LREC 2014. When looking at the cataloguing and licensing activities, we can see that ELRA has been active at making the Meta-Share repository move toward new developments steps, supporting Europe to obtain accurate LRs within the Connecting Europe Facility programme, promoting the use of LR citation, creating the ELRA License Wizard web portal. The article further elaborates on the recent LR production activities of various written, speech and video resources, commissioned by public and private customers. In parallel, ELDA has also worked on several EU-funded projects centred on strategic issues related to the European Digital Single Market. The last part gives an overview of the latest dissemination activities, with a special focus on the celebration of its 20th anniversary organised in Dubrovnik (Croatia) and the following up of LREC, as well as the launching of the new ELRA portal.

pdf
Language Resource Citation: the ISLRN Dissemination and Further Developments
Valérie Mapelli | Vladimir Popescu | Lin Liu | Khalid Choukri
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This article presents the latest dissemination activities and technical developments that were carried out for the International Standard Language Resource Number (ISLRN) service. It also recalls the main principle and submission process for providers to obtain their 13-digit ISLRN identifier. Up to March 2016, 2100 Language Resources were allocated an ISLRN number, not only ELRA’s and LDC’s catalogued Language Resources, but also the ones from other important organisations like the Joint Research Centre (JRC) and the Resource Management Agency (RMA) who expressed their strong support to this initiative. In the research field, not only assigning a unique identification number is important, but also referring to a Language Resource as an object per se (like publications) has now become an obvious requirement. The ISLRN could also become an important parameter to be considered to compute a Language Resource Impact Factor (LRIF) in order to recognize the merits of the producers of Language Resources. Integrating the ISLRN number into a LR-oriented bibliographical reference is thus part of the objective. The idea is to make use of a BibTeX entry that would take into account Language Resources items, including ISLRN.The ISLRN being a requested field within the LREC 2016 submission, we expect that several other LRs will be allocated an ISLRN number by the conference date. With this expansion, this number aims to be a spreadly-used LR citation instrument within works referring to LRs.

pdf
The ELRA License Wizard
Valérie Mapelli | Vladimir Popescu | Lin Liu | Meritxell Fernández Barrera | Khalid Choukri
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

To allow an easy understanding of the various licenses that exist for the use of Language Resources (ELRA’s, META-SHARE’s, Creative Commons’, etc.), ELRA has developed a License Wizardto help the right-holders share/distribute their resources under the appropriate license. It also aims to be exploited by users to better understand the legal obligations that apply in various licensing situations. The present paper elaborates on the License Wizard functionalities of this web configurator, which enables to select a number of legal features and obtain the user license adapted to the users selection, to define which user licenses they would like to select in order to distribute their Language Resources, to integrate the user license terms into a Distribution Agreement that could be proposed to ELRA or META-SHARE for further distribution through the ELRA Catalogue of Language Resources. Thanks to a flexible back office, the structure of the legal feature selection can easily be reviewed to include other features that may be relevant for other licenses. Integrating contributions from other initiatives thus aim to be one of the obvious next steps, with a special focus on CLARIN and Linked Data experiences.

pdf
Review on the Existing Language Resources for Languages of France
Thibault Grouas | Valérie Mapelli | Quentin Samier
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

With the support of the DGLFLF, ELDA conducted an inventory of existing language resources for the regional languages of France. The main aim of this inventory was to assess the exploitability of the identified resources within technologies. A total of 2,299 Language Resources were identified. As a second step, a deeper analysis of a set of three language groups (Breton, Occitan, overseas languages) was carried out along with a focus of their exploitability within three technologies: automatic translation, voice recognition/synthesis and spell checkers. The survey was followed by the organisation of the TLRF2015 Conference which aimed to present the state of the art in the field of the Technologies for Regional Languages of France. The next step will be to activate the network of specialists built up during the TLRF conference and to begin the organisation of a second TLRF conference. Meanwhile, the French Ministry of Culture continues its actions related to linguistic diversity and technology, in particular through a project with Wikimedia France related to contributions to Wikipedia in regional languages, the upcoming new version of the “Corpus de la Parole” and the reinforcement of the DGLFLF’s Observatory of Linguistic Practices.

2014

pdf
ELRA’s Consolidated Services for the HLT Community
Victoria Arranz | Khalid Choukri | Valérie Mapelli | Hélène Mazo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper emphasises on ELRA’s contribution to the HLT field thanks to the consolidation of its services since LREC 2012. Among the most recent contributions is the establishment of the International Standard Language Resource Number (ISLRN), with the creation and exploitation of an associated web portal to enable the procurement of unique identifiers for Language Resources. Interoperability, consolidation and synchronization remain also a strong focus in ELRA’s cataloguing work, in particular with ELRA’s involvement in the META-SHARE project, whose platform is to become ELRA’s next instrument of sharing LRs. Since last LREC, ELRA has continued its action to offer free LRs to the research community. Cooperation is another watchword within ELRA’s activities on multiple aspects: 1) at the legal level, ELRA is supporting the EC in identifying the gaps to be fulfilled to reach harmonized copyright regulations for the HLT community in Europe; 2) at the production level, ELRA is participating in several international projects, in the field of LR production and evaluation of technologies; 3) at the communication level, ELRA has organised the NLP12 meeting with the aim of boosting co-operation and strengthening the bridges between various communities.

2012

pdf
ELRA in the heart of a cooperative HLT world
Valérie Mapelli | Victoria Arranz | Matthieu Carré | Hélène Mazo | Djamel Mostefa | Khalid Choukri
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper aims at giving an overview of ELRA’s recent activities. The first part elaborates on ELRA’s means of boosting the sharing Language Resources (LRs) within the HLT community through its catalogues, LRE-Map initiative, as well as its work towards the integration of its LRs within the META-SHARE open infrastructure. The second part shows how ELRA helps in the development and evaluation of HLT, in particular through its numerous participations to collaborative projects for the production of resources and platforms to facilitate their production and exploitation. A third part focuses on ELRA’s work for clearing IPR issues in a HLT-oriented context, one of its latest initiative being its involvement in a Fair Research Act proposal to promote the easy access to LRs to the widest community. Finally, the last part elaborates on recent actions for disseminating information and promoting cooperation in the field, e.g. an the Language Library being launched at LREC2012 and the creation of an International Standard LR Number, a LR unique identifier to enable the accurate identification of LRs. Among the other messages ELRA will be conveying the attendees are the announcement of a set of freely available resources, the establishment of a LR and Evaluation forum, etc.

pdf
The REPERE Corpus : a multimodal corpus for person recognition
Aude Giraudel | Matthieu Carré | Valérie Mapelli | Juliette Kahn | Olivier Galibert | Ludovic Quintard
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The REPERE Challenge aims to support research on people recognition in multimodal conditions. To assess the technology progression, annual evaluation campaigns will be organized from 2012 to 2014. In this context, the REPERE corpus, a French videos corpus with multimodal annotation, has been developed. This paper presents datasets collected for the dry run test that took place at the beginning of 2012. Specific annotation tools and guidelines are mainly described. At the time being, 6 hours of data have been collected and annotated. Last section presents analyses of annotation distribution and interaction between modalities in the corpus.

pdf
The META-SHARE Metadata Schema for the Description of Language Resources
Maria Gavrilidou | Penny Labropoulou | Elina Desipri | Stelios Piperidis | Haris Papageorgiou | Monica Monachini | Francesca Frontini | Thierry Declerck | Gil Francopoulo | Victoria Arranz | Valerie Mapelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a metadata model for the description of language resources proposed in the framework of the META-SHARE infrastructure, aiming to cover both datasets and tools/technologies used for their processing. It places the model in the overall framework of metadata models, describes the basic principles and features of the model, elaborates on the distinction between minimal and maximal versions thereof, briefly presents the integrated environment supporting the LRs description and search and retrieval processes and concludes with work to be done in the future for the improvement of the model.

2011

pdf
A Metadata Schema for the Description of Language Resources (LRs)
Maria Gavrilidou | Penny Labropoulou | Stelios Piperidis | Monica Monachini | Francesca Frontini | Gil Francopoulo | Victoria Arranz | Valérie Mapelli
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

2008

pdf
Latest Developments in ELRA’s Services
Valérie Mapelli | Victoria Arranz | Hélène Mazo | Khalid Choukri
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes the latest developments in ELRA’s services within the field of Language Resources (LR). These developments focus on 4 main groups of activities: the identification and distribution of Language Resources; the production of LRs; the evaluation of Human Language Technology (HLT), and the dissemination of information in the field. ELRA’s initial work on the distribution of language resources has evolved throughout the years, currently covering a much wider range of activities that have been considered crucial for the current needs of the R&D community and the “good health” of the LR world. Regarding distribution, considerable work has been done on a broader identification, which does not only consider resources to be immediately negotiated for distribution but which aims to inform on all available resources. This has been the seed for the Universal Catalogue. Furthermore, a Catalogue of LRs with favourable conditions for R&D has also been created. Moreover, the different activities in what regards identification on demand, production within different frameworks, evaluation of language technologies and participation in evaluation campaigns, as well as our very specific focus on information dissemination are described in detail in this paper.

pdf
A Guide for the Production of Reusable Language Resources
Victoria Arranz | Franck Gandcher | Valérie Mapelli | Khalid Choukri
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The project described in this paper is funded by the French Ministry of Research. It aims at providing producers of Language Resources, and HLT players in general, with a guide which offers technical, legal and strategic recommendations/guidelines for the reuse of their Language Resources. The guide is dedicated in particular to academic laboratories which produce Language Resources and may benefit from further advice to start development, but also to any HLT player who wishes to follow the best practices in this field. The guidelines focus on different steps of a Language Resource’s life, i.e. specifications, production, validation, distribution, and maintenance. This paper gives a brief overview of the guide, and describes a) technical formats, standards and best practices which correspond to the current state of the art, for different types of resources, whether written or spoken, at different steps of the production line, b) legal issues and models/templates which can be used for the dissemination of Language Resources as widely as possible, c) strategic issues, by offering a dissemination plan which takes into account all types of constraints faced by HLT community players.

2004

pdf
ENABLER Thematic Network of National Projects: Technical, Strategic and Political Issues of LRs
Nicoletta Calzolari | Khalid Choukri | Maria Gavrilidou | Bente Maegaard | Paola Baroni | Hanne Fersøe | Alessandro Lenci | Valérie Mapelli | Monica Monachini | Stelios Piperidis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Technolangue: A Permanent Evaluation and Information Infrastructure
Valérie Mapelli | Maria Nava | Sylvain Surcin | Djamel Mostefa | Khalid Choukri
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf
The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus
Emanuela Cresti | Massimo Moneglia | Fernanda Bacelar do Nascimento | Antonio Moreno Sandoval | Jean Veronis | Philippe Martin | Kalid Choukri | Valerie Mapelli | Daniele Falavigna | Antonio Cid | Claude Blum
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf
For a Repository of NLP Tools
Stéphane Chaudiron | Khalid Choukri | Audrey Mance | Valérie Mapelli
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf
Recent Developments within the European Language Resources Association (ELRA)
Khalid Choukri | Audrey Mance | Valérie Mapelli
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

Search
Co-authors