Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures
Lilli Smal, Andrea Lösch, Josef van Genabith, Maria Giagkou, Thierry Declerck, Stephan Busemann
Abstract
Data is key in training modern language technologies. In this paper, we summarise the findings of the first pan-European study on obstacles to sharing language data across 29 EU Member States and CEF-affiliated countries carried out under the ELRC White Paper action on Sustainable Language Data Sharing to Support Language Equality in Multilingual Europe. Why Language Data Matters. We present the methodology of the study, the obstacles identified and report on recommendations on how to overcome those. The obstacles are classified into (1) lack of appreciation of the value of language data, (2) structural challenges, (3) disposition towards CAT tools and lack of digital skills, (4) inadequate language data management practices, (5) limited access to outsourced translations, and (6) legal concerns. Recommendations are grouped into addressing the European/national policy level, and the organisational/institutional level.- Anthology ID:
- 2020.lrec-1.422
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 3443–3448
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.422
- DOI:
- Cite (ACL):
- Lilli Smal, Andrea Lösch, Josef van Genabith, Maria Giagkou, Thierry Declerck, and Stephan Busemann. 2020. Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3443–3448, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures (Smal et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2020.lrec-1.422.pdf