Daniele Melaccio

2026

Automating FAIRness: A FAIRification Tool within the Language Resources Infrastructure
Daniele Melaccio | Monica Monachini
Proceedings of the Fifteenth Language Resources and Evaluation Conference

In addition to technical interoperability, FAIRness encompasses governance, policy, and ethical aspects, reflecting how language data are produced, represented, and managed within research infrastructures. Ensuring FAIR compliance of language resources is essential for transparent and sustainable research in the social sciences and humanities, enabling data accessibility, quality, and long-term community reuse. The FAIRification Tool — created by CLARIN IT as part of the Humanities and Heritage Italian Open Science Cloud (H2IOSC) — is a modular system that automates and enhances FAIR compliance for language resources. The tool builds upon and extends existing FAIR data assessment frameworks by combining automatic and human validation, a feedback dashboard, certification thresholds, and domain-specific extensions aligned with linguistic metadata standards. It supports FAIR-by-design practices by operationalizing FAIR concepts and embedding them into repository workflows, thereby promoting interoperability across CLARIN, H2IOSC, and EOSC. The tool’s effectiveness has been demonstrated through an initial evaluation conducted on a representative set of linguistic datasets, which revealed notable improvements (30–40%) in FAIR scores, particularly in the Findable and Reusable dimensions, contributing to responsible, policy-aware, and transparent language data management within the European Open Science landscape. demonstrated through an initial evaluation conducted on a representative set of linguistic datasets, which revealed notable improvements (30–40%) in FAIR scores, particularly in the Findable and Reusable dimensions, contributing to responsible, policy-aware, and transparent language data management within the European Open Science landscape.

pdf bib abs

In the context of evolving European and national policies for research infrastructure governance, this paper presents the contribution of a national consortium for language resources and technology to the construction of a national infrastructure for FAIR and interoperable language and cultural data within a broader Humanities and Heritage Open Science initiative. As the national node of a European research infrastructure for language resources, the consortium contributes to translating FAIR and Open Science principles into practice by integrating technical, methodological, and training dimensions. Its activities combine several coordinated components: FAIRification workflows and ontology-based metadata mediation to enhance semantic interoperability across infrastructures; the refactoring and exposure of services through a federated API gateway; and the implementation of a Linguistic Linked Open Data (LLOD) pilot for the validation, transformation, and publication of interoperable RDF datasets. A national training ecosystem — comprising a training platform and a FAIR learning library — supports capacity building and the creation of FAIR-by-design learning materials. Finally, a permanent research observatory monitors community practices and needs, providing evidence-based insights for the continuous improvement of services and training provision. Together, these components demonstrate a coherent strategy for implementing FAIR and Open Science at the national level, while ensuring alignment with major European and national initiatives in the SSH data ecosystem.

Co-authors

Venues

LREC2

Fix author