Claus Zinn

2022

pdf abs
Increasing CMDI’s Semantic Interoperability with schema.org
Nino Meisinger | Thorsten Trippel | Claus Zinn
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The CLARIN Concept Registry (CCR) is the common semantic ground for most CMDI-based profiles to describe language-related resources in the CLARIN universe. While the CCR supports semantic interoperability within this universe, it does not extend beyond it. The flexibility of CMDI, however, allows users to use other term or concept registries when defining their metadata components. In this paper, we describe our use of schema.org, a light ontology used by many parties across disciplines.

pdf
Adapting GermaNet for the Semantic Web
Claus Zinn | Marie Hinrichs | Erhard Hinrichs
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

2018

pdf abs
Squib: The Language Resource Switchboard
Claus Zinn
Computational Linguistics, Volume 44, Issue 4 - December 2018

The CLARIN research infrastructure gives users access to an increasingly rich and diverse set of language-related resources and tools. Whereas there is ample support for searching resources using metadata-based search, or full-text search, or for aggregating resources into virtual collections, there is little support for users to help them process resources in one way or another. In spite of the large number of tools that process texts in many different languages, there is no single point of access where users can find tools to fit their needs and the resources they have. In this squib, we present the Language Resource Switchboard (LRS), which helps users to discover tools that can process their resources. For this, the LRS identifies all applicable tools for a given resource, lists the tasks the tools can achieve, and invokes the selected tool in such a way so that processing can start immediately with little or no prior tool parameterization.

pdf
Lessons Learned: On the Challenges of Migrating a Research Data Repository from a Research Institution to a University Library.
Thorsten Trippel | Claus Zinn
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.
Claus Zinn | Wei Qui | Marie Hinrichs | Emanuel Dima | Alexandr Chernov
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf abs
Crosswalking from CMDI to Dublin Core and MARC 21
Claus Zinn | Thorsten Trippel | Steve Kaminski | Emanuel Dima
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Component MetaData Infrastructure (CMDI) is a framework for the creation and usage of metadata formats to describe all kinds of resources in the CLARIN world. To better connect to the library world, and to allow librarians to enter metadata for linguistic resources into their catalogues, a crosswalk from CMDI-based formats to bibliographic standards is required. The general and rather fluid nature of CMDI, however, makes it hard to map arbitrary CMDI schemas to metadata standards such as Dublin Core (DC) or MARC 21, which have a mature, well-defined and fixed set of field descriptors. In this paper, we address the issue and propose crosswalks between CMDI-based profiles originating from the NaLiDa project and DC and MARC 21, respectively.

2012

pdf abs
A Metadata Editor to Support the Description of Linguistic Resources
Emanuel Dima | Christina Hoppermann | Erhard Hinrichs | Thorsten Trippel | Claus Zinn
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Creating and maintaining metadata for various kinds of resources requires appropriate tools to assist the user. The paper presents the metadata editor ProFormA for the creation and editing of CMDI (Component Metadata Infrastructure) metadata in web forms. This editor supports a number of CMDI profiles currently being provided for different types of resources. Since the editor is based on XForms and server-side processing, users can create and modify CMDI files in their standard browser without the need for further processing. Large parts of ProFormA are implemented as web services in order to reuse them in other contexts and programs.

This paper presents the system architecture as well as the underlying workflow of the Extensible Repository System of Digital Objects (ERDO) which has been developed for the sustainable archiving of language resources within the Tübingen CLARIN-D project. In contrast to other approaches focusing on archiving experts, the described workflow can be used by researchers without required knowledge in the field of long-term storage for transferring data from their local file systems into a persistent repository.

2010

pdf abs
An Evolving eScience Environment for Research Data in Linguistics
Claus Zinn | Peter Wittenburg | Jacquelijn Ringersma
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The amount of research data in the Humanities is increasing at fast speed. Metadata helps describing and making accessible this data to interested researchers within and across institutions. While metadata interoperability is an issue that is being recognised and addressed, the systematic and user-driven provision of annotations and the linking together of resources into new organisational layers have received much less attention. This paper gives an overview of our evolving technological eScience environment to support such functionality. It describes two tools, ADDIT and ViCoS, which enable researchers, rather than archive managers, to organise and reorganise research data to fit their particular needs. The two tools, which are embedded into our institute's existing software landscape, are an initial step towards an eScience environment that gives our scientists easy access to (multimodal) research data of their interest, and empowers them to structure, enrich, link together, and share such data as they wish.

We describe our computer-supported framework to overcome the rule of metadata schism. It combines the use of controlled vocabularies, managed by a data category registry, with a component-based approach, where the categories can be combined to yield complex metadata structures. A metadata scheme devised in this way will thus be grounded in its use of categories. Schema designers will profit from existing prefabricated larger building blocks, motivating re-use at a larger scale. The common base of any two metadata schemes within this framework will solve, at least to a good extent, the semantic interoperability problem, and consequently, further promote systematic use of metadata for existing resources and tools to be shared.

pdf abs
Virtual Language Observatory: The Portal to the Language Resources and Technology Universe
Dieter Van Uytvanck | Claus Zinn | Daan Broeder | Peter Wittenburg | Mariano Gardellini
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Over the years, the field of Language Resources and Technology (LRT) has developed a tremendous amount of resources and tools. However, there is no ready-to-use map that researchers could use to gain a good overview and steadfast orientation when searching for, say corpora or software tools to support their studies. It is rather the case that information is scattered across project- or organisation-specific sites, which makes it hard if not impossible for less-experienced researchers to gather all relevant material. Clearly, the provision of metadata is central to resource and software exploration. However, in the LRT field, metadata comes in many forms, tastes and qualities, and therefore substantial harmonization and curation efforts are required to provide researchers with metadata-based guidance. To address this issue a broad alliance of LRT providers (CLARIN, the Linguist List, DOBES, DELAMAN, DFKI, ELRA) have initiated the Virtual Language Observatory portal to provide a low-barrier, easy-to-follow entry point to language resources and tools; it can be accessed via http://www.clarin.eu/vlo

2008

pdf abs
Exploring and Enriching a Language Resource Archive via the Web
Marc Kemps-Snijders | Alex Klassmann | Claus Zinn | Peter Berck | Albert Russel | Peter Wittenburg
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The download first, then process paradigm is still the predominant working method amongst the research community. The web-based paradigm, however, offers many advantages from a tool development and data management perspective as they allow a quick adaptation to changing research environments. Moreover, new ways of combining tools and data are increasingly becoming available and will eventually enable a true web-based workflow approach, thus challenging the download first, then process paradigm. The necessary infrastructure for managing, exploring and enriching language resources via the Web will need to be delivered by projects like CLARIN and DARIAH.

pdf abs
Ensuring Semantic Interoperability on Lexical Resources
Marc Kemps-Snijders | Claus Zinn | Jacquelijn Ringersma | Menzo Windhouwer
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we describe a unifying approach to tackle data heterogeneity issues for lexica and related resources. We present LEXUS, our software that implements the Lexical Markup Framework (LMF) to uniformly describe and manage lexica of different structures. LEXUS also makes use of a central Data Category Registry (DCR) to address terminological issues with regard to linguistic concepts as well as the handling of working and object languages. Finally, we report on ViCoS, a LEXUS extension, providing support for the definition of arbitrary semantic relations between lexical entries or parts thereof.

2003

pdf
The Role of Initiative in Tutorial Dialogue
Mark G. Core | Johanna D. Moore | Claus Zinn
10th Conference of the European Chapter of the Association for Computational Linguistics