Language Resources in the SSH Cloud: Bringing Language Technologies for Social Sciences and Humanities (in)to the European Open Science Cloud (2020)

Volumes

Proceedings of the Workshop about Language Resources for the SSH Cloud 10 papers

bib (full) Proceedings of the Workshop about Language Resources for the SSH Cloud

pdf bib
Proceedings of the Workshop about Language Resources for the SSH Cloud
Daan Broeder | Maria Eskevich | Monica Monachini

pdf bib abs
Store Scientific Workflows Data in SSHOC Repository
Cesare Concordia | Carlo Meghini | Filippo Benedetti

Today scientific workflows are used by scientists as a way to define automated, scalable, and portable in-silico experiments. Having a formal description of an experiment can improve replicability and reproducibility of the experiment. However, simply storing and publishing the workflow may be not enough, an accurate management of provenance data generated during workflow life cycle is crucial to achieve reproducibility. This document presents the activity being carried out by CNR-ISTI in task 5.2 of the SSHOC project to add to the repository service developed in the task, functionalities to store, access and manage ‘workflow data’ in order to improve replicability and reproducibility of e-science experiments.

The paper presents a journey, which starts from various social sciences and humanities (SSH) Research Infrastructures in Europe and arrives at the comprehensive “ecosystem of infrastructures”, namely the European Open Science Cloud (EOSC). We will highlight how the SSH Open Science infrastructures contribute to the goal of establishing the EOSC. First, through the example of OPERAS, the European Research Infrastructure for Open Scholarly Communication in the SSH, to see how its services are conceived to be part of the EOSC and to address the communities’ needs. The next two sections highlight collaboration practices between partners in Europe to build the SSH component of the EOSC and a SSH discovery platform, as a service of OPERAS and the EOSC. The last two sections will focus on an implementation network dedicated to SSH data fairification.

pdf abs
From the attic to the cloud: mobilization of endangered language resources with linked data
Sebastian Nordhoff

This paper describes a collection of 20k ELAN annotation files harvested from five different endangered language archives. The ELAN files form a very heterogeneous set, but the hierarchical configuration of their tiers allow, in conjunction with the tier content, to identify transcriptions, translations, and glosses. These transcriptions, translations, and glosses are queryable across archives. Small analyses of graphemes (transcription tier), grammatical and lexical glosses (gloss tier), and semantic concepts (translation tier) show the viability of the approach. The use of identifiers from OLAC, Wikidata and Glottolog allows for a better integration of the data from these archives into the Linguistic Linked Open Data Cloud.

pdf abs
Verbal Aggression as an Indicator of Xenophobic Attitudes in Greek Twitter during and after the Financial Crisis
Maria Pontiki | Maria Gavriilidou | Dimitris Gkoumas | Stelios Piperidis

We present a replication of a data-driven and linguistically inspired Verbal Aggression analysis framework that was designed to examine Twitter verbal attacks against predefined target groups of interest as an indicator of xenophobic attitudes during the financial crisis in Greece, in particular during the period 2013-2016. The research goal in this paper is to re-examine Verbal Aggression as an indicator of xenophobic attitudes in Greek Twitter three years later, in order to trace possible changes regarding the main targets, the types and the content of the verbal attacks against the same targets in the post crisis era, given also the ongoing refugee crisis and the political landscape in Greece as it was shaped after the elections in 2019. The results indicate an interesting rearrangement of the main targets of the verbal attacks, while the content and the types of the attacks provide valuable insights about the way these targets are being framed as compared to the respective dominant perceptions and stereotypes about them during the period 2013-2016.

pdf abs
Mining Wages in Nineteenth-Century Job Advertisements. The Application of Language Resources and Language Technology to study Economic and Social Inequality
Ruben Ros | Marieke van Erp | Auke Rijpma | Richard Zijdeman

For the analysis of historical wage development, no structured data is available. Job advertisements, as found in newspapers can provide insights into what different types of jobs paid, but require language technology to structure in a format conducive to quantitative analysis. In this paper, we report on our experiments to mine wages from 19th century newspaper advertisements and detail the challenges that need to be overcome to perform a socio-economic analysis of textual data sources.

pdf abs
LR4SSHOC: The Future of Language Resources in the Context of the Social Sciences and Humanities Open Cloud
Daan Broeder | Maria Eskevich | Monica Monachini

This paper outlines the future of language resources and identifies their potential contribution for creating and sustaining the social sciences and humanities (SSH) component of the European Open Science Cloud (EOSC).

pdf abs
EOSC as a game-changer in the Social Sciences and Humanities research activities
Donatella Castelli

This paper aims to give some insights on how the European Open Science Cloud (EOSC) will be able to influence the Social Sciences and Humanities (SSH) sector, thus paving the way towards innovation. Points of discussion on how the LRs and RIs community can contribute to the revolution in the practice of research areas are provided.

pdf abs
Stretching Disciplinary Boundaries in Language Resource Development and Use: a Linguistic Data Consortium Position Paper
Christopher Cieri

Given the persistent gap between demand and supply, the impetus to reuse language resources is great. Researchers benefit from building upon the work of others including reusing data, tools and methodology. Such reuse should always consider the original intent of the language resource and how that impacts potential reanalysis. When the reuse crosses disciplinary boundaries, the re-user also needs to consider how research standards that differ between social science and humanities on the one hand and human language technologies on the other might lead to differences in unspoken assumptions. Data centers that aim to support multiple research communities have a responsibility to build bridges across disciplinary divides by sharing data in all directions, encouraging re-use and re-sharing and engaging directly in research that improves methodologies.

pdf abs
Crossing the SSH Bridge with Interview Data
Henk van den Heuvel

Spoken audio data, such as interview data, is a scientific instrument used by researchers in various disciplines crossing the boundaries of social sciences and humanities. In this paper, we will have a closer look at a portal designed to perform speech-to-text conversion on audio recordings through Automatic Speech Recognition (ASR) in the CLARIN infrastructure. Within the cluster cross-domain EU project SSHOC the potential value of such a linguistic tool kit for processing spoken language recording has found uptake in a webinar about the topic, and in a task addressing audio analysis of panel survey data. The objective of this contribution is to show that the processing of interviews as a research instrument has opened up a fascinating and fruitful area of collaboration between Social Sciences and Humanities (SSH).