Ingo Siegert

2022

pdf bib
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference
Ingo Siegert | Mickael Rigault | Victoria Arranz
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference

pdf abs
Pseudonymisation of Speech Data as an Alternative Approach to GDPR Compliance
Pawel Kamocki | Ingo Siegert
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference

The debate on the use of personal data in language resources usually focuses — and rightfully so — on anonymisation. However, this very same debate usually ends quickly with the conclusion that proper anonymisation would necessarily cause loss of linguistically valuable information. This paper discusses an alternative approach — pseudonymisation. While pseudonymisation does not solve all the problems (inasmuch as pseudonymised data are still to be regarded as personal data and therefore their processing should still comply with the GDPR principles), it does provide a significant relief, especially — but not only — for those who process personal data for research purposes. This paper describes pseudonymisation as a measure to safeguard rights and interests of data subjects under the GDPR (with a special focus on the right to be informed). It also provides a concrete example of pseudonymisation carried out within a research project at the Institute of Information Technology and Communications of the Otto von Guericke University Magdeburg.

pdf abs
Public Interactions with Voice Assistant – Discussion of Different One-Shot Solutions to Preserve Speaker Privacy
Ingo Siegert | Yamini Sinha | Gino Winkelmann | Oliver Jokisch | Andreas Wendemuth
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference

In recent years, the use of voice assistants has rapidly grown. Hereby, above all, the user’s speech data is stored and processed on a cloud platform, being the decisive factor for a good performance in speech processing and understanding. Although usually, they can be found in private households, a lot of business cases are also employed using voice assistants for public places, be it as an information service, a tour guide, or a booking system. As long as the systems are used in private spaces, it could be argued that the usage is voluntary and that the user itself is responsible for what is processed by the voice assistant system. When leaving the private space, the voluntary use is not the case anymore, as users may be made aware that their voice is processed in the cloud and background voices can be unintendedly recorded and processed as well. Thus, the usage of voice assistants in public environments raises a lot of privacy concerns. In this contribution, we discuss possible anonymization solutions to hide the speakers’ identity, thus allowing a safe cloud processing of speech data. Thereby, we promote the public use of voice assistants.

2020

pdf abs
“Alexa in the wild” – Collecting Unconstrained Conversations with a Modern Voice Assistant in a Public Environment
Ingo Siegert
Proceedings of the Twelfth Language Resources and Evaluation Conference

Datasets featuring modern voice assistants such as Alexa, Siri, Cortana and others allow an easy study of human-machine interactions. But data collections offering an unconstrained, unscripted public interaction are quite rare. Many studies so far have focused on private usage, short pre-defined task or specific domains. This contribution presents a dataset providing a large amount of unconstrained public interactions with a voice assistant. Up to now around 40 hours of device directed utterances were collected during a science exhibition touring through Germany. The data recording was part of an exhibit that engages visitors to interact with a commercial voice assistant system (Amazon’s ALEXA), but did not restrict them to a specific topic. A specifically developed quiz was starting point of the conversation, as the voice assistant was presented to the visitors as a possible joker for the quiz. But the visitors were not forced to solve the quiz with the help of the voice assistant and thus many visitors had an open conversation. The provided dataset – Voice Assistant Conversations in the wild (VACW) – includes the transcripts of both visitors requests and Alexa answers, identified topics and sessions as well as acoustic characteristics automatically extractable from the visitors’ audio files.

2019

pdf abs
Cross-Corpus Data Augmentation for Acoustic Addressee Detection
Oleg Akhtiamov | Ingo Siegert | Alexey Karpov | Wolfgang Minker
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Acoustic addressee detection (AD) is a modern paralinguistic and dialogue challenge that especially arises in voice assistants. In the present study, we distinguish addressees in two settings (a conversation between several people and a spoken dialogue system, and a conversation between several adults and a child) and introduce the first competitive baseline (unweighted average recall equals 0.891) for the Voice Assistant Conversation Corpus that models the first setting. We jointly solve both classification problems, using three models: a linear support vector machine dealing with acoustic functionals and two neural networks utilising raw waveforms alongside with acoustic low-level descriptors. We investigate how different corpora influence each other, applying the mixup approach to data augmentation. We also study the influence of various acoustic context lengths on AD. Two-second speech fragments turn out to be sufficient for reliable AD. Mixup is shown to be beneficial for merging acoustic data (extracted features but not raw waveforms) from different domains that allows us to reach a higher classification performance on human-machine AD and also for training a multipurpose neural network that is capable of solving both human-machine and adult-child AD problems.

2012

The LAST MINUTE corpus comprises multimodal recordings (e.g. video, audio, transcripts) from WOZ interactions in a mundane planning task (Rösner et al., 2011). It is one of the largest corpora with naturalistic data currently available. In this paper we report about first results from attempts to automatically and manually analyze the different modes with respect to emotions and affects exhibited by the subjects. We describe and discuss difficulties encountered due to the strong contrast between the naturalistic recordings and traditional databases with acted emotions.

2010

pdf abs
Developing an Expressive Speech Labeling Tool Incorporating the Temporal Characteristics of Emotion
Stefan Scherer | Ingo Siegert | Lutz Bigalke | Sascha Meudt
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

A lot of research effort has been spent on the development of emotion theories and modeling, however, their suitability and applicability to expressions in human computer interaction has not exhaustively been evaluated. Furthermore, investigations concerning the ability of the annotators to map certain expressions onto the developed emotion models is lacking proof. The proposed annotation tool, which incorporates the standard Geneva Emotional Wheel developed by Klaus Scherer and a novel temporal characteristic description feature, is aiming towards enabling the annotator to label expressions recorded in human computer interaction scenarios on an utterance level. Further, it is respecting key features of realistic and natural emotional expressions, such as their sequentiality, temporal characteristics, their mixed occurrences, and their expressivity or clarity of perception. Additionally, first steps towards evaluating the proposed tool, by analyzing utterance annotations taken from two expressive speech corpora, are undertaken and some future goals including the open source accessibility of the tool are given.