Andreas Liesenfeld
2023
The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Andreas Liesenfeld | Alianda Lopez | Mark Dingemanse
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Andreas Liesenfeld | Alianda Lopez | Mark Dingemanse
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Speech recognition systems are a key intermediary in voice-driven human-computer interaction. Although speech recognition works well for pristine monologic audio, real-life use cases in open-ended interactive settings still present many challenges. We argue that timing is mission-critical for dialogue systems, and evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). This impacts especially the recognition of conversational words (study 2), and in turn has dire consequences for downstream intent recognition (study 3). Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.
2022
From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology
Mark Dingemanse | Andreas Liesenfeld
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mark Dingemanse | Andreas Liesenfeld
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Informal social interaction is the primordial home of human language. Linguistically diverse conversational corpora are an important and largely untapped resource for computational linguistics and language technology. Through the efforts of a worldwide language documentation movement, such corpora are increasingly becoming available. We show how interactional data from 63 languages (26 families) harbours insights about turn-taking, timing, sequential structure and social action, with implications for language technology, natural language understanding, and the design of conversational interfaces. Harnessing linguistically diverse conversational corpora will provide the empirical foundations for flexible, localizable, humane language technologies of the future.
Evaluation of Automatic Speech Recognition for Conversational Speech in Dutch, English and German: What Goes Missing?
Alianda Lopez | Andreas Liesenfeld | Mark Dingemanse
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Alianda Lopez | Andreas Liesenfeld | Mark Dingemanse
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Building and curating conversational corpora for diversity-aware language science and technology
Andreas Liesenfeld | Mark Dingemanse
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Andreas Liesenfeld | Mark Dingemanse
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We present an analysis pipeline and best practice guidelines for building and curating corpora of everyday conversation in diverse languages. Surveying language documentation corpora and other resources that cover 67 languages and varieties from 28 phyla, we describe the compilation and curation process, specify minimal properties of a unified format for interactional data, and develop methods for quality control that take into account turn-taking and timing. Two case studies show the broad utility of conversational data for (i) charting human interactional infrastructure and (ii) tracing challenges and opportunities for current ASR solutions. Linguistically diverse conversational corpora can provide new insights for the language sciences and stronger empirical foundations for language technology.
2021
Animosity and suffering: Metaphors of BITTERNESS in English and Chinese
Gabor Parti | Andreas Liesenfeld | Chu-Ren Huang
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation
Gabor Parti | Andreas Liesenfeld | Chu-Ren Huang
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation
Scikit-talk: A toolkit for processing real-world conversational speech data
Andreas Liesenfeld | Gabor Parti | Chu-Ren Huang
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Andreas Liesenfeld | Gabor Parti | Chu-Ren Huang
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
We present Scikit-talk, an open-source toolkit for processing collections of real-world conversational speech in Python. First of its kind, the toolkit equips those interested in studying or modeling conversations with an easy-to-use interface to build and explore large collections of transcriptions and annotations of talk-in-interaction. Designed for applications in speech processing and Conversational AI, Scikit-talk provides tools to custom-build datasets for tasks such as intent prototyping, dialog flow testing, and conversation design. Its preprocessor module comes with several pre-built interfaces for common transcription formats, which aim to make working across multiple data sources more accessible. The explorer module provides a collection of tools to explore and analyse this data type via string matching and unsupervised machine learning techniques. Scikit-talk serves as a platform to collect and connect different transcription formats and representations of talk, enabling the user to quickly build multilingual datasets of varying detail and granularity. Thus, the toolkit aims to make working with authentic conversational speech data in Python more accessible and to provide the user with comprehensive options to work with representations of talk in appropriate detail for any downstream task. For the latest updates and information on currently supported languages and language resources, please refer to: https://pypi.org/project/scikit-talk/
2020
Predicting gender and age categories in English conversations using lexical, non-lexical, and turn-taking features
Andreas Liesenfeld | Gábor Parti | Yuyin Hsu | Chu-Ren Huang
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Andreas Liesenfeld | Gábor Parti | Yuyin Hsu | Chu-Ren Huang
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
2018
MYCanCor: A Video Corpus of spoken Malaysian Cantonese
Andreas Liesenfeld
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Andreas Liesenfeld
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)