Róbert Sabo


2026

The tendency for speakers to align or accommodate their verbal and non-verbal behaviour to their interlocutors is a fundamental mechanism in spoken interaction, strongly associated with successful communication and social bonding. Despite its ubiquity and documentation across various modalities and linguistic levels (e.g., lexical, prosodic), a lack of comparable, multi-layered linguistic resources and methodological agreement prevents a deeper understanding of its cognitive mechanisms. Multidimensional view of speech alignment might enhance its application in areas like language training or human-machine interaction. This paper addresses these gaps by presenting the development of a multilingual corpus of L1 Slovak and L2 English speech, extending a comparable corpus in L1 English. The corpus utilizes a modified cooperative board game, Forbidden Island, to elicit semi-spontaneous, multi-party conversation and introduces a complementary pair game to specifically target and prime syntactic alignment. The resource includes psychological metadata (e.g., personality, anxiety, perceived dominance) and enables a reproducible methodology for investigating the relationship between entrainment patterns and individual characteristics. By providing a non-Germanic language perspective and a direct L1–L2 comparison framework at prosodic, lexical, pragmatic and syntactic levels, this corpus offers a rich resource for advancing the theoretical understanding, replication, and practical application of speech alignment.

2024

This paper presents the Slovak Autistic and Non-Autistic Child Speech Corpus, which consists of audio-recordings and transcripts of collaborative, task-oriented conversations between children (with or without autism spectrum disorder, ASD) and a non-autistic adult experimenter. The task used to elicit this corpus was the Maps task. This corpus was primarily recorded to investigate lexical alignment, but can also be used to study other conversation coordination strategies and behaviours. Scores on various standardised psychometric tests, such as those measuring IQ, executive functioning, and theory of mind, are included for each participant. In total, the corpus contains over 15 hours of speech. This relatively large database contains a non-Germanic language and can be shared with any qualified researcher, making it a valuable resource for replication of existing findings regarding communication and ASD as well as future research into communication between individuals with and without ASD.
We introduce the Alternating Reading Task (ART) Corpus, a collection of dyadic sentence reading for studying the entrainment and imitation behaviour in speech communication. The ART corpus features three experimental conditions - solo reading, alternating reading, and deliberate imitation - as well as three subcorpora encompassing French-, Italian-, and Slovak-accented English. This design allows systematic investigation of speech entrainment in a controlled and less spontaneous setting. Alongside detailed transcriptions, it includes English proficiency scores, demographics, and in-experiment questionnaires for probing linguistic, personal and interpersonal influences on entrainment. Our presentation covers its design, collection, annotation processes, initial analysis, and future research prospects.

2014

Presence of appropriate acoustic cues of affective features in the synthesized speech can be a prerequisite for the proper evaluation of the semantic content by the message recipient. In the recent work the authors have focused on the research of expressive speech synthesis capable of generating naturally sounding synthetic speech at various levels of arousal. The synthesizer should be able to produce speech in Slovak in different styles from extremely urgent warnings, insisting messages, alerts, through comments, and neutral style speech to soothing messages and very calm speech. A three-step method was used for recording both - the high-activation and low-activation expressive speech databases. The acoustic properties of the obtained databases are discussed. Several synthesizers with different levels of arousal were designed using these databases and their outputs are compared to the original voice of the voice talent. A possible ambiguity of acoustic cues is pointed out and the relevance of the semantic meaning of the sentences both in the sentence set for the speech database recording and in the set for subjective synthesizer testing is discussed.