Ambika Kirkland

2026

Setting the Stage for Disfluency: Implications of Contextual Task Framing Effects for the Design of Listening Tasks
Ambika Kirkland | Jens Edlund
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Speech disfluencies have been shown to impact both judgments about a speaker’s competence and decisions about which source of information to rely on. However, fluency effects more broadly are highly sensitive to context: they are strongest when there is little other information available to inform judgments and decisions, and can be attenuated or even reversed by metacognitive processes. Speech is generally experienced in the context of interactions, where listeners have access to a plethora of information about the speaker and other parameters relevant to decision-making. It is hence crucial to consider how the outcomes of studies on speech disfluencies might be impacted by the framing of experimental tasks and the information available to participants. We carried out a decision-making task where participants had to choose which of two speakers, one fluent and one disfluent, had answered a trivia question correctly. The task was presented in the context of three scenarios which provided different information about the speakers. We replicated previous findings that listeners preferred fluent answers in only one of these three contexts, demonstrating the importance of task framing.

2022

pdf bib abs

Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge
Birger Moell | Jim O’Regan | Shivam Mehta | Ambika Kirkland | Harm Lameris | Joakim Gustafson | Jonas Beskow
Proceedings of the RaPID Workshop - Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments - within the 13th Language Resources and Evaluation Conference

As part of the PSST challenge, we explore how data augmentations, data sources, and model size affect phoneme transcription accuracy on speech produced by individuals with aphasia. We evaluate model performance in terms of feature error rate (FER) and phoneme error rate (PER). We find that data augmentations techniques, such as pitch shift, improve model performance. Additionally, increasing the size of the model decreases FER and PER. Our experiments also show that adding manually-transcribed speech from non-aphasic speakers (TIMIT) improves performance when Room Impulse Response is used to augment the data. The best performing model combines aphasic and non-aphasic data and has a 21.0% PER and a 9.2% FER, a relative improvement of 9.8% compared to the baseline model on the primary outcome measurement. We show that data augmentation, larger model size, and additional non-aphasic data sources can be helpful in improving automatic phoneme recognition models for people with aphasia.

Co-authors

Birger Moell 1

Jim O’Regan 1

Venues

LREC1
RaPID1

Fix author