Silvia Ballarè


2026

This paper analyses the implementation of Automatic Speech Recognition (ASR) into the transcription workflow of the KIParla corpus, a resource of spoken Italian. Through a two-phase experiment, 11 expert and novice transcribers produced both manual and ASR-assisted transcriptions of identical audio segments across three different types of conversation, which were subsequently analyzed through a combination of statistical modeling, word-level alignment and a series of annotation-based metrics. Results show that ASR-assisted workflows can increase transcription speed but do not systemically improve accuracy or prosodic annotation quality. Improvements appear to depend on multiple factors, including workflow configuration, conversation type and annotator experience. These findings are therefore yet not generalizable and highlight the complex interplay between transcription expertise, data type and workflow design. Despite current limitations, ASR-assisted transcription, potentially when supported by task-specific fine-tuning, could be integrated into the KIParla transcription workflow to accelerate corpus creation without compromising linguistic and annotation quality. More broadly, this work underscores the potential of semi-automatic transcription for corpus building, especially in complex settings involving multiple speakers and spontaneous, conversational data.

2025

The present project endeavors to enrich the linguistic resources available for Italian by introducing KIParla Forest, a treebank for the KIParla corpus - an existing and well-known resource for spoken Italian. This article contextualizes the project, describes the treebank creation process and design choices, and highlights future plans for next improvements.

2019