Eleonora Zucchini
2026
Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus
Martina Simonotti | Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Caterina Mauri
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Martina Simonotti | Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Caterina Mauri
Proceedings of the Fifteenth Language Resources and Evaluation Conference
This paper analyses the implementation of Automatic Speech Recognition (ASR) into the transcription workflow of the KIParla corpus, a resource of spoken Italian. Through a two-phase experiment, 11 expert and novice transcribers produced both manual and ASR-assisted transcriptions of identical audio segments across three different types of conversation, which were subsequently analyzed through a combination of statistical modeling, word-level alignment and a series of annotation-based metrics. Results show that ASR-assisted workflows can increase transcription speed but do not systemically improve accuracy or prosodic annotation quality. Improvements appear to depend on multiple factors, including workflow configuration, conversation type and annotator experience. These findings are therefore yet not generalizable and highlight the complex interplay between transcription expertise, data type and workflow design. Despite current limitations, ASR-assisted transcription, potentially when supported by task-specific fine-tuning, could be integrated into the KIParla transcription workflow to accelerate corpus creation without compromising linguistic and annotation quality. More broadly, this work underscores the potential of semi-automatic transcription for corpus building, especially in complex settings involving multiple speakers and spontaneous, conversational data.
2025
Introducing KIParla Forest: seeds for a UD annotation of interactional syntax
Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Cristina Bosco | Caterina Mauri | Manuela Sanguinetti
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)
Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Cristina Bosco | Caterina Mauri | Manuela Sanguinetti
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)
The present project endeavors to enrich the linguistic resources available for Italian by introducing KIParla Forest, a treebank for the KIParla corpus - an existing and well-known resource for spoken Italian. This article contextualizes the project, describes the treebank creation process and design choices, and highlights future plans for next improvements.