Falavigna Daniele


2026

The rise of large language models has boosted speech and language technologies; however, where transcripts of audio data are limited, the performance of current technology is not yet satisfactory. One common strategy to tackle data scarcity is leveraging pseudo-labels, for example automatically transcribing data with a pre-trained ASR. One critical issue of this approach is assessing the quality of the automatic transcriptions, that may be rather bad for low-resourced languages. While several filtering approaches exist in literature, they typically work with decent pre-trained ASR models but may fail otherwise. In this work we propose a phonetic-based ranking, enabling an effective selection with controllable computational resources; the resulting subset of pseudo-labels serves as additional material for fine-tuning the source ASR models. Experiments on common benchmarks in three low-resource languages demonstrate the effectiveness of the proposed approach, yielding up to a 3-point reduction in WER.

2021

We present a system to support simultaneous interpreting in specific domains. The system is going to be developed through a strong synergy among technicians, mostly experts on both speech and text processing, and end-users, i.e. professional interpreters who define the requirements and will test the final product. Some preliminary encouraging results have been achieved on benchmark tests collected with the aim of measuring the performance of single components of the whole system, namely: automatic speech recognition (ASR) and named entity recognition.

2020

This paper describes “TLT-school” a corpus of speech utterances collected in schools of northern Italy for assessing the performance of students learning both English and German. The corpus was recorded in the years 2017 and 2018 from students aged between nine and sixteen years, attending primary, middle and high school. All utterances have been scored, in terms of some predefined proficiency indicators, by human experts. In addition, most of utterances recorded in 2017 have been manually transcribed carefully. Guidelines and procedures used for manual transcriptions of utterances will be described in detail, as well as results achieved by means of an automatic speech recognition system developed by us. Part of the corpus is going to be freely distributed to scientific community particularly interested both in non-native speech recognition and automatic assessment of second language proficiency.