Roman Derunets

2025

Pisets: A Robust Speech Recognition System for Lectures and Interviews
Ivan Bondarenko | Daniil Grebenkin | Oleg Sedukhin | Mikhail Klementev | Roman Derunets | Lyudmila Budneva
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

This work presents a speech-to-text system “Pisets” for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognition using Wav2Vec2, false positive filtering via the Audio Spectrogram Transformer (AST), and final speech recognition through Whisper. The implementation of curriculum learning methods and the utilization of diverse Russian-language speech corpora significantly enhanced the system’s effectiveness. Additionally, advanced uncertainty modeling techniques were introduced, contributing to further improvements in transcription quality. The proposed approaches ensure robust transcribing of long audio data across various acoustic conditions compared to WhisperX and the usual Whisper model. The source code of “Pisets” system is publicly available at GitHub: https://github.com/bond005/pisets.

pdf bib abs

The DataBench shared task in the SemEval-2025 competition aims to tackle the problem of QA from data in tables. Given the diversity of the structure of tables, there are different approaches to retrieving the answer. Although Retrieval-Augmented Generation (RAG) is a viable solution, extracting relevant information from tables remains challenging. In addition, the table can be prohibitively large for direct integration into the LLM context. In this paper, we address QA over tabular data first by identifying relevant columns that might contain the answers, then the LLM generates answers by providing the context of the relevant columns, and finally, the LLM refines its answers. This approach secured us 7th place in the DataBench lite category.

Co-authors

Vasily Konovalov 1

Egor Kratkov 1

Alexander Panchenko 1

Maksim Savkin 1

Oleg Sedukhin 1

Margarita Trofimova 1

Venues

Fix author