From Conversational Speech to Readable Text: Post-Processing Noisy Transcripts in a Low-Resource Setting

Arturs Znotins, Normunds Gruzitis, Roberts Dargis


Abstract
We present ongoing research on automatic post-processing approaches to enhance the readability of noisy speech transcripts in low-resource languages, with a focus on conversational speech in Latvian. We compare transformer-based sequence-labeling models and large language models (LLMs) for the standard punctuation and capitalization restoration task, while also considering automatic correction of mispronounced words and disfluency, and partial inverse text normalization. Our results show that very small LLMs (approx. 2B parameters), fine-tuned on a modest text corpus, can achieve near state-of-the-art performance, rivaling orders of magnitude larger LLMs. Additionally, we demonstrate that a fine-tuned Whisper model, leveraging acoustic cues, outperforms text-only systems on challenging conversational data, even for a low-resource language. Error analysis reveals recurring pitfalls in sentence boundary determination and disfluency handling, emphasizing the importance of consistent annotation and domain adaptation for robust post-processing. Our findings highlight the feasibility of developing efficient post-processing solutions that significantly refine ASR output in low-resource settings, while opening new possibilities for editing and formatting speech transcripts beyond mere restoration of punctuation and capitalization.
Anthology ID:
2025.wnut-1.15
Volume:
Proceedings of the Tenth Workshop on Noisy and User-generated Text
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
JinYeong Bak, Rob van der Goot, Hyeju Jang, Weerayut Buaphet, Alan Ramponi, Wei Xu, Alan Ritter
Venues:
WNUT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
143–148
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.wnut-1.15/
DOI:
Bibkey:
Cite (ACL):
Arturs Znotins, Normunds Gruzitis, and Roberts Dargis. 2025. From Conversational Speech to Readable Text: Post-Processing Noisy Transcripts in a Low-Resource Setting. In Proceedings of the Tenth Workshop on Noisy and User-generated Text, pages 143–148, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
From Conversational Speech to Readable Text: Post-Processing Noisy Transcripts in a Low-Resource Setting (Znotins et al., WNUT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.wnut-1.15.pdf