A Semi-Automatic Workflow for Transcribing and Annotating Broadcast News
Christoph Draxler, Sven Grawunder, Jürgen Trouvain, Felicitas Kleber
Abstract
Audio data archived in radio broadcast stations represent a rich source for various research purposes from phonetic questions up to training and test data for speech modelling. We present an efficient semi-automatic workflow for pre-processing, transcribing and analysing large linguistic-phonetic audio corpora. As a pilot study, we process radio broadcast news from a German public radio station containing recordings from 1956 until 2017. The workflow consists of basic preprocessing, automatic speech recognition, manual word correction, automatic generation of pairs of audio chunks and transcripts, plus an automatic word-, syllable- and phoneme-level segmentation of these chunks. The workflow is organised using the Octra Backend management tool, manual validation and correction of transcripts and chunking are performed using the Octra editor, and the BAS web services perform the segmentation. In an example analysis we show with our specific radio corpus how to use it for comparative longitudinal structure analyses of broadcast news, and for text- and signal-based studies on changes of speech and articulation rate.- Anthology ID:
- 2026.lrec-main.508
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 6408–6417
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.508/
- DOI:
- Cite (ACL):
- Christoph Draxler, Sven Grawunder, Jürgen Trouvain, and Felicitas Kleber. 2026. A Semi-Automatic Workflow for Transcribing and Annotating Broadcast News. International Conference on Language Resources and Evaluation, main:6408–6417.
- Cite (Informal):
- A Semi-Automatic Workflow for Transcribing and Annotating Broadcast News (Draxler et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.508.pdf