Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts

ChaeHun Park, Hojun Cho, Jaegul Choo


Abstract
Automatic speech recognition systems often fail on specialized vocabulary in tasks such as weather forecasting. To address this, we introduce an evaluation dataset of Korean weather queries. The dataset was recorded by diverse native speakers following pronunciation guidelines from domain experts and underwent rigorous verification. Benchmarking both open-source models and a commercial API reveals high error rates on meteorological terms. We also explore a lightweight text-to-speech-based data augmentation strategy, yielding substantial error reduction for domain-specific vocabulary and notable improvement in overall recognition accuracy. Our dataset is available at https://huggingface.co/datasets/ddehun/korean-weather-asr.
Anthology ID:
2025.findings-emnlp.561
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10619–10627
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.561/
DOI:
10.18653/v1/2025.findings-emnlp.561
Bibkey:
Cite (ACL):
ChaeHun Park, Hojun Cho, and Jaegul Choo. 2025. Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10619–10627, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts (Park et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.561.pdf
Checklist:
 2025.findings-emnlp.561.checklist.pdf