Word Error Rate Estimation for Speech Recognition: e-WER

Ahmed Ali, Steve Renals


Abstract
Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER e-WER was 25.3% for the three hours test set, while the actual WER was 28.5%.
Anthology ID:
P18-2004
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–24
Language:
URL:
https://aclanthology.org/P18-2004
DOI:
10.18653/v1/P18-2004
Bibkey:
Cite (ACL):
Ahmed Ali and Steve Renals. 2018. Word Error Rate Estimation for Speech Recognition: e-WER. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 20–24, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Word Error Rate Estimation for Speech Recognition: e-WER (Ali & Renals, ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/P18-2004.pdf
Code
 qcri/e-wer