Word Error Rate Estimation for Speech Recognition: e-WER

Ahmed Ali, Steve Renals

[How to correct problems with metadata yourself]


Abstract
Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER e-WER was 25.3% for the three hours test set, while the actual WER was 28.5%.
Anthology ID:
P18-2004
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–24
Language:
URL:
https://aclanthology.org/P18-2004
DOI:
10.18653/v1/P18-2004
Bibkey:
Cite (ACL):
Ahmed Ali and Steve Renals. 2018. Word Error Rate Estimation for Speech Recognition: e-WER. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 20–24, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Word Error Rate Estimation for Speech Recognition: e-WER (Ali & Renals, ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/P18-2004.pdf
Code
 qcri/e-wer