How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems

Archiki Prasad, Preethi Jyothi


Abstract
In this work, we present a detailed analysis of how accent information is reflected in the internal representation of speech in an end-to-end automatic speech recognition (ASR) system. We use a state-of-the-art end-to-end ASR system, comprising convolutional and recurrent layers, that is trained on a large amount of US-accented English speech and evaluate the model on speech samples from seven different English accents. We examine the effects of accent on the internal representation using three main probing techniques: a) Gradient-based explanation methods, b) Information-theoretic measures, and c) Outputs of accent and phone classifiers. We find different accents exhibiting similar trends irrespective of the probing technique used. We also find that most accent information is encoded within the first recurrent layer, which is suggestive of how one could adapt such an end-to-end model to learn representations that are invariant to accents.
Anthology ID:
2020.acl-main.345
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3739–3753
Language:
URL:
https://aclanthology.org/2020.acl-main.345
DOI:
10.18653/v1/2020.acl-main.345
Bibkey:
Cite (ACL):
Archiki Prasad and Preethi Jyothi. 2020. How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3739–3753, Online. Association for Computational Linguistics.
Cite (Informal):
How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems (Prasad & Jyothi, ACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.acl-main.345.pdf
Video:
 http://slideslive.com/38929438