A Theory of Unsupervised Speech Recognition

Liming Wang; Mark Hasegawa-Johnson; Chang Yoo

A Theory of Unsupervised Speech Recognition

Liming Wang, Mark Hasegawa-Johnson, Chang Yoo

Abstract

Unsupervised speech recognition ({pasted macro ‘ASRU’}/) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora. While various algorithms exist to solve this problem, a theoretical framework is missing to study their properties and address such issues as sensitivity to hyperparameters and training instability. In this paper, we proposed a general theoretical framework to study the properties of {pasted macro ‘ASRU’}/ systems based on random matrix theory and the theory of neural tangent kernels. Such a framework allows us to prove various learnability conditions and sample complexity bounds of {pasted macro ‘ASRU’}/. Extensive {pasted macro ‘ASRU’}/ experiments on synthetic languages with three classes of transition graphs provide strong empirical evidence for our theory (code available at https://github.com/cactuswiththoughts/UnsupASRTheory.gitcactuswiththoughts/UnsupASRTheory.git).

Anthology ID:: 2023.acl-long.67
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1192–1215
Language:
URL:: https://aclanthology.org/2023.acl-long.67
DOI:
Bibkey:
Cite (ACL):: Liming Wang, Mark Hasegawa-Johnson, and Chang Yoo. 2023. A Theory of Unsupervised Speech Recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1192–1215, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: A Theory of Unsupervised Speech Recognition (Wang et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/nodalida-main-page/2023.acl-long.67.pdf

PDF Search