Abstract
Unsupervised speech recognition ({pasted macro ‘ASRU’}/) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora. While various algorithms exist to solve this problem, a theoretical framework is missing to study their properties and address such issues as sensitivity to hyperparameters and training instability. In this paper, we proposed a general theoretical framework to study the properties of {pasted macro ‘ASRU’}/ systems based on random matrix theory and the theory of neural tangent kernels. Such a framework allows us to prove various learnability conditions and sample complexity bounds of {pasted macro ‘ASRU’}/. Extensive {pasted macro ‘ASRU’}/ experiments on synthetic languages with three classes of transition graphs provide strong empirical evidence for our theory (code available at https://github.com/cactuswiththoughts/UnsupASRTheory.gitcactuswiththoughts/UnsupASRTheory.git).- Anthology ID:
- 2023.acl-long.67
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1192–1215
- Language:
- URL:
- https://aclanthology.org/2023.acl-long.67
- DOI:
- Cite (ACL):
- Liming Wang, Mark Hasegawa-Johnson, and Chang Yoo. 2023. A Theory of Unsupervised Speech Recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1192–1215, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- A Theory of Unsupervised Speech Recognition (Wang et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2023.acl-long.67.pdf