Improving End-to-End Bangla Speech Recognition with Semi-supervised Training
Nafis Sadeq, Nafis Tahmid Chowdhury, Farhan Tanvir Utshaw, Shafayat Ahmed, Muhammad Abdullah Adnan
Abstract
Automatic speech recognition systems usually require large annotated speech corpus for training. The manual annotation of a large corpus is very difficult. It can be very helpful to use unsupervised and semi-supervised learning methods in addition to supervised learning. In this work, we focus on using a semi-supervised training approach for Bangla Speech Recognition that can exploit large unpaired audio and text data. We encode speech and text data in an intermediate domain and propose a novel loss function based on the global encoding distance between encoded data to guide the semi-supervised training. Our proposed method reduces the Word Error Rate (WER) of the system from 37% to 31.9%.- Anthology ID:
- 2020.findings-emnlp.169
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Trevor Cohn, Yulan He, Yang Liu
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1875–1883
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.169
- DOI:
- 10.18653/v1/2020.findings-emnlp.169
- Cite (ACL):
- Nafis Sadeq, Nafis Tahmid Chowdhury, Farhan Tanvir Utshaw, Shafayat Ahmed, and Muhammad Abdullah Adnan. 2020. Improving End-to-End Bangla Speech Recognition with Semi-supervised Training. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1875–1883, Online. Association for Computational Linguistics.
- Cite (Informal):
- Improving End-to-End Bangla Speech Recognition with Semi-supervised Training (Sadeq et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2020.findings-emnlp.169.pdf