Improving End-to-End Bangla Speech Recognition with Semi-supervised Training

Nafis Sadeq, Nafis Tahmid Chowdhury, Farhan Tanvir Utshaw, Shafayat Ahmed, Muhammad Abdullah Adnan


Abstract
Automatic speech recognition systems usually require large annotated speech corpus for training. The manual annotation of a large corpus is very difficult. It can be very helpful to use unsupervised and semi-supervised learning methods in addition to supervised learning. In this work, we focus on using a semi-supervised training approach for Bangla Speech Recognition that can exploit large unpaired audio and text data. We encode speech and text data in an intermediate domain and propose a novel loss function based on the global encoding distance between encoded data to guide the semi-supervised training. Our proposed method reduces the Word Error Rate (WER) of the system from 37% to 31.9%.
Anthology ID:
2020.findings-emnlp.169
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1875–1883
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.169
DOI:
10.18653/v1/2020.findings-emnlp.169
Bibkey:
Cite (ACL):
Nafis Sadeq, Nafis Tahmid Chowdhury, Farhan Tanvir Utshaw, Shafayat Ahmed, and Muhammad Abdullah Adnan. 2020. Improving End-to-End Bangla Speech Recognition with Semi-supervised Training. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1875–1883, Online. Association for Computational Linguistics.
Cite (Informal):
Improving End-to-End Bangla Speech Recognition with Semi-supervised Training (Sadeq et al., Findings 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2020.findings-emnlp.169.pdf