myMediCon: End-to-End Burmese Automatic Speech Recognition for Medical Conversations
Hay Man Htun, Ye Kyaw Thu, Hutchatai Chanlekha, Kotaro Funakoshi, Thepchai Supnithi
Abstract
End-to-End Automatic Speech Recognition (ASR) models have significantly advanced the field of speech processing by streamlining traditionally complex ASR system pipelines, promising enhanced accuracy and efficiency. Despite these advancements, there is a notable absence of freely available medical conversation speech corpora for Burmese, which is one of the low-resource languages. Addressing this gap, we present a manually curated Burmese Medical Speech Conversations (myMediCon) corpus, encapsulating conversations among medical doctors, nurses, and patients. Utilizing the ESPnet speech processing toolkit, we explore End-to-End ASR models for the Burmese language, focus on Transformer and Recurrent Neural Network (RNN) architectures. Our corpus comprises 12 speakers, including three males and nine females, with a total speech duration of nearly 11 hours within the medical domain. To assess the ASR performance, we applied word and syllable segmentation to the text corpus. ASR models were evaluated using Character Error Rate (CER), Word Error Rate (WER), and Translation Error Rate (TER). The experimental results indicate that the RNN-based Burmese speech recognition with syllable-level segmentation achieved the best performance, yielding a CER of 9.7%. Moreover, the RNN approach significantly outperformed the Transformer model.- Anthology ID:
- 2024.lrec-main.1051
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 12032–12039
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1051
- DOI:
- Cite (ACL):
- Hay Man Htun, Ye Kyaw Thu, Hutchatai Chanlekha, Kotaro Funakoshi, and Thepchai Supnithi. 2024. myMediCon: End-to-End Burmese Automatic Speech Recognition for Medical Conversations. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12032–12039, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- myMediCon: End-to-End Burmese Automatic Speech Recognition for Medical Conversations (Htun et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1051.pdf