Bi-dialectal ASR of Armenian from Naturalistic and Read Speech
Malajyan Arthur, Victoria Khurshudyan, Karen Avetisyan, Hossep Dolatian, Damien Nouvel
Abstract
The paper explores the development of Automatic Speech Recognition (ASR) models for Armenian, by using data from two standard dialects (Eastern Armenian and Western Armenian). The goal is to develop a joint bi-variational model. We achieve state-of-the-art results. Results from our ASR experiments demonstrate the impact of dataset selection and data volume on model performance. The study reveals limited transferability between dialects, although integrating datasets from both dialects enhances overall performance. The paper underscores the importance of dataset diversity and volume in ASR model training for under-resourced languages like Armenian.- Anthology ID:
- 2024.sigul-1.27
- Volume:
- Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Maite Melero, Sakriani Sakti, Claudia Soria
- Venues:
- SIGUL | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 227–236
- Language:
- URL:
- https://aclanthology.org/2024.sigul-1.27
- DOI:
- Cite (ACL):
- Malajyan Arthur, Victoria Khurshudyan, Karen Avetisyan, Hossep Dolatian, and Damien Nouvel. 2024. Bi-dialectal ASR of Armenian from Naturalistic and Read Speech. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 227–236, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Bi-dialectal ASR of Armenian from Naturalistic and Read Speech (Arthur et al., SIGUL-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.sigul-1.27.pdf