Malajyan Arthur

2024

pdf abs
Bi-dialectal ASR of Armenian from Naturalistic and Read Speech
Malajyan Arthur | Victoria Khurshudyan | Karen Avetisyan | Hossep Dolatian | Damien Nouvel
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024

The paper explores the development of Automatic Speech Recognition (ASR) models for Armenian, by using data from two standard dialects (Eastern Armenian and Western Armenian). The goal is to develop a joint bi-variational model. We achieve state-of-the-art results. Results from our ASR experiments demonstrate the impact of dataset selection and data volume on model performance. The study reveals limited transferability between dialects, although integrating datasets from both dialects enhances overall performance. The paper underscores the importance of dataset diversity and volume in ASR model training for under-resourced languages like Armenian.

Co-authors

Victoria Khurshudyan 1
Karen Avetisyan 1
Hossep Dolatian 1
Damien Nouvel 1

Venues

sigul1
ws1