An Effective Pronunciation Assessment Approach Leveraging Hierarchical Transformers and Pre-training Strategies
Bi-Cheng Yan, Jiun-Ting Li, Yi-Cheng Wang, Hsin Wei Wang, Tien-Hong Lo, Yung-Chang Hsu, Wei-Cheng Chao, Berlin Chen
Abstract
Automatic pronunciation assessment (APA) manages to quantify a second language (L2) learner’s pronunciation proficiency in a target language by providing fine-grained feedback with multiple pronunciation aspect scores at various linguistic levels. Most existing efforts on APA typically parallelize the modeling process, namely predicting multiple aspect scores across various linguistic levels simultaneously. This inevitably makes both the hierarchy of linguistic units and the relatedness among the pronunciation aspects sidelined. Recognizing such a limitation, we in this paper first introduce HierTFR, a hierarchal APA method that jointly models the intrinsic structures of an utterance while considering the relatedness among the pronunciation aspects. We also propose a correlation-aware regularizer to strengthen the connection between the estimated scores and the human annotations. Furthermore, novel pre-training strategies tailored for different linguistic levels are put forward so as to facilitate better model initialization. An extensive set of empirical experiments conducted on the speechocean762 benchmark dataset suggest the feasibility and effectiveness of our approach in relation to several competitive baselines.- Anthology ID:
- 2024.acl-long.95
- Volume:
- Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1737–1747
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.acl-long.95/
- DOI:
- 10.18653/v1/2024.acl-long.95
- Cite (ACL):
- Bi-Cheng Yan, Jiun-Ting Li, Yi-Cheng Wang, Hsin Wei Wang, Tien-Hong Lo, Yung-Chang Hsu, Wei-Cheng Chao, and Berlin Chen. 2024. An Effective Pronunciation Assessment Approach Leveraging Hierarchical Transformers and Pre-training Strategies. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1737–1747, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- An Effective Pronunciation Assessment Approach Leveraging Hierarchical Transformers and Pre-training Strategies (Yan et al., ACL 2024)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.acl-long.95.pdf