INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang Yoo
Abstract
Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented accents, resulting in a deteriorated performance for non-native (L2) English accents. Although there have been some approaches to mitigate this issue, all of these methods require updating the pre-trained model weights. In this paper, we propose Information Theoretic Adversarial Prompt Tuning (INTapt), which introduces prompts concatenated to the original input that can re-modulate the attention of the pre-trained model such that the corresponding input resembles a native (L1) English speech without updating the backbone weights. INTapt is trained simultaneously in the following two manners: (1) adversarial training to reduce accent feature dependence between the original input and the prompt-concatenated input and (2) training to minimize CTC loss for improving ASR performance to a prompt-concatenated input. Experimental results show that INTapt improves the performance of L2 English and increases feature similarity between L2 and L1 accents.- Anthology ID:
- 2023.findings-acl.627
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9893–9902
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.627
- DOI:
- 10.18653/v1/2023.findings-acl.627
- Cite (ACL):
- Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, and Chang Yoo. 2023. INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9893–9902, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition (Yoon et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2023.findings-acl.627.pdf