Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

Maxime Poli; Emmanuel Chemla; Emmanuel Dupoux

doi:10.18653/v1/2024.emnlp-main.302

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

Maxime Poli, Emmanuel Chemla, Emmanuel Dupoux

Abstract

Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible. Generating speech through a pipeline that operates at the text level typically loses nuances, intonations, and non-verbal vocalizations. Modeling directly from speech opens up the path to more natural and expressive systems. On the other hand, speech-only systems require up to three orders of magnitude more data to catch up to their text-based counterparts in terms of their semantic abilities. We show that fine-tuning speech representation models on phoneme classification leads to more context-invariant representations, and language models trained on these units achieve comparable lexical comprehension to ones trained on hundred times more data.

Anthology ID:: 2024.emnlp-main.302
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5284–5292
Language:
URL:: https://aclanthology.org/2024.emnlp-main.302
DOI:: 10.18653/v1/2024.emnlp-main.302
Bibkey:
Cite (ACL):: Maxime Poli, Emmanuel Chemla, and Emmanuel Dupoux. 2024. Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5284–5292, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach (Poli et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-main.302.pdf

PDF Search