Abstract
Recent advancements in language modeling have led to the emergenceof Large Language Models (LLMs) capable ofvarious natural language processing tasks.Despite their success in text-based tasks, applying LLMs to the speech domainremains limited and challenging. This paper presents BLOOMZMMS, a novel modelthat integrates a multilingual LLM with a multilingual speech encoder,aiming to harness the capabilities of LLMs for speech recognition and beyond.Utilizing a multi-instructional training approach, we demonstrate the transferabilityof linguistic knowledge from the text to the speech modality.Our experiments, conducted on 1900 hours of transcribed data from 139 languages,establish that a multilingual speech representation can be effectivelylearned and aligned with a multilingual LLM. While this learned representationinitially shows limitations in task generalization, we address this issue bygenerating synthetic targets in a multi-instructional style.Our zero-shot evaluation results confirm the robustness of our approach acrossmultiple tasks, including speech translation and multilingual spoken languageunderstanding, thereby opening new avenues for applying LLMs in the speech domain.- Anthology ID:
- 2024.findings-naacl.52
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2024
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 814–834
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-naacl.52/
- DOI:
- 10.18653/v1/2024.findings-naacl.52
- Cite (ACL):
- Pavel Denisov and Thang Vu. 2024. Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 814–834, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training (Denisov & Vu, Findings 2024)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-naacl.52.pdf