SparQLe: Speech Queries to Text Translation Through LLMs

Amirbek Djanibekov, Hanan Aldarmaki


Abstract
With the growing influence of Large Language Models (LLMs), there is increasing interest in integrating speech representations with them to enable more seamless multi-modal processing and speech understanding. This study introduces a novel approach that combines self-supervised speech representations with instruction-tuned LLMs for speech-to-text translation. The proposed approach leverages a modality adapter to align extracted speech features with instruction-tuned LLMs using English speech data. Our experiments demonstrate that this method effectively preserves the semantic content of the input speech and serves as an effective bridge between self-supervised speech models and instruction-tuned LLMs, offering a promising approach for various speech understanding applications.
Anthology ID:
2025.iwslt-1.6
Volume:
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Antonis Anastasopoulos
Venues:
IWSLT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–83
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.iwslt-1.6/
DOI:
Bibkey:
Cite (ACL):
Amirbek Djanibekov and Hanan Aldarmaki. 2025. SparQLe: Speech Queries to Text Translation Through LLMs. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), pages 76–83, Vienna, Austria (in-person and online). Association for Computational Linguistics.
Cite (Informal):
SparQLe: Speech Queries to Text Translation Through LLMs (Djanibekov & Aldarmaki, IWSLT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.iwslt-1.6.pdf