Yanqiao Zhu
2025
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Wenxi Chen
|
Ziyang Ma
|
Ruiqi Yan
|
Yuzhe Liang
|
Xiquan Li
|
Ruiyang Xu
|
Zhikang Niu
|
Yanqiao Zhu
|
Yifan Yang
|
Zhanxun Liu
|
Kai Yu
|
Yuxuan Hu
|
Jinyu Li
|
Yan Lu
|
Shujie Liu
|
Xie Chen
Findings of the Association for Computational Linguistics: ACL 2025
Recent advancements highlight the potential of end-to-end real-time spoken dialogue systems, showcasing their low latency and high quality. In this paper, we introduce SLAM-Omni, a timbre-controllable, end-to-end voice interaction system with single-stage training. SLAM-Omni achieves zero-shot timbre control by modeling spoken language with semantic tokens and decoupling speaker information to a vocoder. By predicting grouped speech semantic tokens at each step, our method significantly reduces the sequence length of audio tokens, accelerating both training and inference. Additionally, we propose historical text prompting to compress dialogue history, facilitating efficient multi-round interactions. Comprehensive evaluations reveal that SLAM-Omni outperforms prior models of similar scale, requiring only 15 hours of training on 4 GPUs with limited data. Notably, it is the first spoken dialogue system to achieve competitive performance with a single-stage training approach, eliminating the need for pre-training on TTS or ASR tasks. Further experiments validate its multilingual and multi-turn dialogue capabilities on larger datasets.
2024
BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers
Ran Xu
|
Wenqi Shi
|
Yue Yu
|
Yuchen Zhuang
|
Yanqiao Zhu
|
May Dongmei Wang
|
Joyce C. Ho
|
Chao Zhang
|
Carl Yang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Developing effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the lack of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by instruction fine-tuning on a combination of labeled datasets and synthetic pairs. Experiments on 5 biomedical tasks across 11 datasets verify BMRetriever’s efficacy on various biomedical applications. BMRetriever also exhibits strong parameter efficiency, with the 410M variant outperforming baselines up to 11.7 times larger, and the 2B variant matching the performance of models with over 5B parameters. The training data and model checkpoints are released at https://huggingface.co/BMRetriever to ensure transparency, reproducibility, and application to new domains.