Octopus: Towards Building the Arabic Speech LLM Suite
Sara Althubaiti, Vasista Sai Lodagala, Tjad Clark, Yousseif Ahmed Elshahawy, Daniel Izham, Abdullah Alrajeh, Aljawahrah Bin Tamran, Ahmed Ali
Abstract
We present Octopus, a first family of modular speech-language models designed for Arabic-English ASR, dialect identification, and speech translation. Built on Whisper-V3 and enhanced with large language models like ALLaM, LLaMA, and DeepSeek, Octopus bridges speech and text through a lightweight projection layer and Q-Former. To broaden its scope beyond speech, Octopus integrates BEATs, a general-purpose audio encoder allowing it to understand both linguistic and acoustic events. Despite its simplicity, this dual-encoder design supports robust performance across multilingual and code-switched scenarios. We also introduce TinyOctopus, a distilled variant using smaller models (Distil-Whisper + LLaMA3-1B / DeepSeek-1.5B), achieving competitive results with just a fraction of the parameters. Fine-tuning on synthetic code-switched data further boosts its performance. Octopus demonstrates the power of compact, extensible architectures in Arabic-centric speech modeling and sets the stage for unified multilingual audio-language understanding.- Anthology ID:
- 2025.arabicnlp-main.35
- Volume:
- Proceedings of The Third Arabic Natural Language Processing Conference
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
- Venue:
- ArabicNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 425–435
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.35/
- DOI:
- Cite (ACL):
- Sara Althubaiti, Vasista Sai Lodagala, Tjad Clark, Yousseif Ahmed Elshahawy, Daniel Izham, Abdullah Alrajeh, Aljawahrah Bin Tamran, and Ahmed Ali. 2025. Octopus: Towards Building the Arabic Speech LLM Suite. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 425–435, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Octopus: Towards Building the Arabic Speech LLM Suite (Althubaiti et al., ArabicNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.35.pdf