iKnow-audio: Integrating Knowledge Graphs with Audio-Language Models
Michel Olvera, Changhong Wang, Paraskevas Stamatiadis, Gaël Richard, Slim Essid
Abstract
Contrastive Language–Audio Pretraining (CLAP) models learn by aligning audio and text in a shared embedding space, enabling powerful zero-shot recognition. However, their performance is highly sensitive to prompt formulation and language nuances, and they often inherit semantic ambiguities and spurious correlations from noisy pretraining data. While prior work has explored prompt engineering, adapters, and prefix tuning to address these limitations, the use of structured prior knowledge remains largely unexplored. We present iKnow-audio, a framework that integrates knowledge graphs with audio-language models to provide robust semantic grounding. iKnow-audio builds on the Audio-centric Knowledge Graph (AKG), which encodes ontological relations comprising semantic, causal, and taxonomic connections reflective of everyday sound scenes and events. By training knowlege graph embedding models on the AKG and refining CLAP predictions through this structured knowledge, iKnow-audio improves disambiguation of acoustically similar sounds and reduces reliance on prompt engineering. Comprehensive zero-shot evaluations across six benchmark datasets demonstrate consistent gains over baseline CLAP, supported by embedding-space analyses that highlight improved relational grounding. Resources are publicly available at https://github.com/michelolzam/iknow-audio- Anthology ID:
- 2025.emnlp-main.1759
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 34671–34688
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1759/
- DOI:
- Cite (ACL):
- Michel Olvera, Changhong Wang, Paraskevas Stamatiadis, Gaël Richard, and Slim Essid. 2025. iKnow-audio: Integrating Knowledge Graphs with Audio-Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34671–34688, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- iKnow-audio: Integrating Knowledge Graphs with Audio-Language Models (Olvera et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1759.pdf