Junyan Jiang
2025
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
Shangda Wu
|
Guo Zhancheng
|
Ruibin Yuan
|
Junyan Jiang
|
SeungHeon Doh
|
Gus Xia
|
Juhan Nam
|
Xiaobing Li
|
Feng Yu
|
Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
CLaMP 3 is a unified framework developed to address challenges of cross-modal and cross-lingual generalization in music information retrieval. Using contrastive learning, it aligns all major music modalities–including sheet music, performance signals, and audio recordings–with multilingual text in a shared representation space, enabling retrieval across unaligned modalities with text as a bridge. It features a multilingual text encoder adaptable to unseen languages, exhibiting strong cross-lingual generalization. Leveraging retrieval-augmented generation, we curated M4-RAG, a web-scale dataset consisting of 2.31 million music-text pairs. This dataset is enriched with detailed metadata that represents a wide array of global musical traditions. To advance future research, we release WikiMT-X, a benchmark comprising 1,000 triplets of sheet music, audio, and richly varied text descriptions. Experiments show that CLaMP 3 achieves state-of-the-art performance on multiple MIR tasks, significantly surpassing previous strong baselines and demonstrating excellent generalization in multimodal and multilingual music contexts.
2020
Discovering Music Relations with Sequential Attention
Junyan Jiang
|
Gus Xia
|
Taylor Berg-Kirkpatrick
Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA)
Search
Fix author
Co-authors
- Gus Xia 2
- Taylor Berg-Kirkpatrick 1
- Seungheon Doh 1
- Xiaobing Li 1
- Juhan Nam 1
- show all...