SeungHeon Doh
Also published as: Seungheon Doh
2026
Proceedings of the 4th Workshop on NLP for Music and Audio (NLP4MusA 2026)
Elena V. Epure | Sergio Oramas | SeungHeon Doh | Pedro Ramoneda | Anna Kruspe | Mohamed Sordo
Proceedings of the 4th Workshop on NLP for Music and Audio (NLP4MusA 2026)
Elena V. Epure | Sergio Oramas | SeungHeon Doh | Pedro Ramoneda | Anna Kruspe | Mohamed Sordo
Proceedings of the 4th Workshop on NLP for Music and Audio (NLP4MusA 2026)
ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering
Daeyong Kwon | SeungHeon Doh | Juhan Nam
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Daeyong Kwon | SeungHeon Doh | Juhan Nam
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Recent advances in Large Language Models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowledge in pretraining data. While music information retrieval and computational musicology have explored structured and multimodal understanding, few resources support factual and contextual music question answering (MQA) grounded in artist metadata or historical context. We introduce MusWikiDB, a vector database of 3.2M passages from 144K music-related Wikipedia pages, and ArtistMus, a benchmark of 1,000 questions on 500 diverse artists with metadata such as genre, debut year, and topic. These resources enable systematic evaluation of retrieval augmented generation (RAG) for MQA. Experiments show that RAG markedly improves factual accuracy—open-source models gain up to +56.8 percentage points (pp; Qwen3 8B: 35.0→91.8), approaching proprietary performance. RAG-style fine-tuning further boosts both factual recall and contextual reasoning, yielding strong improvements on both in-domain and out-of-domain benchmarks. MusWikiDB also yields +6 pp higher accuracy and 67% faster retrieval than the general Wikipedia corpus. We release MusWikiDB and ArtistMus to advance research in music information retrieval and domain-specific QA, establishing a foundation for retrieval augmented reasoning in culturally rich domains such as music.
2025
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
Shangda Wu | Guo Zhancheng | Ruibin Yuan | Junyan Jiang | SeungHeon Doh | Gus Xia | Juhan Nam | Xiaobing Li | Feng Yu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
Shangda Wu | Guo Zhancheng | Ruibin Yuan | Junyan Jiang | SeungHeon Doh | Gus Xia | Juhan Nam | Xiaobing Li | Feng Yu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
CLaMP 3 is a unified framework developed to address challenges of cross-modal and cross-lingual generalization in music information retrieval. Using contrastive learning, it aligns all major music modalities–including sheet music, performance signals, and audio recordings–with multilingual text in a shared representation space, enabling retrieval across unaligned modalities with text as a bridge. It features a multilingual text encoder adaptable to unseen languages, exhibiting strong cross-lingual generalization. Leveraging retrieval-augmented generation, we curated M4-RAG, a web-scale dataset consisting of 2.31 million music-text pairs. This dataset is enriched with detailed metadata that represents a wide array of global musical traditions. To advance future research, we release WikiMT-X, a benchmark comprising 1,000 triplets of sheet music, audio, and richly varied text descriptions. Experiments show that CLaMP 3 achieves state-of-the-art performance on multiple MIR tasks, significantly surpassing previous strong baselines and demonstrating excellent generalization in multimodal and multilingual music contexts.
2024
PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text
Hayeon Bang | Eunjin Choi | Megan Finch | Seungheon Doh | Seolhee Lee | Gyeong-Hoon Lee | Juhan Nam
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
Hayeon Bang | Eunjin Choi | Megan Finch | Seungheon Doh | Seolhee Lee | Gyeong-Hoon Lee | Juhan Nam
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
While piano music has become a significant area of study in Music Information Retrieval (MIR), there is a notable lack of datasets for piano solo music with text labels. To address this gap, we present PIAST (PIano dataset with Audio, Symbolic, and Text), a piano music dataset. Utilizing a piano-specific taxonomy of semantic tags, we collected 9,673 tracks from YouTube and added human annotations for 2,023 tracks by music experts, resulting in two subsets: PIAST-YT and PIAST-AT. Both include audio, text, tag annotations, and transcribed MIDI utilizing state-of-the-art piano transcription and beat tracking models. Among many possible tasks with the multimodal dataset, we conduct music tagging and retrieval using both audio and MIDI data and report baseline performances to demonstrate its potential as a valuable resource for MIR research.
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
Anna Kruspe | Sergio Oramas | Elena V. Epure | Mohamed Sordo | Benno Weck | SeungHeon Doh | Minz Won | Ilaria Manco | Gabriel Meseguer-Brocal
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
Anna Kruspe | Sergio Oramas | Elena V. Epure | Mohamed Sordo | Benno Weck | SeungHeon Doh | Minz Won | Ilaria Manco | Gabriel Meseguer-Brocal
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)