2024
pdf
abs
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
Zihao Deng
|
Yinghao Ma
|
Yudong Liu
|
Rongchen Guo
|
Ge Zhang
|
Wenhu Chen
|
Wenhao Huang
|
Emmanouil Benetos
Findings of the Association for Computational Linguistics: NAACL 2024
Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT (CITATION) with a frozen LLM, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from captions in the MusicCaps datasets, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones.
pdf
abs
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Ruibin Yuan
|
Hanfeng Lin
|
Yi Wang
|
Zeyue Tian
|
Shangda Wu
|
Tianhao Shen
|
Ge Zhang
|
Yuhang Wu
|
Cong Liu
|
Ziya Zhou
|
Liumeng Xue
|
Ziyang Ma
|
Qin Liu
|
Tianyu Zheng
|
Yizhi Li
|
Yinghao Ma
|
Yiming Liang
|
Xiaowei Chi
|
Ruibo Liu
|
Zili Wang
|
Chenghua Lin
|
Qifeng Liu
|
Tao Jiang
|
Wenhao Huang
|
Wenhu Chen
|
Jie Fu
|
Emmanouil Benetos
|
Gus Xia
|
Roger Dannenberg
|
Wei Xue
|
Shiyin Kang
|
Yike Guo
Findings of the Association for Computational Linguistics ACL 2024
While LLMs demonstrate impressive capabilities in musical knowledge, we find that music reasoning is still an unsolved task.We introduce ChatMusician, an open-source large language model (LLM) that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language.ChatMusician can understand and generate music with a pure text tokenizer without external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score.ChatMusician is capable of composing well-structured, full-length music, condition on texts, chords, melodies, motifs, musical forms, etc.On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 by a noticeable margin. We show that ChatMusician preserves or even surpasses the original LLaMA2 7B’s language abilities by evaluating on MMLU benchmark.Our work reveals that LLMs can be an excellent compressor for music, which can be seen as humanity’s creative language, but there remains significant territory to be conquered.We release our 5B token music-language corpora MusicPiles, the collected MusicTheoryBench, code, model and demo.
pdf
abs
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
Yizhi Li
|
Ge Zhang
|
Xingwei Qu
|
Jiali Li
|
Zhaoqun Li
|
Noah Wang
|
Hao Li
|
Ruibin Yuan
|
Yinghao Ma
|
Kai Zhang
|
Wangchunshu Zhou
|
Yiming Liang
|
Lei Zhang
|
Lei Ma
|
Jiajun Zhang
|
Zuowen Li
|
Wenhao Huang
|
Chenghua Lin
|
Jie Fu
Findings of the Association for Computational Linguistics ACL 2024
The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following.Yet, their effectiveness often diminishes in low-resource languages like Chinese, exacerbated by biased evaluations from data leakage, casting doubt on their true generalizability to new linguistic territories. In response, we introduce the Chinese Instruction-Following Benchmark (**CIF-Bench**), designed to evaluate the zero-shot generalizability of LLMs to the Chinese language. CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances across 20 categories. To mitigate data contamination, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance, totaling 45,000 data instances.Our evaluation of 28 selected LLMs reveals a noticeable performance gap, with the best model scoring only 52.9%, highlighting the limitations of LLMs in less familiar language and task contexts.This work not only uncovers the current limitations of LLMs in handling Chinese language tasks but also sets a new standard for future LLM generalizability research, pushing towards the development of more adaptable, culturally informed, and linguistically diverse models.