BoYaEval: Evaluating Multimodal Large Language Models on Understanding Ancient Chinese Musical Scores
Jiajia Li, Weizhi Xue, Yao Yao, Qiwei Li, Chenchong, Zuchao Li, Ping Wang, Hai Zhao
Abstract
Multimodal Large Language Models (MLLMs) excel in general tasks but struggle with specialized, structured cultural symbols. We introduce BoYaEval, the first comprehensive benchmark dedicated to deciphering diverse Ancient Chinese musical notations, including five types of ancient Chinese music notation systems. These systems utilize unique spatial layouts and specialized ideograms to encode pitch and intricate playing techniques. BoYaEval comprises 3,175 high-quality images across these notation styles and establishes a three-tier evaluation: Structural Parsing (symbol recognition), Instructional Translation (technique mapping), and Musical Reasoning (melody derivation). We evaluate 21 leading MLLMs. Results indicate that while models perform adequately in basic recognition, they fail in cross-system compositional logic, scoring only around 27% on reasoning tasks. BoYaEval highlights the limitations of current MLLMs in processing diverse spatial-symbolic dependencies, bridging the gap between ancient wisdom and modern AI for digitizing intangible cultural heritage. The BoYaEval benchmark is publicly available at https://huggingface.co/datasets/MYTH-Lab/BoYaEval.- Anthology ID:
- 2026.acl-long.997
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 21858–21873
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.997/
- DOI:
- Cite (ACL):
- Jiajia Li, Weizhi Xue, Yao Yao, Qiwei Li, Chenchong, Zuchao Li, Ping Wang, and Hai Zhao. 2026. BoYaEval: Evaluating Multimodal Large Language Models on Understanding Ancient Chinese Musical Scores. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 21858–21873, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- BoYaEval: Evaluating Multimodal Large Language Models on Understanding Ancient Chinese Musical Scores (Li et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.997.pdf