New Compendium of a Myriad of Plants: A New Dataset Describing Ancient Chinese Plants
Xiaobin Shen, Zhongqing Wang, Shichen Li, Chu-Ren Huang, Guodong Zhou
Abstract
In ancient China, a variety of datasets depicted humanistic scenes, geographical features, and plants. However, these datasets, compiled long ago, often contain errors, lack comprehensiveness, and are inconsistent with modern realities. To meet current demands, we aim to expand and improve ancient datasets using large language model. Focusing on the Great Compendium of Myriad Flowers, an invaluable ancient plants dataset, we gather information on numerous previously excluded plants, carefully select and organize classical Chinese poetry and prose, and construct a comprehensive botanical encyclopedia knowledge system. Additionally, we collect ancient paintings and modern photographs of plants to enrich the dataset. Furthermore, we propose a novel multi-modal plant classification model designed to integrate multi-modal information from both classical and contemporary sources, enabling the extraction of plant-related information from classical Chinese poetry and prose. Extensive experiments demonstrate the importance of the proposed new ancient plants dataset, and also indicate the effectiveness of our proposed multi-modal plant classification model.- Anthology ID:
- 2026.findings-acl.73
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1483–1498
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.73/
- DOI:
- Cite (ACL):
- Xiaobin Shen, Zhongqing Wang, Shichen Li, Chu-Ren Huang, and Guodong Zhou. 2026. New Compendium of a Myriad of Plants: A New Dataset Describing Ancient Chinese Plants. In Findings of the Association for Computational Linguistics: ACL 2026, pages 1483–1498, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- New Compendium of a Myriad of Plants: A New Dataset Describing Ancient Chinese Plants (Shen et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.73.pdf