New Compendium of a Myriad of Plants: A New Dataset Describing Ancient Chinese Plants

Xiaobin Shen, Zhongqing Wang, Shichen Li, Chu-Ren Huang, Guodong Zhou


Abstract
In ancient China, a variety of datasets depicted humanistic scenes, geographical features, and plants. However, these datasets, compiled long ago, often contain errors, lack comprehensiveness, and are inconsistent with modern realities. To meet current demands, we aim to expand and improve ancient datasets using large language model. Focusing on the Great Compendium of Myriad Flowers, an invaluable ancient plants dataset, we gather information on numerous previously excluded plants, carefully select and organize classical Chinese poetry and prose, and construct a comprehensive botanical encyclopedia knowledge system. Additionally, we collect ancient paintings and modern photographs of plants to enrich the dataset. Furthermore, we propose a novel multi-modal plant classification model designed to integrate multi-modal information from both classical and contemporary sources, enabling the extraction of plant-related information from classical Chinese poetry and prose. Extensive experiments demonstrate the importance of the proposed new ancient plants dataset, and also indicate the effectiveness of our proposed multi-modal plant classification model.
Anthology ID:
2026.findings-acl.73
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1483–1498
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.73/
DOI:
Bibkey:
Cite (ACL):
Xiaobin Shen, Zhongqing Wang, Shichen Li, Chu-Ren Huang, and Guodong Zhou. 2026. New Compendium of a Myriad of Plants: A New Dataset Describing Ancient Chinese Plants. In Findings of the Association for Computational Linguistics: ACL 2026, pages 1483–1498, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
New Compendium of a Myriad of Plants: A New Dataset Describing Ancient Chinese Plants (Shen et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.73.pdf
Checklist:
 2026.findings-acl.73.checklist.pdf