Bringing Suzhou Numerals into the Digital Age: A Dataset and Recognition Study on Ancient Chinese Trade Records

Ting-Lin Wu, Zih-Ching Chen, Chen-Yuan Chen, Pi-Jhong Chen, Li-Chiao Wang


Abstract
Suzhou numerals, a specialized numerical no-tation system historically used in Chinese com-merce and accounting, played a pivotal role in financial transactions from the Song Dynasty to the early 20th century. Despite their his-torical significance, they remain largely absent from modern OCR benchmarks, limiting com-putational access to archival trade documents. This paper presents a curated dataset of 773 expert-annotated Suzhou numeral samples ex-tracted from late Qing-era trade ledgers. We provide a statistical analysis of character distri-butions, offering insights into their real-world usage in historical bookkeeping. Additionally, we evaluate baseline performance with hand-written text recognition (HTR) model, high-lighting the challenges of recognizing low-resource brush-written numerals. By introduc-ing this dataset and initial benchmark results, we aim to facilitate research in historical doc-umentation in ancient Chinese characters, ad-vancing the digitization of early Chinese finan-cial records. The dataset is publicly available at our huggingface hub, and our codebase can be accessed at our github repository.
Anthology ID:
2025.alp-1.13
Volume:
Proceedings of the Second Workshop on Ancient Language Processing
Month:
May
Year:
2025
Address:
The Albuquerque Convention Center, Laguna
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
Venues:
ALP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–111
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.alp-1.13/
DOI:
Bibkey:
Cite (ACL):
Ting-Lin Wu, Zih-Ching Chen, Chen-Yuan Chen, Pi-Jhong Chen, and Li-Chiao Wang. 2025. Bringing Suzhou Numerals into the Digital Age: A Dataset and Recognition Study on Ancient Chinese Trade Records. In Proceedings of the Second Workshop on Ancient Language Processing, pages 105–111, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
Cite (Informal):
Bringing Suzhou Numerals into the Digital Age: A Dataset and Recognition Study on Ancient Chinese Trade Records (Wu et al., ALP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.alp-1.13.pdf