Bringing Suzhou Numerals into the Digital Age: A Dataset and Recognition Study on Ancient Chinese Trade Records
Ting-Lin Wu, Zih-Ching Chen, Chen-Yuan Chen, Pi-Jhong Chen, Li-Chiao Wang
Abstract
Suzhou numerals, a specialized numerical no-tation system historically used in Chinese com-merce and accounting, played a pivotal role in financial transactions from the Song Dynasty to the early 20th century. Despite their his-torical significance, they remain largely absent from modern OCR benchmarks, limiting com-putational access to archival trade documents. This paper presents a curated dataset of 773 expert-annotated Suzhou numeral samples ex-tracted from late Qing-era trade ledgers. We provide a statistical analysis of character distri-butions, offering insights into their real-world usage in historical bookkeeping. Additionally, we evaluate baseline performance with hand-written text recognition (HTR) model, high-lighting the challenges of recognizing low-resource brush-written numerals. By introduc-ing this dataset and initial benchmark results, we aim to facilitate research in historical doc-umentation in ancient Chinese characters, ad-vancing the digitization of early Chinese finan-cial records. The dataset is publicly available at our huggingface hub, and our codebase can be accessed at our github repository.- Anthology ID:
- 2025.alp-1.13
- Volume:
- Proceedings of the Second Workshop on Ancient Language Processing
- Month:
- May
- Year:
- 2025
- Address:
- The Albuquerque Convention Center, Laguna
- Editors:
- Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
- Venues:
- ALP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 105–111
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.alp-1.13/
- DOI:
- Cite (ACL):
- Ting-Lin Wu, Zih-Ching Chen, Chen-Yuan Chen, Pi-Jhong Chen, and Li-Chiao Wang. 2025. Bringing Suzhou Numerals into the Digital Age: A Dataset and Recognition Study on Ancient Chinese Trade Records. In Proceedings of the Second Workshop on Ancient Language Processing, pages 105–111, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
- Cite (Informal):
- Bringing Suzhou Numerals into the Digital Age: A Dataset and Recognition Study on Ancient Chinese Trade Records (Wu et al., ALP 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.alp-1.13.pdf