A Training-free LLM-based Approach to General Chinese Character Error Correction

Houquan Zhou, Bo Zhang, Zhenghua Li, Ming Yan, Min Zhang


Abstract
Chinese spelling correction (CSC) is a crucial task that aims to correct character errors in Chinese text. While conventional CSC focuses on character substitution errors caused by mistyping, two other common types of character errors, missing and redundant characters, have received less attention. These errors are often excluded from CSC datasets during the annotation process or ignored during evaluation, even when they have been annotated. This issue limits the practicality of the CSC task. To address this issue, we introduce the task of General Chinese Character Error Correction (C2EC), which focuses on all three types of character errors. We construct a high-quality C2EC benchmark by combining and manually verifying data from CCTC and Lemon datasets. We extend the training-free prompt-free CSC method to C2EC by using Levenshtein distance for handling length changes and leveraging an additional prompt-based large language model (LLM) to improve performance. Experiments show that our method enables a 14B-parameter LLM to be on par with models nearly 50 times larger on both conventional CSC and C2EC tasks, without any fine-tuning.
Anthology ID:
2025.acl-long.678
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13827–13852
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.678/
DOI:
Bibkey:
Cite (ACL):
Houquan Zhou, Bo Zhang, Zhenghua Li, Ming Yan, and Min Zhang. 2025. A Training-free LLM-based Approach to General Chinese Character Error Correction. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13827–13852, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
A Training-free LLM-based Approach to General Chinese Character Error Correction (Zhou et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.678.pdf