ACE: Automatic Colloquialism, Typographical and Orthographic Errors Detection for Chinese Language

Shichao Dong, Gabriel Pui Cheong Fung, Binyang Li, Baolin Peng, Ming Liao, Jia Zhu, Kam-fai Wong


Abstract
We present a system called ACE for Automatic Colloquialism and Errors detection for written Chinese. ACE is based on the combination of N-gram model and rule-base model. Although it focuses on detecting colloquial Cantonese (a dialect of Chinese) at the current stage, it can be extended to detect other dialects. We chose Cantonese becauase it has many interesting properties, such as unique grammar system and huge colloquial terms, that turn the detection task extremely challenging. We conducted experiments using real data and synthetic data. The results indicated that ACE is highly reliable and effective.
Anthology ID:
C16-2041
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
Month:
December
Year:
2016
Address:
Osaka, Japan
Editor:
Hideo Watanabe
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
194–197
Language:
URL:
https://aclanthology.org/C16-2041
DOI:
Bibkey:
Cite (ACL):
Shichao Dong, Gabriel Pui Cheong Fung, Binyang Li, Baolin Peng, Ming Liao, Jia Zhu, and Kam-fai Wong. 2016. ACE: Automatic Colloquialism, Typographical and Orthographic Errors Detection for Chinese Language. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, pages 194–197, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
ACE: Automatic Colloquialism, Typographical and Orthographic Errors Detection for Chinese Language (Dong et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/C16-2041.pdf