An Alignment-Agnostic Model for Chinese Text Error Correction

Liying Zheng, Yue Deng, Weishun Song, Liang Xu, Jing Xiao


Abstract
This paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which are common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters, but cannot handle missing or redundant characters due to inconsistency between model inputs and outputs. Although Seq2Seq-based or sequence tagging methods provide solutions to the three error types and achieved relatively good results in English context, they do not perform well in Chinese context according to our experiments. In our work, we propose a novel alignment-agnostic detect-correct framework that can handle both text aligned and non-aligned situations and can serve as a cold start model when no annotation data are provided. Experimental results on three datasets demonstrate that our method is effective and achieves a better performance than most recent published models.
Anthology ID:
2021.findings-emnlp.30
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
321–326
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.30
DOI:
10.18653/v1/2021.findings-emnlp.30
Bibkey:
Cite (ACL):
Liying Zheng, Yue Deng, Weishun Song, Liang Xu, and Jing Xiao. 2021. An Alignment-Agnostic Model for Chinese Text Error Correction. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 321–326, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
An Alignment-Agnostic Model for Chinese Text Error Correction (Zheng et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.findings-emnlp.30.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2021.findings-emnlp.30.mp4