Abstract
“This paper explains our GEC system submitted by THU KELab (sz) in the CCL2023-Eval Task7 CLTC (Chinese Learner Text Correction) Track 1: Multidimensional Chinese Learner TextCorrection. Recent studies have demonstrate GEC performance can be improved by increasingthe amount of training data. However, high-quality public GEC data is much less abundant. To address this issue, we propose two data-driven techniques, data augmentation and data de-noising, to improve the GEC performance. Data augmentation creates pseudo data to enhancegeneralization, while data denoising removes noise from the realistic training data. The resultson the official evaluation dataset YACLC demonstrate the effectiveness of our approach. Finally,our GEC system ranked second in both close and open tasks. All of our datasets and codes areavailabel at https://github.com/THUKElab/CCL2023-CLTC-THU_KELab.”- Anthology ID:
- 2023.ccl-3.29
- Volume:
- Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
- Month:
- August
- Year:
- 2023
- Address:
- Harbin, China
- Editors:
- Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 262–270
- Language:
- English
- URL:
- https://aclanthology.org/2023.ccl-3.29
- DOI:
- Cite (ACL):
- Jingheng Ye, Yinghui Li, and Haitao Zheng. 2023. System Report for CCL23-Eval Task 7: THU KELab (sz) - Exploring Data Augmentation and Denoising for Chinese Grammatical Error Correction. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations), pages 262–270, Harbin, China. Chinese Information Processing Society of China.
- Cite (Informal):
- System Report for CCL23-Eval Task 7: THU KELab (sz) - Exploring Data Augmentation and Denoising for Chinese Grammatical Error Correction (Ye et al., CCL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.ccl-3.29.pdf