@inproceedings{ye-etal-2023-system,
    title = "System Report for {CCL}23-Eval Task 7: {THU} {KEL}ab (sz) - Exploring Data Augmentation and Denoising for {C}hinese Grammatical Error Correction",
    author = "Ye, Jingheng  and
      Li, Yinghui  and
      Zheng, Haitao",
    editor = "Sun, Maosong  and
      Qin, Bing  and
      Qiu, Xipeng  and
      Jiang, Jing  and
      Han, Xianpei",
    booktitle = "Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)",
    month = aug,
    year = "2023",
    address = "Harbin, China",
    publisher = "Chinese Information Processing Society of China",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.ccl-3.29/",
    pages = "262--270",
    language = "eng",
    abstract = "``This paper explains our GEC system submitted by THU KELab (sz) in the CCL2023-Eval Task7 CLTC (Chinese Learner Text Correction) Track 1: Multidimensional Chinese Learner TextCorrection. Recent studies have demonstrate GEC performance can be improved by increasingthe amount of training data. However, high-quality public GEC data is much less abundant. To address this issue, we propose two data-driven techniques, data augmentation and data de-noising, to improve the GEC performance. Data augmentation creates pseudo data to enhancegeneralization, while data denoising removes noise from the realistic training data. The resultson the official evaluation dataset YACLC demonstrate the effectiveness of our approach. Finally,our GEC system ranked second in both close and open tasks. All of our datasets and codes areavailabel at \url{https://github.com/THUKElab/CCL2023-CLTC-THU_KELab}.''"
}Markdown (Informal)
[System Report for CCL23-Eval Task 7: THU KELab (sz) - Exploring Data Augmentation and Denoising for Chinese Grammatical Error Correction](https://preview.aclanthology.org/ingest-emnlp/2023.ccl-3.29/) (Ye et al., CCL 2023)
ACL