Hannan Cao


2023

pdf
Mitigating Exposure Bias in Grammatical Error Correction with Data Augmentation and Reweighting
Hannan Cao | Wenmian Yang | Hwee Tou Ng
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

The most popular approach in grammatical error correction (GEC) is based on sequence-to-sequence (seq2seq) models. Similar to other autoregressive generation tasks, seq2seq GEC also faces the exposure bias problem, i.e., the context tokens are drawn from different distributions during training and testing, caused by the teacher forcing mechanism. In this paper, we propose a novel data manipulation approach to overcome this problem, which includes a data augmentation method during training to mimic the decoder input at inference time, and a data reweighting method to automatically balance the importance of each kind of augmented samples. Experimental results on benchmark GEC datasets show that our method achieves significant improvements compared to prior approaches.

2021

pdf
Grammatical Error Correction with Contrastive Learning in Low Error Density Domains
Hannan Cao | Wenmian Yang | Hwee Tou Ng
Findings of the Association for Computational Linguistics: EMNLP 2021

Although grammatical error correction (GEC) has achieved good performance on texts written by learners of English as a second language, performance on low error density domains where texts are written by English speakers of varying levels of proficiency can still be improved. In this paper, we propose a contrastive learning approach to encourage the GEC model to assign a higher probability to a correct sentence while reducing the probability of incorrect sentences that the model tends to generate, so as to improve the accuracy of the model. Experimental results show that our approach significantly improves the performance of GEC models in low error density domains, when evaluated on the benchmark CWEB dataset.