Xiaoqing Lyu
2022
String Editing Based Chinese Grammatical Error Diagnosis
Haihua Xie
|
Xiaoqing Lyu
|
Xuefei Chen
Proceedings of the 29th International Conference on Computational Linguistics
Chinese Grammatical Error Diagnosis (CGED) suffers the problems of numerous types of grammatical errors and insufficiency of training data. In this paper, we propose a string editing based CGED model that requires less training data by using a unified workflow to handle various types of grammatical errors. Two measures are proposed in our model to enhance the performance of CGED. First, the detection and correction of grammatical errors are divided into different stages. In the stage of error detection, the model only outputs the types of grammatical errors so that the tag vocabulary size is significantly reduced compared with other string editing based models. Secondly, the correction of some grammatical errors is converted to the task of masked character inference, which has plenty of training data and mature solutions. Experiments on datasets of NLPTEA-CGED demonstrate that our model outperforms other CGED models in many aspects.
2020
基于数据增强和多任务特征学习的中文语法错误检测方法(Chinese Grammar Error Detection based on Data Enhancement and Multi-task Feature Learning)
Haihua Xie (谢海华)
|
Zhiyou Chen (陈志优)
|
Jing Cheng (程静)
|
Xiaoqing Lyu (吕肖庆)
|
Zhi Tang (汤帜)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
由于中文语法的复杂性,中文语法错误检测(CGED)的难度较大,而训练语料和相关研究的缺乏,使得CGED的效果还远达不到能够实用的程度。本文提出一种CGED模型,采用数据增强、预训练语言模型和基于语言学特征多任务学习的方式,弥补训练语料稀缺的不足。数据增强能够有效地扩充训练集,预训练语言模型蕴含丰富的语义信息有助于语法分析,基于语言学特征多任务学习对语言模型进行微调则可以使语言模型学习到跟语法错误检测相关的语言学特征。本文提出的方法在NLPTEA的CGED数据集进行测试,取得了优于其他模型的结果。
Search