基于数据增强和多任务特征学习的中文语法错误检测方法(Chinese Grammar Error Detection based on Data Enhancement and Multi-task Feature Learning)
Haihua Xie (谢海华), Zhiyou Chen (陈志优), Jing Cheng (程静), Xiaoqing Lyu (吕肖庆), Zhi Tang (汤帜)
Abstract
由于中文语法的复杂性,中文语法错误检测(CGED)的难度较大,而训练语料和相关研究的缺乏,使得CGED的效果还远达不到能够实用的程度。本文提出一种CGED模型,采用数据增强、预训练语言模型和基于语言学特征多任务学习的方式,弥补训练语料稀缺的不足。数据增强能够有效地扩充训练集,预训练语言模型蕴含丰富的语义信息有助于语法分析,基于语言学特征多任务学习对语言模型进行微调则可以使语言模型学习到跟语法错误检测相关的语言学特征。本文提出的方法在NLPTEA的CGED数据集进行测试,取得了优于其他模型的结果。- Anthology ID:
- 2020.ccl-1.71
- Volume:
- Proceedings of the 19th Chinese National Conference on Computational Linguistics
- Month:
- October
- Year:
- 2020
- Address:
- Haikou, China
- Editors:
- Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 761–770
- Language:
- Chinese
- URL:
- https://aclanthology.org/2020.ccl-1.71
- DOI:
- Cite (ACL):
- Haihua Xie, Zhiyou Chen, Jing Cheng, Xiaoqing Lyu, and Zhi Tang. 2020. 基于数据增强和多任务特征学习的中文语法错误检测方法(Chinese Grammar Error Detection based on Data Enhancement and Multi-task Feature Learning). In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 761–770, Haikou, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于数据增强和多任务特征学习的中文语法错误检测方法(Chinese Grammar Error Detection based on Data Enhancement and Multi-task Feature Learning) (Xie et al., CCL 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.ccl-1.71.pdf