Jin Chen
2020
基于抽象语义表示的汉语疑问句的标注与分析(Chinese Interrogative Sentences Annotation and Analysis Based on the Abstract Meaning Representation)
Peiyi Yan (闫培艺)
|
Bin Li (李斌)
|
Tong Huang (黄彤)
|
Kairui Huo (霍凯蕊)
|
Jin Chen (陈瑾)
|
Weiguang Qu (曲维光)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
疑问句的句法语义分析在搜索引擎、信息抽取和问答系统等领域有着广泛的应用。计算语言学多采取问句分类和句法分析相结合的方式来处理疑问句,精度和效率还不理想。而疑问句的语言学研究成果丰富,比如疑问句的结构类型、疑问焦点和疑问代词的非疑问用法等,但缺乏系统的形式化表示。本文致力于解决这一难题,采用基于图结构的汉语句子语义的整体表示方法—中文抽象语义表示(CAMR)来标注疑问句的语义结构,将疑问焦点和整句语义一体化表示出来。然后选取了宾州中文树库CTB8.0网络媒体语料、小学语文教材以及《小王子》中文译本的2万句语料中共计2071句疑问句,统计了疑问句的主要特点。统计表明,各种疑问代词都可以通过疑问概念amr-unknown和语义关系的组合来表示,能够完整地表示出疑问句的关键信息、疑问焦点和语义结构。最后,根据疑问代词所关联的语义关系,统计了疑问焦点的概率分布,其中原因、修饰语和受事的占比最高,分别占26.53%、16.73%以及16.44%。基于抽象语义表示的疑问句标注与分析可以为汉语疑问句研究提供基础理论与资源。
2019
Erroneous data generation for Grammatical Error Correction
Shuyao Xu
|
Jiehao Zhang
|
Jin Chen
|
Long Qin
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
It has been demonstrated that the utilization of a monolingual corpus in neural Grammatical Error Correction (GEC) systems can significantly improve the system performance. The previous state-of-the-art neural GEC system is an ensemble of four Transformer models pretrained on a large amount of Wikipedia Edits. The Singsound GEC system follows a similar approach but is equipped with a sophisticated erroneous data generating component. Our system achieved an F0:5 of 66.61 in the BEA 2019 Shared Task: Grammatical Error Correction. With our novel erroneous data generating component, the Singsound neural GEC system yielded an M2 of 63.2 on the CoNLL-2014 benchmark (8.4% relative improvement over the previous state-of-the-art system).
2018
CLUF: a Neural Model for Second Language Acquisition Modeling
Shuyao Xu
|
Jin Chen
|
Long Qin
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
Second Language Acquisition Modeling is the task to predict whether a second language learner would respond correctly in future exercises based on their learning history. In this paper, we propose a neural network based system to utilize rich contextual, linguistic and user information. Our neural model consists of a Context encoder, a Linguistic feature encoder, a User information encoder and a Format information encoder (CLUF). Furthermore, a decoder is introduced to combine such encoded features and make final predictions. Our system ranked in first place in the English track and second place in the Spanish and French track with an AUROC score of 0.861, 0.835 and 0.854 respectively.