Weiguang Qu

2021

pdf bib abs
中文连动句语义关系识别研究(Research on Semantic Relation Recognition of Chinese Serial-verb Sentences)
Chao Sun (孙超) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Yanhui Gu (顾彦慧) | Bin Li (李斌) | Junsheng Zhou (周俊生)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

“连动句是形如“NP+VP1+VP2”的句子,句中含有两个或两个以上的动词(或动词结构)且动词的施事为同一对象。相同结构的连动句可以表示多种不同的语义关系。本文基于前人对连动句中VP1和VP2之间的语义关系分类,标注了连动句语义关系数据集,基于神经网络完成了对连动句语义关系的识别。该方法将连动句语义识别任务进行分解,基于BERT进行编码,利用BiLSTM-CRF先识别出连动句中连动词(VP)及其主语(NP),再基于融合连动词信息的编码,利用BiLSTM-Attention对连动词进行关系判别,实验结果验证了所提方法的有效性。”

pdf bib abs
中文词语离合现象识别研究(Research on Recognition of the Separation and Reunion Phenomena of Words in Chinese)
Lou Zhou (周露) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

“汉语词语的离合现象是汉语中一种词语可分可合的特殊现象。本文采用字符级序列标注方法解决二字动词离合现象的自动识别问题,以避免中文分词及词性标注的错误传递,节省制定匹配规则与特征模板的人工开支。在训练过程中微调BERT中文预训练模型,获取面向目标任务的字符向量表示,并引入掩码机制对模型隐藏离用法中分离的词语,减轻词语本身对识别结果的影响,强化中间插入成分的学习,并对前后语素采用不同的掩码以强调其出现顺序,进而使模型具备了识别复杂及偶发性离用法的能力。为获得含有上下文信息的句子表达,将原始的句子表达与采用掩码的句子表达分别输入两个不同参数的BiLSTM层进行训练,最后采用CRF算法捕捉句子标签序列的依赖关系。本文提出的BERT MASK + 2BiLSTMs + CRF模型比现有最优的离合词识别模型提高了2.85%的F1值。”

pdf bib
Event Detection as Graph Parsing
Jianye Xie | Haotong Sun | Junsheng Zhou | Weiguang Qu | Xinyu Dai
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib abs
多轮对话的篇章级抽象语义表示标注体系研究(Research on Discourse-level Abstract Meaning Representation Annotation framework in Multi-round Dialogue)
Tong Huang (黄彤) | Bin Li (李斌) | Peiyi Yan (闫培艺) | Tingting Ji (计婷婷) | Weiguang Qu (曲维光)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

对话分析是智能客服、聊天机器人等自然语言对话应用的基础课题,而对话语料与常规书面语料有较大差异,存在大量的称谓、情感短语、省略、语序颠倒、冗余等复杂现象,对句法和语义分析器的影响较大,对话自动分析的准确率相对书面语料一直不高。其主要原因在于对多轮对话缺乏严整的形式化描写方式,不利于后续的分析计算。因此,本文在梳理国内外针对对话的标注体系和语料库的基础上,提出了基于抽象语义表示的篇章级多轮对话标注体系。具体探讨了了篇章级别的语义结构标注方法,给出了词语和概念关系的对齐方案,针对称谓语和情感短语增加了相应的语义关系和概念,调整了表示主观情感词语的论元结构,并对对话中一些特殊现象进行了规定,设计了人工标注平台,为大规模的多轮对话语料库标注与计算研究奠定基础。

pdf bib abs
基于抽象语义表示的汉语疑问句的标注与分析(Chinese Interrogative Sentences Annotation and Analysis Based on the Abstract Meaning Representation)
Peiyi Yan (闫培艺) | Bin Li (李斌) | Tong Huang (黄彤) | Kairui Huo (霍凯蕊) | Jin Chen (陈瑾) | Weiguang Qu (曲维光)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

疑问句的句法语义分析在搜索引擎、信息抽取和问答系统等领域有着广泛的应用。计算语言学多采取问句分类和句法分析相结合的方式来处理疑问句,精度和效率还不理想。而疑问句的语言学研究成果丰富,比如疑问句的结构类型、疑问焦点和疑问代词的非疑问用法等,但缺乏系统的形式化表示。本文致力于解决这一难题,采用基于图结构的汉语句子语义的整体表示方法—中文抽象语义表示(CAMR)来标注疑问句的语义结构,将疑问焦点和整句语义一体化表示出来。然后选取了宾州中文树库CTB8.0网络媒体语料、小学语文教材以及《小王子》中文译本的2万句语料中共计2071句疑问句,统计了疑问句的主要特点。统计表明,各种疑问代词都可以通过疑问概念amr-unknown和语义关系的组合来表示,能够完整地表示出疑问句的关键信息、疑问焦点和语义结构。最后,根据疑问代词所关联的语义关系,统计了疑问焦点的概率分布,其中原因、修饰语和受事的占比最高,分别占26.53%、16.73%以及16.44%。基于抽象语义表示的疑问句标注与分析可以为汉语疑问句研究提供基础理论与资源。

pdf bib abs
基于神经网络的连动句识别(Recognition of serial-verb sentences based on Neural Network)
Chao Sun (孙超) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Yanhui Gu (顾彦慧) | Bin Li (李斌) | Junsheng Zhou (周俊生)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

连动句是具有连动结构的句子,是汉语中的特殊句法结构,在现代汉语中十分常见且使用频繁。连动句语法结构和语义关系都很复杂,在识别中存在许多问题,对此本文针对连动句的识别问题进行了研究,提出了一种基于神经网络的连动句识别方法。本方法分两步:第一步,运用简单的规则对语料进行预处理;第二步,用文本分类的思想,使用BERT编码,利用多层CNN与BiLSTM模型联合提取特征进行分类,进而完成连动句识别任务。在人工标注的语料上进行实验,实验结果达到92.71%的准确率,F1值为87.41%。

pdf bib abs
基于深度学习的实体关系抽取研究综述(Review of Entity Relation Extraction based on deep learning)
Zhentao Xia (夏振涛) | Weiguang Qu (曲维光) | Yanhui Gu (顾彦慧) | Junsheng Zhou (周俊生) | Bin Li (李斌)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

作为信息抽取的一项核心子任务,实体关系抽取对于知识图谱、智能问答、语义搜索等自然语言处理应用都十分重要。关系抽取在于从非结构化文本中自动地识别实体之间具有的某种语义关系。该文聚焦句子级别的关系抽取研究,介绍用于关系抽取的主要数据集并对现有的技术作了阐述,主要分为:有监督的关系抽取、远程监督的关系抽取和实体关系联合抽取。我们对比用于该任务的各种模型,分析它们的贡献与缺陷。最后介绍中文实体关系抽取的研究现状和方法。

pdf bib abs
面向中文AMR标注体系的兼语语料库构建及识别研究(Research on the Construction and Recognition of Concurrent corpus for Chinese AMR Annotation System)
Wenhui Hou (侯文惠) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Bin Li (李斌) | Yanhui Gu (顾彦慧) | Junsheng Zhou (周俊生)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

兼语结构是汉语中常见的一种动词结构,由述宾短语与主谓短语共享兼语,结构复杂,给句法分析造成困难,因此兼语语料库构建及识别工作对于语义解析及下游任务都具有重要意义。但现存兼语语料库较少,面向中文AMR标注体系的兼语语料库构建仍处于空白阶段。针对这一现状,本文总结了一套兼语语料库标注规范,并构建了一定数量面向中文AMR标注体系的兼语语料库。基于构建的语料库,采用基于字符的神经网络模型识别兼语结构,并对识别结果以及未来的改进方向进行分析总结。

pdf bib abs
An Element-aware Multi-representation Model for Law Article Prediction
Huilin Zhong | Junsheng Zhou | Weiguang Qu | Yunfei Long | Yanhui Gu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing works have proved that using law articles as external knowledge can improve the performance of the Legal Judgment Prediction. However, they do not fully use law article information and most of the current work is only for single label samples. In this paper, we propose a Law Article Element-aware Multi-representation Model (LEMM), which can make full use of law article information and can be used for multi-label samples. The model uses the labeled elements of law articles to extract fact description features from multiple angles. It generates multiple representations of a fact for classification. Every label has a law-aware fact representation to encode more information. To capture the dependencies between law articles, the model also introduces a self-attention mechanism between multiple representations. Compared with baseline models like TopJudge, this model improves the accuracy of 5.84%, the macro F1 of 6.42%, and the micro F1 of 4.28%.

pdf bib abs
Construct a Sense-Frame Aligned Predicate Lexicon for Chinese AMR Corpus
Li Song | Yuling Dai | Yihuan Liu | Bin Li | Weiguang Qu
Proceedings of the 12th Language Resources and Evaluation Conference

The study of predicate frame is an important topic for semantic analysis. Abstract Meaning Representation (AMR) is an emerging graph based semantic representation of a sentence. Since core semantic roles defined in the predicate lexicon compose the backbone in an AMR graph, the construction of the lexicon becomes the key issue. The existing lexicons blur senses and frames of predicates, which needs to be refined to meet the tasks like word sense disambiguation and event extraction. This paper introduces the on-going project on constructing a novel predicate lexicon for Chinese AMR corpus. The new lexicon includes 14,389 senses and 10,800 frames of 8,470 words. As some senses can be aligned to more than one frame, and vice versa, we found the alignment between senses is not just one frame per sense. Explicit analysis is given for multiple aligned relations, which proves the necessity of the proposed lexicon for AMR corpus, and supplies real data for linguistic theoretical studies.

2019

pdf bib abs
Ellipsis in Chinese AMR Corpus
Yihuan Liu | Bin Li | Peiyi Yan | Li Song | Weiguang Qu
Proceedings of the First International Workshop on Designing Meaning Representations

Ellipsis is very common in language. It’s necessary for natural language processing to restore the elided elements in a sentence. However, there’s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98% of sentences have ellipses. 92% of the ellipses are restored by copying the antecedents’ concepts. and 12.9% of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.

pdf bib abs
Building a Chinese AMR Bank with Concept and Relation Alignments
Bin Li | Yuan Wen | Li Song | Weiguang Qu | Nianwen Xue
Linguistic Issues in Language Technology, Volume 18, 2019 - Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing

Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71% of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95% of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.

Weiguang Qu

2021

2020

2019

2016

2015

2012

2010

Co-authors

Venues