Junsheng Zhou


2023

pdf
差比句结构及其缺省现象的识别补全研究(A Study on Identification and Completion of Comparative Sentence Structures with Ellipsis Phenomenon)
Pengfei Zhou (周鹏飞) | Weiguang Qv (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“差比句是用来表达两个或多个事物之间的相似或不同之处的句子结构,常用句式为“X比Y+比较结果”。差比句存在多种结构变体且大量存在省略现象,造成汉语语法研究和自然语言处理任务困难,因此实现差比句结构的识别和对其缺省结构进行补全非常有意义。本文采用序列化标注方法构建了一个差比句语料库,提出了一个能够融合字与词信息的LatticeBERT-BILSTM-CRF模型来对差比句结构自动识别,并且能对缺省单位进行自动补全,实验结果验证了方法的有效性。”

pdf
汉语被动结构解析及其在CAMR中的应用研究(Parsing of Passive Structure in Chinese and Its Application in CAMR)
Kang Hu (康胡,) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“汉语被动句是一种重要的语言现象。本文采用BIO结合索引的标注方法,对被动句中的被动结构进行了细粒度标注,提出了一种基于BERT-wwm-ext预训练模型和双仿射注意力机制的CRF序列标注模型,实现对汉语被动句中内部结构的自动解析,F1值达到97.31%。本文提出的模型具有良好的泛化性,实验证明,利用本文模型的被动结构解析结果对CAMR图后处理,能有效提高CAMR被动句解析任务的性能。”

2022

pdf
基于特征融合的汉语被动句自动识别研究(Automatic Recognition of Chinese Passive Sentences Based on Feature Fusion)
Kang Hu (胡康) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Yanhui Gu (顾彦慧) | Bin Li (李斌)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“汉语中的被动句根据有无被动标记词可分为有标记被动句和无标记被动句。由于其形态构成复杂多样,给自然语言理解带来很大困难,因此实现汉语被动句的自动识别对自然语言处理下游任务具有重要意义。本文构建了一个被动句语料库,提出了一个融合词性和动词论元框架信息的PC-BERT-CNN模型,对汉语被动句进行自动识别。实验结果表明,本文提出的模型能够准确地识别汉语被动句,其中有标记被动句识别F1值达到98.77%,无标记被动句识别F1值达到96.72%。”

pdf
Automated Essay Scoring via Pairwise Contrastive Regression
Jiayi Xie | Kaiwei Cai | Li Kong | Junsheng Zhou | Weiguang Qu
Proceedings of the 29th International Conference on Computational Linguistics

Automated essay scoring (AES) involves the prediction of a score relating to the writing quality of an essay. Most existing works in AES utilize regression objectives or ranking objectives respectively. However, the two types of methods are highly complementary. To this end, in this paper we take inspiration from contrastive learning and propose a novel unified Neural Pairwise Contrastive Regression (NPCR) model in which both objectives are optimized simultaneously as a single loss. Specifically, we first design a neural pairwise ranking model to guarantee the global ranking order in a large list of essays, and then we further extend this pairwise ranking model to predict the relative scores between an input essay and several reference essays. Additionally, a multi-sample voting strategy is employed for inference. We use Quadratic Weighted Kappa to evaluate our model on the public Automated Student Assessment Prize (ASAP) dataset, and the experimental results demonstrate that NPCR outperforms previous methods by a large margin, achieving the state-of-the-art average performance for the AES task.

pdf
Align-smatch: A Novel Evaluation Method for Chinese Abstract Meaning Representation Parsing based on Alignment of Concept and Relation
Liming Xiao | Bin Li | Zhixing Xu | Kairui Huo | Minxuan Feng | Junsheng Zhou | Weiguang Qu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Abstract Meaning Representation is a sentence-level meaning representation, which abstracts the meaning of sentences into a rooted acyclic directed graph. With the continuous expansion of Chinese AMR corpus, more and more scholars have developed parsing systems to automatically parse sentences into Chinese AMR. However, the current parsers can’t deal with concept alignment and relation alignment, let alone the evaluation methods for AMR parsing. Therefore, to make up for the vacancy of Chinese AMR parsing evaluation methods, based on AMR evaluation metric smatch, we have improved the algorithm of generating triples so that to make it compatible with concept alignment and relation alignment. Finally, we obtain a new integrity metric align-smatch for paring evaluation. A comparative research then was conducted on 20 manually annotated AMR and gold AMR, with the result that align-smatch works well in alignments and more robust in evaluating arcs. We also put forward some fine-grained metric for evaluating concept alignment, relation alignment and implicit concepts, in order to further measure parsers’ performance in subtasks.

2021

pdf
中文连动句语义关系识别研究(Research on Semantic Relation Recognition of Chinese Serial-verb Sentences)
Chao Sun (孙超) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Yanhui Gu (顾彦慧) | Bin Li (李斌) | Junsheng Zhou (周俊生)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

连动句是形如“NP+VP1+VP2”的句子,句中含有两个或两个以上的动词(或动词结构)且动词的施事为同一对象。相同结构的连动句可以表示多种不同的语义关系。本文基于前人对连动句中VP1和VP2之间的语义关系分类,标注了连动句语义关系数据集,基于神经网络完成了对连动句语义关系的识别。该方法将连动句语义识别任务进行分解,基于BERT进行编码,利用BiLSTM-CRF先识别出连动句中连动词(VP)及其主语(NP),再基于融合连动词信息的编码,利用BiLSTM-Attention对连动词进行关系判别,实验结果验证了所提方法的有效性。

pdf
中文词语离合现象识别研究(Research on Recognition of the Separation and Reunion Phenomena of Words in Chinese)
Lou Zhou (周露) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

汉语词语的离合现象是汉语中一种词语可分可合的特殊现象。本文采用字符级序列标注方法解决二字动词离合现象的自动识别问题,以避免中文分词及词性标注的错误传递,节省制定匹配规则与特征模板的人工开支。在训练过程中微调BERT中文预训练模型,获取面向目标任务的字符向量表示,并引入掩码机制对模型隐藏离用法中分离的词语,减轻词语本身对识别结果的影响,强化中间插入成分的学习,并对前后语素采用不同的掩码以强调其出现顺序,进而使模型具备了识别复杂及偶发性离用法的能力。为获得含有上下文信息的句子表达,将原始的句子表达与采用掩码的句子表达分别输入两个不同参数的BiLSTM层进行训练,最后采用CRF算法捕捉句子标签序列的依赖关系。本文提出的BERT MASK + 2BiLSTMs + CRF模型比现有最优的离合词识别模型提高了2.85%的F1值。

pdf
Event Detection as Graph Parsing
Jianye Xie | Haotong Sun | Junsheng Zhou | Weiguang Qu | Xinyu Dai
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf
An Element-aware Multi-representation Model for Law Article Prediction
Huilin Zhong | Junsheng Zhou | Weiguang Qu | Yunfei Long | Yanhui Gu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing works have proved that using law articles as external knowledge can improve the performance of the Legal Judgment Prediction. However, they do not fully use law article information and most of the current work is only for single label samples. In this paper, we propose a Law Article Element-aware Multi-representation Model (LEMM), which can make full use of law article information and can be used for multi-label samples. The model uses the labeled elements of law articles to extract fact description features from multiple angles. It generates multiple representations of a fact for classification. Every label has a law-aware fact representation to encode more information. To capture the dependencies between law articles, the model also introduces a self-attention mechanism between multiple representations. Compared with baseline models like TopJudge, this model improves the accuracy of 5.84%, the macro F1 of 6.42%, and the micro F1 of 4.28%.

pdf
基于神经网络的连动句识别(Recognition of serial-verb sentences based on Neural Network)
Chao Sun (孙超) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Yanhui Gu (顾彦慧) | Bin Li (李斌) | Junsheng Zhou (周俊生)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

连动句是具有连动结构的句子,是汉语中的特殊句法结构,在现代汉语中十分常见且使用频繁。连动句语法结构和语义关系都很复杂,在识别中存在许多问题,对此本文针对连动句的识别问题进行了研究,提出了一种基于神经网络的连动句识别方法。本方法分两步:第一步,运用简单的规则对语料进行预处理;第二步,用文本分类的思想,使用BERT编码,利用多层CNN与BiLSTM模型联合提取特征进行分类,进而完成连动句识别任务。在人工标注的语料上进行实验,实验结果达到92.71%的准确率,F1值为87.41%。

pdf
基于深度学习的实体关系抽取研究综述(Review of Entity Relation Extraction based on deep learning)
Zhentao Xia (夏振涛) | Weiguang Qu (曲维光) | Yanhui Gu (顾彦慧) | Junsheng Zhou (周俊生) | Bin Li (李斌)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

作为信息抽取的一项核心子任务,实体关系抽取对于知识图谱、智能问答、语义搜索等自然语言处理应用都十分重要。关系抽取在于从非结构化文本中自动地识别实体之间具有的某种语义关系。该文聚焦句子级别的关系抽取研究,介绍用于关系抽取的主要数据集并对现有的技术作了阐述,主要分为:有监督的关系抽取、远程监督的关系抽取和实体关系联合抽取。我们对比用于该任务的各种模型,分析它们的贡献与缺 陷。最后介绍中文实体关系抽取的研究现状和方法。

pdf
面向中文AMR标注体系的兼语语料库构建及识别研究(Research on the Construction and Recognition of Concurrent corpus for Chinese AMR Annotation System)
Wenhui Hou (侯文惠) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Bin Li (李斌) | Yanhui Gu (顾彦慧) | Junsheng Zhou (周俊生)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

兼语结构是汉语中常见的一种动词结构,由述宾短语与主谓短语共享兼语,结构复杂,给句法分析造成困难,因此兼语语料库构建及识别工作对于语义解析及下游任务都具有重要意义。但现存兼语语料库较少,面向中文AMR标注体系的兼语语料库构建仍处于空白阶段。针对这一现状,本文总结了一套兼语语料库标注规范,并构建了一定数量面向中文AMR标注体系的兼语语料库。基于构建的语料库,采用基于字符的神经网络模型识别兼语结构,并对识别结果以及未来的改进方向进行分析总结。

2016

pdf
A Search-Based Dynamic Reranking Model for Dependency Parsing
Hao Zhou | Yue Zhang | Shujian Huang | Junsheng Zhou | Xin-Yu Dai | Jiajun Chen
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
A Fast Approach for Semantic Similar Short Texts Retrieval
Yanhui Gu | Zhenglu Yang | Junsheng Zhou | Weiguang Qu | Jinmao Wei | Xingtian Shi
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
AMR Parsing with an Incremental Joint Model
Junsheng Zhou | Feiyu Xu | Hans Uszkoreit | Weiguang Qu | Ran Li | Yanhui Gu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2012

pdf
Exploiting Chunk-level Features to Improve Phrase Chunking
Junsheng Zhou | Weiguang Qu | Fen Zhang
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2006

pdf
Chinese Named Entity Recognition with a Multi-Phase Model
Junsheng Zhou | Liang He | Xinyu Dai | Jiajun Chen
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing