Siyu Chen


2022

pdf
DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction
MeiHan Tong | Bin Xu | Shuai Wang | Meihuan Han | Yixin Cao | Jiangqi Zhu | Siyu Chen | Lei Hou | Juanzi Li
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Event extraction aims to identify an event and then extract the arguments participating in the event. Despite the great success in sentence-level event extraction, events are more naturally presented in the form of documents, with event arguments scattered in multiple sentences. However, a major barrier to promote document-level event extraction has been the lack of large-scale and practical training and evaluation datasets. In this paper, we present DocEE, a new document-level event extraction dataset including 27,000+ events, 180,000+ arguments. We highlight three features: large-scale manual annotations, fine-grained argument types and application-oriented settings. Experiments show that there is still a big gap between state-of-the-art models and human beings (41% Vs 85% in F1 score), indicating that DocEE is an open issue. DocEE is now available at https://github.com/tongmeihan1995/DocEE.git.

2021

pdf
先秦词网构建及梵汉对比研究(The Construction of Pre-Qin Ancient Chinese WordNet and Cross Language Comparative Study between Ancient Sanskrit WordNet and Pre-Qin Ancient Chinese WordNet)
Xuehui Lu (卢雪晖) | Huidan Xu (徐会丹) | Siyu Chen (陈思瑜) | Bin Li (李斌)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

先秦汉语在汉语史研究上具有重要地位,然而以往的研究始终没有形成结构化的先秦词汇资源,难以满足古汉语信息处理和跨语言对比的研究需要。国际上以英文词网(WordNet)的义类架构为基础,已经建立了数十种语言的词网,已经成为多语言自然语言处理和跨语言对比的基础资源。本文综述了国内外各种词网的构建情况,特别是古代语言的词网和汉语词网,然后详细介绍了先秦词网的构建和校正过程,构建起了涵盖43591个词语、61227个义项、17975个义类的先秦汉语词网。本文还通过与古梵语词网的跨语言对比,尝试分析这两种古老语言在词汇上的共性和差异,初步验证先秦词网的有效性。