DocEE-zh: A Fine-grained Benchmark for Chinese Document-level Event Extraction
Minghui Liu, MeiHan Tong, Yangda Peng, Lei Hou, Juanzi Li, Bin Xu
Abstract
Event extraction aims to identify events and then extract the arguments involved in those events. In recent years, there has been a gradual shift from sentence-level event extraction to document-level event extraction research. Despite the significant success achieved in English domain event extraction research, event extraction in Chinese still remains largely unexplored. However, a major obstacle to promoting Chinese document-level event extraction is the lack of fine-grained, wide domain coverage datasets for model training and evaluation. In this paper, we propose DocEE-zh, a new Chinese document-level event extraction dataset comprising over 36,000 events and more than 210,000 arguments. DocEE-zh is an extension of the DocEE dataset, utilizing the same event schema, and all data has been meticulously annotated by human experts. We highlight two features: focus on high-interest event types and fine-grained argument types. Experimental results indicate that state-of-the-art models still fail to achieve satisfactory performance, with an F1 score of 45.88% on the event argument extraction task, revealing that Chinese document-level event extraction (DocEE) remains an unresolved challenge. DocEE-zh is now available at https://github.com/tongmeihan1995/DocEE.git.- Anthology ID:
- 2024.findings-emnlp.35
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 637–649
- Language:
- URL:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.findings-emnlp.35/
- DOI:
- 10.18653/v1/2024.findings-emnlp.35
- Cite (ACL):
- Minghui Liu, MeiHan Tong, Yangda Peng, Lei Hou, Juanzi Li, and Bin Xu. 2024. DocEE-zh: A Fine-grained Benchmark for Chinese Document-level Event Extraction. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 637–649, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- DocEE-zh: A Fine-grained Benchmark for Chinese Document-level Event Extraction (Liu et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.findings-emnlp.35.pdf