A Comprehensive Survey on Document-Level Information Extraction

Hanwen Zheng, Sijia Wang, Lifu Huang


Abstract
Document-level information extraction (doc-IE) plays a pivotal role in the realm of natural language processing (NLP). This paper embarks on a comprehensive review and discussion of contemporary literature related to doc-IE. In addition, we conduct a thorough error analysis using state-of-the-art algorithms, shedding light on their limitations and remaining challenges for tackling the task of doc-IE. Our findings demonstrate that issues like entity coreference resolution and the lack of robust reasoning significantly hinder the effectiveness of document-level information extraction (doc-IE). Additionally, we uncover new challenges, including labeling noise and relation transitivity. The overarching objective of this survey paper is to provide valuable insights that can empower NLP researchers to further advance the performance of doc-IE.
Anthology ID:
2024.futured-1.6
Volume:
Proceedings of the Workshop on the Future of Event Detection (FuturED)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Joel Tetreault, Thien Huu Nguyen, Hemank Lamba, Amanda Hughes
Venues:
FuturED | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
58–72
Language:
URL:
https://preview.aclanthology.org/Author-Pages-WenzhengZhang-ZhengyanShi-ShuYang/2024.futured-1.6/
DOI:
10.18653/v1/2024.futured-1.6
Bibkey:
Cite (ACL):
Hanwen Zheng, Sijia Wang, and Lifu Huang. 2024. A Comprehensive Survey on Document-Level Information Extraction. In Proceedings of the Workshop on the Future of Event Detection (FuturED), pages 58–72, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
A Comprehensive Survey on Document-Level Information Extraction (Zheng et al., FuturED 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/Author-Pages-WenzhengZhang-ZhengyanShi-ShuYang/2024.futured-1.6.pdf