Abstract
Document-level entity-based extraction (EE), aiming at extracting entity-centric information such as entity roles and entity relations, is key to automatic knowledge acquisition from text corpora for various domains. Most document-level EE systems build extractive models, which struggle to model long-term dependencies among entities at the document level. To address this issue, we propose a generative framework for two document-level EE tasks: role-filler entity extraction (REE) and relation extraction (RE). We first formulate them as a template generation problem, allowing models to efficiently capture cross-entity dependencies, exploit label semantics, and avoid the exponential computation complexity of identifying N-ary relations. A novel cross-attention guided copy mechanism, TopK Copy, is incorporated into a pre-trained sequence-to-sequence model to enhance the capabilities of identifying key information in the input document. Experiments done on the MUC-4 and SciREX dataset show new state-of-the-art results on REE (+3.26%), binary RE (+4.8%), and 4-ary RE (+2.7%) in F1 score.- Anthology ID:
- 2021.emnlp-main.426
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5257–5269
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.426
- DOI:
- 10.18653/v1/2021.emnlp-main.426
- Cite (ACL):
- Kung-Hsiang Huang, Sam Tang, and Nanyun Peng. 2021. Document-level Entity-based Extraction as Template Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5257–5269, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Document-level Entity-based Extraction as Template Generation (Huang et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2021.emnlp-main.426.pdf
- Code
- PlusLabNLP/TempGen
- Data
- MUC-4, SciREX