UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction
Hang Yan, Yu Sun, Xiaonan Li, Yunhua Zhou, Xuanjing Huang, Xipeng Qiu
Abstract
Information Extraction (IE) spans several tasks with different output structures, such as named entity recognition, relation extraction and event extraction. Previously, those tasks were solved with different models because of diverse task output structures. Through re-examining IE tasks, we find that all of them can be interpreted as extracting spans and span relations. They can further be decomposed into token-pair classification tasks by using the start and end token of a span to pinpoint the span, and using the start-to-start and end-to-end token pairs of two spans to determine the relation. Based on the reformulation, we propose a Unified Token-pair Classification architecture for Information Extraction (UTC-IE), where we introduce Plusformer on top of the token-pair feature matrix. Specifically, it models axis-aware interaction with plus-shaped self-attention and local interaction with Convolutional Neural Network over token pairs. Experiments show that our approach outperforms task-specific and unified models on all tasks in 10 datasets, and achieves better or comparable results on 2 joint IE datasets. Moreover, UTC-IE speeds up over state-of-the-art models on IE tasks significantly in most datasets, which verifies the effectiveness of our architecture.- Anthology ID:
- 2023.acl-long.226
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4096–4122
- Language:
- URL:
- https://aclanthology.org/2023.acl-long.226
- DOI:
- 10.18653/v1/2023.acl-long.226
- Cite (ACL):
- Hang Yan, Yu Sun, Xiaonan Li, Yunhua Zhou, Xuanjing Huang, and Xipeng Qiu. 2023. UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4096–4122, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction (Yan et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.acl-long.226.pdf