Integrating Structural Semantic Knowledge for Enhanced Information Extraction Pre-training

Xiaoyang Yi; Yuru Bao; Jian Zhang; Yifang Qin; Faxin Lin

doi:10.18653/v1/2024.emnlp-main.129

Integrating Structural Semantic Knowledge for Enhanced Information Extraction Pre-training

Xiaoyang Yi, Yuru Bao, Jian Zhang, Yifang Qin, Faxin Lin

Abstract

Information Extraction (IE), aiming to extract structured information from unstructured natural language texts, can significantly benefit from pre-trained language models. However, existing pre-training methods solely focus on exploiting the textual knowledge, relying extensively on annotated large-scale datasets, which is labor-intensive and thus limits the scalability and versatility of the resulting models. To address these issues, we propose SKIE, a novel pre-training framework tailored for IE that integrates structural semantic knowledge via contrastive learning, effectively alleviating the annotation burden. Specifically, SKIE utilizes Abstract Meaning Representation (AMR) as a low-cost supervision source to boost model performance without human intervention. By enhancing the topology of AMR graphs, SKIE derives high-quality cohesive subgraphs as additional training samples, providing diverse multi-level structural semantic knowledge. Furthermore, SKIE refines the graph encoder to better capture cohesive information and edge relation information, thereby improving the pre-training efficacy. Extensive experimental results demonstrate that SKIE outperforms state-of-the-art baselines across multiple IE tasks and showcases exceptional performance in few-shot and zero-shot settings.

Anthology ID:: 2024.emnlp-main.129
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2156–2171
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-main.129/
DOI:: 10.18653/v1/2024.emnlp-main.129
Bibkey:
Cite (ACL):: Xiaoyang Yi, Yuru Bao, Jian Zhang, Yifang Qin, and Faxin Lin. 2024. Integrating Structural Semantic Knowledge for Enhanced Information Extraction Pre-training. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2156–2171, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Integrating Structural Semantic Knowledge for Enhanced Information Extraction Pre-training (Yi et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-main.129.pdf

PDF Cite Search Fix data