Syntactically Robust Training on Partially-Observed Data for Open Information Extraction

Ji Qi, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu


Abstract
Open Information Extraction models have shown promising results with sufficient supervision. However, these models face a fundamental challenge that the syntactic distribution of training data is partially observable in comparison to the real world. In this paper, we propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution based on diverse paraphrase generation. To tackle the intrinsic problem of knowledge deformation of paraphrasing, two algorithms based on semantic similarity matching and syntactic tree walking are used to restore the expressionally transformed knowledge. The training framework can be generally applied to other syntactic partial observable domains. Based on the proposed framework, we build a new evaluation set called CaRB-AutoPara, a syntactically diverse dataset consistent with the real-world setting for validating the robustness of the models. Experiments including a thorough analysis show that the performance of the model degrades with the increase of the difference in syntactic distribution, while our framework gives a robust boundary.
Anthology ID:
2022.findings-emnlp.465
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6245–6257
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.465
DOI:
10.18653/v1/2022.findings-emnlp.465
Bibkey:
Cite (ACL):
Ji Qi, Yuxiang Chen, Lei Hou, Juanzi Li, and Bin Xu. 2022. Syntactically Robust Training on Partially-Observed Data for Open Information Extraction. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6245–6257, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Syntactically Robust Training on Partially-Observed Data for Open Information Extraction (Qi et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2022.findings-emnlp.465.pdf