Consistent Document-level Relation Extraction via Counterfactuals

Ali Modarressi, Abdullatif Köksal, Hinrich Schuetze


Abstract
Many datasets have been developed to train and evaluate document-level relation extraction (RE) models. Most of these are constructed using real-world data. It has been shown that RE models trained on real-world data suffer from factual biases. To evaluate and address this issue, we present CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement. We first demonstrate that models trained on factual data exhibit inconsistent behavior: while they accurately extract triples from factual data, they fail to extract the same triples after counterfactual modification. This inconsistency suggests that models trained on factual data rely on spurious signals such as specific entities and external knowledge – rather than on context – to extract triples. We show that by generating document-level counterfactual data with CovEReD and training models on them, consistency is maintained with minimal impact on RE performance. We release our CovEReD pipeline as well as Re-DocRED-CF, a dataset of counterfactual RE documents, to assist in evaluating and addressing inconsistency in document-level RE.
Anthology ID:
2024.findings-emnlp.672
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11501–11507
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-emnlp.672/
DOI:
10.18653/v1/2024.findings-emnlp.672
Bibkey:
Cite (ACL):
Ali Modarressi, Abdullatif Köksal, and Hinrich Schuetze. 2024. Consistent Document-level Relation Extraction via Counterfactuals. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 11501–11507, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Consistent Document-level Relation Extraction via Counterfactuals (Modarressi et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-emnlp.672.pdf
Software:
 2024.findings-emnlp.672.software.zip
Data:
 2024.findings-emnlp.672.data.zip