VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction

Khai Phan Tran; Wen Hua; Xue Li

VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction

Abstract

Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. However, most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. To tackle this challenge, we propose a novel data augmentation approach using generative models to enhance data from the embedding space. Our method leverages the Variational Autoencoder (VAE) architecture to capture all relation-wise distributions formed by entity pair representations and augment data for underrepresented relations. To better capture the multi-label nature of DocRE, we parameterize the VAE’s latent space with a Diffusion Model. Additionally, we introduce a hierarchical training framework to integrate the proposed VAE-based augmentation module into DocRE systems. Experiments on two benchmark datasets demonstrate that our method outperforms state-of-the-art models, effectively addressing the long-tail distribution problem in DocRE. Our code is released at: https://github.com/khaitran22/VaeDiff-DocRE

Anthology ID:: 2025.coling-main.488
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7307–7320
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-main.488/
DOI:
Bibkey:
Cite (ACL):: Khai Phan Tran, Wen Hua, and Xue Li. 2025. VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7307–7320, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction (Tran et al., COLING 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-main.488.pdf

PDF Cite Search Fix data