Constructing a Silver Corpus for Weakly Supervised Vietnamese Event Extraction using Cross-Document N-ary Relation Filtering

Phạm Xuân Hiệu, Tuan Vu Minh, Mai-Vu Tran, Hoang-Quynh Le


Abstract
Event extraction for low-resource languages such as Vietnamese is limited by the lack of large-scale annotated data. To address this, we propose a weakly supervised framework that constructs a silver corpus via pseudo-labeling. We introduce a cross-document n-ary relation filtering strategy to reduce noise by leveraging consistency across multiple articles describing the same event, and further enhance data diversity with schema-based augmentation. Experiments on the BKEE benchmark show consistent improvements, demonstrating the effectiveness of our approach. Data is available at: https://github.com/Larken1612/VietEE2.
Anthology ID:
2026.eeuca-1.4
Volume:
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ali Hürriyetoğlu, Surendrabikram Thapa, Hristo Tanev
Venues:
EEUCA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–37
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.eeuca-1.4/
DOI:
Bibkey:
Cite (ACL):
Phạm Xuân Hiệu, Tuan Vu Minh, Mai-Vu Tran, and Hoang-Quynh Le. 2026. Constructing a Silver Corpus for Weakly Supervised Vietnamese Event Extraction using Cross-Document N-ary Relation Filtering. In Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026), pages 26–37, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Constructing a Silver Corpus for Weakly Supervised Vietnamese Event Extraction using Cross-Document N-ary Relation Filtering (Hiệu et al., EEUCA 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.eeuca-1.4.pdf