CVRH: Cross-modal Variational Role Hypergraph Network via Semantic Enhancement for Multi-modal Event Argument Extraction

Bangze Pan, Yang Li, Ruili Pu, Suge Wang, Jian Liao, JianXing Zheng, Xiaoli Li, Deyu Li


Abstract
Multi-modal Event Argument Extraction task (MEAE) aims to extract all arguments related to a specific event from multiple modalities and identify their corresponding roles. Existing methods focus on weakly alignment of uni-modal representations and generatively data augmentation techniques. However, these methods ignore the potential impact of event role information on MEAE. To address this problem, we propose a Cross-modal Variational Role Hypergraph Network via Semantic Enhancement (CVRH). Unlike previous approaches, CVRH centers on event role information and designs a variational role hyperedge via semantic enhancement, which constructs a role hypergraph for event arguments within multi-modal documents. It explicitly modeling the high-order role correlations among cross-modal arguments in a document. Furthermore, CVRH introduces a modal shared encoder based on differential transformer, which effectively learns shared semantic representations across modalities and enhances the independence of argument representations. On the M2E2 benchmark, experimental results show that CVRH achieves a 6.9% improvement in F1-score on the MEAE compared to current state-of-the-art methods.
Anthology ID:
2026.findings-acl.978
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19565–19575
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.978/
DOI:
Bibkey:
Cite (ACL):
Bangze Pan, Yang Li, Ruili Pu, Suge Wang, Jian Liao, JianXing Zheng, Xiaoli Li, and Deyu Li. 2026. CVRH: Cross-modal Variational Role Hypergraph Network via Semantic Enhancement for Multi-modal Event Argument Extraction. In Findings of the Association for Computational Linguistics: ACL 2026, pages 19565–19575, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CVRH: Cross-modal Variational Role Hypergraph Network via Semantic Enhancement for Multi-modal Event Argument Extraction (Pan et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.978.pdf
Checklist:
 2026.findings-acl.978.checklist.pdf