Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization
Nayu Liu, Fanglong Yao, Haoran Luo, Yong Yang, Chen Tang, Bo Lv
Abstract
Multimodal summarization (MS) combines text and visuals to generate summaries. Recently, many-to-many multimodal summarization (M3S) garnered interest as it enables a unified model for multilingual and cross-lingual MS. Existing methods have made progress by facilitating the transfer of common multimodal summarization knowledge. While, prior M3S models that fully share parameters neglect the language-specific knowledge learning, where potential interference between languages may limit the flexible adaptation of MS modes across different language combinations and hinder further collaborative improvements in joint M3S training. Based on this observation, we propose Language Constrained Multimodal Hyper Adapter (LCMHA) for M3S. LCMHA integrates language-specific multimodal adapters into multilingual pre-trained backbones via a language constrained hypernetwork, enabling relaxed parameter sharing that enhances language-specific learning while preserving shared MS knowledge learning. In addition, a language-regularized hypernetwork is designed to balance intra- and inter-language learning, generating language-specific adaptation weights and enhancing the retention of distinct language features through the regularization of generated parameters. Experimental results on the M3Sum benchmark show LCMHA’s effectiveness and scalability across multiple multilingual pre-trained backbones.- Anthology ID:
- 2025.acl-long.1229
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 25285–25298
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1229/
- DOI:
- Cite (ACL):
- Nayu Liu, Fanglong Yao, Haoran Luo, Yong Yang, Chen Tang, and Bo Lv. 2025. Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25285–25298, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization (Liu et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1229.pdf