Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning
Jun Cheng Yang, Zuchao Li, Shuai Xie, Wei Yu, Shijun Li, Bo Du
Abstract
The chain-of-thought technique has been received well in multi-modal tasks. It is a step-by-step linear reasoning process that adjusts the length of the chain to improve the performance of generated prompts. However, human thought processes are predominantly non-linear, as they encompass multiple aspects simultaneously and employ dynamic adjustment and updating mechanisms. Therefore, we propose a novel Aggregation-Graph-of-Thought (AGoT) mechanism for soft-prompt tuning in multi-modal representation learning. The proposed AGoT models the human thought process not only as a chain but also models each step as a reasoning aggregation graph to cope with the overlooked multiple aspects of thinking in single-step reasoning. This turns the entire reasoning process into prompt aggregation and prompt flow operations. Experiments show that our multi-modal model enhanced with AGoT soft-prompting achieves good results in several tasks such as text-image retrieval, visual question answering, and image recognition. In addition, we demonstrate that it has good domain generalization performance due to better reasoning.- Anthology ID:
- 2024.lrec-main.1306
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 15024–15036
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1306
- DOI:
- Cite (ACL):
- Jun Cheng Yang, Zuchao Li, Shuai Xie, Wei Yu, Shijun Li, and Bo Du. 2024. Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15024–15036, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning (Yang et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2024.lrec-main.1306.pdf