VIT-Pro: Visual Instruction Tuning for Product Images
Vishnu Prabhakaran, Purav Aggarwal, Vishruit Kulshreshtha, Arunita Das, Sahini Venkata Sitaram Sruti, Anoop Saladi
Abstract
General vision-language models (VLMs) trained on web data struggle to understand and converse about real-world e-commerce product images. We propose a cost-efficient approach for collecting training data to train a generative VLM for e-commerce product images. The key idea is to leverage large-scale, loosely-coupled image-text pairs from e-commerce stores, use a pretrained LLM to generate multimodal instruction-following data, and fine-tune a general vision-language model using LoRA. Our instruction-finetuned model, VIT-Pro, can understand and respond to queries about product images, covering diverse concepts and tasks. VIT-Pro outperforms several general-purpose VLMs on multiple vision tasks in the e-commerce domain.- Anthology ID:
- 2025.naacl-industry.57
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 695–707
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.naacl-industry.57/
- DOI:
- Cite (ACL):
- Vishnu Prabhakaran, Purav Aggarwal, Vishruit Kulshreshtha, Arunita Das, Sahini Venkata Sitaram Sruti, and Anoop Saladi. 2025. VIT-Pro: Visual Instruction Tuning for Product Images. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 695–707, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- VIT-Pro: Visual Instruction Tuning for Product Images (Prabhakaran et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.naacl-industry.57.pdf