MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction

Jiaying Gong, Hongda Shen, Janet Jenq


Abstract
Attribute value extraction plays a crucial role in enhancing e-commerce search, filtering, and recommendation systems. However, prior visual attribute value extraction methods typically rely on both product images and textual information such as product descriptions and titles. In practice, text can be ambiguous, inaccurate, or unavailable, which can degrade model performance. We propose Mixture of Image Captioning Experts (MICE), a novel augmentation framework for product attribute value extraction. MICE leverages a curated pool of image captioning models to generate accurate captions from product images, resulting in robust attribute extraction solely from an image. Extensive experiments on the public ImplicitAVE dataset and a proprietary women’s tops dataset demonstrate that MICE significantly improves the performance of state-of-the-art large multimodal models (LMMs) in both zero-shot and fine-tuning settings. An ablation study validates the contribution of each component in the framework. MICE’s modular design offers scalability and adaptability, making it well-suited for diverse industrial applications with varying computational and latency requirements.
Anthology ID:
2025.acl-industry.80
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Georg Rehm, Yunyao Li
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1151–1160
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.acl-industry.80/
DOI:
Bibkey:
Cite (ACL):
Jiaying Gong, Hongda Shen, and Janet Jenq. 2025. MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 1151–1160, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction (Gong et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.acl-industry.80.pdf