MIRAGE: Metadata-guided Image Retrieval and Answer Generation for E-commerce Troubleshooting

Rishav Sahay, Lavanya Sita Tekumalla, Anoop Saladi


Abstract
Existing multimodal systems typically associate text and available images based on embedding similarity or simple co-location, but such approaches often fail to ensure that the linked image accurately depicts the specific product or component mentioned in a troubleshooting instruction. We introduce MIRAGE, a metadata-first paradigm that treats structured metadata, (not raw pixels), as a first-class modality for multimodal grounding. In MIRAGE, both text and images are projected through a shared semantic schema capturing product attributes, context, and visual aspects, enabling reasoning over interpretable attributes for troubleshooting rather than unstructured embeddings. MIRAGE comprises of three complementary modules: M-Link for schema-guided image–text linking, M-Gen for metadata-conditioned multimodal generation, and M-Eval for consistency evaluation in the same structured space. Experiments on large-scale enterprise e-commerce troubleshooting data across 10 product types on 100K text chunks and 35K images show that metadata-centric grounding achieves over 40% higher linking coverage of high-quality visual content and over 45% in linking and response quality than embedding-based baselines. MIRAGE demonstrates the potential of structured metadata in enabling scalable, fine-grained grounding in multimodal troubleshooting systems.
Anthology ID:
2026.eacl-industry.56
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
764–776
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-industry.56/
DOI:
Bibkey:
Cite (ACL):
Rishav Sahay, Lavanya Sita Tekumalla, and Anoop Saladi. 2026. MIRAGE: Metadata-guided Image Retrieval and Answer Generation for E-commerce Troubleshooting. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 764–776, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
MIRAGE: Metadata-guided Image Retrieval and Answer Generation for E-commerce Troubleshooting (Sahay et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-industry.56.pdf