Towards Multi-Modal Co-Reference Resolution in Conversational Shopping Agents
Samuel Osebe, Prashan Wanigasekara, Thomas Gueudre, Thanh Tran, Rahul Sharma, Fan Yang, Qian Hu, Weitong Ruan, Emre Barut, Chengwei Su
Abstract
The context of modern smart voice assistants is often multi-modal, where images, audio and video content are consumed by users simultaneously. In such a setup, co-reference resolution is especially challenging, and runs across modalities and dialogue turns. We explore the problem of multi-modal co-reference resolution in multi-turn dialogues and quantify the performance of multi-modal LLMs on a specially curated dataset of long, image-interleaved conversations between a voice assistant and human in a shopping use case. We propose a custom architecture for multi-modal embedding alignment using a novel parameter augmentation technique. Our proposed Parameter Augmented LLM approach shows a 4.9% absolute F1 improvement above a cross-attention baseline while reducing the number of parameters being trained by 4x.- Anthology ID:
- 2024.ecnlp-1.2
- Volume:
- Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Shervin Malmasi, Besnik Fetahu, Nicola Ueffing, Oleg Rokhlenko, Eugene Agichtein, Ido Guy
- Venues:
- ECNLP | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 8–18
- Language:
- URL:
- https://aclanthology.org/2024.ecnlp-1.2
- DOI:
- Cite (ACL):
- Samuel Osebe, Prashan Wanigasekara, Thomas Gueudre, Thanh Tran, Rahul Sharma, Fan Yang, Qian Hu, Weitong Ruan, Emre Barut, and Chengwei Su. 2024. Towards Multi-Modal Co-Reference Resolution in Conversational Shopping Agents. In Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024, pages 8–18, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Towards Multi-Modal Co-Reference Resolution in Conversational Shopping Agents (Osebe et al., ECNLP-WS 2024)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2024.ecnlp-1.2.pdf