Divide and Conquer: Rethinking Ambiguous Candidate Identification in Multimodal Dialogues with Pseudo-Labelling
Bhathiya Hemanthage, Christian Dondrup, Hakan Bilen, Oliver Lemon
Abstract
Ambiguous Candidate Identification(ACI) in multimodal dialogue is the task of identifying all potential objects that a user’s utterance could be referring to in a visual scene, in cases where the reference cannot be uniquely determined. End-to-end models are the dominant approach for this task, but have limited real-world applicability due to unrealistic inference-time assumptions such as requiring predefined catalogues of items. Focusing on a more generalized and realistic ACI setup, we demonstrate that a modular approach, which first emphasizes language-only reasoning over dialogue context before performing vision-language fusion, significantly outperforms end-to-end trained baselines. To mitigate the lack of annotations for training the language-only module (student), we propose a pseudo-labelling strategy with a prompted Large Language Model (LLM) as the teacher.- Anthology ID:
- 2024.sigdial-1.20
- Volume:
- Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
- Month:
- September
- Year:
- 2024
- Address:
- Kyoto, Japan
- Editors:
- Tatsuya Kawahara, Vera Demberg, Stefan Ultes, Koji Inoue, Shikib Mehri, David Howcroft, Kazunori Komatani
- Venue:
- SIGDIAL
- SIG:
- SIGDIAL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 222–227
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.sigdial-1.20/
- DOI:
- 10.18653/v1/2024.sigdial-1.20
- Cite (ACL):
- Bhathiya Hemanthage, Christian Dondrup, Hakan Bilen, and Oliver Lemon. 2024. Divide and Conquer: Rethinking Ambiguous Candidate Identification in Multimodal Dialogues with Pseudo-Labelling. In Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 222–227, Kyoto, Japan. Association for Computational Linguistics.
- Cite (Informal):
- Divide and Conquer: Rethinking Ambiguous Candidate Identification in Multimodal Dialogues with Pseudo-Labelling (Hemanthage et al., SIGDIAL 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.sigdial-1.20.pdf