Game-Based Video-Context Dialogue

Ramakanth Pasunuru, Mohit Bansal


Abstract
Current dialogue systems focus more on textual and speech context knowledge and are usually based on two speakers. Some recent work has investigated static image-based dialogue. However, several real-world human interactions also involve dynamic visual context (similar to videos) as well as dialogue exchanges among multiple speakers. To move closer towards such multimodal conversational skills and visually-situated applications, we introduce a new video-context, many-speaker dialogue dataset based on live-broadcast soccer game videos and chats from Twitch.tv. This challenging testbed allows us to develop visually-grounded dialogue models that should generate relevant temporal and spatial event language from the live video, while also being relevant to the chat history. For strong baselines, we also present several discriminative and generative models, e.g., based on tridirectional attention flow (TriDAF). We evaluate these models via retrieval ranking-recall, automatic phrase-matching metrics, as well as human evaluation studies. We also present dataset analyses, model ablations, and visualizations to understand the contribution of different modalities and model components.
Anthology ID:
D18-1012
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
125–136
Language:
URL:
https://aclanthology.org/D18-1012
DOI:
10.18653/v1/D18-1012
Bibkey:
Cite (ACL):
Ramakanth Pasunuru and Mohit Bansal. 2018. Game-Based Video-Context Dialogue. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 125–136, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Game-Based Video-Context Dialogue (Pasunuru & Bansal, EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/D18-1012.pdf
Attachment:
 D18-1012.Attachment.pdf
Video:
 https://preview.aclanthology.org/ml4al-ingestion/D18-1012.mp4
Code
 ramakanth-pasunuru/video-dialogue
Data
Twitch-FIFA