Chien-Wei Lin


2022

pdf
GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution
Danfeng Guo | Arpit Gupta | Sanchit Agarwal | Jiun-Yu Kao | Shuyang Gao | Arijit Biswas | Chien-Wei Lin | Tagyoung Chung | Mohit Bansal
Proceedings of the 29th International Conference on Computational Linguistics

Learning from multimodal data has become a popular research topic in recent years. Multimodal coreference resolution (MCR) is an important task in this area. MCR involves resolving the references across different modalities, e.g., text and images, which is a crucial capability for building next-generation conversational agents. MCR is challenging as it requires encoding information from different modalities and modeling associations between them. Although significant progress has been made for visual-linguistic tasks such as visual grounding, most of the current works involve single turn utterances and focus on simple coreference resolutions. In this work, we propose an MCR model that resolves coreferences made in multi-turn dialogues with scene images. We present GRAVL-BERT, a unified MCR framework which combines visual relationships between objects, background scenes, dialogue, and metadata by integrating Graph Neural Networks with VL-BERT. We present results on the SIMMC 2.0 multimodal conversational dataset, achieving the rank-1 on the DSTC-10 SIMMC 2.0 MCR challenge with F1 score 0.783. Our code is available at https://github.com/alexa/gravl-bert.

2021

pdf
Few Shot Dialogue State Tracking using Meta-learning
Saket Dingliwal | Shuyang Gao | Sanchit Agarwal | Chien-Wei Lin | Tagyoung Chung | Dilek Hakkani-Tur
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Dialogue State Tracking (DST) forms a core component of automated chatbot based systems designed for specific goals like hotel, taxi reservation, tourist information etc. With the increasing need to deploy such systems in new domains, solving the problem of zero/few-shot DST has become necessary. There has been a rising trend for learning to transfer knowledge from resource-rich domains to unknown domains with minimal need for additional data. In this work, we explore the merits of meta-learning algorithms for this transfer and hence, propose a meta-learner D-REPTILE specific to the DST problem. With extensive experimentation, we provide clear evidence of benefits over conventional approaches across different domains, methods, base models and datasets with significant (5-25%) improvement over the baseline in a low-data setting. Our proposed meta-learner is agnostic of the underlying model and hence any existing state-of-the-art DST system can improve its performance on unknown domains using our training strategy.

pdf
Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems
Anish Acharya | Suranjit Adhikari | Sanchit Agarwal | Vincent Auvray | Nehal Belgamwar | Arijit Biswas | Shubhra Chandra | Tagyoung Chung | Maryam Fazel-Zarandi | Raefer Gabriel | Shuyang Gao | Rahul Goel | Dilek Hakkani-Tur | Jan Jezabek | Abhay Jha | Jiun-Yu Kao | Prakash Krishnan | Peter Ku | Anuj Goyal | Chien-Wei Lin | Qing Liu | Arindam Mandal | Angeliki Metallinou | Vishal Naik | Yi Pan | Shachi Paul | Vittorio Perera | Abhishek Sethi | Minmin Shen | Nikko Strom | Eddie Wang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomenon like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task integrated with live APIs and show that the dialogue simulator is an essential component of the system that leads to over 50% improvement in turn-level action signature prediction accuracy.