Aishwarya Padmakumar


On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets
Hyounghun Kim | Aishwarya Padmakumar | Di Jin | Mohit Bansal | Dilek Hakkani-Tur
Proceedings of the Third Workshop on Insights from Negative Results in NLP

Natural language guided embodied task completion is a challenging problem since it requires understanding natural language instructions, aligning them with egocentric visual observations, and choosing appropriate actions to execute in the environment to produce desired changes. We experiment with augmenting a transformer model for this task with modules that effectively utilize a wider field of view and learn to choose whether the next step requires a navigation or manipulation action. We observed that the proposed modules resulted in improved, and in fact state-of-the-art performance on an unseen validation set of a popular benchmark dataset, ALFRED. However, our best model selected using the unseen validation set underperforms on the unseen test split of ALFRED, indicating that performance on the unseen validation set may not in itself be a sufficient indicator of whether model improvements generalize to unseen test sets. We highlight this result as we believe it may be a wider phenomenon in machine learning tasks but primarily noticeable only in benchmarks that limit evaluations on test splits, and highlights the need to modify benchmark design to better account for variance in model performance.

ALFRED-L: Investigating the Role of Language for Action Learning in Interactive Visual Environments
Arjun Akula | Spandana Gella | Aishwarya Padmakumar | Mahdi Namazifar | Mohit Bansal | Jesse Thomason | Dilek Hakkani-Tur
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Embodied Vision and Language Task Completion requires an embodied agent to interpret natural language instructions and egocentric visual observations to navigate through and interact with environments. In this work, we examine ALFRED, a challenging benchmark for embodied task completion, with the goal of gaining insight into how effectively models utilize language. We find evidence that sequence-to-sequence and transformer-based models trained on this benchmark are not sufficiently sensitive to changes in input language instructions. Next, we construct a new test split – ALFRED-L to test whether ALFRED models can generalize to task structures not seen during training that intuitively require the same types of language understanding required in ALFRED. Evaluation of existing models on ALFRED-L suggests that (a) models are overly reliant on the sequence in which objects are visited in typical ALFRED trajectories and fail to adapt to modifications of this sequence and (b) models trained with additional augmented trajectories are able to adapt relatively better to such changes in input language instructions.

Dialog Acts for Task Driven Embodied Agents
Spandana Gella | Aishwarya Padmakumar | Patrick Lange | Dilek Hakkani-Tur
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Embodied agents need to be able to interact in natural language – understanding task descriptions and asking appropriate follow up questions to obtain necessary information to be effective at successfully accomplishing tasks for a wide range of users. In this work, we propose a set of dialog acts for modelling such dialogs and annotate the TEACh dataset that includes over 3,000 situated, task oriented conversations (consisting of 39.5k utterances in total) with dialog acts. To our knowledge,TEACh-DA is the first large scale dataset of dialog act annotations for embodied task completion. Furthermore, we demonstrate the use of this annotated dataset in training models for tagging the dialog acts of a given utterance, predicting the dialog act of the next response given a dialog history, and use the dialog acts to guide agent’s non-dialog behaviour. In particular, our experiments on the TEACh Execution from Dialog History task where the model predicts the sequence of low level actions to be executed in the environment for embodied task completion, demonstrate that dialog acts can improve end performance by up to 2 points compared to the system without dialog acts.


pdf bib
Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics
Malihe Alikhani | Valts Blukis | Parisa Kordjamshidi | Aishwarya Padmakumar | Hao Tan
Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics

Generative Conversational Networks
Alexandros Papangelis | Karthik Gopalakrishnan | Aishwarya Padmakumar | Seokhwan Kim | Gokhan Tur | Dilek Hakkani-Tur
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Inspired by recent work in meta-learning and generative teaching networks, we propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data (given some seed data) and then train themselves from that data to perform a given task. We use reinforcement learning to optimize the data generation process where the reward signal is the agent’s performance on the task. The task can be any language-related task, from intent detection to full task-oriented conversations. In this work, we show that our approach is able to generalise from seed data and performs well in limited data and limited computation settings, with significant gains for intent detection and slot tagging across multiple datasets: ATIS, TOD, SNIPS, and Restaurants8k. We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data. We also conduct an analysis of the novelty of the generated data and provide generated examples for intent detection, slot tagging, and non-goal oriented conversations.


Learning a Policy for Opportunistic Active Learning
Aishwarya Padmakumar | Peter Stone | Raymond Mooney
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.


Integrated Learning of Dialog Strategies and Semantic Parsing
Aishwarya Padmakumar | Jesse Thomason | Raymond J. Mooney
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Natural language understanding and dialog management are two integral components of interactive dialog systems. Previous research has used machine learning techniques to individually optimize these components, with different forms of direct and indirect supervision. We present an approach to integrate the learning of both a dialog strategy using reinforcement learning, and a semantic parser for robust natural language understanding, using only natural dialog interaction for supervision. Experimental results on a simulated task of robot instruction demonstrate that joint learning of both components improves dialog performance over learning either of these components alone.