Baber Khalid


2021

pdf bib
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
Mert Inan | Piyush Sharma | Baber Khalid | Radu Soricut | Matthew Stone | Malihe Alikhani
Findings of the Association for Computational Linguistics: EMNLP 2021

Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations. However, image captioning metrics have struggled to give accurate learned estimates of the semantic and pragmatic success of output text. We address this weakness by introducing the first discourse-aware learned generation metric for evaluating image descriptions. Our approach is inspired by computational theories of discourse for capturing information goals using coherence. We present a dataset of image–description pairs annotated with coherence relations. We then train a coherence-aware metric on a subset of the Conceptual Captions dataset and measure its effectiveness—its ability to predict human ratings of output captions—on a test set composed of out-of-domain images. We demonstrate a higher Kendall Correlation Coefficient for our proposed metric with the human judgments for the results of a number of state-of-the-art coherence-aware caption generation models when compared to several other metrics including recently proposed learned metrics such as BLEURT and BERTScore.

2020

pdf bib
Combining Cognitive Modeling and Reinforcement Learning for Clarification in Dialogue
Baber Khalid | Malihe Alikhani | Matthew Stone
Proceedings of the 28th International Conference on Computational Linguistics

In many domains, dialogue systems need to work collaboratively with users to successfully reconstruct the meaning the user had in mind. In this paper, we show how cognitive models of users’ communicative strategies can be leveraged in a reinforcement learning approach to dialogue planning to enable interactive systems to give targeted, effective feedback about the system’s understanding. We describe a prototype system that collaborates on reference tasks that distinguish arbitrarily varying color patches from similar distractors, and use experiments with crowd workers and analyses of our learned policies to document that our approach leads to context-sensitive clarification strategies that focus on key missing information, elicit correct answers that the system understands, and contribute to increasing dialogue success.