Approaches to bias assessment usually require such technical skills that, by design, they leave discrimination experts out. In this paper we present EDIA, a tool that facilitates that experts in discrimination explore social biases in word embeddings and masked language models. Experts can then characterize those biases so that their presence can be assessed more systematically, and actions can be planned to address them. They can work interactively to assess the effects of different characterizations of bias in a given word embedding or language model, which helps to specify informal intuitions in concrete resources for systematic testing.
Referring resolution is the task of identifying the referent of a natural language expression, for example “the woman behind the other woman getting a massage”. In this paper we investigate which are the kinds of referring expressions on which current transformer based models fail. Motivated by this analysis we identify the weakening of the spatial natural constraints as one of its causes and propose a model that aims to restore it. We evaluate our proposed model on different datasets for the task showing improved performance on the most challenging kinds of referring expressions. Finally we present a thorough analysis of the kinds errors that are improved by the new model and those that are not and remain future challenges for the task.
Visual Dialog is assumed to require the dialog history to generate correct responses during a dialog. However, it is not clear from previous work how dialog history is needed for visual dialog. In this paper we define what it means for a visual question to require dialog history and we release a subset of the Guesswhat?! questions for which their dialog history completely changes their responses. We propose a novel interpretable representation that visually grounds dialog history: the Region under Discussion. It constrains the image’s spatial features according to a semantic representation of the history inspired by the information structure notion of Question under Discussion.We evaluate the architecture on task-specific multimodal models and the visual transformer model LXMERT.