Reuben Cohn-Gordon


2020

pdf
Pragmatic Issue-Sensitive Image Captioning
Allen Nie | Reuben Cohn-Gordon | Christopher Potts
Findings of the Association for Computational Linguistics: EMNLP 2020

Image captioning systems need to produce texts that are not only true but also relevant in that they are properly aligned with the current issues. For instance, in a newspaper article about a sports event, a caption that not only identifies the player in a picture but also comments on their ethnicity could create unwanted reader reactions. To address this, we propose Issue-Sensitive Image Captioning (ISIC). In ISIC, the captioner is given a target image and an issue, which is a set of images partitioned in a way that specifies what information is relevant. For the sports article, we could construct a partition that places images into equivalence classes based on player position. To model this task, we use an extension of the Rational Speech Acts model. Our extension is built on top of state-of-the-art pretrained neural image captioners and explicitly uses image partitions to control caption generation. In both automatic and human evaluations, we show that these models generate captions that are descriptive and issue-sensitive. Finally, we show how ISIC can complement and enrich the related task of Visual Question Answering.

pdf
Communication-based Evaluation for Natural Language Generation
Benjamin Newman | Reuben Cohn-Gordon | Christopher Potts
Proceedings of the Society for Computation in Linguistics 2020

2019

pdf
Lost in Machine Translation: A Method to Reduce Meaning Loss
Reuben Cohn-Gordon | Noah Goodman
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

A desideratum of high-quality translation systems is that they preserve meaning, in the sense that two sentences with different meanings should not translate to one and the same sentence in another language. However, state-of-the-art systems often fail in this regard, particularly in cases where the source and target languages partition the “meaning space” in different ways. For instance, “I cut my finger.” and “I cut my finger off.” describe different states of the world but are translated to French (by both Fairseq and Google Translate) as “Je me suis coupé le doigt.”, which is ambiguous as to whether the finger is detached. More generally, translation systems are typically many-to-one (non-injective) functions from source to target language, which in many cases results in important distinctions in meaning being lost in translation. Building on Bayesian models of informative utterance production, we present a method to define a less ambiguous translation system in terms of an underlying pre-trained neural sequence-to-sequence model. This method increases injectivity, resulting in greater preservation of meaning as measured by improvement in cycle-consistency, without impeding translation quality (measured by BLEU score).

pdf
An Incremental Iterated Response Model of Pragmatics
Reuben Cohn-Gordon | Noah Goodman | Christopher Potts
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

2018

pdf
Pragmatically Informative Image Captioning with Character-Level Inference
Reuben Cohn-Gordon | Noah Goodman | Christopher Potts
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We combine a neural image captioner with a Rational Speech Acts (RSA) model to make a system that is pragmatically informative: its objective is to produce captions that are not merely true but also distinguish their inputs from similar images. Previous attempts to combine RSA with neural image captioning require an inference which normalizes over the entire set of possible utterances. This poses a serious problem of efficiency, previously solved by sampling a small subset of possible utterances. We instead solve this problem by implementing a version of RSA which operates at the level of characters (“a”, “b”, “c”, ...) during the unrolling of the caption. We find that the utterance-level effect of referential captions can be obtained with only character-level decisions. Finally, we introduce an automatic method for testing the performance of pragmatic speaker models, and show that our model outperforms a non-pragmatic baseline as well as a word-level RSA captioner.