Scott Grigsby


2024

pdf
Adding multimodal capabilities to a text-only translation model
Vipin Vijayan | Braeden Bowen | Scott Grigsby | Timothy Anderson | Jeremy Gwinnup
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

While most current work in multimodal machine translation (MMT) uses the Multi30k dataset for training and evaluation, we find that the resulting models overfit to the Multi30k dataset to an extreme degree. Consequently, these models perform very badly when evaluated against typical text-only testing sets such as the newstest datasets. In order to perform well on both Multi30k and typical text-only datasets, we use a performant text-only machine translation (MT) model as the starting point of our MMT model. We add vision-text adapter layers connected via gating mechanisms to the MT model, and incrementally transform the MT model into an MMT model by 1) pre-training using vision-based masking of the source text and 2) fine-tuning on Multi30k. We achieve a state-of-the-art performance on the Multi30k 2016 en-de test set of 46.5 BLEU4 score and 0.61 CoMMuTE score via this approach while retaining the performance of the original text-only MT model against the newstest dataset.

pdf
Detecting concrete visual tokens for Multimodal Machine Translation
Braeden Bowen | Vipin Vijayan | Scott Grigsby | Timothy Anderson | Jeremy Gwinnup
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

The challenge of visual grounding and masking in multimodal machine translation (MMT) systems has encouraged varying approaches to the detection and selection of visually-grounded text tokens for masking. We introduce new methods for detection of visually and contextually relevant (concrete) tokens from source sentences, including detection with natural language processing (NLP), detection with object detection, and a joint detection-verification technique. We also introduce new methods for selection of detected tokens, including shortest n tokens, longest n tokens, and all detected concrete tokens. We utilize the GRAM MMT architecture to train models against synthetically collated multimodal datasets of source images with masked sentences, showing performance improvements and improved usage of visual context during translation tasks over the baseline model.

2022

pdf
Semantic Novelty Detection and Characterization in Factual Text Involving Named Entities
Nianzu Ma | Sahisnu Mazumder | Alexander Politowicz | Bing Liu | Eric Robertson | Scott Grigsby
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Much of the existing work on text novelty detection has been studied at the topic level, i.e., identifying whether the topic of a document or a sentence is novel or not. Little work has been done at the fine-grained semantic level (or contextual level). For example, given that we know Elon Musk is the CEO of a technology company, the sentence “Elon Musk acted in the sitcom The Big Bang Theory” is novel and surprising because normally a CEO would not be an actor. Existing topic-based novelty detection methods work poorly on this problem because they do not perform semantic reasoning involving relations between named entities in the text and their background knowledge. This paper proposes an effective model (called PAT-SND) to solve the problem, which can also characterize the novelty. An annotated dataset is also created. Evaluation shows that PAT-SND outperforms 10 baselines by large margins.

2021

pdf
Semantic Novelty Detection in Natural Language Descriptions
Nianzu Ma | Alexander Politowicz | Sahisnu Mazumder | Jiahua Chen | Bing Liu | Eric Robertson | Scott Grigsby
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper proposes to study a fine-grained semantic novelty detection task, which can be illustrated with the following example. It is normal that a person walks a dog in the park, but if someone says “A man is walking a chicken in the park”, it is novel. Given a set of natural language descriptions of normal scenes, we want to identify descriptions of novel scenes. We are not aware of any existing work that solves the problem. Although existing novelty or anomaly detection algorithms are applicable, since they are usually topic-based, they perform poorly on our fine-grained semantic novelty detection task. This paper proposes an effective model (called GAT-MA) to solve the problem and also contributes a new dataset. Experimental evaluation shows that GAT-MA outperforms 11 baselines by large margins.