Sam Wiseman


2021

pdf bib
On Generalization in Coreference Resolution
Shubham Toshniwal | Patrick Xia | Sam Wiseman | Karen Livescu | Kevin Gimpel
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains. We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and metadata differ, we propose a method for jointly training a single model on this heterogeneous data mixture by using data augmentation to account for annotation differences and sampling to balance the data quantities. We find that in a zero-shot setting, models trained on a single dataset transfer poorly while joint training yields improved overall performance, leading to better generalization in coreference resolution models. This work contributes a new benchmark for robust coreference resolution and multiple new state-of-the-art results.

pdf bib
WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections
Mingda Chen | Sam Wiseman | Kevin Gimpel
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Data-to-text Generation by Splicing Together Nearest Neighbors
Sam Wiseman | Arturs Backurs | Karl Stratos
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We propose to tackle data-to-text generation tasks by directly splicing together retrieved segments of text from “neighbor” source-target pairs. Unlike recent work that conditions on retrieved neighbors but generates text token-by-token, left-to-right, we learn a policy that directly manipulates segments of neighbor text, by inserting or replacing them in partially constructed generations. Standard techniques for training such a policy require an oracle derivation for each generation, and we prove that finding the shortest such derivation can be reduced to parsing under a particular weighted context-free grammar. We find that policies learned in this way perform on par with strong baselines in terms of automatic and human evaluation, but allow for more interpretable and controllable generation.

2020

pdf bib
Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks
Shubham Toshniwal | Sam Wiseman | Allyson Ettinger | Karen Livescu | Kevin Gimpel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Long document coreference resolution remains a challenging task due to the large memory and runtime requirements of current models. Recent work doing incremental coreference resolution using just the global representation of entities shows practical benefits but requires keeping all entities in memory, which can be impractical for long documents. We argue that keeping all entities in memory is unnecessary, and we propose a memory-augmented neural network that tracks only a small bounded number of entities at a time, thus guaranteeing a linear runtime in length of document. We show that (a) the model remains competitive with models with high memory and computational requirements on OntoNotes and LitBank, and (b) the model learns an efficient memory management strategy easily outperforming a rule-based strategy

pdf bib
ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation
Lifu Tu | Richard Yuanzhe Pang | Sam Wiseman | Kevin Gimpel
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model. In particular, we view our non-autoregressive translation system as an inference network (Tu and Gimpel, 2018) trained to minimize the autoregressive teacher energy. This contrasts with the popular approach of training a non-autoregressive model on a distilled corpus consisting of the beam-searched outputs of such a teacher model. Our approach, which we call ENGINE (ENerGy-based Inference NEtworks), achieves state-of-the-art non-autoregressive results on the IWSLT 2014 DE-EN and WMT 2016 RO-EN datasets, approaching the performance of autoregressive models.

pdf bib
Discrete Latent Variable Representations for Low-Resource Text Classification
Shuning Jin | Sam Wiseman | Karl Stratos | Karen Livescu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient. We consider several approaches to learning discrete latent variable models for text in the case where exact marginalization over these variables is intractable. We compare the performance of the learned representations as features for low-resource document and sentence classification. Our best models outperform the previous best reported results with continuous representations in these low-resource settings, while learning significantly more compressed representations. Interestingly, we find that an amortized variant of Hard EM performs particularly well in the lowest-resource regimes.

2019

pdf bib
Label-Agnostic Sequence Labeling by Copying Nearest Neighbors
Sam Wiseman | Karl Stratos
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Retrieve-and-edit based approaches to structured prediction, where structures associated with retrieved neighbors are edited to form new structures, have recently attracted increased interest. However, much recent work merely conditions on retrieved structures (e.g., in a sequence-to-sequence framework), rather than explicitly manipulating them. We show we can perform accurate sequence labeling by explicitly (and only) copying labels from retrieved neighbors. Moreover, because this copying is label-agnostic, we can achieve impressive performance in zero-shot sequence-labeling tasks. We additionally consider a dynamic programming approach to sequence labeling in the presence of retrieved neighbors, which allows for controlling the number of distinct (copied) segments used to form a prediction, and leads to both more interpretable and accurate predictions.

pdf bib
Controllable Paraphrase Generation with a Syntactic Exemplar
Mingda Chen | Qingming Tang | Sam Wiseman | Kevin Gimpel
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Prior work on controllable text generation usually assumes that the controlled attribute can take on one of a small set of values known a priori. In this work, we propose a novel task, where the syntax of a generated sentence is controlled rather by a sentential exemplar. To evaluate quantitatively with standard metrics, we create a novel dataset with human annotations. We also develop a variational model with a neural module specifically designed for capturing syntactic knowledge and several multitask training objectives to promote disentangled representation learning. Empirically, the proposed model is observed to achieve improvements over baselines and learn to capture desirable characteristics.

pdf bib
A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations
Mingda Chen | Qingming Tang | Sam Wiseman | Kevin Gimpel
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a generative model for a sentence that uses two latent variables, with one intended to represent the syntax of the sentence and the other to represent its semantics. We show we can achieve better disentanglement between semantic and syntactic representations by training with multiple losses, including losses that exploit aligned paraphrastic sentences and word-order information. We evaluate our models on standard semantic similarity tasks and novel syntactic similarity tasks. Empirically, we find that the model with the best performing syntactic and semantic representations also gives rise to the most disentangled representations.

2018

pdf bib
Entity Tracking Improves Cloze-style Reading Comprehension
Luong Hoang | Sam Wiseman | Alexander Rush
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent work has improved on modeling for reading comprehension tasks with simple approaches such as the Attention Sum-Reader; however, automatic systems still significantly trail human performance. Analysis suggests that many of the remaining hard instances are related to the inability to track entity-references throughout documents. This work focuses on these hard entity tracking cases with two extensions: (1) additional entity features, and (2) training with a multi-task tracking objective. We show that these simple modifications improve performance both independently and in combination, and we outperform the previous state of the art on the LAMBADA dataset by 8 pts, particularly on difficult entity examples. We also effectively match the performance of more complicated models on the named entity portion of the CBT dataset.

pdf bib
Learning Neural Templates for Text Generation
Sam Wiseman | Stuart Shieber | Alexander Rush
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation. Encoder-decoder models are largely (a) uninterpretable, and (b) difficult to control in terms of their phrasing or content. This work proposes a neural generation system using a hidden semi-markov model (HSMM) decoder, which learns latent, discrete templates jointly with learning to generate. We show that this model learns useful templates, and that these templates make generation both more interpretable and controllable. Furthermore, we show that this approach scales to real data sets and achieves strong performance nearing that of encoder-decoder text generation models.

bib
Deep Latent Variable Models of Natural Language
Alexander Rush | Yoon Kim | Sam Wiseman
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

The proposed tutorial will cover deep latent variable models both in the case where exact inference over the latent variables is tractable and when it is not. The former case includes neural extensions of unsupervised tagging and parsing models. Our discussion of the latter case, where inference cannot be performed tractably, will restrict itself to continuous latent variables. In particular, we will discuss recent developments both in neural variational inference (e.g., relating to Variational Auto-encoders) and in implicit density modeling (e.g., relating to Generative Adversarial Networks). We will highlight the challenges of applying these families of methods to NLP problems, and discuss recent successes and best practices.

2017

pdf bib
Challenges in Data-to-Document Generation
Sam Wiseman | Stuart Shieber | Alexander Rush
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

2016

pdf bib
Learning Global Features for Coreference Resolution
Sam Wiseman | Alexander M. Rush | Stuart M. Shieber
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Antecedent Prediction Without a Pipeline
Sam Wiseman | Alexander M. Rush | Stuart Shieber
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

pdf bib
Sequence-to-Sequence Learning as Beam-Search Optimization
Sam Wiseman | Alexander M. Rush
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Learning Anaphoricity and Antecedent Ranking Features for Coreference Resolution
Sam Wiseman | Alexander M. Rush | Stuart Shieber | Jason Weston
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)