Karen Livescu


2021

pdf bib
On Generalization in Coreference Resolution
Shubham Toshniwal | Patrick Xia | Sam Wiseman | Karen Livescu | Kevin Gimpel
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains. We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and metadata differ, we propose a method for jointly training a single model on this heterogeneous data mixture by using data augmentation to account for annotation differences and sampling to balance the data quantities. We find that in a zero-shot setting, models trained on a single dataset transfer poorly while joint training yields improved overall performance, leading to better generalization in coreference resolution models. This work contributes a new benchmark for robust coreference resolution and multiple new state-of-the-art results.

pdf bib
Substructure Substitution: Structured Data Augmentation for NLP
Haoyue Shi | Karen Livescu | Kevin Gimpel
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
On the Role of Supervision in Unsupervised Constituency Parsing
Haoyue Shi | Karen Livescu | Kevin Gimpel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the parsing F1 score on the Wall Street Journal (WSJ) development set (1,700 sentences). We introduce strong baselines for them, by training an existing supervised parsing model (Kitaev and Klein, 2018) on the same labeled examples they access. When training on the 1,700 examples, or even when using only 50 examples for training and 5 for development, such a few-shot parsing approach can outperform all the unsupervised parsing methods by a significant margin. Few-shot parsing can be further improved by a simple data augmentation method and self-training. This suggests that, in order to arrive at fair conclusions, we should carefully consider the amount of labeled data used for model development. We propose two protocols for future work on unsupervised parsing: (i) use fully unsupervised criteria for hyperparameter tuning and model selection; (ii) use as few labeled examples as possible for model development, and compare to few-shot parsing trained on the same labeled examples.

pdf bib
Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks
Shubham Toshniwal | Sam Wiseman | Allyson Ettinger | Karen Livescu | Kevin Gimpel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Long document coreference resolution remains a challenging task due to the large memory and runtime requirements of current models. Recent work doing incremental coreference resolution using just the global representation of entities shows practical benefits but requires keeping all entities in memory, which can be impractical for long documents. We argue that keeping all entities in memory is unnecessary, and we propose a memory-augmented neural network that tracks only a small bounded number of entities at a time, thus guaranteeing a linear runtime in length of document. We show that (a) the model remains competitive with models with high memory and computational requirements on OntoNotes and LitBank, and (b) the model learns an efficient memory management strategy easily outperforming a rule-based strategy

pdf bib
A Cross-Task Analysis of Text Span Representations
Shubham Toshniwal | Haoyue Shi | Bowen Shi | Lingyu Gao | Karen Livescu | Kevin Gimpel
Proceedings of the 5th Workshop on Representation Learning for NLP

Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution. While extensive research has focused on functional architectures for representing words and sentences, there is less work on representing arbitrary spans of text within sentences. In this paper, we conduct a comprehensive empirical evaluation of six span representation methods using eight pretrained language representation models across six tasks, including two tasks that we introduce. We find that, although some simple span representations are fairly reliable across tasks, in general the optimal span representation varies by task, and can also vary within different facets of individual tasks. We also find that the choice of span representation has a bigger impact with a fixed pretrained encoder than with a fine-tuned encoder.

pdf bib
Discrete Latent Variable Representations for Low-Resource Text Classification
Shuning Jin | Sam Wiseman | Karl Stratos | Karen Livescu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient. We consider several approaches to learning discrete latent variable models for text in the case where exact marginalization over these variables is intractable. We compare the performance of the learned representations as features for low-resource document and sentence classification. Our best models outperform the previous best reported results with continuous representations in these low-resource settings, while learning significantly more compressed representations. Interestingly, we find that an amortized variant of Hard EM performs particularly well in the lowest-resource regimes.

pdf bib
PeTra: A Sparsely Supervised Memory Model for People Tracking
Shubham Toshniwal | Allyson Ettinger | Kevin Gimpel | Karen Livescu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose PeTra, a memory-augmented neural network designed to track entities in its memory slots. PeTra is trained using sparse annotation from the GAP pronoun resolution dataset and outperforms a prior memory model on the task while using a simpler architecture. We empirically compare key modeling choices, finding that we can simplify several aspects of the design of the memory module while retaining strong performance. To measure the people tracking capability of memory models, we (a) propose a new diagnostic evaluation based on counting the number of unique entities in text, and (b) conduct a small scale human evaluation to compare evidence of people tracking in the memory logs of PeTra relative to a previous approach. PeTra is highly effective in both evaluations, demonstrating its ability to track people in its memory despite being trained with limited annotation.

2019

pdf bib
Visually Grounded Neural Syntax Acquisition
Haoyue Shi | Jiayuan Mao | Kevin Gimpel | Karen Livescu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without any explicit supervision. The model learns by looking at natural images and reading paired captions. VG-NSL generates constituency parse trees of texts, recursively composes representations for constituents, and matches them with images. We define concreteness of constituents by their matching scores with images, and use it to guide the parsing of text. Experiments on the MSCOCO data set show that VG-NSL outperforms various unsupervised parsing approaches that do not use visual grounding, in terms of F1 scores against gold parse trees. We find that VGNSL is much more stable with respect to the choice of random initialization and the amount of training data. We also find that the concreteness acquired by VG-NSL correlates well with a similar measure defined by linguists. Finally, we also apply VG-NSL to multiple languages in the Multi30K data set, showing that our model consistently outperforms prior unsupervised approaches.

pdf bib
Pre-training on high-resource speech recognition improves low-resource speech-to-text translation
Sameer Bansal | Herman Kamper | Karen Livescu | Adam Lopez | Sharon Goldwater
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve Spanish English ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data are available. Through an ablation study, we find that the pre-trained encoder (acoustic model) accounts for most of the improvement, despite the fact that the shared language in these tasks is the target language text, not the source language audio. Applying this insight, we show that pre-training on ASR helps ST even when the ASR language differs from both source and target ST languages: pre-training on French ASR also improves Spanish-English ST. Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3.5 to 7.1 BLEU.

2018

pdf bib
Variational Sequential Labelers for Semi-Supervised Learning
Mingda Chen | Qingming Tang | Karen Livescu | Kevin Gimpel
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We introduce a family of multitask variational methods for semi-supervised sequence labeling. Our model family consists of a latent-variable generative model and a discriminative labeler. The generative models use latent variables to define the conditional probability of a word given its context, drawing inspiration from word prediction objectives commonly used in learning word embeddings. The labeler helps inject discriminative information into the latent space. We explore several latent variable configurations, including ones with hierarchical structure, which enables the model to account for both label-specific and word-specific information. Our models consistently outperform standard sequential baselines on 8 sequence labeling datasets, and improve further with unlabeled data.

pdf bib
Parsing Speech: a Neural Approach to Integrating Lexical and Acoustic-Prosodic Information
Trang Tran | Shubham Toshniwal | Mohit Bansal | Kevin Gimpel | Karen Livescu | Mari Ostendorf
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.

2017

pdf bib
Learning to Embed Words in Context for Syntactic Tasks
Lifu Tu | Kevin Gimpel | Karen Livescu
Proceedings of the 2nd Workshop on Representation Learning for NLP

We present models for embedding words in the context of surrounding words. Such models, which we refer to as token embeddings, represent the characteristics of a word that are specific to a given context, such as word sense, syntactic category, and semantic role. We explore simple, efficient token embedding models based on standard neural network architectures. We learn token embeddings on a large amount of unannotated text and evaluate them as features for part-of-speech taggers and dependency parsers trained on much smaller amounts of annotated data. We find that predictors endowed with token embeddings consistently outperform baseline predictors across a range of context window and training set sizes.

2016

pdf bib
Mapping Unseen Words to Task-Trained Embedding Spaces
Pranava Swaroop Madhyastha | Mohit Bansal | Kevin Gimpel | Karen Livescu
Proceedings of the 1st Workshop on Representation Learning for NLP

pdf bib
Charagram: Embedding Words and Sentences via Character n-grams
John Wieting | Mohit Bansal | Kevin Gimpel | Karen Livescu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
From Paraphrase Database to Compositional Paraphrase Model and Back
John Wieting | Mohit Bansal | Kevin Gimpel | Karen Livescu
Transactions of the Association for Computational Linguistics, Volume 3

The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates. However, it is still unclear how it can best be used, due to the heuristic nature of the confidences and its necessarily incomplete coverage. We propose models to leverage the phrase pairs from the PPDB to build parametric paraphrase models that score paraphrase pairs more accurately than the PPDB’s internal scores while simultaneously improving its coverage. They allow for learning phrase embeddings as well as improved word embeddings. Moreover, we introduce two new, manually annotated datasets to evaluate short-phrase paraphrasing models. Using our paraphrase model trained using PPDB, we achieve state-of-the-art results on standard word and bigram similarity tasks and beat strong baselines on our new short phrase paraphrase tasks.

pdf bib
Deep Multilingual Correlation for Improved Word Embeddings
Ang Lu | Weiran Wang | Mohit Bansal | Kevin Gimpel | Karen Livescu
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Revisiting Word Neighborhoods for Speech Recognition
Preethi Jyothi | Karen Livescu
Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM

pdf bib
Tailoring Continuous Word Representations for Dependency Parsing
Mohit Bansal | Kevin Gimpel | Karen Livescu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
David Yarowsky | Timothy Baldwin | Anna Korhonen | Karen Livescu | Steven Bethard
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
Hao Tang | Joseph Keshet | Karen Livescu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2010

pdf bib
Domain Adaptation with Unlabeled Data for Dialog Act Tagging
Anna Margolis | Karen Livescu | Mari Ostendorf
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

2008

pdf bib
Invited talk: Phonological Models in Automatic Speech Recognition
Karen Livescu
Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology

2004

pdf bib
Feature-based Pronunciation Modeling for Speech Recognition
Karen Livescu | James Glass
Proceedings of HLT-NAACL 2004: Short Papers