Chris Dyer

Also published as: Christopher Dyer, Christopher J. Dyer

2023

Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning.

2022

pdf abs
Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale
Laurent Sartran | Samuel Barrett | Adhiguna Kuncoro | Miloš Stanojević | Phil Blunsom | Chris Dyer
Transactions of the Association for Computational Linguistics, Volume 10

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism—one that is independent of composed syntactic representations—plays an important role in current successful models of long text.

2021

pdf abs
Game-theoretic Vocabulary Selection via the Shapley Value and Banzhaf Index
Roma Patel | Marta Garnelo | Ian Gemp | Chris Dyer | Yoram Bachrach
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The input vocabulary and the representations learned are crucial to the performance of neural NLP models. Using the full vocabulary results in less explainable and more memory intensive models, with the embedding layer often constituting the majority of model parameters. It is thus common to use a smaller vocabulary to lower memory requirements and construct more interpertable models. We propose a vocabulary selection method that views words as members of a team trying to maximize the model’s performance. We apply power indices from cooperative game theory, including the Shapley value and Banzhaf index, that measure the relative importance of individual team members in accomplishing a joint task. We approximately compute these indices to identify the most influential words. Our empirical evaluation examines multiple NLP tasks, including sentence and document classification, question answering and textual entailment. We compare to baselines that select words based on frequency, TF-IDF and regression coefficients under L1 regularization, and show that this game-theoretic vocabulary selection outperforms all baseline on a range of different tasks and datasets.

pdf
Better Chinese Sentence Segmentation with Reinforcement Learning
Srivatsan Srinivasan | Chris Dyer
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf abs
Diverse Pretrained Context Encodings Improve Document Translation
Domenic Donato | Lei Yu | Chris Dyer
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We propose a new architecture for adapting a sentence-level sequence-to-sequence transformer by incorporating multiple pre-trained document context signals and assess the impact on translation performance of (1) different pretraining approaches for generating these signals, (2) the quantity of parallel data for which document context is available, and (3) conditioning on source, target, or source and target contexts. Experiments on the NIST Chinese-English, and IWSLT and WMT English-German tasks support four general conclusions: that using pre-trained context representations markedly improves sample efficiency, that adequate parallel data resources are crucial for learning to use document context, that jointly conditioning on multiple context representations outperforms any single representation, and that source context is more valuable for translation performance than target side context. Our best multi-context model consistently outperforms the best existing context-aware transformers.

2020

pdf abs
Learning Robust and Multilingual Speech Representations
Kazuya Kawakami | Luyu Wang | Chris Dyer | Phil Blunsom | Aaron van den Oord
Findings of the Association for Computational Linguistics: EMNLP 2020

Unsupervised speech representation learning has shown remarkable success at finding representations that correlate with phonetic structures and improve downstream speech recognition performance. However, most research has been focused on evaluating the representations in terms of their ability to improve the performance of speech recognition systems on read English (e.g. Wall Street Journal and LibriSpeech). This evaluation methodology overlooks two important desiderata that speech representations should have: robustness to domain shifts and transferability to other languages. In this paper we learn representations from up to 8000 hours of diverse and noisy speech data and evaluate the representations by looking at their robustness to domain shifts and their ability to improve recognition performance in many languages. We find that our representations confer significant robustness advantages to the resulting recognition systems: we see significant improvements in out-of-domain transfer relative to baseline feature sets and the features likewise provide improvements in 25 phonetically diverse languages.

We show that Bayes’ rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents a compelling benefit because parallel documents are not always available. In our formulation, the posterior probability of a candidate translation is the product of the unconditional (prior) probability of the candidate output document and the “reverse translation probability” of translating the candidate output back into the source language. Our proposed model uses a powerful autoregressive language model as the prior on target language documents, but it assumes that each sentence is translated independently from the target to the source language. Crucially, at test time, when a source document is observed, the document language model prior induces dependencies between the translations of the source sentences in the posterior. The model’s independence assumption not only enables efficient use of available data, but it additionally admits a practical left-to-right beam-search algorithm for carrying out inference. Experiments show that our model benefits from using cross-sentence context in the language model, and it outperforms existing document translation approaches.

Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Hence, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they still benefit from more explicit syntactic biases. To answer this question, we introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining, by distilling the syntactically informative predictions of a hierarchical—albeit harder to scale—syntactic language model. Since BERT models masked words in bidirectional context, we propose to distill the approximate marginal distribution over words in context from the syntactic LM. Our approach reduces relative error by 2–21% on a diverse set of structured prediction tasks, although we obtain mixed results on the GLUE benchmark. Our findings demonstrate the benefits of syntactic biases, even for representation learners that exploit large amounts of data, and contribute to a better understanding of where syntactic biases are helpful in benchmarks of natural language understanding.

This paper describes the DeepMind submission to the Chinese→English constrained data track of the WMT2020 Shared Task on News Translation. The submission employs a noisy channel factorization as the backbone of a document translation system. This approach allows the flexible combination of a number of independent component models which are further augmented with back-translation, distillation, fine-tuning with in-domain data, Monte-Carlo Tree Search decoding, and improved uncertainty estimation. In order to address persistent issues with the premature truncation of long sequences we included specialized length models and sentence segmentation techniques. Our final system provides a 9.9 BLEU points improvement over a baseline Transformer on our test set (newstest 2019).

We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision used in training, and we find that both task structure and narrative language provide large benefits in segmentation quality.

pdf abs
A Probabilistic Generative Model for Typographical Analysis of Early Modern Printing
Kartik Goyal | Chris Dyer | Christopher Warren | Maxwell G’Sell | Taylor Berg-Kirkpatrick
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a deep and interpretable probabilistic generative model to analyze glyph shapes in printed Early Modern documents. We focus on clustering extracted glyph images into underlying templates in the presence of multiple confounding sources of variance. Our approach introduces a neural editor model that first generates well-understood printing phenomena like spatial perturbations from template parameters via interpertable latent variables, and then modifies the result by generating a non-interpretable latent vector responsible for inking variations, jitter, noise from the archiving process, and other unforeseen phenomena associated with Early Modern printing. Critically, by introducing an inference network whose input is restricted to the visual residual between the observation and the interpretably-modified template, we are able to control and isolate what the vector-valued latent variable captures. We show that our approach outperforms rigid interpretable clustering baselines (c.f. Ocular) and overly-flexible deep generative models (VAE) alike on the task of completely unsupervised discovery of typefaces in mixed-fonts documents.

2019

pdf abs
Compound Probabilistic Context-Free Grammars for Grammar Induction
Yoon Kim | Chris Dyer | Alexander Rush
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our context-free rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this context-dependent grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods for grammar induction from words with neural language models.

pdf abs
Scalable Syntax-Aware Language Models Using Knowledge Distillation
Adhiguna Kuncoro | Chris Dyer | Laura Rimell | Stephen Clark | Phil Blunsom
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of training data. To answer this question, we introduce an efficient knowledge distillation (KD) technique that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the LSTM to develop a more structurally sensitive representation of the larger training data it learns from. On targeted syntactic evaluations, we find that, while sequential LSTMs perform much better than previously reported, our proposed technique substantially improves on this baseline, yielding a new state of the art. Our findings and analysis affirm the importance of structural biases, even in models that learn from large amounts of data.

pdf abs
Learning to Discover, Ground and Use Words with Segmental Neural Language Models
Kazuya Kawakami | Chris Dyer | Phil Blunsom
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a segmental neural language model that combines the generalization power of neural networks with the ability to discover word-like units that are latent in unsegmented character sequences. In contrast to previous segmentation models that treat word segmentation as an isolated task, our model unifies word discovery, learning how words fit together to form sentences, and, by conditioning the model on visual context, how words’ meanings ground in representations of nonlinguistic modalities. Experiments show that the unconditional model learns predictive distributions better than character LSTM models, discovers words competitively with nonparametric Bayesian word segmentation models, and that modeling language conditional on visual context improves performance on both.

pdf abs
Unsupervised Recurrent Neural Network Grammars
Yoon Kim | Alexander Rush | Lei Yu | Adhiguna Kuncoro | Chris Dyer | Gábor Melis
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.

pdf abs
An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search
Kartik Goyal | Chris Dyer | Taylor Berg-Kirkpatrick
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of inexact search, globally normalized neural models are still more effective than their locally normalized counterparts. Further, since our training approach is sensitive to warm-starting with pre-trained models, we also propose a novel initialization strategy based on self-normalization for pre-training globally normalized models. We perform analysis of our approach on two tasks: CCG supertagging and Machine Translation, and demonstrate the importance of global normalization under different conditions while using search-aware training.

pdf abs
Comparing Top-Down and Bottom-Up Neural Generative Dependency Models
Austin Matthews | Graham Neubig | Chris Dyer
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recurrent neural network grammars generate sentences using phrase-structure syntax and perform very well on both parsing and language modeling. To explore whether generative dependency models are similarly effective, we propose two new generative models of dependency syntax. Both models use recurrent neural nets to avoid making explicit independence assumptions, but they differ in the order used to construct the trees: one builds the tree bottom-up and the other top-down, which profoundly changes the estimation problem faced by the learner. We evaluate the two models on three typologically different languages: English, Arabic, and Japanese. While both generative models improve parsing performance over a discriminative baseline, they are significantly less effective than non-syntactic LSTM language models. Surprisingly, little difference between the construction orders is observed for either parsing or language modeling.

pdf abs
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Po-Sen Huang | Robert Stanforth | Johannes Welbl | Chris Dyer | Dani Yogatama | Sven Gowal | Krishnamurthy Dvijotham | Pushmeet Kohli
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate such brittleness, but these are unlikely to find worst-case adversaries due to the complexity of the search space arising from discrete text perturbations. In this work, we approach the problem from the opposite direction: to formally verify a system’s robustness against a predefined class of adversarial attacks. We study text classification under synonym replacements or character flip perturbations. We propose modeling these input perturbations as a simplex and then using Interval Bound Propagation – a formal model verification method. We modify the conventional log-likelihood training objective to train models that can be efficiently verified, which would otherwise come with exponential search complexity. The resulting models show only little difference in terms of nominal accuracy, but have much improved verified accuracy under perturbations and come with an efficiently computable formal guarantee on worst case adversaries.

pdf abs
Text Genre and Training Data Size in Human-like Parsing
John Hale | Adhiguna Kuncoro | Keith Hall | Chris Dyer | Jonathan Brennan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Domain-specific training typically makes NLP systems work better. We show that this extends to cognitive modeling as well by relating the states of a neural phrase-structure parser to electrophysiological measures from human participants. These measures were recorded as participants listened to a spoken recitation of the same literary text that was supplied as input to the neural parser. Given more training data, the system derives a better cognitive model — but only when the training examples come from the same textual genre. This finding is consistent with the idea that humans adapt syntactic expectations to particular genres during language comprehension (Kaan and Chun, 2018; Branigan and Pickering, 2017).

2018

Reading comprehension (RC)—in contrast to information retrieval—requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecting answers using superficial information (e.g., local context similarity or global term frequency); they thus fail to test for the essential integrative aspect of RC. To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. We show that although humans solve the tasks easily, standard RC models struggle on the tasks presented here. We provide an analysis of the dataset and the challenges it presents.

pdf abs
LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better
Adhiguna Kuncoro | Chris Dyer | John Hale | Dani Yogatama | Stephen Clark | Phil Blunsom
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Language exhibits hierarchical structure, but recent work using a subject-verb agreement diagnostic argued that state-of-the-art language models, LSTMs, fail to learn long-range syntax sensitive dependencies. Using the same diagnostic, we show that, in fact, LSTMs do succeed in learning such dependencies—provided they have enough capacity. We then explore whether models that have access to explicit syntactic information learn agreement more effectively, and how the way in which this structural information is incorporated into the model impacts performance. We find that the mere presence of syntactic information does not improve accuracy, but when model architecture is determined by syntax, number agreement is improved. Further, we find that the choice of how syntactic structure is built affects how well number agreement is learned: top-down construction outperforms left-corner and bottom-up variants in capturing non-local structural dependencies.

pdf abs
Finding syntax in human encephalography with beam search
John Hale | Chris Dyer | Adhiguna Kuncoro | Jonathan Brennan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recurrent neural network grammars (RNNGs) are generative models of (tree , string ) pairs that rely on neural networks to evaluate derivational choices. Parsing with them using beam search yields a variety of incremental complexity metrics such as word surprisal and parser action count. When used as regressors against human electrophysiological responses to naturalistic text, they derive two amplitude effects: an early peak and a P600-like later peak. By contrast, a non-syntactic neural language model yields no reliable effects. Model comparisons attribute the early peak to syntactic composition within the RNNG. This pattern of results recommends the RNNG+beam search combination as a mechanistic model of the syntactic processing that occurs during normal human language comprehension.

pdf abs
Using Morphological Knowledge in Open-Vocabulary Neural Language Models
Austin Matthews | Graham Neubig | Chris Dyer
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Languages with productive morphology pose problems for language models that generate words from a fixed vocabulary. Although character-based models allow any possible word type to be generated, they are linguistically naïve: they must discover that words exist and are delimited by spaces—basic linguistic facts that are built in to the structure of word-based models. We introduce an open-vocabulary language model that incorporates more sophisticated linguistic knowledge by predicting words using a mixture of three generative processes: (1) by generating words as a sequence of characters, (2) by directly generating full word forms, and (3) by generating words as a sequence of morphemes that are combined using a hand-written morphological analyzer. Experiments on Finnish, Turkish, and Russian show that our model outperforms character sequence models and other strong baselines on intrinsic and extrinsic measures. Furthermore, we show that our model learns to exploit morphological knowledge encoded in the analyzer, and, as a byproduct, it can perform effective unsupervised morphological disambiguation.

We introduce the syntactic scaffold, an approach to incorporating syntactic information into semantic tasks. Syntactic scaffolds avoid expensive syntactic processing at runtime, only making use of a treebank during training, through a multitask objective. We improve over strong baselines on PropBank semantics, frame semantics, and coreference resolution, achieving competitive performance on all three tasks.

2017

pdf abs
Reference-Aware Language Models
Zichao Yang | Phil Blunsom | Chris Dyer | Wang Ling
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a general class of language models that treat reference as discrete stochastic latent variables. This decision allows for the creation of entity mentions by accessing external databases of referents (required by, e.g., dialogue generation) or past internal state (required to explicitly model coreferentiality). Beyond simple copying, our coreference model can additionally refer to a referent using varied mention forms (e.g., a reference to “Jane” can be realized as “she”), a characteristic feature of reference in natural languages. Experiments on three representative applications show our model variants outperform models based on deterministic attention and standard language modeling baselines.

pdf abs
What Do Recurrent Neural Network Grammars Learn About Syntax?
Adhiguna Kuncoro | Miguel Ballesteros | Lingpeng Kong | Chris Dyer | Graham Neubig | Noah A. Smith
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Recurrent neural network grammars (RNNG) are a recently proposed probablistic generative modeling family for natural language. They show state-of-the-art language modeling and parsing performance. We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection. We find that explicit modeling of composition is crucial for achieving the best performance. Through the attention mechanism, we find that headedness plays a central role in phrasal representation (with the model’s latent attention largely agreeing with predictions made by hand-crafted head rules, albeit with some important differences). By training grammars without nonterminal labels, we find that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.

pdf abs
Neural Machine Translation with Recurrent Attention Modeling
Zichao Yang | Zhiting Hu | Yuntian Deng | Chris Dyer | Alex Smola
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future. We improve upon the attention model of Bahdanau et al. (2014) by explicitly modeling the relationship between previous and subsequent attention levels for each word using one recurrent network per input word. This architecture easily captures informative features, such as fertility and regularities in relative distortion. In experiments, we show our parameterization of attention improves translation quality.

pdf abs
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems
Wang Ling | Dani Yogatama | Chris Dyer | Phil Blunsom
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Solving algebraic word problems requires executing a series of arithmetic operations—a program—to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mathematical expressions that derive the final answer through a series of small steps. Although rationales do not explicitly specify programs, they provide a scaffolding for their structure via intermediate milestones. To evaluate our approach, we have created a new 100,000-sample dataset of questions, answers and rationales. Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs.

pdf abs
Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling
Kazuya Kawakami | Chris Dyer | Phil Blunsom
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Fixed-vocabulary language models fail to account for one of the most characteristic statistical facts of natural language: the frequent creation and reuse of new word types. Although character-level language models offer a partial solution in that they can create word types not attested in the training corpus, they do not capture the “bursty” distribution of such words. In this paper, we augment a hierarchical LSTM language model that generates sequences of word tokens character by character with a caching mechanism that learns to reuse previously generated words. To validate our model we construct a new open-vocabulary language modeling corpus (the Multilingual Wikipedia Corpus; MWC) from comparable Wikipedia articles in 7 typologically diverse languages and demonstrate the effectiveness of our model across this range of languages.

pdf abs
Ontology-Aware Token Embeddings for Prepositional Phrase Attachment
Pradeep Dasigi | Waleed Ammar | Chris Dyer | Eduard Hovy
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Type-level word embeddings use the same set of parameters to represent all instances of a word regardless of its context, ignoring the inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined in WordNet and represent a word token in a particular context by estimating a distribution over relevant semantic concepts. We use the new, context-sensitive embeddings in a model for predicting prepositional phrase (PP) attachments and jointly learn the concept embeddings and model parameters. We show that using context-sensitive embeddings improves the accuracy of the PP attachment model by 5.4% absolute points, which amounts to a 34.4% relative reduction in errors.

pdf abs
Differentiable Scheduled Sampling for Credit Assignment
Kartik Goyal | Chris Dyer | Taylor Berg-Kirkpatrick
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding in sequence-to-sequence (seq2seq) models. By incorporating this approximation into the scheduled sampling training procedure–a well-known technique for correcting exposure bias–we introduce a new training objective that is continuous and differentiable everywhere and can provide informative gradients near points where previous decoding decisions change their value. By using a related approximation, we also demonstrate a similar approach to sampled-based training. We show that our approach outperforms both standard cross-entropy training and scheduled sampling procedures in two sequence prediction tasks: named entity recognition and machine translation.

pdf bib abs
Greedy Transition-Based Dependency Parsing with Stack LSTMs
Miguel Ballesteros | Chris Dyer | Yoav Goldberg | Noah A. Smith
Computational Linguistics, Volume 43, Issue 2 - June 2017

We introduce a greedy transition-based parser that learns to represent parser states using recurrent neural networks. Our primary innovation that enables us to do this efficiently is a new control structure for sequential neural networks—the stack long short-term memory unit (LSTM). Like the conventional stack data structures used in transition-based parsers, elements can be pushed to or popped from the top of the stack in constant time, but, in addition, an LSTM maintains a continuous space embedding of the stack contents. Our model captures three facets of the parser’s state: (i) unbounded look-ahead into the buffer of incoming words, (ii) the complete history of transition actions taken by the parser, and (iii) the complete contents of the stack of partially built tree fragments, including their internal structures. In addition, we compare two different word representations: (i) standard word vectors based on look-up tables and (ii) character-based models of words. Although standard word embedding models work well in all languages, the character-based models improve the handling of out-of-vocabulary words, particularly in morphologically rich languages. Finally, we discuss the use of dynamic oracles in training the parser. During training, dynamic oracles alternate between sampling parser states from the training data and from the model as it is being learned, making the model more robust to the kinds of errors that will be made at test time. Training our model with dynamic oracles yields a linear-time greedy parser with very competitive performance.

pdf bib abs
Should Neural Network Architecture Reflect Linguistic Structure?
Chris Dyer
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

I explore the hypothesis that conventional neural network models (e.g., recurrent neural networks) are incorrectly biased for making linguistically sensible generalizations when learning, and that a better class of models is based on architectures that reflect hierarchical structures for which considerable behavioral evidence exists. I focus on the problem of modeling and representing the meanings of sentences. On the generation front, I introduce recurrent neural network grammars (RNNGs), a joint, generative model of phrase-structure trees and sentences. RNNGs operate via a recursive syntactic process reminiscent of probabilistic context-free grammar generation, but decisions are parameterized using RNNs that condition on the entire (top-down, left-to-right) syntactic derivation history, thus relaxing context-free independence assumptions, while retaining a bias toward explaining decisions via “syntactically local” conditioning contexts. Experiments show that RNNGs obtain better results in generating language than models that don’t exploit linguistic structure. On the representation front, I explore unsupervised learning of syntactic structures based on distant semantic supervision using a reinforcement-learning algorithm. The learner seeks a syntactic structure that provides a compositional architecture that produces a good representation for a downstream semantic task. Although the inferred structures are quite different from traditional syntactic analyses, the performance on the downstream tasks surpasses that of systems that use sequential RNNs and tree-structured RNNs based on treebank dependencies. This is joint work with Adhi Kuncoro, Dani Yogatama, Miguel Ballesteros, Phil Blunsom, Ed Grefenstette, Wang Ling, and Noah A. Smith.

2016

pdf bib
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP
Dipanjan Das | Chris Dyer | Manaal Faruqui | Yulia Tsvetkov
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP

pdf
Attention-based Multimodal Neural Machine Translation
Po-Yao Huang | Frederick Liu | Sz-Rung Shiang | Jean Oh | Chris Dyer
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
Manaal Faruqui | Yulia Tsvetkov | Pushpendre Rastogi | Chris Dyer
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

pdf
Correlation-based Intrinsic Evaluation of Word Vector Representations
Yulia Tsvetkov | Manaal Faruqui | Chris Dyer
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

pdf
Posterior regularization for Joint Modeling of Multiple Structured Prediction Tasks with Soft Constraints
Kartik Goyal | Chris Dyer
Proceedings of the Workshop on Structured Prediction for NLP

pdf
Mining Parallel Corpora from Sina Weibo and Twitter
Wang Ling | Luís Marujo | Chris Dyer | Alan W. Black | Isabel Trancoso
Computational Linguistics, Volume 42, Issue 2 - June 2016

pdf
Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning
Yulia Tsvetkov | Manaal Faruqui | Wang Ling | Brian MacWhinney | Chris Dyer
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Synthesizing Compound Words for Machine Translation
Austin Matthews | Eva Schlinger | Alon Lavie | Chris Dyer
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Cross-lingual Models of Word Embeddings: An Empirical Comparison
Shyam Upadhyay | Manaal Faruqui | Chris Dyer | Dan Roth
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf abs
Many Languages, One Parser
Waleed Ammar | George Mulcaire | Miguel Ballesteros | Chris Dyer | Noah A. Smith
Transactions of the Association for Computational Linguistics, Volume 4

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser’s performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.

pdf
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
Jeffrey Flanigan | Chris Dyer | Noah A. Smith | Jaime Carbonell
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
Recurrent Neural Network Grammars
Chris Dyer | Adhiguna Kuncoro | Miguel Ballesteros | Noah A. Smith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Neural Architectures for Named Entity Recognition
Guillaume Lample | Miguel Ballesteros | Sandeep Subramanian | Kazuya Kawakami | Chris Dyer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Morphological Inflection Generation Using Character Sequence to Sequence Learning
Manaal Faruqui | Yulia Tsvetkov | Graham Neubig | Chris Dyer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Generation from Abstract Meaning Representation using Tree Transducers
Jeffrey Flanigan | Chris Dyer | Noah A. Smith | Jaime Carbonell
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model
Trevor Cohn | Cong Duy Vu Hoang | Ekaterina Vymolova | Kaisheng Yao | Chris Dyer | Gholamreza Haffari
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning
Yulia Tsvetkov | Sunayana Sitaram | Manaal Faruqui | Guillaume Lample | Patrick Littell | David Mortensen | Alan W Black | Lori Levin | Chris Dyer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Hierarchical Attention Networks for Document Classification
Zichao Yang | Diyi Yang | Chris Dyer | Xiaodong He | Alex Smola | Eduard Hovy
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Generalizing and Hybridizing Count-based and Neural Language Models
Graham Neubig | Chris Dyer
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer Settings
Akash Bharadwaj | David Mortensen | Chris Dyer | Jaime Carbonell
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser
Adhiguna Kuncoro | Miguel Ballesteros | Lingpeng Kong | Chris Dyer | Noah A. Smith
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
Character Sequence Models for Colorful Words
Kazuya Kawakami | Chris Dyer | Bryan Routledge | Noah A. Smith
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
Training with Exploration Improves a Greedy Stack LSTM Parser
Miguel Ballesteros | Yoav Goldberg | Chris Dyer | Noah A. Smith
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
Transition-Based Dependency Parsing with Heuristic Backtracking
Jacob Buckman | Miguel Ballesteros | Chris Dyer
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

bib abs
Practical Neural Networks for NLP: From Theory to Code
Chris Dyer | Yoav Goldberg | Graham Neubig
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

This tutorial aims to bring NLP researchers up to speed with the current techniques in deep learning and neural networks, and show them how they can turn their ideas into practical implementations. We will start with simple classification models (logistic regression and multilayer perceptrons) and cover more advanced patterns that come up in NLP such as recurrent networks for sequence tagging and prediction problems, structured networks (e.g., compositional architectures based on syntax trees), structured output spaces (sequences and trees), attention for sequence-to-sequence transduction, and feature induction for complex algorithm states. A particular emphasis will be on learning to represent complex objects as recursive compositions of simpler objects. This representation will reflect characterize standard objects in NLP, such as the composition of characters and morphemes into words, and words into sentences and documents. In addition, new opportunities such as learning to embed "algorithm states" such as those used in transition-based parsing and other sequential structured prediction models (for which effective features may be difficult to engineer by hand) will be covered.Everything in the tutorial will be grounded in code — we will show how to program seemingly complex neural-net models using toolkits based on the computation-graph formalism. Computation graphs decompose complex computations into a DAG, with nodes representing inputs, target outputs, parameters, or (sub)differentiable functions (e.g., "tanh", "matrix multiply", and "softmax"), and edges represent data dependencies. These graphs can be run "forward" to make predictions and compute errors (e.g., log loss, squared error) and then "backward" to compute derivatives with respect to model parameters. In particular we'll cover the Python bindings of the CNN library. CNN has been designed from the ground up for NLP applications, dynamically structured NNs, rapid prototyping, and a transparent data and execution model.

pdf
Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs
Swabha Swayamdipta | Miguel Ballesteros | Chris Dyer | Noah A. Smith
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

pdf abs
Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
Patrick Littell | David R. Mortensen | Kartik Goyal | Chris Dyer | Lori Levin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition – capitalization – is absent, as the language’s Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.

pdf abs
The Role of Context in Neural Morphological Disambiguation
Qinlan Shen | Daniel Clothiaux | Emily Tagtow | Patrick Littell | Chris Dyer
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Languages with rich morphology often introduce sparsity in language processing tasks. While morphological analyzers can reduce this sparsity by providing morpheme-level analyses for words, they will often introduce ambiguity by returning multiple analyses for the same surface form. The problem of disambiguating between these morphological parses is further complicated by the fact that a correct parse for a word is not only be dependent on the surface form but also on other words in its context. In this paper, we present a language-agnostic approach to morphological disambiguation. We address the problem of using context in morphological disambiguation by presenting several LSTM-based neural architectures that encode long-range surface-level and analysis-level contextual dependencies. We applied our approach to Turkish, Russian, and Arabic to compare effectiveness across languages, matching state-of-the-art results in two of the three languages. Our results also demonstrate that while context plays a role in learning how to disambiguate, the type and amount of context needed varies between languages.

pdf abs
Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik
Patrick Littell | Kartik Goyal | David R. Mortensen | Alexa Little | Chris Dyer | Lori Levin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of “Linguistic Rapid Response” to potential emergency humanitarian relief situations. In the absence of large annotated corpora, parallel corpora, treebanks, bilingual lexica, etc., we found the following to be effective: exploiting distributional regularities in monolingual data, projecting information across closely related languages, and utilizing human linguist judgments. We show promising results on both a four-month exercise in Sorani and a two-day exercise in Tajik, achieved with minimal annotation costs.

pdf abs
PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors
David R. Mortensen | Patrick Littell | Akash Bharadwaj | Kartik Goyal | Chris Dyer | Lori Levin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper contributes to a growing body of evidence that—when coupled with appropriate machine-learning techniques–linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data. In particular, we show that phonological features outperform character-based models. PanPhon is a database relating over 5,000 IPA segments to 21 subsegmental articulatory features. We show that this database boosts performance in various NER-related tasks. Phonologically aware, neural CRF models built on PanPhon features are able to perform better on monolingual Spanish and Turkish NER tasks that character-based models. They have also been shown to work well in transfer models (as between Uzbek and Turkish). PanPhon features also contribute measurably to Orthography-to-IPA conversion tasks.

2015

pdf
Constraint-Based Models of Lexical Borrowing
Yulia Tsvetkov | Waleed Ammar | Chris Dyer
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models
Sujay Kumar Jauhar | Chris Dyer | Eduard Hovy
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Two/Too Simple Adaptations of Word2Vec for Syntax Problems
Wang Ling | Chris Dyer | Alan W. Black | Isabel Trancoso
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Unsupervised POS Induction with Word Embeddings
Chu-Cheng Lin | Waleed Ammar | Chris Dyer | Lori Levin
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Retrofitting Word Vectors to Semantic Lexicons
Manaal Faruqui | Jesse Dodge | Sujay Kumar Jauhar | Chris Dyer | Eduard Hovy | Noah A. Smith
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
A Supertag-Context Model for Weakly-Supervised CCG Parser Learning
Dan Garrette | Chris Dyer | Jason Baldridge | Noah A. Smith
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf
Transition-Based Dependency Parsing with Stack Long Short-Term Memory
Chris Dyer | Miguel Ballesteros | Wang Ling | Austin Matthews | Noah A. Smith
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Gaussian LDA for Topic Models with Word Embeddings
Rajarshi Das | Manzil Zaheer | Chris Dyer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Unifying Bayesian Inference and Vector Space Models for Improved Decipherment
Qing Dou | Ashish Vaswani | Kevin Knight | Chris Dyer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Sparse Overcomplete Word Vector Representations
Manaal Faruqui | Yulia Tsvetkov | Dani Yogatama | Chris Dyer | Noah A. Smith
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Lexicon Stratification for Translating Out-of-Vocabulary Words
Yulia Tsvetkov | Chris Dyer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf
Frame-Semantic Role Labeling with Heterogeneous Annotations
Meghana Kshirsagar | Sam Thomson | Nathan Schneider | Jaime Carbonell | Noah A. Smith | Chris Dyer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf
Non-distributional Word Vector Representations
Manaal Faruqui | Chris Dyer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf
Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs
Miguel Ballesteros | Chris Dyer | Noah A. Smith
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
Evaluation of Word Vector Representations by Subspace Alignment
Yulia Tsvetkov | Manaal Faruqui | Wang Ling | Guillaume Lample | Chris Dyer
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
Humor Recognition and Humor Anchor Extraction
Diyi Yang | Alon Lavie | Chris Dyer | Eduard Hovy
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
Book Reviews: Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax by Emily M. Bender
Chris Dyer
Computational Linguistics, Volume 41, Issue 1 - March 2015

2014

We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification. Results show that accuracy for automatic adjective type classification is high, but synsets are considerably more difficult to classify, even for trained human annotators. We release the manually annotated data, the classifier, and the induced supersense labeling of 12,304 WordNet adjective synsets.

pdf abs
A Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness
Archna Bhatia | Mandy Simons | Lori Levin | Yulia Tsvetkov | Chris Dyer | Jordan Bender
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a definiteness annotation scheme that captures the semantic, pragmatic, and discourse information, which we call communicative functions, associated with linguistic descriptions such as “a story about my speech”, “the story”, “every time I give it”, “this slideshow”. A survey of the literature suggests that definiteness does not express a single communicative function but is a grammaticalization of many such functions, for example, identifiability, familiarity, uniqueness, specificity. Our annotation scheme unifies ideas from previous research on definiteness while attempting to remove redundancy and make it easily annotatable. This annotation scheme encodes the communicative functions of definiteness rather than the grammatical forms of definiteness. We assume that the communicative functions are largely maintained across languages while the grammaticalization of this information may vary. One of the final goals is to use our semantically annotated corpora to discover how definiteness is grammaticalized in different languages. We release our annotated corpora for English and Hindi, and sample annotations for Hebrew and Russian, together with an annotation manual.

pdf abs
Dual Subtitles as Parallel Corpora
Shikun Zhang | Wang Ling | Chris Dyer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtitles present viewers with two languages simultaneously, and are generally aligned in the segment level, which removes the need to automatically perform this alignment. This is desirable as extracted parallel data does not contain alignment errors present in previous work that aligns different subtitle files for the same movie. We present a simple heuristic to detect and extract dual subtitles and show that more than 20 million sentence pairs can be extracted for the Mandarin-English language pair. We also show that extracting data from this source can be a viable solution for improving Machine Translation systems in the domain of subtitles.

pdf
Language Modeling with Power Low Rank Ensembles
Ankur P. Parikh | Avneesh Saluja | Chris Dyer | Eric Xing
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Latent-Variable Synchronous CFGs for Hierarchical Translation
Avneesh Saluja | Chris Dyer | Shay B. Cohen
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Metaphor Detection with Cross-Lingual Model Transfer
Yulia Tsvetkov | Leonid Boytsov | Anatole Gershman | Eric Nyberg | Chris Dyer
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
A Discriminative Graph-Based Parser for the Abstract Meaning Representation
Jeffrey Flanigan | Sam Thomson | Jaime Carbonell | Chris Dyer | Noah A. Smith
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Distributed Representations of Geographically Situated Language
David Bamman | Chris Dyer | Noah A. Smith
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Community Evaluation and Exchange of Word Vectors at wordvectors.org
Manaal Faruqui | Chris Dyer
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf
Simplified Dependency Annotations with GFL-Web
Michael T. Mordowanec | Nathan Schneider | Chris Dyer | Noah A. Smith
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf abs
Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut
Nathan Schneider | Emily Danchik | Chris Dyer | Noah A. Smith
Transactions of the Association for Computational Linguistics, Volume 2

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving nearly 60% F1 for MWE identification.

pdf abs
Locally Non-Linear Learning for Statistical Machine Translation via Discretization and Structured Regularization
Jonathan H. Clark | Chris Dyer | Alon Lavie
Transactions of the Association for Computational Linguistics, Volume 2

Linear models, which support efficient learning and inference, are the workhorses of statistical machine translation; however, linear decision rules are less attractive from a modeling perspective. In this work, we introduce a technique for learning arbitrary, rule-local, non-linear feature transforms that improve model expressivity, but do not sacrifice the efficient inference and learning associated with linear models. To demonstrate the value of our technique, we discard the customary log transform of lexical probabilities and drop the phrasal translation probability in favor of raw counts. We observe that our algorithm learns a variation of a log transform that leads to better translation quality compared to the explicit log transform. We conclude that non-linear responses play an important role in SMT, an observation that we hope will inform the efforts of feature engineers.

pdf
Learning from Post-Editing: Online Model Adaptation for Statistical Machine Translation
Michael Denkowski | Chris Dyer | Alon Lavie
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Improving Vector Space Word Representations Using Multilingual Correlation
Manaal Faruqui | Chris Dyer
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation
Yulia Tsvetkov | Florian Metze | Chris Dyer
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf abs
Real time adaptive machine translation: cdec and TransCenter
Michael Denkowski | Alon Lavie | Isabel Lacruz | Chris Dyer
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas

cdec Realtime and TransCenter provide an end-to-end experimental setup for machine translation post-editing research. Realtime provides a framework for building adaptive MT systems that learn from post-editor feedback while TransCenter incorporates a web-based translation interface that connects users to these systems and logs post-editing activity. This combination allows the straightforward deployment of MT systems specifically for post-editing and analysis of translator productivity when working with adaptive systems. Both toolkits are freely available under open source licenses.

pdf
Real Time Adaptive Machine Translation for Post-Editing with cdec and TransCenter
Michael Denkowski | Alon Lavie | Isabel Lacruz | Chris Dyer
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf
Weakly-Supervised Bayesian Learning of a CCG Supertagger
Dan Garrette | Chris Dyer | Jason Baldridge | Noah A. Smith
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf
Crowdsourcing High-Quality Parallel Data Extraction from Twitter
Wang Ling | Luís Marujo | Chris Dyer | Alan W. Black | Isabel Trancoso
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf
The CMU Submission for the Shared Task on Language Identification in Code-Switched Data
Chu-Cheng Lin | Waleed Ammar | Lori Levin | Chris Dyer
Proceedings of the First Workshop on Computational Approaches to Code Switching

2013

pdf
Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search
Jeffrey Flanigan | Chris Dyer | Jaime Carbonell
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
Olutobi Owoputi | Brendan O’Connor | Chris Dyer | Kevin Gimpel | Nathan Schneider | Noah A. Smith
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
A Simple, Fast, and Effective Reparameterization of IBM Model 2
Chris Dyer | Victor Chahuneau | Noah A. Smith
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Supersense Tagging for Arabic: the MT-in-the-Middle Attack
Nathan Schneider | Behrang Mohit | Chris Dyer | Kemal Oflazer | Noah A. Smith
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Knowledge-Rich Morphological Priors for Bayesian Language Models
Victor Chahuneau | Noah A. Smith | Chris Dyer
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the 2013 NAACL HLT Demonstration Session
Chris Dyer | Derrick Higgins
Proceedings of the 2013 NAACL HLT Demonstration Session

pdf
Paraphrasing 4 Microblog Normalization
Wang Ling | Chris Dyer | Alan W Black | Isabel Trancoso
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
A Systematic Exploration of Diversity in Machine Translation
Kevin Gimpel | Dhruv Batra | Chris Dyer | Gregory Shakhnarovich
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Translating into Morphologically Rich Languages with Synthetic Phrases
Victor Chahuneau | Eva Schlinger | Noah A. Smith | Chris Dyer
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options
Yulia Tsvetkov | Chris Dyer | Lori Levin | Archna Bhatia
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Distributions on Minimalist Grammar Derivations
Tim Hunter | Chris Dyer
Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13)

pdf
Microblogs as Parallel Corpora
Wang Ling | Guang Xiang | Chris Dyer | Alan Black | Isabel Trancoso
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
An Information Theoretic Approach to Bilingual Word Clustering
Manaal Faruqui | Chris Dyer
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf
Transliteration by Sequence Labeling with Lattice Encodings and Reranking
Waleed Ammar | Chris Dyer | Noah Smith
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

pdf
A Bayesian Model for Learning SCFGs with Discontiguous Rules
Abby Levenberg | Chris Dyer | Phil Blunsom
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf abs
One System, Many Domains: Open-Domain Statistical Machine Translation via Feature Augmentation
Jonathan Clark | Alon Lavie | Chris Dyer
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

In this paper, we introduce a simple technique for incorporating domain information into a statistical machine translation system that significantly improves translation quality when test data comes from multiple domains. Our approach augments (conjoins) standard translation model and language model features with domain indicator features and requires only minimal modifications to the optimization and decoding procedures. We evaluate our method on two language pairs with varying numbers of domains, and observe significant improvements of up to 1.0 BLEU.

pdf
Bayesian Language Modelling of German Compounds
Jan A. Botha | Chris Dyer | Phil Blunsom
Proceedings of COLING 2012

pdf
Learning Semantics and Selectional Preference of Adjective-Noun Pairs
Karl Moritz Hermann | Chris Dyer | Phil Blunsom | Stephen Pulman
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
Patrick Simianer | Stefan Riezler | Chris Dyer
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper describes the University of Maryland statistical machine translation system used in the IWSLT 2007 evaluation. Our focus was threefold: using hierarchical phrase-based models in spoken language translation, the incorporation of sub-lexical information in model estimation via morphological analysis (Arabic) and word and character segmentation (Chinese), and the use of n-gram sequence models for source-side punctuation prediction. Our efforts yield significant improvements in Chinese-English and Arabic-English translation tasks for both spoken language and human transcription conditions.

pdf
The “Noisier Channel”: Translation from Morphologically Complex Languages
Christopher J. Dyer
Proceedings of the Second Workshop on Statistical Machine Translation