Donald Metzler


2021

pdf bib
How Reliable are Model Diagnostics?
Vamsi Aribandi | Yi Tay | Donald Metzler
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Are Pretrained Convolutions Better than Pretrained Transformers?
Yi Tay | Mostafa Dehghani | Jai Prakash Gupta | Vamsi Aribandi | Dara Bahri | Zhen Qin | Donald Metzler
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In the era of pre-trained language models, Transformers are the de facto choice of model architectures. While recent research has shown promise in entirely convolutional, or CNN, architectures, they have not been explored using the pre-train-fine-tune paradigm. In the context of language models, are convolutional models competitive to Transformers when pre-trained? This paper investigates this research question and presents several interesting findings. Across an extensive set of experiments on 8 datasets/tasks, we find that CNN-based pre-trained models are competitive and outperform their Transformer counterpart in certain scenarios, albeit with caveats. Overall, the findings outlined in this paper suggest that conflating pre-training and architectural advances is misguided and that both advances should be considered independently. We believe our research paves the way for a healthy amount of optimism in alternative architectures.

pdf bib
StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
Yikang Shen | Yi Tay | Che Zheng | Dara Bahri | Donald Metzler | Aaron Courville
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

There are two major classes of natural language grammars — the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words. While previous unsupervised parsing methods mostly focus on only inducing one class of grammars, we introduce a novel model, StructFormer, that can induce dependency and constituency structure at the same time. To achieve this, we propose a new parsing framework that can jointly generate a constituency tree and dependency graph. Then we integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism. Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling at the same time.

2020

pdf bib
Reverse Engineering Configurations of Neural Text Generation Models
Yi Tay | Dara Bahri | Che Zheng | Clifford Brunk | Donald Metzler | Andrew Tomkins
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent advances in neural text generation modeling have resulted in a number of societal concerns related to how such approaches might be used in malicious ways. It is therefore desirable to develop a deeper understanding of the fundamental properties of such models. The study of artifacts that emerge in machine generated text as a result of modeling choices is a nascent research area. To this end, the extent and degree to which these artifacts surface in generated text is still unclear. In the spirit of better understanding generative text models and their artifacts, we propose the new task of distinguishing which of several variants of a given model generated some piece of text. Specifically, we conduct an extensive suite of diagnostic tests to observe whether modeling choices (e.g., sampling methods, top-k probabilities, model architectures, etc.) leave detectable artifacts in the text they generate. Our key finding, which is backed by a rigorous set of experiments, is that such artifacts are present and that different modeling choices can be inferred by looking at generated text alone. This suggests that neural text generators may actually be more sensitive to various modeling choices than previously thought.

2012

pdf bib
Structured Event Retrieval over Microblog Archives
Donald Metzler | Congxing Cai | Eduard Hovy
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques
Donald Metzler | Eduard Hovy | Chunliang Zhang
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Contextual Bearing on Linguistic Variation in Social Media
Stephan Gouws | Donald Metzler | Congxing Cai | Eduard Hovy
Proceedings of the Workshop on Language in Social Media (LSM 2011)

pdf bib
Unsupervised Mining of Lexical Variants from Noisy Text
Stephan Gouws | Dirk Hovy | Donald Metzler
Proceedings of the First workshop on Unsupervised Learning in NLP

2009

pdf bib
Search Engine Adaptation by Feedback Control Adjustment for Time-sensitive Query
Ruiqiang Zhang | Yi Chang | Zhaohui Zheng | Donald Metzler | Jian-yun Nie
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers