2021
pdf
bib
abs
Generic Oracles for Structured Prediction
Christoph Teichmann
|
Antoine Venant
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)
When learned without exploration, local models for structured prediction tasks are subject to exposure bias and cannot be trained without detailed guidance. Active Imitation Learning (AIL), also known in NLP as Dynamic Oracle Learning, is a general technique for working around these issues by allowing the exploration of different outputs at training time. AIL requires oracle feedback: an oracle is any algorithm which can, given a partial candidate solution and gold annotation, find the correct (minimum loss) next output to produce. This paper describes a general finite state technique for deriving oracles. The technique describe is also efficient and will greatly expand the tasks for which AIL can be used.
2020
pdf
abs
Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty
Katherine Keith
|
Christoph Teichmann
|
Brendan O’Connor
|
Edgar Meij
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive impact in both the private sector and academia. Yet, as we revisit and extend the original authors’ annotations and text measurements we find interesting text-as-data methodological research questions: (1) Are annotator disagreements a reflection of ambiguity in language? (2) Do alternative text measurements correlate with one another and with measures of external predictive validity? We find for this application (1) some annotator disagreements of economic policy uncertainty can be attributed to ambiguity in language, and (2) switching measurements from keyword-matching to supervised machine learning classifiers results in low correlation, a concerning implication for the validity of the index.
2019
pdf
bib
abs
Grammatical Sequence Prediction for Real-Time Neural Semantic Parsing
Chunyang Xiao
|
Christoph Teichmann
|
Konstantine Arkoudas
Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges
While sequence-to-sequence (seq2seq) models achieve state-of-the-art performance in many natural language processing tasks, they can be too slow for real-time applications. One performance bottleneck is predicting the most likely next token over a large vocabulary; methods to circumvent this bottleneck are a current research topic. We focus specifically on using seq2seq models for semantic parsing, where we observe that grammars often exist which specify valid formal representations of utterance semantics. By developing a generic approach for restricting the predictions of a seq2seq model to grammatically permissible continuations, we arrive at a widely applicable technique for speeding up semantic parsing. The technique leads to a 74% speed-up on an in-house dataset with a large vocabulary, compared to the same neural model without grammatical restrictions
2018
pdf
abs
The ACL Anthology: Current State and Future Directions
Daniel Gildea
|
Min-Yen Kan
|
Nitin Madnani
|
Christoph Teichmann
|
Martín Villalba
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)
The Association of Computational Linguistic’s Anthology is the open source archive, and the main source for computational linguistics and natural language processing’s scientific literature. The ACL Anthology is currently maintained exclusively by community volunteers and has to be available and up-to-date at all times. We first discuss the current, open source approach used to achieve this, and then discuss how the planned use of Docker images will improve the Anthology’s long-term stability. This change will make it easier for researchers to utilize Anthology data for experimentation. We believe the ACL community can directly benefit from the extension-friendly architecture of the Anthology. We end by issuing an open challenge of reviewer matching we encourage the community to rally towards.
pdf
abs
Discovering User Groups for Natural Language Generation
Nikos Engonopoulos
|
Christoph Teichmann
|
Alexander Koller
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
We present a model which predicts how individual users of a dialog system understand and produce utterances based on user groups. In contrast to previous work, these user groups are not specified beforehand, but learned in training. We evaluate on two referring expression (RE) generation tasks; our experiments show that our model can identify user groups and learn how to most effectively talk to them, and can dynamically assign unseen users to the correct groups as they interact with the system.
2017
pdf
abs
Coarse-To-Fine Parsing for Expressive Grammar Formalisms
Christoph Teichmann
|
Alexander Koller
|
Jonas Groschwitz
Proceedings of the 15th International Conference on Parsing Technologies
We generalize coarse-to-fine parsing to grammar formalisms that are more expressive than PCFGs and/or describe languages of trees or graphs. We evaluate our algorithm on PCFG, PTAG, and graph parsing. While we achieve the expected performance gains on PCFGs, coarse-to-fine does not help for PTAG and can even slow down parsing for graphs. We discuss the implications of this finding.
pdf
abs
Alto: Rapid Prototyping for Parsing and Translation
Johannes Gontrum
|
Jonas Groschwitz
|
Alexander Koller
|
Christoph Teichmann
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
We present Alto, a rapid prototyping tool for new grammar formalisms. Alto implements generic but efficient algorithms for parsing, translation, and training for a range of monolingual and synchronous grammar formalisms. It can easily be extended to new formalisms, which makes all of these algorithms immediately available for the new formalism.
pdf
abs
Generating Contrastive Referring Expressions
Martín Villalba
|
Christoph Teichmann
|
Alexander Koller
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The referring expressions (REs) produced by a natural language generation (NLG) system can be misunderstood by the hearer, even when they are semantically correct. In an interactive setting, the NLG system can try to recognize such misunderstandings and correct them. We present an algorithm for generating corrective REs that use contrastive focus (“no, the BLUE button”) to emphasize the information the hearer most likely misunderstood. We show empirically that these contrastive REs are preferred over REs without contrast marking.
2016
pdf
bib
Adaptive Importance Sampling from Finite State Automata
Christoph Teichmann
|
Kasimir Wansing
|
Alexander Koller
Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata
2015
pdf
Graph parsing with s-graph grammars
Jonas Groschwitz
|
Alexander Koller
|
Christoph Teichmann
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2014
pdf
bib
A New Implementation for Canonical Text Services
Jochen Tiepmar
|
Christoph Teichmann
|
Gerhard Heyer
|
Monica Berti
|
Gregory Crane
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)
2011
pdf
Reducing the Size of the Representation for the uDOP-Estimate
Christoph Teichmann
Proceedings of the First workshop on Unsupervised Learning in NLP