Nathan Schneider

2021

pdf bib
Mischievous nominal constructions in Universal Dependencies
Nathan Schneider | Amir Zeldes
Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021)

pdf bib abs
Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories
Jakob Prange | Nathan Schneider | Vivek Srikumar
Transactions of the Association for Computational Linguistics, Volume 9

Abstract Although current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories’ internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.

pdf bib abs
Classifying Divergences in Cross-lingual AMR Pairs
Shira Wein | Nathan Schneider
Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

Translation divergences are varied and widespread, challenging approaches that rely on parallel text. To annotate translation divergences, we propose a schema grounded in the Abstract Meaning Representation (AMR), a sentence-level semantic framework instantiated for a number of languages. By comparing parallel AMR graphs, we can identify specific points of divergence. Each divergence is labeled with both a type and a cause. We release a small corpus of annotated English-Spanish data, and analyze the annotations in our corpus.

pdf bib abs
Subcategorizing Adverbials in Universal Conceptual Cognitive Annotation
Zhuxin Wang | Jakob Prange | Nathan Schneider
Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

Universal Conceptual Cognitive Annotation (UCCA) is a semantic annotation scheme that organizes texts into coarse predicate-argument structure, offering broad coverage of semantic phenomena. At the same time, there is still need for a finer-grained treatment of many of the categories. The Adverbial category is of special interest, as it covers a wide range of fundamentally different meanings such as negation, causation, aspect, and event quantification. In this paper we introduce a refinement annotation scheme for UCCA’s Adverbial category, showing that UCCA Adverbials can indeed be subcategorized into at least 7 semantic types, and doing so can help clarify and disambiguate the otherwise coarse-grained labels. We provide a preliminary set of annotation guidelines, as well as pilot annotation experiments with high inter-annotator agreement, confirming the validity of the scheme.

pdf bib abs
A Balanced and Broadly Targeted Computational Linguistics Curriculum
Emma Manning | Nathan Schneider | Amir Zeldes
Proceedings of the Fifth Workshop on Teaching NLP

This paper describes the primarily-graduate computational linguistics and NLP curriculum at Georgetown University, a U.S. university that has seen significant growth in these areas in recent years. We reflect on the principles behind our curriculum choices, including recognizing the various academic backgrounds and goals of our students; teaching a variety of skills with an emphasis on working directly with data; encouraging collaboration and interdisciplinary work; and including languages beyond English. We reflect on challenges we have encountered, such as the difficulty of teaching programming skills alongside NLP fundamentals, and discuss areas for future growth.

pdf bib abs
Lexical Semantic Recognition
Nelson F. Liu | Daniel Hershcovich | Michael Kranzlein | Nathan Schneider
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence. We hypothesize that a unified lexical semantic recognition task is an effective way to encapsulate previously disparate styles of annotation, including multiword expression identification / classification and supersense tagging. Using the STREUSLE corpus, we train a neural CRF sequence tagger and evaluate its performance along various axes of annotation. As the label set generalizes that of previous tasks (PARSEME, DiMSUM), we additionally evaluate how well the model generalizes to those test sets, finding that it approaches or surpasses existing models despite training only on STREUSLE. Our work also establishes baseline models and evaluation metrics for integrated and accurate modeling of lexical semantics, facilitating future work in this area.

pdf bib abs
BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology
Luke Gessler | Nathan Schneider
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

An important question concerning contextualized word embedding (CWE) models like BERT is how well they can represent different word senses, especially those in the long tail of uncommon senses. Rather than build a WSD system as in previous work, we investigate contextualized embedding neighborhoods directly, formulating a query-by-example nearest neighbor retrieval task and examining ranking performance for words and senses in different frequency bands. In an evaluation on two English sense-annotated corpora, we find that several popular CWE models all outperform a random baseline even for proportionally rare senses, without explicit sense supervision. However, performance varies considerably even among models with similar architectures and pretraining regimes, with especially large differences for rare word senses, revealing that CWE models are not all created equal when it comes to approximating word senses in their native representations.

pdf bib abs
Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets
Michael Kranzlein | Nelson F. Liu | Nathan Schneider
Findings of the Association for Computational Linguistics: EMNLP 2021

For interpreting the behavior of a probabilistic model, it is useful to measure a model’s calibration—the extent to which it produces reliable confidence scores. We address the open problem of calibration for tagging models with sparse tagsets, and recommend strategies to measure and reduce calibration error (CE) in such models. We show that several post-hoc recalibration techniques all reduce calibration error across the marginal distribution for two existing sequence taggers. Moreover, we propose tag frequency grouping (TFG) as a way to measure calibration error in different frequency bands. Further, recalibrating each group separately promotes a more equitable reduction of calibration error across the tag frequency spectrum.

pdf bib
CCG Supertagging as Top-down Tree Generation
Jakob Prange | Nathan Schneider | Vivek Srikumar
Proceedings of the Society for Computation in Linguistics 2021

pdf bib
SNACS Annotation of Case Markers and Adpositions in Hindi
Aryaman Arora | Nitin Venkateswaran | Nathan Schneider
Proceedings of the Society for Computation in Linguistics 2021

pdf bib
Supersense and Sensibility: Proxy Tasks for Semantic Annotation of Prepositions
Luke Gessler | Shira Wein | Nathan Schneider
Proceedings of the Society for Computation in Linguistics 2021

pdf bib abs
Referenceless Parsing-Based Evaluation of AMR-to-English Generation
Emma Manning | Nathan Schneider
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

Reference-based automatic evaluation metrics are notoriously limited for NLG due to their inability to fully capture the range of possible outputs. We examine a referenceless alternative: evaluating the adequacy of English sentences generated from Abstract Meaning Representation (AMR) graphs by parsing into AMR and comparing the parse directly to the input. We find that the errors introduced by automatic AMR parsing substantially limit the effectiveness of this approach, but a manual editing study indicates that as parsing improves, parsing-based evaluation has the potential to outperform most reference-based metrics.

pdf bib abs
Probabilistic, Structure-Aware Algorithms for Improved Variety, Accuracy, and Coverage of AMR Alignments
Austin Blodgett | Nathan Schneider
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present algorithms for aligning components of Abstract Meaning Representation (AMR) graphs to spans in English sentences. We leverage unsupervised learning in combination with heuristics, taking the best of both worlds from previous AMR aligners. Our unsupervised models, however, are more sensitive to graph substructures, without requiring a separate syntactic parse. Our approach covers a wider variety of AMR substructures than previously considered, achieves higher coverage of nodes and edges, and does so with higher accuracy. We will release our LEAMR datasets and aligner for use in research on AMR parsing, generation, and evaluation.

pdf bib abs
Putting Words in BERT’s Mouth: Navigating Contextualized Vector Spaces with Pseudowords
Taelin Karidi | Yichu Zhou | Nathan Schneider | Omri Abend | Vivek Srikumar
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses. By inducing a contextualized “pseudoword” vector as a stand-in for a static embedding in the input layer, and then performing masked prediction of a word in the sentence, we are able to investigate the geometry of the BERT-space in a controlled manner around individual instances. Using our method on a set of carefully constructed sentences targeting highly ambiguous English words, we find substantial regularity in the contextualized space, with regions that correspond to distinct word senses; but between these regions there are occasionally “sense voids”—regions that do not correspond to any intelligible sense.

pdf bib
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Alexis Palmer | Nathan Schneider | Natalie Schluter | Guy Emerson | Aurelie Herbelot | Xiaodan Zhu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

2020

pdf bib abs
PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English
Michael Kranzlein | Emma Manning | Siyao Peng | Shira Wein | Aryaman Arora | Nathan Schneider
Proceedings of the 14th Linguistic Annotation Workshop

We present the Prepositions Annotated with Supsersense Tags in Reddit International English (“PASTRIE”) corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish. The annotations are comprehensive, covering all preposition types and tokens in the sample. Along with the corpus, we provide analysis of distributional patterns across the included L1s and a discussion of the influence of L1s on L2 preposition choice.

pdf bib abs
Supersense and Sensibility: Proxy Tasks for Semantic Annotation of Prepositions
Luke Gessler | Shira Wein | Nathan Schneider
Proceedings of the 14th Linguistic Annotation Workshop

Prepositional supersense annotation is time-consuming and requires expert training. Here, we present two sensible methods for obtaining prepositional supersense annotations indirectly by eliciting surface substitution and similarity judgments. Four pilot studies suggest that both methods have potential for producing prepositional supersense annotations that are comparable in quality to expert annotations.

pdf bib abs
Sprucing up Supersenses: Untangling the Semantic Clusters of Accompaniment and Purpose
Jena D. Hwang | Nathan Schneider | Vivek Srikumar
Proceedings of the 14th Linguistic Annotation Workshop

We reevaluate an existing adpositional annotation scheme with respect to two thorny semantic domains: accompaniment and purpose. ‘Accompaniment’ broadly speaking includes two entities situated together or participating in the same event, while ‘purpose’ broadly speaking covers the desired outcome of an action, the intended use or evaluated use of an entity, and more. We argue the policy in the SNACS scheme for English should be recalibrated with respect to these clusters of interrelated meanings without adding complexity to the overall scheme. Our analysis highlights tradeoffs in lumping vs. splitting decisions as well as the flexibility afforded by the construal analysis.

pdf bib abs
Comparison by Conversion: Reverse-Engineering UCCA from Syntax and Lexical Semantics
Daniel Hershcovich | Nathan Schneider | Dotan Dvir | Jakob Prange | Miryam de Lhoneux | Omri Abend
Proceedings of the 28th International Conference on Computational Linguistics

Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other. To perform a systematic comparative analysis, we evaluate the mapping between meaning representations from different frameworks using two complementary methods: (i) a rule-based converter, and (ii) a supervised delexicalized parser that parses to one framework using only information from the other as features. We apply these methods to convert the STREUSLE corpus (with syntactic and lexical semantic annotations) to UCCA (a graph-structured full-sentence meaning representation). Both methods yield surprisingly accurate target representations, close to fully supervised UCCA parser quality—indicating that UCCA annotations are partially redundant with STREUSLE annotations. Despite this substantial convergence between frameworks, we find several important areas of divergence.

pdf bib abs
A Human Evaluation of AMR-to-English Generation Systems
Emma Manning | Shira Wein | Nathan Schneider
Proceedings of the 28th International Conference on Computational Linguistics

Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation. In this work, we present the results of a new human evaluation which collects fluency and adequacy scores, as well as categorization of error types, for several recent AMR generation systems. We discuss the relative quality of these systems and how our results compare to those of automatic metrics, finding that while the metrics are mostly successful in ranking systems overall, collecting human judgments allows for more nuanced comparisons. We also analyze common errors made by these systems.

pdf bib abs
Cross-lingual Semantic Representation for NLP with UCCA
Omri Abend | Dotan Dvir | Daniel Hershcovich | Jakob Prange | Nathan Schneider
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

This is an introductory tutorial to UCCA (Universal Conceptual Cognitive Annotation), a cross-linguistically applicable framework for semantic representation, with corpora annotated in English, German and French, and ongoing annotation in Russian and Hebrew. UCCA builds on extensive typological work and supports rapid annotation. The tutorial will provide a detailed introduction to the UCCA annotation guidelines, design philosophy and the available resources; and a comparison to other meaning representations. It will also survey the existing parsing work, including the findings of three recent shared tasks, in SemEval and CoNLL, that addressed UCCA parsing. Finally, the tutorial will present recent applications and extensions to the scheme, demonstrating its value for natural language processing in a range of languages and domains.

pdf bib abs
K-SNACS: Annotating Korean Adposition Semantics
Jena D. Hwang | Hanwool Choe | Na-Rae Han | Nathan Schneider
Proceedings of the Second International Workshop on Designing Meaning Representations

While many languages use adpositions to encode semantic relationships between content words in a sentence (e.g., agentivity or temporality), the details of how adpositions work vary widely across languages with respect to both form and meaning. In this paper, we empirically adapt the SNACS framework (Schneider et al., 2018) to Korean, a language that is typologically distant from English—the language SNACS was based on. We apply the SNACS framework to annotate the highly popular novellaThe Little Prince with semantic supersense labels over allKorean postpositions. Thus, we introduce the first broad-coverage corpus annotated with Korean postposition semantics and provide a detailed analysis of the corpus with an apples-to-apples comparison between Korean and English annotations

Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese; to the best of our knowledge, this is the first Chinese corpus to be broadly annotated with adposition semantics. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria, though its development focused primarily on English prepositions (Schneider et al., 2018). We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English. On a Mandarin translation of The Little Prince, we achieve high inter-annotator agreement and analyze semantic correspondences of adposition tokens in bitext.

pdf bib abs
(Re)construing Meaning in NLP
Sean Trott | Tiago Timponi Torrent | Nancy Chang | Nathan Schneider
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Human speakers have an extensive toolkit of ways to express themselves. In this paper, we engage with an idea largely absent from discussions of meaning in natural language understanding—namely, that the way something is expressed reflects different ways of conceptualizing or construing the information being conveyed. We first define this phenomenon more precisely, drawing on considerable prior work in theoretical cognitive semantics and psycholinguistics. We then survey some dimensions of construed meaning and show how insights from construal could inform theoretical and practical work in NLP.

pdf bib abs
Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi
Aryaman Arora | Luke Gessler | Nathan Schneider
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted). Previous work has attempted to predict schwa deletion in a rule-based fashion using prosodic or phonetic analysis. We present the first statistical schwa deletion classifier for Hindi, which relies solely on the orthography as the input and outperforms previous approaches. We trained our model on a newly-compiled pronunciation lexicon extracted from various online dictionaries. Our best Hindi model achieves state of the art performance, and also achieves good performance on a closely related language, Punjabi, without modification.

2019

pdf bib abs
An Improved Approach for Semantic Graph Composition with CCG
Austin Blodgett | Nathan Schneider
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

This paper builds on previous work using Combinatory Categorial Grammar (CCG) to derive a transparent syntax-semantics interface for Abstract Meaning Representation (AMR) parsing. We define new semantics for the CCG combinators that is better suited to deriving AMR graphs. In particular, we define relation-wise alternatives for the application and composition combinators: these require that the two constituents being combined overlap in one AMR relation. We also provide a new semantics for type raising, which is necessary for certain constructions. Using these mechanisms, we suggest an analysis of eventive nouns, which present a challenge for deriving AMR graphs. Our theoretical analysis will facilitate future work on robust and transparent AMR parsing using CCG.

Research on adpositions and possessives in multiple languages has led to a small inventory of general-purpose meaning classes that disambiguate tokens. Importantly, that work has argued for a principled separation of the semantic role in a scene from the function coded by morphosyntax. Here, we ask whether this approach can be generalized beyond adpositions and possessives to cover all scene participants—including subjects and objects—directly, without reference to a frame lexicon. We present new guidelines for English and the results of an interannotator agreement study.

pdf bib abs
Semantically Constrained Multilayer Annotation: The Case of Coreference
Jakob Prange | Nathan Schneider | Omri Abend
Proceedings of the First International Workshop on Designing Meaning Representations

We propose a coreference annotation scheme as a layer on top of the Universal Conceptual Cognitive Annotation foundational layer, treating units in predicate-argument structure as a basis for entity and event mentions. We argue that this allows coreference annotators to sidestep some of the challenges faced in other schemes, which do not enforce consistency with predicate-argument structure and vary widely in what kinds of mentions they annotate and how. The proposed approach is examined with a pilot annotation study and compared with annotations from other schemes.

pdf bib abs
Made for Each Other: Broad-Coverage Semantic Structures Meet Preposition Supersenses
Jakob Prange | Nathan Schneider | Omri Abend
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Universal Conceptual Cognitive Annotation (UCCA; Abend and Rappoport, 2013) is a typologically-informed, broad-coverage semantic annotation scheme that describes coarse-grained predicate-argument structure but currently lacks semantic roles. We argue that lexicon-free annotation of the semantic roles marked by prepositions, as formulated by Schneider et al. (2018), is complementary and suitable for integration within UCCA. We show empirically for English that the schemes, though annotated independently, are compatible and can be combined in a single semantic graph. A comparison of several approaches to parsing the integrated representation lays the groundwork for future research on this task.

2018

Semantic relations are often signaled with prepositional or possessive marking—but extreme polysemy bedevils their analysis and automatic interpretation. We introduce a new annotation scheme, corpus, and task for the disambiguation of prepositions and possessives in English. Unlike previous approaches, our annotations are comprehensive with respect to types and tokens of these markers; use broadly applicable supersense classes rather than fine-grained dictionary definitions; unite prepositions and possessives under the same class inventory; and distinguish between a marker’s lexical contribution and the role it marks in the context of a predicate or scene. Strong interannotator agreement rates, as well as encouraging disambiguation results with established supervised methods, speak to the viability of the scheme and task.

pdf bib abs
Discourse Coherence: Concurrent Explicit and Implicit Relations
Hannah Rohde | Alexander Johnson | Nathan Schneider | Bonnie Webber
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Theories of discourse coherence posit relations between discourse segments as a key feature of coherent text. Our prior work suggests that multiple discourse relations can be simultaneously operative between two segments for reasons not predicted by the literature. Here we test how this joint presence can lead participants to endorse seemingly divergent conjunctions (e.g., BUT and SO) to express the link they see between two segments. These apparent divergences are not symptomatic of participant naivety or bias, but arise reliably from the concurrent availability of multiple relations between segments – some available through explicit signals and some via inference. We believe that these new results can both inform future progress in theoretical work on discourse coherence and lead to higher levels of performance in discourse parsing.

pdf bib abs
Parsing Tweets into Universal Dependencies
Yijia Liu | Yi Zhu | Wanxiang Che | Bing Qin | Nathan Schneider | Noah A. Smith
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We study the problem of analyzing tweets with universal dependencies (UD). We extend the UD guidelines to cover special constructions in tweets that affect tokenization, part-of-speech tagging, and labeled dependencies. Using the extended guidelines, we create a new tweet treebank for English (Tweebank v2) that is four times larger than the (unlabeled) Tweebank v1 introduced by Kong et al. (2014). We characterize the disagreements between our annotators and show that it is challenging to deliver consistent annotation due to ambiguity in understanding and explaining tweets. Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD. To overcome the annotation noise without sacrificing computational efficiency, we propose a new method to distill an ensemble of 20 transition-based parsers into a single one. Our parser achieves an improvement of 2.2 in LAS over the un-ensembled baseline and outperforms parsers that are state-of-the-art on other treebanks in both accuracy and speed.

pdf bib abs
A Structured Syntax-Semantics Interface for English-AMR Alignment
Ida Szubert | Adam Lopez | Nathan Schneider
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Abstract Meaning Representation (AMR) annotations are often assumed to closely mirror dependency syntax, but AMR explicitly does not require this, and the assumption has never been tested. To test it, we devise an expressive framework to align AMR graphs to dependency graphs, which we use to annotate 200 AMRs. Our annotation explains how 97% of AMR edges are evoked by words or syntax. Previously existing AMR alignment frameworks did not allow for mapping AMR onto syntax, and as a consequence they explained at most 23%. While we find that there are indeed many cases where AMR annotations closely mirror syntax, there are also pervasive differences. We use our annotations to test a baseline AMR-to-syntax aligner, finding that this task is more difficult than AMR-to-string alignment; and to pinpoint errors in an AMR parser. We make our data and code freely available for further research on AMR parsing and generation, and the relationship of AMR to syntax.

pdf bib
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Agata Savary | Carlos Ramisch | Jena D. Hwang | Nathan Schneider | Melanie Andresen | Sameer Pradhan | Miriam R. L. Petruck
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

pdf bib abs
Leaving no token behind: comprehensive (and delicious) annotation of MWEs and supersenses
Nathan Schneider
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

I will describe an unorthodox approach to lexical semantic annotation that prioritizes corpus coverage, democratizing analysis of a wide range of expression types. I argue that a lexicon-free lexical semantics—defined in terms of units and supersense tags—is an appetizing direction for NLP, as it is robust, cost-effective, easily understood, not too language-specific, and can serve as a foundation for richer semantic structure. Linguistic delicacies from the STREUSLE and DiMSUM corpora, which have been multiword- and supersense-annotated, attest to the veritable smörgåsbord of noncanonical constructions in English, including various flavors of prepositions, MWEs, and other curiosities. Bio: Nathan Schneider is an annotation schemer and computational modeler for natural language. As Assistant Professor of Linguistics and Computer Science at Georgetown University, he looks for synergies between practical language technologies and the scientific study of language. He specializes in broad-coverage semantic analysis: designing linguistic meaning representations, annotating them in corpora, and automating them with statistical natural language processing techniques. A central focus in this research is the nexus between grammar and lexicon as manifested in multiword expressions and adpositions/case markers. He has inhabited UC Berkeley (BA in Computer Science and Linguistics), Carnegie Mellon University (Ph.D. in Language Technologies), and the University of Edinburgh (postdoc). Now a Hoya and leader of NERT, he continues to play with data and algorithms for linguistic meaning.

pdf bib abs
Annotation of Tense and Aspect Semantics for Sentential AMR
Lucia Donatelli | Michael Regan | William Croft | Nathan Schneider
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Although English grammar encodes a number of semantic contrasts with tense and aspect marking, these semantics are currently ignored by Abstract Meaning Representation (AMR) annotations. This paper extends sentence-level AMR to include a coarse-grained treatment of tense and aspect semantics. The proposed framework augments the representation of finite predications to include a four-way temporal distinction (event time before, up to, at, or after speech time) and several aspectual distinctions (including static vs. dynamic, habitual vs. episodic, and telic vs. atelic). This will enable AMR to be used for NLP tasks and applications that require sophisticated reasoning about time and event structure.

pdf bib abs
Constructing an Annotated Corpus of Verbal MWEs for English
Abigail Walsh | Claire Bonial | Kristina Geeraert | John P. McCrae | Nathan Schneider | Clarissa Somers
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the construction and annotation of a corpus of verbal MWEs for English, as part of the PARSEME Shared Task 1.1 on automatic identification of verbal MWEs. The criteria for corpus selection, the categories of MWEs used, and the training process are discussed, along with the particular issues that led to revisions in edition 1.1 of the annotation guidelines. Finally, an overview of the characteristics of the final annotated corpus is presented, as well as some discussion on inter-annotator agreement.

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.

pdf bib
Semantic Supersenses for English Possessives
Austin Blodgett | Nathan Schneider
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
The NLTK FrameNet API: Designing for Discoverability with a Rich Linguistic Resource
Nathan Schneider | Chuck Wooters
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

A new Python API, integrated within the NLTK suite, offers access to the FrameNet 1.7 lexical database. The lexicon (structured in terms of frames) as well as annotated sentences can be processed programatically, or browsed with human-readable displays via the interactive Python prompt.

pdf bib abs
Double Trouble: The Problem of Construal in Semantic Annotation of Adpositions
Jena D. Hwang | Archna Bhatia | Na-Rae Han | Tim O’Gorman | Vivek Srikumar | Nathan Schneider
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4,250 preposition tokens in a 55,000 word corpus of English. Attempts to apply the scheme to adpositions and case markers in other languages, as well as some problematic cases in English, have led us to reconsider the assumption that an adposition’s lexical contribution is equivalent to the role/relation that it mediates. Our proposal is to embrace the potential for construal in adposition use, expressing such phenomena directly at the token level to manage complexity and avoid sense proliferation. We suggest a framework to represent both the scene role and the adposition’s lexical function so they can be annotated at scale—supporting automatic, statistical processing of domain-general language—and discuss how this representation would allow for a simpler inventory of labels.

pdf bib
Proceedings of the 11th Linguistic Annotation Workshop
Nathan Schneider | Nianwen Xue
Proceedings of the 11th Linguistic Annotation Workshop

pdf bib
Exploring Substitutability through Discourse Adverbials and Multiple Judgments
Hannah Rohde | Anna Dickinson | Nathan Schneider | Annie Louis | Bonnie Webber
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

2016

pdf bib
Filling in the Blanks in Understanding Discourse Adverbials: Consistency, Conflict, and Context-Dependence in a Crowdsourced Elicitation Task
Hannah Rohde | Anna Dickinson | Nathan Schneider | Christopher N. L. Clark | Annie Louis | Bonnie Webber
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib abs
Inconsistency Detection in Semantic Annotation
Nora Hollenstein | Nathan Schneider | Bonnie Webber
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Inconsistencies are part of any manually annotated corpus. Automatically finding these inconsistencies and correcting them (even manually) can increase the quality of the data. Past research has focused mainly on detecting inconsistency in syntactic annotation. This work explores new approaches to detecting inconsistency in semantic annotation. Two ranking methods are presented in this paper: a discrepancy ranking and an entropy ranking. Those methods are then tested and evaluated on multiple corpora annotated with multiword expressions and supersense labels. The results show considerable improvements in detecting inconsistency candidates over a random baseline. Possible applications of methods for inconsistency detection are improving the annotation procedure as well as the guidelines and correcting errors in completed annotations.

pdf bib
SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)
Nathan Schneider | Dirk Hovy | Anders Johannsen | Marine Carpuat
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
A Hierarchy with, of, and for Preposition Supersenses
Nathan Schneider | Vivek Srikumar | Jena D. Hwang | Martha Palmer
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
What I’ve learned about annotating informal text (and why you shouldn’t take my word for it)
Nathan Schneider
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks
Lizhen Qu | Gabriela Ferraro | Liyuan Zhou | Weiwei Hou | Nathan Schneider | Timothy Baldwin
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Frame-Semantic Role Labeling with Heterogeneous Annotations
Meghana Kshirsagar | Sam Thomson | Nathan Schneider | Jaime Carbonell | Noah A. Smith | Chris Dyer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
A Corpus and Model Integrating Multiword Expressions and Supersenses
Nathan Schneider | Noah A. Smith
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
The Logic of AMR: Practical, Unified, Graph-Based Sentence Semantics for NLP
Nathan Schneider | Jeffrey Flanigan | Tim O’Gorman
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Getting the Roles Right: Using FrameNet in NLP
Collin F. Baker | Nathan Schneider | Miriam R. L. Petruck | Michael Ellsworth
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

2014

pdf bib
Frame-Semantic Parsing
Dipanjan Das | Desai Chen | André F. T. Martins | Nathan Schneider | Noah A. Smith
Computational Linguistics, Volume 40, Issue 1 - March 2014

We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification. Results show that accuracy for automatic adjective type classification is high, but synsets are considerably more difficult to classify, even for trained human annotators. We release the manually annotated data, the classifier, and the induced supersense labeling of 12,304 WordNet adjective synsets.

pdf bib abs
Comprehensive Annotation of Multiword Expressions in a Social Web Corpus
Nathan Schneider | Spencer Onuffer | Nora Kazour | Emily Danchik | Michael T. Mordowanec | Henrietta Conrad | Noah A. Smith
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class. Here we advocate for a comprehensive annotation approach: proceeding sentence by sentence, our annotators manually group tokens into MWEs according to guidelines that cover a broad range of multiword phenomena. Under this scheme, we have fully annotated an English web corpus for multiword expressions, including those containing gaps.

pdf bib
Simplified Dependency Annotations with GFL-Web
Michael T. Mordowanec | Nathan Schneider | Chris Dyer | Noah A. Smith
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib abs
Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut
Nathan Schneider | Emily Danchik | Chris Dyer | Noah A. Smith
Transactions of the Association for Computational Linguistics, Volume 2

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving nearly 60% F1 for MWE identification.

2013

pdf bib
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
Olutobi Owoputi | Brendan O’Connor | Chris Dyer | Kevin Gimpel | Nathan Schneider | Noah A. Smith
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Supersense Tagging for Arabic: the MT-in-the-Middle Attack
Nathan Schneider | Behrang Mohit | Chris Dyer | Kemal Oflazer | Noah A. Smith
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Book Review: Design Patterns in Fluid Construction Grammar edited by Luc Steels
Nathan Schneider | Reut Tsarfaty
Computational Linguistics, Volume 39, Issue 2 - June 2013

2012

pdf bib
Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
Nathan Schneider | Behrang Mohit | Kemal Oflazer | Noah A. Smith
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Recall-Oriented Learning of Named Entities in Arabic Wikipedia
Behrang Mohit | Nathan Schneider | Rishav Bhowmick | Kemal Oflazer | Noah A. Smith
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011