Daniel Chen

2022

pdf bib abs
Contrast Sets for Stativity of English Verbs in Context
Daniel Chen | Alexis Palmer
Proceedings of the 29th International Conference on Computational Linguistics

For the task of classifying verbs in context as dynamic or stative, current models approach human performance, but only for particular data sets. To better understand the performance of such models, and how well they are able to generalize beyond particular test sets, we apply the contrast set (Gardner et al., 2020) methodology to stativity classification. We create nearly 300 contrastive pairs by perturbing test set instances just enough to change their labels from one class to the other, while preserving coherence, meaning, and well-formedness. Contrastive evaluation shows that a model with near-human performance on an in-distribution test set degrades substantially when applied to transformed examples, showing that the stative vs. dynamic classification task is more complex than the model performance might otherwise suggest. Code and data are freely available.

pdf bib abs
My Case, For an Adposition: Lexical Polysemy of Adpositions and Case Markers in Finnish and Latin
Daniel Chen | Mans Hulden
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Adpositions and case markers contain a high degree of polysemy and participate in unique semantic role configurations. We present a novel application of the SNACS supersense hierarchy to Finnish and Latin data by manually annotating adposition and case marker tokens in Finnish and Latin translations of Chapters IV-V of Le Petit Prince (The Little Prince). We evaluate the computational validity of the semantic role annotation categories by grouping raw, contextualized Multilingual BERT embeddings using k-means clustering.

2021

pdf bib abs
AutoAspect: Automatic Annotation of Tense and Aspect for Uniform Meaning Representations
Daniel Chen | Martha Palmer | Meagan Vigus
Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

We present AutoAspect, a novel, rule-based annotation tool for labeling tense and aspect. The pilot version annotates English data. The aspect labels are designed specifically for Uniform Meaning Representations (UMR), an annotation schema that aims to encode crosslingual semantic information. The annotation tool combines syntactic and semantic cues to assign aspects on a sentence-by-sentence basis, following a sequence of rules that each output a UMR aspect. Identified events proceed through the sequence until they are assigned an aspect. We achieve a recall of 76.17% for identifying UMR events and an accuracy of 62.57% on all identified events, with high precision values for 2 of the aspect labels.

2020

Sequence-to-sequence models have proven to be highly successful in learning morphological inflection from examples as the series of SIGMORPHON/CoNLL shared tasks have shown. It is usually assumed, however, that a linguist working with inflectional examples could in principle develop a gold standard-level morphological analyzer and generator that would surpass a trained neural network model in accuracy of predictions, but that it may require significant amounts of human labor. In this paper, we discuss an experiment where a group of people with some linguistic training develop 25+ grammars as part of the shared task and weigh the cost/benefit ratio of developing grammars by hand. We also present tools that can help linguists triage difficult complex morphophonological phenomena within a language and hypothesize inflectional class membership. We conclude that a significant development effort by trained linguists to analyze and model morphophonological patterns are required in order to surpass the accuracy of neural models.

2019

Co-authors

Venues

coling1
dmr1
law1
lrec1
naacl1
show all...

sigmorphon1

Fix data